Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I know it can happen on some older ARM designs, microcontrollers and weird DSP-like etc. chips. But I can't think of any case on modern x86 chips at least.

Some low-performance ARMv7/8 designs can split NEON SIMD instructions into multiple clock cycles, but I think even then NEON is going to perform better.



I started digging into it but I don't have an intel machine with linux easily available and that's a prerequisite for looking at microop performance. I think you win.


I don't need to win, I just want the truth to win. In other words, I'd consider it a win if I, you or anyone else learns something new.

SIMD is highly optimized at this point, with sustainable two instructions per clock throughput. It's hard to imagine scalar getting anywhere near even at 4 inst/clk rates.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: