I think most HPC people would disagree with this statement. State-of-the-art HPC code is still written in ASM (see e.g., https://github.com/xianyi/OpenBLAS) [that's what Intel is doing too]
ASM makes sense when the time spent in a specific routine exceeds the time it takes to write the ASM, which makes a lot of sense for Blas, less so for other HPC yet speculative or less fundamental projects. Cvodes for instance doesn’t need to be written in ASM, and I think Julia makes a strong case that it could have been written in Julia.
I don't think they would. I think they realize that state-of-the-art HPC code is a small fraction of all the code written. I doubt that these people write ASM instead of Python or JS or C or whatever when doing simple scripts.
That ASM code is however not necessarily constructed manually. You'd think for high performance code with limited scope, a superoptimizer would be used.
Not sure what a "superoptimizer" would look like in this context. For a reference, I know for sure that this https://github.com/giaf/blasfeo (which beats Intel MKL) was coded entirely by hand.
There is more and more effort in the automatic development of high-performance linear algebra kernels.
But based on my experience, it would certainly be a big challenge to have a tool able to exploit the subtle differences in the assembly languages of different architectures, if the aim is to match or even exceed an expert-crafted assembly kernel.
Anyway, that's surely a very promising active research direction.
Good point. And you don't have to go that low. Maybe go use Object Pascal, Nim, or Vlang. I know... the libraries. But a lot of them are bindings of C libraries. So, you can create bindings in other languages too or use Python from those languages. There are various options.