I think the sentiment (at least my sentiment) is that "mainstream ML" has fallen into the transformer local minimum, and given the weight of the players in that space it will take a huge amount of force to move them out of it.
The likes of this, Mercury Coder, and even RKWV are definitely hopeful - but there's a pitch black shadow of hype and speculation to outshine.
I disagree. Most AI innovation today is around things like agents, integrations, and building out use cases. This is possible because transformers have made human-like AI possible for the first-time in the history of humanity. These use-cases will remain the same even if the underlying architecture changes. The number of people working on new architectures today is way more than were working on neural networks in 2017 when 'attention is all you need' came out. Nevertheless, actual ML model researchers are only a small portion of the total ML/AI community, and this is fine.
If you consider most of the dominate architectures in deeplearning type approaches, transformers are remarkably generic. If you reduce transformer like architectures to "position independent iterated self attention with intermediate transformations", they can support ~all modalities and incorporate other representations (e.g. convolutions, CLIP style embeddings, graphs or sequences encoded with additional position embeddings). On top of that, they're very compute friendly.
Two of the largest weaknesses seem to be auto-regressive sampling (not unique to the base architecture) and expensive self attention over very long contexts (whether sequence shaped or generic graph shaped). Many researchers are focusing efforts there!
Transformers are very close to some types of feed forward networks. The difference is that transformers can be trained in parallel without the need for auto-regression (which is slow, for training, but kind of nice for streaming , low-latency inference). It's a mathematical trick. RWKV makes it obvious.
It's true, but you can't deny the importance of the architecture. It's pretty clear that using simple perceptrons would not have led us down the same path.
Sure, but I think a reasonable corollary is that new algorithms and architectures will show their strengths when new realms of computation become available.
I've secretly wondered if the next (ahem) quantum leap in output quality will arrive with quantum computing wherein answering 10,000 if statements simultaneously would radically change the inference pipeline
But I am also open to the fact that I may be thinking of this in terms of 'faster horses' and not the right question
It's not clear how your perception of quantum computing would lead to 'faster horses' in the current view of NN architectures - keep mind that the common view of 'exploring many paths simultaneously' is at best an oversimplification (https://scottaaronson.blog/?p=2026).
That said, perhaps advances in computing fundamentals would lead to something entirely new (and not at all horselike).
If you can tie in a loss function for a neural network into the quantum excitement state of a quantum system, then presumably, letting the system settle at the energy minimum would be equivalent to a training step, but perhaps much faster.
There is more fundamental ML research today than at any other point in history, including in non-transformer architectures. That is my point. It doesn't seem that way because 90%+ of 'ML research' has nothing to do with fundamental ML and is instead research around applications, which are indifferent to the underlying model at the end of the day. That was the point of my comment.
The likes of this, Mercury Coder, and even RKWV are definitely hopeful - but there's a pitch black shadow of hype and speculation to outshine.