I think the sentiment (at least my sentiment) is that "mainstream ML" has fallen...

anon291 · on May 12, 2025

I disagree. Most AI innovation today is around things like agents, integrations, and building out use cases. This is possible because transformers have made human-like AI possible for the first-time in the history of humanity. These use-cases will remain the same even if the underlying architecture changes. The number of people working on new architectures today is way more than were working on neural networks in 2017 when 'attention is all you need' came out. Nevertheless, actual ML model researchers are only a small portion of the total ML/AI community, and this is fine.

spindump8930 · on May 12, 2025

If you consider most of the dominate architectures in deeplearning type approaches, transformers are remarkably generic. If you reduce transformer like architectures to "position independent iterated self attention with intermediate transformations", they can support ~all modalities and incorporate other representations (e.g. convolutions, CLIP style embeddings, graphs or sequences encoded with additional position embeddings). On top of that, they're very compute friendly.

Two of the largest weaknesses seem to be auto-regressive sampling (not unique to the base architecture) and expensive self attention over very long contexts (whether sequence shaped or generic graph shaped). Many researchers are focusing efforts there!

Also see: https://www.isattentionallyouneed.com/

anon291 · on May 12, 2025

Transformers are very close to some types of feed forward networks. The difference is that transformers can be trained in parallel without the need for auto-regression (which is slow, for training, but kind of nice for streaming , low-latency inference). It's a mathematical trick. RWKV makes it obvious.

Retric · on May 12, 2025

The sheer scale of computation and data available is what’s pushing AI to near human levels. The same algorithms in 1980 wouldn’t be nearly as useful.

anon291 · on May 12, 2025

It's true, but you can't deny the importance of the architecture. It's pretty clear that using simple perceptrons would not have led us down the same path.

Retric · on May 12, 2025

Sure, but I think a reasonable corollary is that new algorithms and architectures will show their strengths when new realms of computation become available.

mdaniel · on May 12, 2025

I've secretly wondered if the next (ahem) quantum leap in output quality will arrive with quantum computing wherein answering 10,000 if statements simultaneously would radically change the inference pipeline

But I am also open to the fact that I may be thinking of this in terms of 'faster horses' and not the right question

spindump8930 · on May 12, 2025

It's not clear how your perception of quantum computing would lead to 'faster horses' in the current view of NN architectures - keep mind that the common view of 'exploring many paths simultaneously' is at best an oversimplification (https://scottaaronson.blog/?p=2026).

That said, perhaps advances in computing fundamentals would lead to something entirely new (and not at all horselike).

anon291 · on May 12, 2025

If you can tie in a loss function for a neural network into the quantum excitement state of a quantum system, then presumably, letting the system settle at the energy minimum would be equivalent to a training step, but perhaps much faster.

janalsncm · on May 12, 2025

> AI innovation today

I think you are talking about something else. In my opinion, integration is very different from fundamental ML research.

anon291 · on May 12, 2025

There is more fundamental ML research today than at any other point in history, including in non-transformer architectures. That is my point. It doesn't seem that way because 90%+ of 'ML research' has nothing to do with fundamental ML and is instead research around applications, which are indifferent to the underlying model at the end of the day. That was the point of my comment.