A better metaphor would be to say it compresses the internet, creates a Markov c...

IanCal · on March 8, 2023

> creates a Markov chain based on that compression

I dislike that interpretation. It suggests it builds a very basic statistical model, but a very basic statistical model simply wouldn't be able to do what these models can do.

Or alternatively, if you want to consider the model as a markov chain mapping the probability from the previous four thousand tokens to the next token then the space is astronomically large. Beyond astronomically and even economically large, there are ~50,000^4096 possible input states.

Jensson · on March 8, 2023

> but a very basic statistical model simply wouldn't be able to do what these models can do.

Why do you think that? Why do you think a basic statistical continuation of the logic of a text wouldn't do what the current model does? There are trillions of conversations out there it can rely on to continue the text, people playing theatre, people roleplaying, tutorials, people playing opposite games, people brainstorming etc. Create a parser that can parse those down to logic, then make a markov chain based on that, and I have no problem seeing the current ChatGPT skills manifesting from that.

> Or alternatively, if you want to consider the model as a markov chain mapping the probability from the previous four thousand tokens to the next token then the space is astronomically large. Beyond astronomically and even economically large, there are ~50,000^4096 possible input states.

Yes, that is the novel thing, it compresses the states down to something manageable without losing the essence of the text, and then builds a model there of likely next token.