I guess I'm saying their goal here was to do no search and get high performance, not create the highest possible performance. The trade off between run time search and more training (better models) is an explicit area of research, and this paper is in that realm. Noam Brown has talked a lot about this in the context of poker and diplomacy.
I guess I'm saying their goal here was to do no search and get high performance, not create the highest possible performance. The trade off between run time search and more training (better models) is an explicit area of research, and this paper is in that realm. Noam Brown has talked a lot about this in the context of poker and diplomacy.