It's funny that the context window size is such a thing still. Like the whole LLM 'thing' is compression. Why can't we figure out some equally brilliant way of handling context besides just storing text somewhere and feeding it to the llm? RAG is the best attempt so far. We need something like a dynamic in flight llm/data structure being generated from the context that the agent can query as it goes.
My favorite solution is a lower parameter 5 layer model trained on the data that acts as a local compression and response, a neurocortext layer wrapped around any large persistent data you have to interact with and ...... maybe also a specialist tool that spins up which is built with that data in mind but is deterministic in it's approach- sort of a just-in-time index or adaptive indexing
That’s actually a pretty cool idea. When I think about my internal mental model of a codebase I’m working on it’s definitely a compacted lossy thing that evolves as I learn more.