If a commit's job is to capture state at a particular point in time, so that it can be reproduced and understood, then it _also_ needs to include the exact model used. This is only useful if you can ensure access to the previous versions of the model -- which is not something that providers are willing to do (in fact, they regularly "retire" old models). The only transparent way forward is to open source the models, along with their weights, and their training set (to verify that the weights match, and to retrain the model when new architectures and new hardware are released).
Not insisting upon this, would be similar to depending on a SaaS to compile and packages software, and being totally cool with it. Both LLMs and build systems, convert human-friendly notation into machine-friendly notation. We should hold the LLM companies to the same standards of transparency that we hold the people who make things like nix, clang, llvm, cmake, cargo, etc.
Not insisting upon this, would be similar to depending on a SaaS to compile and packages software, and being totally cool with it. Both LLMs and build systems, convert human-friendly notation into machine-friendly notation. We should hold the LLM companies to the same standards of transparency that we hold the people who make things like nix, clang, llvm, cmake, cargo, etc.