It is considering what you get for it, and it's not lower end six figures and most likely seven. The JetMoe team released their training cost estimate and it took them $100k to train what's effectively a 2.2B model for 1.25 T tokens. Compare that to the still tiny Mistral 7B which is 3x larger and was trained on 4x more data you get a figure more around $1.7M. These are the absolute smallest production-viable LLMs.
For something like Mixtral 8X22B with 40B active params you'd looking at the $10M range, and if something gets screwed up during training you can be left with a dud and nothing to show for it, like LLama-2-33B. It's like buying millions worth of lootboxes and hoping something good drops.
For something like Mixtral 8X22B with 40B active params you'd looking at the $10M range, and if something gets screwed up during training you can be left with a dud and nothing to show for it, like LLama-2-33B. It's like buying millions worth of lootboxes and hoping something good drops.