new blog post
"There Are No New Ideas In AI.... Only New Datasets"
in which i summarize LLMs in exactly four breakthroughs and explain why it was really *data* all along that mattered... not algorithms
So the main recent additions are efficiency tweaks. Like Mixture of Experts (but that's also a few years old) and Multi-head Latent Attention.
There are a few ideas like Mamba, Samba etc. but they have not been scaled yet, so it's not yet clear if they can compete with SOTA LLMs