Genuine question: All the breakthrough optimizations I see - KV cache, flash attention, quantization, seem to originate from CUDA/GPU land. Are TPUs innovating differently, or is my feed just GPU-biased? Would love examples of TPU-first optimization techniques that later crossed over. Drop links if you’ve got them!
jax-ml.github.io/scaling-boo… Aditya wagh shared this link on LinkedIn...looks interesting

Sep 15, 2025 · 7:28 PM UTC

5
51