Raja Koduri · Sep 15, 2025 · 3:29 PM UTC

Raja Koduri

Raja Koduri

@RajaXg

Sep 15

Genuine question: All the breakthrough optimizations I see - KV cache, flash attention, quantization, seem to originate from CUDA/GPU land. Are TPUs innovating differently, or is my feed just GPU-biased? Would love examples of TPU-first optimization techniques that later crossed over. Drop links if you’ve got them!

880

Raja Koduri · Sep 15, 2025 · 7:28 PM UTC

Raja Koduri · Sep 15, 2025 · 7:28 PM UTC

Raja Koduri

@RajaXg

Sep 15

jax-ml.github.io/scaling-boo… Aditya wagh shared this link on LinkedIn...looks interesting

Sep 15, 2025 · 7:28 PM UTC