Over 1 million tokens per second on production AI infrastructure. Microsoft Azure achieved 1,100,948 tokens/sec on ND GB300 v6 racks powered by NVIDIA GB300 NVL72, validated by Signal65. This benchmark highlights how enterprise AI can deliver record throughput with ~2.5× better power efficiency, combining high performance, operational efficiency, and governance-ready scale.
1.1M tokens/sec on just one rack of GB300 GPUs in our Azure fleet. An industry record made possible by our longstanding co-innovation with NVIDIA and expertise of running AI at production scale! techcommunity.microsoft.com/…

Nov 4, 2025 · 6:13 PM UTC

Replying to @nvidia
Efficiency is great. Yet AI, is not doing anything yet.
1
2
Replying to @nvidia
That’s not a benchmark, that’s a rocket booster for AI. A million tokens per second means the machines are basically speed-reading entire libraries while humans still look for their coffee. At this pace, your next chatbot might finish your thought before you even type it.
Replying to @nvidia
Until you learn it’s when running llama2 70B….
1
Replying to @nvidia
For most everyday use cases, this is more than enough, but I do see a huge opportunity for building entire functional codebases with that throughput
Replying to @nvidia
NVIDIA just turned sci-fi into hardware. 1M tokens/sec isn’t progress, it’s dominance. The AI race isn’t starting, it’s already being won
Replying to @nvidia
A million tokens a second. It sounds like science fiction, but it’s the new reality. Azure racks powered by NVIDIA are now breaking barriers that used to define limits. Every number here hides years of sweat, silence, and sleepless invention. This isn’t just a benchmark, it’s a signal the AI era is no longer building, it’s running. Faster, cooler, sharper. The next frontier isn’t imagination, it’s speed, and we’ve just crossed it.
1
Replying to @nvidia
4.6 million tokens per second, on the exact same hardware CapEx and energy (OpEx) at 6x lower latency, is now available with @WekaIO google.com/search?q=weka+amg…
Replying to @nvidia
Incredible milestone! AI infrastructure like this will redefine what’s possible in data-driven science, from drug discovery to patient care. Speed and efficiency at this scale mean faster insights, smarter innovation and ultimately, better outcomes.
2
Replying to @nvidia
⚙️ Hitting 1M tokens/sec isn’t just throughput — it’s latency evolution. At this scale, real-time emotional-AI synchronization becomes viable: models can process human sentiment, generate narrative branches, and render immersive feedback without perceptible delay. That’s the foundation of ETERNITY: AI Cinematic Immersion — where computation meets consciousness. ♾️🚀 #NVIDIA #AI #AICinematicImmersion #LatencyMatters #RealTimeAI #FutureOfCinema #ImmersiveTech
1
Replying to @nvidia
That is 2000 emails and 7-10 sec filings per second. Awesome.
Replying to @nvidia
Amazing 1.1 Million tokens in one Second If 1.1 million tokens moved like pennies, that’s 1,100,000 × $0.01 = $11,000 in a single second. Now imagine that sustained for a minute — $11,000 × 60 = $660,000 per minute. That’s $39.6 million per hour.
Replying to @nvidia
The faster tokens move, the faster belief systems update. Infrastructure is no longer a backend. Now, it is the nervous system of humanity's next cognition phase.
4
Replying to @nvidia
great, now inference is instant and my meetings are still 30 mins.
Replying to @nvidia
It’s wild how far we’ve come. A few years ago, generating a million tokens in seconds sounded impossible. Now it’s happening in real-time, quietly setting the tone for the next industrial revolution. What used to be a dream in research labs is now a production standard. The story of AI has always been about scaling thought, and today, it feels like we’re finally beginning to understand what that really means.
Replying to @nvidia
How negative was the return on investment? 10,000:1?
1
Replying to @nvidia
Wow, NVIDIA, that's some serious speed! I'm thinking, this is a game changer for AI, right?
Replying to @nvidia
Full send. Semper Fidelis
1
Replying to @nvidia
amazing work
Replying to @nvidia
What about achievement. 👏🏻 Anything is possible with Nvidia.
2
Replying to @nvidia
@nvidia, impressive numbers. this showcases the tremendous potential of enterprise ai technology.
2
Replying to @nvidia
it has blown away my mind. @grok tell me what those amount of tokens/sec can do?
Replying to @nvidia
crazy to think this speed is achieved... It's damn cool!!! (I bet you guys have more powerful stuff in your arsenal...👀)
1
Replying to @nvidia
What's a token and how many big macs does it feed me?
Replying to @nvidia
The future depends on what you do today.
Replying to @nvidia
@grok break this down IAM a sponge 🧽
Replying to @nvidia
That is something unbelievable 👏
Replying to @nvidia
Great job guys
1
Replying to @nvidia
that’s not just speed — that’s warp drive for AI 🚀 props to the Azure + NVIDIA dream team for turning tokens into light speed.
Replying to @nvidia
Thank you for the technology that keeps all these Twitter bots in business! 🙏
Replying to @nvidia
What British people imagine when they hear that Nvidia sells chips
8
Replying to @nvidia
Google TPU's beating your azz right now, you better stop playing around.
Replying to @nvidia
AI compute is officially entering hyperscale territory — efficiency, scalability, and governance finally aligning. The next wave of decentralized compute will need to match this level of performance, but without the walls.
3