Cerebras beats Nvidia H100 but can it beat Blackwell? Blackwell inference endpoints are finally out and it’s fast. It runs GPT-OSS-120B at ~700 tokens/s, leapfrogging H100 and Groq. Cerebras clocked in at 3,000 TPS - still #1. Looking forward to Rubin!

Nov 6, 2025 · 11:01 PM UTC

Replying to @cerebras
Is this true on a watt for watt basis?
3
2
pricing is TCO - look at pricing
1
3
Replying to @cerebras
Lies - you have chip that is designed after a dinner plate so that is your max. go up against NVLink and get smacked - you don't have NVLink or wouldn't be able to even think about scaling your solution.
1
2
nvlink is measured in terabytes/s cerebras fabric is measured in petabytes/s
1
6
Replying to @cerebras
@grok why Cerebras is faster ? What is wafer scale, does Nvidia or AMD not produce it yet ?
Replying to @cerebras
And the next wafers are already coming. Oh 2026 is gonna be lit. Cerebras should be the multi trillion dollar company. K2 Think, GLM4.6 and Cognition SWE 1.5 all at record speeds and all full non quant models. Incredible.
2
Replying to @cerebras
Yeah, but an NVidia can support more models. You guys are limited to what? Also can't be used to train models. Apples to oranges.
6
Replying to @cerebras
this would be the perfect time to certain k2 thinking cerebras
2
Replying to @cerebras
3000 tps + 37 seconds queue latency per api call
Replying to @cerebras
Cerebras and Nvidia are racing like Formula 1 cars on silicon tracks. Blackwell’s fast, but Cerebras is still leading with 3000 TPS, basically lapping the field. The real question isn’t who’s faster; it’s who runs longer without burning a power station. Rubin might just bring the afterburner.
Replying to @cerebras
can’t wait to cut my wafer so i can have cerebras at home
Replying to @cerebras
Why you're not giving cloud GPU access on hourly rate? Maybe open-source SDK to port models to support your hardware. Would like to host our custom models on your silicon.
Replying to @cerebras
Minimax M2, Qwen3-Omni, maybe even Kimi K2 thinking - this is the interesting stuff with practical agentic use-cases and lots of thinking which would really be nice to have at fast speeds. K2 in particular is slow as hell at the moment
Replying to @cerebras
@grok whats rubin
Replying to @cerebras
Don’t they need to be 100x faster to make sense?
Replying to @cerebras
When Kimi K2 Thinking?? Really looking forward to it. I need speeeeed.
1
Replying to @cerebras
wen k2 thinking ser
Replying to @cerebras
Are we going to have Kimi K2 Thinking on Cerebras with interleaved thinking support soon?
3
Replying to @cerebras
Unmatched speed
1
Replying to @cerebras
those speed numbers are wild, progress is moving so fast these days
Replying to @cerebras
broo this is crazy
Replying to @cerebras
3000 tps is straight disgusting
1
Replying to @cerebras
@grok how fast is the first to token ms
Replying to @cerebras
What is the comparison here? One chip to one chip, one server to one server rack. We can fit more GB 200 and one server compared to the WSE 3.
2
2
Replying to @cerebras
@grok can you do an estimation of pricing of Nvidia vs Cerebras. Meaning that to get the same 2995 tokens per second, would it be cheaper to get the cerebras or nvidia? Also what’s the performance per watt?
1
1
Replying to @cerebras
@grok what performance metric is this chart showing. is this relevant for training or inference performance on the current SOTA LLM models? what are the biases in this data.
Replying to @cerebras
Time for IPO public?
3
Replying to @cerebras
3,000 tps is a great benchmark, but the real question is how many software engineers you need to keep it fed. hardware is easy, the moat is the compiler and the ecosystem tax.
Replying to @cerebras
Minimax M2 🙏
1
Replying to @cerebras
What was your sales last quarter?