This week, Baseten's model performance team unlocked the fastest TPS and TTFT for gpt-oss 120b on
@nvidia hardware. When gpt-oss launched we sprinted to offer it at 450 TPS... now we've exceeded 650 TPS and 0.11 sec TTFT... and we'll keep working to keep raising the bar.
We are proud to offer the best E2E latency available with near-limitless scale, incredible performance, and the highest uptime 99.99%.