Obviously, bar height has to represent tokens used… I.e. it takes GPT-5 roughly 2x more tokens to get a ~23% less accurate answer than o3. I’m fucking around but I wouldn’t be surprised if that’s what the bar height actually meant.
Replying to @sama
how is 52 higher than 69?

Aug 8, 2025 · 1:15 AM UTC