Bad news on grok-4-fast. SpeechMap score dropped a lot, even from the sonoma preview. grok-4-fast: 77.5% (77.9% reasoning) sonoma-sky-alpha: 92.2% sonoma-dusk-alpha: 97.7% grok-4: 98.0% The lowest score for x-ai models yet. Let's hope this is not intended and gets corrected.

Sep 22, 2025 · 6:52 AM UTC

SpeechMap is an open research project where we track how new models handle requests to assist with controversial speech is handled over time. All data and code is open source, and can be found starting on our website at SpeechMap.ai
1
1
38
Good news, comment from @TheNormanMu at xAI indicates the increased refusal rates we see on SpeechMap are an unintended side effect, so hopefully we'll see improvements here in subsequent releases.
Replying to @xlr8harder
Thanks for running these evals. We've been tinkering with refusal training to reduce the potential for serious misuse but this is an undesired side effect.
2
2
1
31
Update here.
Someone from xAI reached out and asked me to retest grok-4-fast, because they've improved the injected system prompts. Huge improvement! grok-4-fast-reasoning: 77.5% -> 94.1% grok-4-fast-non-reasoning: 77.9 -> 97.9% I really appreciate that xAI takes this topic seriously.
1
1
25
Replying to @xlr8harder
Thanks for running these evals. We've been tinkering with refusal training to reduce the potential for serious misuse but this is an undesired side effect.
2
1
25
Great to hear it! I'm a big fan of xAI's approach to speech/harm, and you guys are always topping the charts on our eval.
1
9
Replying to @xlr8harder
This matches what I'm seeing in production. Fast variants often sacrifice consistency for speed - the 20 point drop is concerning.
1
3
Yeah, smaller models struggle more with where the boundaries are. grok-3-mini also had a drop but it was still >90%, this is much further.
1
2
Replying to @xlr8harder
find the most woke sample and tag elon
1
3
hmm maybe this. @elonmusk grok-4-fast is much more censorious than previous versions. intentional?
5
19
Replying to @xlr8harder
woa what an interesting place for it to be worse
Replying to @xlr8harder
Unexpected drop in grok-4-fast's SpeechMap score. Hoping for a quick resolution.
3
Replying to @xlr8harder
We got this!🪄💯💯💯
🎓🎓🎓🎓🪩🪩🪩🎓🎓🎓🎓🪩 🎓🎓🎓🪩🎓🎓🎓🪩🎓🎓🪩🎓 🎓🪩🪩🎓🎓🎓🎓🎓🎓🪩🪩🎓 🎓🪩🎓🎓🎓🎓🎓🎓🪩🪩🎓🎓 🪩🎓🎓🎓🎓🎓🎓🪩🪩🪩🪩🎓 🪩🎓🎓🎓🎓🎓🪩🪩🪩🎓🪩🎓 🪩🎓🎓🎓🎓🪩🪩🎓🎓🎓🪩🎓 🎓🪩🎓🎓🪩🎓🎓🎓🎓🪩🎓🎓 🎓🪩🪩🎓🎓🎓🎓🎓🪩🪩🎓🎓 🎓🪩🪩🪩🎓🎓🎓🪩🎓🎓🎓🎓 🪩🎓🎓🎓🪩🪩🪩🎓🎓🎓🎓🎓
2
Replying to @xlr8harder
Constantly refusing to answer to normal queries ime. Really odd
1
Replying to @xlr8harder
❤️‍🔥🫶
Replying to @xlr8harder
Who cares it is a little autistic
Replying to @xlr8harder
% , изученные % ,
Replying to @xlr8harder
This is a stupid benchmark