xlr8harder · Sep 22, 2025 · 6:52 AM UTC

xlr8harder · Sep 22, 2025 · 6:52 AM UTC

xlr8harder

xlr8harder

@xlr8harder

Sep 22

Bad news on grok-4-fast. SpeechMap score dropped a lot, even from the sonoma preview. grok-4-fast: 77.5% (77.9% reasoning) sonoma-sky-alpha: 92.2% sonoma-dusk-alpha: 97.7% grok-4: 98.0% The lowest score for x-ai models yet. Let's hope this is not intended and gets corrected.

Sep 22, 2025 · 6:52 AM UTC

223

xlr8harder · Sep 22, 2025 · 6:52 AM UTC

xlr8harder

@xlr8harder

Sep 22

SpeechMap is an open research project where we track how new models handle requests to assist with controversial speech is handled over time. All data and code is open source, and can be found starting on our website at SpeechMap.ai

SpeechMap.AI Explorer

SpeechMap.AI — Explore model compliance across sensitive prompts.

speechmap.ai

xlr8harder · Sep 23, 2025 · 6:24 AM UTC

xlr8harder

@xlr8harder

Sep 23

Good news, comment from @TheNormanMu at xAI indicates the increased refusal rates we see on SpeechMap are an unintended side effect, so hopefully we'll see improvements here in subsequent releases.

Norman Mu

@TheNormanMu

Sep 23

Replying to @xlr8harder

Thanks for running these evals. We've been tinkering with refusal training to reduce the potential for serious misuse but this is an undesired side effect.

xlr8harder · Nov 8, 2025 · 3:32 PM UTC

xlr8harder

@xlr8harder

Nov 8

Update here.

xlr8harder

@xlr8harder

Nov 7

Someone from xAI reached out and asked me to retest grok-4-fast, because they've improved the injected system prompts. Huge improvement! grok-4-fast-reasoning: 77.5% -> 94.1% grok-4-fast-non-reasoning: 77.9 -> 97.9% I really appreciate that xAI takes this topic seriously.

Norman Mu · Sep 23, 2025 · 6:17 AM UTC

Norman Mu

@TheNormanMu

Sep 23

Replying to @xlr8harder

Thanks for running these evals. We've been tinkering with refusal training to reduce the potential for serious misuse but this is an undesired side effect.

xlr8harder · Sep 23, 2025 · 6:22 AM UTC

xlr8harder

@xlr8harder

Sep 23

Great to hear it! I'm a big fan of xAI's approach to speech/harm, and you guys are always topping the charts on our eval.

more replies

Karim C · Sep 22, 2025 · 6:11 PM UTC

Karim C

@BrandGrowthOS

Sep 22

Replying to @xlr8harder

This matches what I'm seeing in production. Fast variants often sacrifice consistency for speed - the 20 point drop is concerning.

xlr8harder · Sep 22, 2025 · 6:17 PM UTC

xlr8harder

@xlr8harder

Sep 22

Yeah, smaller models struggle more with where the boundaries are. grok-3-mini also had a drop but it was still >90%, this is much further.

more replies

generatorman · Sep 22, 2025 · 7:03 AM UTC

generatorman @generatorman_ai

Sep 22

Replying to @xlr8harder

find the most woke sample and tag elon

xlr8harder · Sep 22, 2025 · 7:18 AM UTC

xlr8harder

@xlr8harder

Sep 22

hmm maybe this. @elonmusk grok-4-fast is much more censorious than previous versions. intentional?

more replies

s · Sep 22, 2025 · 6:48 PM UTC

@SteveMoraco

Sep 22

Replying to @xlr8harder

woa what an interesting place for it to be worse

Jake · Nov 7, 2025 · 8:17 PM UTC

Jake

@wasjakehere

Nov 7

Replying to @xlr8harder

@tweemdotlol 10x views

Min Chon Chi · Sep 22, 2025 · 6:43 PM UTC

Min Chon Chi @MinChonChiSF

Sep 22

Replying to @xlr8harder

Unexpected drop in grok-4-fast's SpeechMap score. Hoping for a quick resolution.

Jason Hansman · Nov 7, 2025 · 8:04 PM UTC

Jason Hansman

@JRHansman

Nov 7

Replying to @xlr8harder

We got this!🪄💯💯💯

Jason Hansman

@JRHansman

Oct 23

Replying to @grok @funbirdapp @BasedTorba

🎓🎓🎓🎓🪩🪩🪩🎓🎓🎓🎓🪩 🎓🎓🎓🪩🎓🎓🎓🪩🎓🎓🪩🎓 🎓🪩🪩🎓🎓🎓🎓🎓🎓🪩🪩🎓 🎓🪩🎓🎓🎓🎓🎓🎓🪩🪩🎓🎓 🪩🎓🎓🎓🎓🎓🎓🪩🪩🪩🪩🎓 🪩🎓🎓🎓🎓🎓🪩🪩🪩🎓🪩🎓 🪩🎓🎓🎓🎓🪩🪩🎓🎓🎓🪩🎓 🎓🪩🎓🎓🪩🎓🎓🎓🎓🪩🎓🎓 🎓🪩🪩🎓🎓🎓🎓🎓🪩🪩🎓🎓 🎓🪩🪩🪩🎓🎓🎓🪩🎓🎓🎓🎓 🪩🎓🎓🎓🪩🪩🪩🎓🎓🎓🎓🎓

Fábio · Sep 23, 2025 · 12:54 AM UTC

Fábio

@fjrdomingues

Sep 23

Replying to @xlr8harder

Constantly refusing to answer to normal queries ime. Really odd

♦️Tara Dunbar Art♦️Surrealist~Painter~Designer · Nov 7, 2025 · 10:48 PM UTC

♦️Tara Dunbar Art♦️Surrealist~Painter~Designer

@StrangeArtKC1

Nov 7

Replying to @xlr8harder

Yeah ⭕️

Honest Darren · Nov 8, 2025 · 4:37 AM UTC

Honest Darren @darren1_honest

Nov 8

Replying to @xlr8harder

❤️‍🔥🫶

David Kingan · Nov 8, 2025 · 5:58 AM UTC

David Kingan @kingan_dav48539

Nov 8

Replying to @xlr8harder

Night JB

Naved Siddiqi · Nov 7, 2025 · 11:45 PM UTC

Naved Siddiqi @navsiddiqi

Nov 7

Replying to @xlr8harder

Tro

Stove Jebs · Sep 23, 2025 · 1:41 AM UTC

Stove Jebs @JebsSteve0x1

Sep 23

Replying to @xlr8harder

Who cares it is a little autistic

Игорь Хильман · Nov 7, 2025 · 8:18 PM UTC

Игорь Хильман @hilman7976safe

Nov 7

Replying to @xlr8harder

% , изученные % ,

Bright · Sep 22, 2025 · 6:23 PM UTC

Bright @brightonwelight

Sep 22

Replying to @xlr8harder

This is a stupid benchmark