Damn. And original Kimi was already on par or SOTA on creative writing x.com/garyfung/status/194667… so gap is likely widening To be seen on how good this is on coding, if it beats MiniMax M2? ❤️ seeing open weights winning or closely following ClosedAI
From my tests, Kimi K2 thinking is better than everything Xai, Anthropic, Google has to offer atm. The only thing that is better than this is Gpt 5 codex (at code) and Gpt 5 pro (at high level algorithm design) It beats the SOTA at creative writing by a mile. Good work @crystalsssup!
Holy shit, beats grok 4 heavy too on HLE with Kimi thinking’s own heavy
ok we're at 51% with "heavy" mode > ​Heavy Mode​: K2 Thinking Heavy Mode employs an efficient parallel strategy: it first rolls out eight trajectories simultaneously, then reflectively aggregates all outputs to generate the final result.

Nov 7, 2025 · 2:55 AM UTC

1
1
1
since you already got Kimi K2 non-think in your garden. K2 Thinking addition when @windsurf ? GPT-5 high reasoning intelligence with higher speed and lower cost, want! grok.com/share/bGVnYWN5_de3c…
this 1 shotted a Word clone? 🤯 how much more this thing can do, iterating with me function app showcase at moonshotai.github.io/Kimi-K2…
for agentic coding. Updated table of additional models I care about using with direct or normalized scores of - SWE-bench Verified - Terminal-Bench according to Kimi kimi.com/share/19a5c71b-18e2… while i also test its agentic web search capability. A bit better on factuality & comprehensive than Grok which has been my daily driver!
1
1
3
reranked with added telecom bench. Weighted score based on 50% SWE-bench Verified, 25% Terminal-Bench, 25% τ²-Bench Telecom with with that, the top 3 models are near neck to neck for agentic coding: mix of intelligent "swe" problem solving and coding + reliable tool calls
2
1
1
AA full scores:
Replying to @ArtificialAnlys
Individual results across all evaluations in the Artificial Analysis Intelligence Index: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME 2025, IFBench, AA-LCR, Terminal-Bench Hard, 𝜏²-Bench Telecom
Near SOTA at math too. How does that work with SOTA in creative writing, entirely different disciplines 🤔
Kimi K2 Thinking is now the top-performing non-TTC model for math