Damn. And original Kimi was already on par or SOTA on creative writing x.com/garyfung/status/194667… so gap is likely widening
To be seen on how good this is on coding, if it beats MiniMax M2?
❤️ seeing open weights winning or closely following ClosedAI
From my tests, Kimi K2 thinking is better than everything Xai, Anthropic, Google has to offer atm.
The only thing that is better than this is Gpt 5 codex (at code) and Gpt 5 pro (at high level algorithm design)
It beats the SOTA at creative writing by a mile.
Good work @crystalsssup!
Holy shit, beats grok 4 heavy too on HLE with Kimi thinking’s own heavy
ok we're at 51% with "heavy" mode
> Heavy Mode: K2 Thinking Heavy Mode employs an efficient parallel strategy: it first rolls out eight trajectories simultaneously, then reflectively aggregates all outputs to generate the final result.
Nov 7, 2025 · 2:55 AM UTC
since you already got Kimi K2 non-think in your garden. K2 Thinking addition when @windsurf ?
GPT-5 high reasoning intelligence with higher speed and lower cost, want!
grok.com/share/bGVnYWN5_de3c…
this 1 shotted a Word clone? 🤯
how much more this thing can do, iterating with me
function app showcase at moonshotai.github.io/Kimi-K2…
for agentic coding. Updated table of additional models I care about using with direct or normalized scores of
- SWE-bench Verified
- Terminal-Bench
according to Kimi kimi.com/share/19a5c71b-18e2… while i also test its agentic web search capability. A bit better on factuality & comprehensive than Grok which has been my daily driver!
reranked with added telecom bench. Weighted score based on 50% SWE-bench Verified, 25% Terminal-Bench, 25% τ²-Bench Telecom
with with that, the top 3 models are near neck to neck for agentic coding: mix of intelligent "swe" problem solving and coding + reliable tool calls







