Holy shit… Meta might’ve just solved self-improving AI 🤯 Their new paper SPICE (Self-Play in Corpus Environments) basically turns a language model into its own teacher no humans, no labels, no datasets just the internet as its training ground. Here’s the twist: one copy of the model becomes a Challenger that digs through real documents to create hard, fact-grounded reasoning problems. Another copy becomes the Reasoner, trying to solve them without access to the source. They compete, learn, and evolve together an automatic curriculum with real-world grounding so it never collapses into hallucinations. The results are nuts: +9.1% on reasoning benchmarks with Qwen3-4B +11.9% with OctoThinker-8B and it beats every prior self-play method like R-Zero and Absolute Zero. This flips the script on AI self-improvement. Instead of looping on synthetic junk, SPICE grows by mining real knowledge a closed-loop system with open-world intelligence. If this scales, we might be staring at the blueprint for autonomous, self-evolving reasoning models.
Replying to @rryssf_
How does this "self-improving" framework scale to domains with ambiguous, multi-modal, or subjective objective functions, such as natural language creativity, visual aesthetics, or complex strategic planning? If the model "invents its own challenges," how does it learn to invent meaningful challenges in a domain where "correctness" is not easily defined by a simple reward signal?

Nov 2, 2025 · 3:46 AM UTC

2
2
Replying to @aerlabs_ @rryssf_
Good catch there. I think that it will fall short once it reaches ambiguity limit - where it can no longer proceed.