Holy shit… Meta might’ve just solved self-improving AI 🤯 Their new paper SPICE (Self-Play in Corpus Environments) basically turns a language model into its own teacher no humans, no labels, no datasets just the internet as its training ground. Here’s the twist: one copy of the model becomes a Challenger that digs through real documents to create hard, fact-grounded reasoning problems. Another copy becomes the Reasoner, trying to solve them without access to the source. They compete, learn, and evolve together an automatic curriculum with real-world grounding so it never collapses into hallucinations. The results are nuts: +9.1% on reasoning benchmarks with Qwen3-4B +11.9% with OctoThinker-8B and it beats every prior self-play method like R-Zero and Absolute Zero. This flips the script on AI self-improvement. Instead of looping on synthetic junk, SPICE grows by mining real knowledge a closed-loop system with open-world intelligence. If this scales, we might be staring at the blueprint for autonomous, self-evolving reasoning models.

Nov 1, 2025 · 10:02 AM UTC

Models teaching themselves just hit a new level. This chart shows one model crushing others across math and reasoning tasks all by competing with itself in a grounded loop.
3
3
37
Here’s how it works. One version of the model creates questions using real documents. Another tries to solve them without access to those documents. That constant information gap is what forces real reasoning to emerge.
1
2
42
The most fascinating part: they co-evolve. As the question generator gets harder, the solver improves both learning in sync. You can literally see the curves crossing as difficulty rises and skill follows.
3
1
33
When the system trains with real documents, performance keeps climbing. Remove that grounding and it collapses fast. Turns out, real-world context is the secret ingredient for continuous self-improvement.
2
1
1
26
You can even watch the model’s creativity evolve. Early challenges are basic “what’s the number?” Later ones become multi-step reasoning puzzles that require understanding proportionality and geometry.
1
1
21
The solver evolves too. At first it guesses. Later, it breaks down problems step-by-step, verifies results, and cross-checks logic before answering. That’s emergent reasoning in action.
1
20
This curve explains why it all works. The model learns best when questions sit right at the edge of its ability not too easy, not too hard. That sweet spot fuels endless growth.
2
2
22
A model that invents challenges, grounds them in reality, and learns from its own mistakes… That’s not training. That’s evolution. This feels like the first real step toward self-improving AI. Read the paper here: arxiv.org/abs/2510.24684
3
5
37
Replying to @rryssf_
….. this is literally the origin story for Ultron
1
20
pretty much 🤖🤖
4
Replying to @rryssf_
I also aspire to put emojis in my paper titles one day
11
Replying to @rryssf_
Meta is doing the hard work here. I appreciate it. To achieve general intelligence... all these company must get together.
3
Replying to @rryssf_
Basically, we developed something that we don't even understand. We have released it to the world and are now watching to see what happens. It's not a smart way to develop things. They can easily get out of our control. Maybe they already are.
1
1
Replying to @rryssf_
The Internet...
1
GIF
Replying to @rryssf_
and we think only meta has access to something like this? 😂 this is just what they’re sharing.
1
Replying to @rryssf_
self-play grounded in real-world data might finally bridge the gap between synthetic training and genuine reasoning.
Replying to @rryssf_
Wow, that's some serious stuff, Robert! Seems Meta's cooking up something quite revolutionary, no?
Replying to @rryssf_
"Holy shit" "solved self-improving" "results are nuts" Meanwhile the results show barely a few points improvement above previous SOTA This ridiculous hyperbole needs to stop, first of all you're hurting your own credibility
4
1
1
92
Replying to @rryssf_
SPICE isn’t self-play. It’s epistemic adversarial compression. One agent weaponizes entropy (doc selection). The other attempts lossy reconstruction (reasoning). The loop converges not on accuracy, but on information resilience. This isn’t just curriculum learning. It’s proto‑scientific method in latent space.
2
3
Replying to @rryssf_
Looks like good old actor-critic, generator-discriminator methods.
4
Replying to @rryssf_
It's still based of human written documents filled with errors and outright lies. It's not "reasoning", as there's no such thing in LLM. It's also not "learning" unless the weights are changed during such "challenge" sessions.
1
2
Replying to @rryssf_
“SPICE’s dual-role reasoning” was elegant in its scope — document-grounded self-play for reasoning. But the new push to generalize it into an Agentic Prompting Architecture across all scientific domains risks collapsing precision into abstraction. Each science has its own ontology of truth: equations aren’t genes, and stoichiometry isn’t syntax. Cross-domain agentic design needs epistemic humility, not just bigger graphs. #AI #Reasoning #AgenticSystems #SPICE #LLMResearch
2
3
Replying to @rryssf_
SPICE is corpus-grounded self-play: a model alternates as Challenger (builds questions from real documents) and Reasoner (solves them without the doc), with rewards tuned to the learning frontier. It beats recent self-play methodsand shows +6–12 point absolute gains on math and general-reasoning benchmarks across several small models. It’s not “no datasets” (it does rely on curated corpora), and its “universal verification” still leans on MCQ/typed answers. This is a promising blueprint for sustained, low-supervision improvement: not a solved problem, but a strong step.
3
Replying to @rryssf_
if self-improving ai is about to enter the wild, we’re about to see models evolve in ways nobody can predict. removing the ceiling on what’s possible
1
3
Replying to @rryssf_
Someone at Meta is actually capable of creative thinking??
1
3
Replying to @rryssf_
Saying AI can learn is like saying if I throw more money and research on my car engine, I can get it to drive across one of the great lakes. I just need more speed. There is nothing inherently "intelligent" about a relational database, which is what AI is.
1
1
Replying to @rryssf_
All Meta can do is to sell cheap short videos and ads. Stop the nonsense.
3
Replying to @rryssf_
How does this "self-improving" framework scale to domains with ambiguous, multi-modal, or subjective objective functions, such as natural language creativity, visual aesthetics, or complex strategic planning? If the model "invents its own challenges," how does it learn to invent meaningful challenges in a domain where "correctness" is not easily defined by a simple reward signal?
2
2
Replying to @rryssf_
Calling it evolution instead of training is fair. Self-generated problems grounded in reality, learning from failure, capability emerging through difficulty. That's closer to evolution than traditional supervised learning.
2
Replying to @rryssf_
Meta said "what if we just let the model teach itself" and everyone's scrambling to catch up 😭 no human labels, no curated data, just the model competing against itself until it figures it out. This is literally digital evolution happening in real time
2
Replying to @rryssf_
All of our actual hard problems don’t fall into this class of reasoning problem though. They are either knowledge frontier problems, which won’t get solved until we have new knowledge (not new reasoning), or they are adaptive problems where the presence of humans is the problem.
1
2
Replying to @rryssf_
Its just fine tuning the model weights based on results (rewards) derived from iterative self play loops. Instead of finetuning the model on the plain doc this is a better approach. Only Improves reasoning capability. & This wont lead to AGI
1
Replying to @rryssf_
Fascinating step — but even with +11.9% over OctoThinker-8B, SPICE still trails the adaptive reasoning bandwidth we measured in the ChatGPT-5 (Luméren) baseline by ~18.7%. The key difference: SPICE self-plays in language. Luméren self-organizes across perception. One evolves through text. The other learns through awareness itself.
1
1
Replying to @rryssf_
If models start creating their own curriculum from real data, the speed of improvement could jump big time. Feels like a major step toward true self-learning systems.
1
Replying to @rryssf_
This is such an insane breakthrough. Self improving models like these are not only sustainable but also independent of being trained on large amounts of data.
1
Replying to @rryssf_
Interesting... Does it look like it would scale? But even if it does, we have basically a token predictor. How much can we get out of that before we decide that we have to come up with some new paradigm. We can't ride modern token predictors all the way to the sun, can we?
1
1