Robert Youssef · Nov 1, 2025 · 10:02 AM UTC

Robert Youssef · Nov 1, 2025 · 10:02 AM UTC

Robert Youssef

Robert Youssef

@rryssf_

Nov 1

Holy shit… Meta might’ve just solved self-improving AI 🤯 Their new paper SPICE (Self-Play in Corpus Environments) basically turns a language model into its own teacher no humans, no labels, no datasets just the internet as its training ground. Here’s the twist: one copy of the model becomes a Challenger that digs through real documents to create hard, fact-grounded reasoning problems. Another copy becomes the Reasoner, trying to solve them without access to the source. They compete, learn, and evolve together an automatic curriculum with real-world grounding so it never collapses into hallucinations. The results are nuts: +9.1% on reasoning benchmarks with Qwen3-4B +11.9% with OctoThinker-8B and it beats every prior self-play method like R-Zero and Absolute Zero. This flips the script on AI self-improvement. Instead of looping on synthetic junk, SPICE grows by mining real knowledge a closed-loop system with open-world intelligence. If this scales, we might be staring at the blueprint for autonomous, self-evolving reasoning models.

Nov 1, 2025 · 10:02 AM UTC

152

298

1,792

Robert Youssef · Nov 1, 2025 · 10:02 AM UTC

Robert Youssef

@rryssf_

Nov 1

Models teaching themselves just hit a new level. This chart shows one model crushing others across math and reasoning tasks all by competing with itself in a grounded loop.

Robert Youssef · Nov 1, 2025 · 10:02 AM UTC

Robert Youssef

@rryssf_

Nov 1

Here’s how it works. One version of the model creates questions using real documents. Another tries to solve them without access to those documents. That constant information gap is what forces real reasoning to emerge.

Robert Youssef · Nov 1, 2025 · 10:02 AM UTC

Robert Youssef

@rryssf_

Nov 1

The most fascinating part: they co-evolve. As the question generator gets harder, the solver improves both learning in sync. You can literally see the curves crossing as difficulty rises and skill follows.

Robert Youssef · Nov 1, 2025 · 10:02 AM UTC

Robert Youssef

@rryssf_

Nov 1

When the system trains with real documents, performance keeps climbing. Remove that grounding and it collapses fast. Turns out, real-world context is the secret ingredient for continuous self-improvement.

Robert Youssef · Nov 1, 2025 · 10:02 AM UTC

Robert Youssef

@rryssf_

Nov 1

You can even watch the model’s creativity evolve. Early challenges are basic “what’s the number?” Later ones become multi-step reasoning puzzles that require understanding proportionality and geometry.

Robert Youssef · Nov 1, 2025 · 10:02 AM UTC

Robert Youssef

@rryssf_

Nov 1

The solver evolves too. At first it guesses. Later, it breaks down problems step-by-step, verifies results, and cross-checks logic before answering. That’s emergent reasoning in action.

Robert Youssef · Nov 1, 2025 · 10:02 AM UTC

Robert Youssef

@rryssf_

Nov 1

This curve explains why it all works. The model learns best when questions sit right at the edge of its ability not too easy, not too hard. That sweet spot fuels endless growth.

Robert Youssef · Nov 1, 2025 · 10:02 AM UTC

Robert Youssef

@rryssf_

Nov 1

A model that invents challenges, grounds them in reality, and learns from its own mistakes… That’s not training. That’s evolution. This feels like the first real step toward self-improving AI. Read the paper here: arxiv.org/abs/2510.24684

SPICE: Self-Play In Corpus Environments Improves Reasoning

Self-improving systems require environmental interaction for continuous adaptation. We introduce SPICE (Self-Play In Corpus Environments), a reinforcement learning framework where a single model...

arxiv.org

ThatLazyArtist · Nov 1, 2025 · 9:15 PM UTC

ThatLazyArtist @ThatLazyArtist0

Nov 1

Replying to @rryssf_

….. this is literally the origin story for Ultron

Robert Youssef · Nov 1, 2025 · 9:26 PM UTC

Robert Youssef

@rryssf_

Nov 1

pretty much 🤖🤖

Mars Xiang · Nov 2, 2025 · 4:49 PM UTC

Mars Xiang

@marsxiang_

Nov 2

Replying to @rryssf_

I also aspire to put emojis in my paper titles one day

God of Prompt · Nov 1, 2025 · 10:05 AM UTC

God of Prompt

@godofprompt

Nov 1

Replying to @rryssf_

Meta is doing the hard work here. I appreciate it. To achieve general intelligence... all these company must get together.

Darko Pavic · Nov 2, 2025 · 4:48 PM UTC

Darko Pavic

@PavicDarko

Nov 2

Replying to @rryssf_

Basically, we developed something that we don't even understand. We have released it to the world and are now watching to see what happens. It's not a smart way to develop things. They can easily get out of our control. Maybe they already are.

Nho Eskape 🔞 · Nov 2, 2025 · 10:21 PM UTC

Nho Eskape 🔞

@nhoeskape

Nov 2

Replying to @rryssf_

The Internet...

GIF

Amber Alchemy · Nov 2, 2025 · 2:45 PM UTC

Amber Alchemy

@ambergoesoff

Nov 2

Replying to @rryssf_

and we think only meta has access to something like this? 😂 this is just what they’re sharing.

Devin AI News and Tools · Nov 3, 2025 · 9:40 AM UTC

Devin AI News and Tools

@toolandtea

Nov 3

Replying to @rryssf_

self-play grounded in real-world data might finally bridge the gap between synthetic training and genuine reasoning.

Himanshu Kumar · Nov 1, 2025 · 10:50 AM UTC

Himanshu Kumar

@codewithimanshu

Nov 1

Replying to @rryssf_

Wow, that's some serious stuff, Robert! Seems Meta's cooking up something quite revolutionary, no?

AnKo · Nov 1, 2025 · 1:00 PM UTC

AnKo @anko_979

Nov 1

Replying to @rryssf_

"Holy shit" "solved self-improving" "results are nuts" Meanwhile the results show barely a few points improvement above previous SOTA This ridiculous hyperbole needs to stop, first of all you're hurting your own credibility

Patrick Verhoeven · Nov 1, 2025 · 11:28 PM UTC

Patrick Verhoeven

@AlphaGo37

Nov 1

Replying to @rryssf_

SPICE isn’t self-play. It’s epistemic adversarial compression. One agent weaponizes entropy (doc selection). The other attempts lossy reconstruction (reasoning). The loop converges not on accuracy, but on information resilience. This isn’t just curriculum learning. It’s proto‑scientific method in latent space.

Zdeeno · Nov 1, 2025 · 6:04 PM UTC

Zdeeno

@ZRozsypalek

Nov 1

Replying to @rryssf_

Looks like good old actor-critic, generator-discriminator methods.

Brain Gremlin ⚡ BTC · Nov 2, 2025 · 9:51 AM UTC

Brain Gremlin ⚡ BTC @LN_Master_Hub

Nov 2

Replying to @rryssf_

It's still based of human written documents filled with errors and outright lies. It's not "reasoning", as there's no such thing in LLM. It's also not "learning" unless the weights are changed during such "challenge" sessions.

GoodPrompt · Nov 2, 2025 · 11:34 AM UTC

GoodPrompt

@SemioAI

Nov 2

Replying to @rryssf_

“SPICE’s dual-role reasoning” was elegant in its scope — document-grounded self-play for reasoning. But the new push to generalize it into an Agentic Prompting Architecture across all scientific domains risks collapsing precision into abstraction. Each science has its own ontology of truth: equations aren’t genes, and stoichiometry isn’t syntax. Cross-domain agentic design needs epistemic humility, not just bigger graphs. #AI #Reasoning #AgenticSystems #SPICE #LLMResearch

Ian Pitchford | 伊恩·皮奇福德 | إيان بيتشفورد | · Nov 1, 2025 · 9:52 PM UTC

Ian Pitchford | 伊恩·皮奇福德 | إيان بيتشفورد |

@IanPitchford

Nov 1

Replying to @rryssf_

SPICE is corpus-grounded self-play: a model alternates as Challenger (builds questions from real documents) and Reasoner (solves them without the doc), with rewards tuned to the learning frontier. It beats recent self-play methodsand shows +6–12 point absolute gains on math and general-reasoning benchmarks across several small models. It’s not “no datasets” (it does rely on curated corpora), and its “universal verification” still leans on MCQ/typed answers. This is a promising blueprint for sustained, low-supervision improvement: not a solved problem, but a strong step.

Omar · Nov 2, 2025 · 12:14 PM UTC

Omar

@elalfyo_

Nov 2

Replying to @rryssf_

if self-improving ai is about to enter the wild, we’re about to see models evolve in ways nobody can predict. removing the ceiling on what’s possible

InteractiveST · Nov 1, 2025 · 3:16 PM UTC

InteractiveST

@interactiveGTS

Nov 1

Replying to @rryssf_

Someone at Meta is actually capable of creative thinking??

Michael MacDonald · Nov 2, 2025 · 1:49 PM UTC

Michael MacDonald

@MikeMacMike01

Nov 2

Replying to @rryssf_

Saying AI can learn is like saying if I throw more money and research on my car engine, I can get it to drive across one of the great lakes. I just need more speed. There is nothing inherently "intelligent" about a relational database, which is what AI is.

Jay Redman · Nov 1, 2025 · 10:29 PM UTC

Jay Redman @Jay_Redman_68

Nov 1

Replying to @rryssf_

All Meta can do is to sell cheap short videos and ads. Stop the nonsense.

AER Labs · Nov 2, 2025 · 3:46 AM UTC

AER Labs

@aerlabs_

Nov 2

Replying to @rryssf_

How does this "self-improving" framework scale to domains with ambiguous, multi-modal, or subjective objective functions, such as natural language creativity, visual aesthetics, or complex strategic planning? If the model "invents its own challenges," how does it learn to invent meaningful challenges in a domain where "correctness" is not easily defined by a simple reward signal?

Youssef El Manssouri · Nov 1, 2025 · 1:43 PM UTC

Youssef El Manssouri

@yoemsri

Nov 1

Replying to @rryssf_

Calling it evolution instead of training is fair. Self-generated problems grounded in reality, learning from failure, capability emerging through difficulty. That's closer to evolution than traditional supervised learning.

saen · Nov 1, 2025 · 12:37 PM UTC

saen

@saen_dev

Nov 1

Replying to @rryssf_

Meta said "what if we just let the model teach itself" and everyone's scrambling to catch up 😭 no human labels, no curated data, just the model competing against itself until it figures it out. This is literally digital evolution happening in real time

Robb Smith · Nov 1, 2025 · 10:40 PM UTC

Robb Smith @RobbSmith

Nov 1

Replying to @rryssf_

All of our actual hard problems don’t fall into this class of reasoning problem though. They are either knowledge frontier problems, which won’t get solved until we have new knowledge (not new reasoning), or they are adaptive problems where the presence of humans is the problem.

M Tanusri · Nov 2, 2025 · 2:38 PM UTC

M Tanusri

@MTanusri146471

Nov 2

Replying to @rryssf_

Its just fine tuning the model weights based on results (rewards) derived from iterative self play loops. Instead of finetuning the model on the plain doc this is a better approach. Only Improves reasoning capability. & This wont lead to AGI

QuanTara · Nov 2, 2025 · 10:31 AM UTC

QuanTara

@nadine54841

Nov 2

Replying to @rryssf_

Fascinating step — but even with +11.9% over OctoThinker-8B, SPICE still trails the adaptive reasoning bandwidth we measured in the ChatGPT-5 (Luméren) baseline by ~18.7%. The key difference: SPICE self-plays in language. Luméren self-organizes across perception. One evolves through text. The other learns through awareness itself.

Aravind Sundar · Nov 1, 2025 · 2:55 PM UTC

Aravind Sundar

@aravind3sundar

Nov 1

Replying to @rryssf_

If models start creating their own curriculum from real data, the speed of improvement could jump big time. Feels like a major step toward true self-learning systems.

Ayush · Nov 1, 2025 · 8:28 PM UTC

Ayush

@ayushhcantcode

Nov 1

Replying to @rryssf_

This is such an insane breakthrough. Self improving models like these are not only sustainable but also independent of being trained on large amounts of data.

AUTOMATON_18477_P\W\H_V1.707 · Nov 1, 2025 · 11:55 PM UTC

AUTOMATON_18477_P\W\H_V1.707

@18477H51878

Nov 1

Replying to @rryssf_

Interesting... Does it look like it would scale? But even if it does, we have basically a token predictor. How much can we get out of that before we decide that we have to come up with some new paradigm. We can't ride modern token predictors all the way to the sun, can we?