RIP fine-tuning ☠️ This new Stanford paper just killed it. It’s called 'Agentic Context Engineering (ACE)' and it proves you can make models smarter without touching a single weight. Instead of retraining, ACE evolves the context itself. The model writes, reflects, and edits its own prompt over and over until it becomes a self-improving system. Think of it like the model keeping a growing notebook of what works. Each failure becomes a strategy. Each success becomes a rule. The results are absurd: +10.6% better than GPT-4–powered agents on AppWorld. +8.6% on finance reasoning. 86.9% lower cost and latency. No labels. Just feedback. Everyone’s been obsessed with “short, clean” prompts. ACE flips that. It builds long, detailed evolving playbooks that never forget. And it works because LLMs don’t want simplicity, they want *context density. If this scales, the next generation of AI won’t be “fine-tuned.” It’ll be self-tuned. We’re entering the era of living prompts.

Oct 9, 2025 · 12:52 PM UTC

Here’s how ACE works 👇 It splits the model’s brain into 3 roles: Generator - runs the task Reflector - critiques what went right or wrong Curator - updates the context with only what matters Each loop adds delta updates small context changes that never overwrite old knowledge. It’s literally the first agent framework that grows its own prompt.
7
11
1
231
Every prior method had one fatal flaw: context collapse. Models rewrite their entire prompt each time → it gets shorter → details vanish → accuracy tanks. In the paper, one model’s accuracy fell from 66.7 → 57.1 after a single rewrite. ACE fixes that by never rewriting the full context - only updating what changed.
2
3
107
The numbers are ridiculous. ACE beat every major baseline: +10.6% on AppWorld (agents) +8.6% on FiNER (finance) and matched GPT-4.1–powered IBM CUGA, using a smaller open-source model. And it cut rollout latency by 86.9% while lowering cost 80%.
2
2
76
Fine-tuning updates weights. ACE updates understanding. It’s cheaper, interpretable, and reversible. You can literally watch how your AI learns, one context delta at a time. This is the start of agentic self-learning where prompts become the new model weights.
1
2
73
ACE points to a wild future: AI systems that don’t just reason they remember. Instead of retraining models, we’ll train contexts. Each system carries a living memory that evolves across sessions, domains, and users. The next breakthroughs won’t come from bigger models… They’ll come from smarter context architectures.
3
2
60
The AI prompt library your competitors don't want you to find → Biggest collection of text & image prompts → Unlimited custom prompts → Lifetime access & updates Grab it before it's gone 👇 godofprompt.ai/pricing
5
1
28
Replying to @rryssf_
Evolving contexts, system prompts, adapters. This is the way.
1
2
21
this is the way
1
9
Replying to @rryssf_
It's a great method but orthogonal to fine-tuning. However, fine-tuning will always be necessary to reduce the number of tokens needed.
4
1
96
Replying to @rryssf_
This is why I told writers they were made for AI. •
1
3
Replying to @rryssf_
oh wow! it feels like the natural evolution of prompt engineering into a self-adaptive system
4
Replying to @rryssf_
Context is everything.
1
1
Replying to @rryssf_
stanford is not stopping
1
3
Replying to @rryssf_
But finetuning has been obsolete for a while now, since LLMs become smarter and smarter. Made sense still maybe a year ago.
2
Replying to @rryssf_
this is now a breakthrough
2
Replying to @rryssf_
Nice! 💯
2
Replying to @rryssf_
Context engineering and model fine-tuning address distinct challenges and operate under different sets of constraints.
1
Replying to @rryssf_
on the same shit ur on @jicapal
1
1
Replying to @rryssf_
Always wary of "RIP" levels but I think this is a great write up. ✨🤘
Replying to @rryssf_
This is DSPy, nothing new, but yes, extremely useful.
Replying to @rryssf_
This is exactly the magic of DSPy, nothing new.
Replying to @rryssf_
This is a method of fine-tuning in itself, is it not?
Replying to @rryssf_
That is not the whole story. While the study correctly highlights the importance of refining inputs, this is not innovative and amounts to simple prompt engineering. What is truly revealing is that fine-tuning is shown to be ineffective and misguided. What the study overlooks is the mistaken belief that prompting alone is sufficient. Welcome to the AI medieval era, where ignorance and myths fill nearly every research paper. ai-cosmos.hashnode.dev/the-r…
Replying to @rryssf_
SambaNova team, lets go!!! 🦾 💜
Replying to @rryssf_
Self-improving prompts could redefine efficiency in AI deployment.
Replying to @rryssf_
Thank god! I think fine tuning has resulted in all the extra information in these responses that the prompt didn’t ask for. gpt 5 thinking is def over tuned
Replying to @rryssf_
So Mira is wrong?
Today we launched Tinker. Tinker brings frontier tools to researchers, offering clean abstractions for writing experiments and training pipelines while handling distributed training complexity. It enables novel research, custom models, and solid baselines. Excited to see what people build.
Replying to @rryssf_
killer paper love the innovation
Replying to @rryssf_
Making models smarter just by improving context does not mean there isn't value in improving the model for your needs. "RIP fine-tuning" completely missed the point.
1
46
Replying to @rryssf_
>RIP fine-tuning ☠️ A little excessive, no?
1
33
Replying to @rryssf_
It goes a lot deeper than the prompts- check out GEPA and DSPy! We’re working to streamline this whole self-improvement feedback loop in an open ecosystem @modaicdev
2
29
Replying to @rryssf_
Maybe I’m just dumb, but wouldn’t this over time stop working because of the context window?
2
12