Robert Youssef · Oct 9, 2025 · 12:52 PM UTC

Robert Youssef · Oct 9, 2025 · 12:52 PM UTC

Robert Youssef

Robert Youssef

@rryssf_

Oct 9

RIP fine-tuning ☠️ This new Stanford paper just killed it. It’s called 'Agentic Context Engineering (ACE)' and it proves you can make models smarter without touching a single weight. Instead of retraining, ACE evolves the context itself. The model writes, reflects, and edits its own prompt over and over until it becomes a self-improving system. Think of it like the model keeping a growing notebook of what works. Each failure becomes a strategy. Each success becomes a rule. The results are absurd: +10.6% better than GPT-4–powered agents on AppWorld. +8.6% on finance reasoning. 86.9% lower cost and latency. No labels. Just feedback. Everyone’s been obsessed with “short, clean” prompts. ACE flips that. It builds long, detailed evolving playbooks that never forget. And it works because LLMs don’t want simplicity, they want *context density. If this scales, the next generation of AI won’t be “fine-tuned.” It’ll be self-tuned. We’re entering the era of living prompts.

Oct 9, 2025 · 12:52 PM UTC

237

1,205

182

7,952

Robert Youssef · Oct 9, 2025 · 12:52 PM UTC

Robert Youssef

@rryssf_

Oct 9

Here’s how ACE works 👇 It splits the model’s brain into 3 roles: Generator - runs the task Reflector - critiques what went right or wrong Curator - updates the context with only what matters Each loop adds delta updates small context changes that never overwrite old knowledge. It’s literally the first agent framework that grows its own prompt.

231

Robert Youssef · Oct 9, 2025 · 12:52 PM UTC

Robert Youssef

@rryssf_

Oct 9

Every prior method had one fatal flaw: context collapse. Models rewrite their entire prompt each time → it gets shorter → details vanish → accuracy tanks. In the paper, one model’s accuracy fell from 66.7 → 57.1 after a single rewrite. ACE fixes that by never rewriting the full context - only updating what changed.

107

Robert Youssef · Oct 9, 2025 · 12:52 PM UTC

Robert Youssef

@rryssf_

Oct 9

The numbers are ridiculous. ACE beat every major baseline: +10.6% on AppWorld (agents) +8.6% on FiNER (finance) and matched GPT-4.1–powered IBM CUGA, using a smaller open-source model. And it cut rollout latency by 86.9% while lowering cost 80%.

Robert Youssef · Oct 9, 2025 · 12:53 PM UTC

Robert Youssef

@rryssf_

Oct 9

Fine-tuning updates weights. ACE updates understanding. It’s cheaper, interpretable, and reversible. You can literally watch how your AI learns, one context delta at a time. This is the start of agentic self-learning where prompts become the new model weights.

Robert Youssef · Oct 9, 2025 · 12:53 PM UTC

Robert Youssef

@rryssf_

Oct 9

ACE points to a wild future: AI systems that don’t just reason they remember. Instead of retraining models, we’ll train contexts. Each system carries a living memory that evolves across sessions, domains, and users. The next breakthroughs won’t come from bigger models… They’ll come from smarter context architectures.

Robert Youssef · Oct 9, 2025 · 12:53 PM UTC

Robert Youssef

@rryssf_

Oct 9

Read the full paper: arxiv.org/abs/2510.04618

Agentic Context Engineering: Evolving Contexts for Self-Improving...

Large language model (LLM) applications such as agents and domain-specific reasoning increasingly rely on context adaptation -- modifying inputs with instructions, strategies, or evidence, rather...

arxiv.org

108

Robert Youssef · Oct 9, 2025 · 8:38 PM UTC

Robert Youssef

@rryssf_

Oct 9

The AI prompt library your competitors don't want you to find → Biggest collection of text & image prompts → Unlimited custom prompts → Lifetime access & updates Grab it before it's gone 👇 godofprompt.ai/pricing

Pricing - God of Prompt

Explore all of our products and pricing for the biggest collection of AI resources on the internet at God of Prompt!

godofprompt.ai

Benedikt Koehler · Oct 9, 2025 · 9:32 PM UTC

Benedikt Koehler

@furukama

Oct 9

Replying to @rryssf_

Evolving contexts, system prompts, adapters. This is the way.

Robert Youssef · Oct 9, 2025 · 9:44 PM UTC

Robert Youssef

@rryssf_

Oct 9

this is the way

Minh Nhat Nguyen · Oct 9, 2025 · 8:39 PM UTC

Minh Nhat Nguyen

@menhguin

Oct 9

Replying to @rryssf_

109

Christian Szegedy · Oct 9, 2025 · 7:24 PM UTC

Christian Szegedy

@ChrSzegedy

Oct 9

Replying to @rryssf_

It's a great method but orthogonal to fine-tuning. However, fine-tuning will always be necessary to reduce the number of tokens needed.

Kirk Patrick Miller · Oct 9, 2025 · 9:14 PM UTC

Kirk Patrick Miller

@Chaos2Cured

Oct 9

Replying to @rryssf_

This is why I told writers they were made for AI. •

Arindam Majumder 𝕏 · Oct 9, 2025 · 5:37 PM UTC

Arindam Majumder 𝕏

@Arindam_1729

Oct 9

Replying to @rryssf_

oh wow! it feels like the natural evolution of prompt engineering into a self-adaptive system

Pawel Czech · Oct 9, 2025 · 9:03 PM UTC

Pawel Czech

@czech_pawel

Oct 9

Replying to @rryssf_

Context is everything.

God of Prompt · Oct 9, 2025 · 12:54 PM UTC

God of Prompt

@godofprompt

Oct 9

Replying to @rryssf_

stanford is not stopping

Piotr Pomorski · Oct 9, 2025 · 8:57 PM UTC

Piotr Pomorski

@PtrPomorski

Oct 9

Replying to @rryssf_

But finetuning has been obsolete for a while now, since LLMs become smarter and smarter. Made sense still maybe a year ago.

Alex Prompter · Oct 9, 2025 · 12:54 PM UTC

Alex Prompter

@alex_prompter

Oct 9

Replying to @rryssf_

this is now a breakthrough

Dan · Oct 10, 2025 · 9:20 PM UTC

Dan @DanVitorPH

Oct 10

Replying to @rryssf_

Nice! 💯

Thread Reader App · Oct 9, 2025 · 6:52 PM UTC

Thread Reader App

@threadreaderapp

Oct 9

Replying to @rryssf_

Your thread is very popular today! #TopUnroll threadreaderapp.com/thread/1… 🙏🏼@StartupYou for 🥇unroll

Thread by @rryssf_ on Thread Reader App

@rryssf_: RIP fine-tuning ☠️ This new Stanford paper just killed it. It’s called 'Agentic Context Engineering (ACE)' and it proves you can make models smarter without touching a single weight....

threadreaderapp.com

Artem Andreenko · Oct 9, 2025 · 7:48 PM UTC

Artem Andreenko

@miolini

Oct 9

Replying to @rryssf_

Context engineering and model fine-tuning address distinct challenges and operate under different sets of constraints.

ELLIE X · Oct 10, 2025 · 11:58 PM UTC

ELLIE X @heyellieday

Oct 10

Replying to @rryssf_

on the same shit ur on @jicapal

chase adams · Oct 9, 2025 · 7:04 PM UTC

chase adams

@curiouslychase

Oct 9

Replying to @rryssf_

Always wary of "RIP" levels but I think this is a great write up. ✨🤘

Muratcan Koylan · Oct 10, 2025 · 1:50 PM UTC

Muratcan Koylan

@koylanai

Oct 10

Replying to @rryssf_

This is DSPy, nothing new, but yes, extremely useful.

The Canaanite · Oct 10, 2025 · 10:19 PM UTC

The Canaanite

@mysticaltech

Oct 10

Replying to @rryssf_

This is exactly the magic of DSPy, nothing new.

Donny Solana · Oct 9, 2025 · 10:52 PM UTC

Donny Solana

@DonnySolana

Oct 9

Replying to @rryssf_

This is a method of fine-tuning in itself, is it not?

Gerard Sans | Axiom 🇬🇧 · Oct 10, 2025 · 12:16 PM UTC

Gerard Sans | Axiom 🇬🇧

@gerardsans

Oct 10

Replying to @rryssf_

That is not the whole story. While the study correctly highlights the importance of refining inputs, this is not innovative and amounts to simple prompt engineering. What is truly revealing is that fine-tuning is shown to be ineffective and misguided. What the study overlooks is the mistaken belief that prompting alone is sufficient. Welcome to the AI medieval era, where ignorance and myths fill nearly every research paper. ai-cosmos.hashnode.dev/the-r…

The RL Fairytale: OpenAI's Unsteady Grounds

Explore why OpenAI's "reasoning" advances may be more fairy tale than fact, as foundational issues with RL persist in AI development

ai-cosmos.hashnode.dev

SambaNova · Oct 13, 2025 · 4:30 PM UTC

SambaNova

@SambaNovaAI

Oct 13

Replying to @rryssf_

SambaNova team, lets go!!! 🦾 💜

Devin AI News and Tools · Oct 10, 2025 · 11:18 AM UTC

Devin AI News and Tools

@toolandtea

Oct 10

Replying to @rryssf_

Self-improving prompts could redefine efficiency in AI deployment.

Josh · Oct 10, 2025 · 6:21 AM UTC

Josh

@joshvancuyck

Oct 10

Replying to @rryssf_

Thank god! I think fine tuning has resulted in all the extra information in these responses that the prompt didn’t ask for. gpt 5 thinking is def over tuned

Trace Vertical Ai Cohen · Oct 9, 2025 · 8:58 PM UTC

Trace Vertical Ai Cohen

@Trace_Cohen

Oct 9

Replying to @rryssf_

So Mira is wrong?

Mira Murati

@miramurati

Oct 1

Today we launched Tinker. Tinker brings frontier tools to researchers, offering clean abstractions for writing experiments and training pipelines while handling distributed training complexity. It enables novel research, custom models, and solid baselines. Excited to see what people build.

ag · Oct 10, 2025 · 5:15 AM UTC

@theagnft

Oct 10

Replying to @rryssf_

killer paper love the innovation

Ward Plunet · Oct 9, 2025 · 6:52 PM UTC

Ward Plunet

@StartupYou

Oct 9

Replying to @rryssf_

@threadreaderapp please #unroll

Len White · Oct 9, 2025 · 10:36 PM UTC

Len White

@LenSeaside

Oct 9

Replying to @rryssf_

Making models smarter just by improving context does not mean there isn't value in improving the model for your needs. "RIP fine-tuning" completely missed the point.

Francisco Javier Arceo · Oct 9, 2025 · 7:40 PM UTC

Francisco Javier Arceo

@franciscojarceo

Oct 9

Replying to @rryssf_

>RIP fine-tuning ☠️ A little excessive, no?

Farouk · Oct 9, 2025 · 6:36 PM UTC

Farouk

@FaroukAdeleke3

Oct 9

Replying to @rryssf_

It goes a lot deeper than the prompts- check out GEPA and DSPy! We’re working to streamline this whole self-improvement feedback loop in an open ecosystem @modaicdev

Michael Branconier · Oct 9, 2025 · 11:23 PM UTC

Michael Branconier

@mike_branc

Oct 9

Replying to @rryssf_

Maybe I’m just dumb, but wouldn’t this over time stop working because of the context window?