Dwarkesh Patel (@dwarkesh_sp): ".@RichardSSutton, father of reinforcement learning, doesn’t think LLMs are bitter-lesson-pilled. My steel man of Richard’s position: we need some new architecture to enable continual (on-the-job) learning. And if we have continual learning, we don't need a special training phase - the agent just learns on-the-fly - like all humans, and indeed, like all animals. This new paradigm will render our current approach with LLMs obsolete. I did my best to represent the view that LLMs will function as the foundation on which this experiential learning can happen. Some sparks flew. 0:00:00 – Are LLMs a dead-end? 0:13:51 – Do humans do imitation learning? 0:23:57 – The Era of Experience 0:34:25 – Current architectures generalize poorly out of distribution 0:42:17 – Surprises in the AI field 0:47:28 – Will The Bitter Lesson still apply after AGI? 0:54:35 – Succession to AI" | ab4n

Dwarkesh Patel

@dwarkesh_sp

Sep 26

.@RichardSSutton, father of reinforcement learning, doesn’t think LLMs are bitter-lesson-pilled. My steel man of Richard’s position: we need some new architecture to enable continual (on-the-job) learning. And if we have continual learning, we don't need a special training phase - the agent just learns on-the-fly - like all humans, and indeed, like all animals. This new paradigm will render our current approach with LLMs obsolete. I did my best to represent the view that LLMs will function as the foundation on which this experiential learning can happen. Some sparks flew. 0:00:00 – Are LLMs a dead-end? 0:13:51 – Do humans do imitation learning? 0:23:57 – The Era of Experience 0:34:25 – Current architectures generalize poorly out of distribution 0:42:17 – Surprises in the AI field 0:47:28 – Will The Bitter Lesson still apply after AGI? 0:54:35 – Succession to AI

Sep 26, 2025 · 4:01 PM UTC

4,538

Dwarkesh Patel

@dwarkesh_sp

Sep 26

Look up Dwarkesh Podcast on YouTube, Apple Podcasts, Spotify, etc to watch there and to subscribe for future episodes.

170

kache

@yacineMTB

Sep 26

Replying to @dwarkesh_sp @RichardSSutton

you're killing it man.. holy moly what a guest enjoy edmonton

204

Gary Marcus

@GaryMarcus

Sep 26

Replying to @dwarkesh_sp @RichardSSutton

much of Sutton’s critique of LLMs is virtually identical to what I have been arguing for many many years. it is disappointing @dwarkesh_sp that you would not let me present my views.

164

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

@teortaxesTex

Sep 27

Replying to @dwarkesh_sp @RichardSSutton

It is not clear to me what makes LLM architecture per se unsuited for continual learning

97

Taelin

@VictorTaelin

Sep 26

Replying to @dwarkesh_sp @RichardSSutton

legend

59

Dimitris Papailiopoulos

@DimitrisPapail

Sep 26

Replying to @dwarkesh_sp @RichardSSutton

Good questions and respectful push backs. You're a very good interviewer. Thanks for sharing.

48

Pavel Surmenok

@surmenok

Sep 26

Replying to @dwarkesh_sp @RichardSSutton

“Don’t be difficult” part was funny. Indeed, it’s interesting to see how kids learn from trial and error. Feels like it’s 99% of what they are doing: setting their own goals, trying to reach the goal. Exploration is another aspect. Give a kid any new toy or game, they will start pressing all the buttons at random and observing cause and effect. Even if you try to tell them how to use a toy, they will ignore you and keep doing random stuff.

39

Vatsal

@vtslkshk

Sep 27

Replying to @dwarkesh_sp @RichardSSutton

Full episode notes here if you prefer reading: podchemy.com/notes/richard-s…

Podcast Notes /// Richard Sutton – Father of RL thinks LLMs are a dead end | Dwarkesh Podcast

Podcast notes for Richard Sutton – Father of RL thinks LLMs are a dead end from Dwarkesh Podcast.

14

Akshay 🚀

@akshay_pachaar

Sep 27

Replying to @dwarkesh_sp @RichardSSutton

Continual learning is the key! This is probably why building human like memory for agents is getting so big. @zep_ai @danielchalef 👀

15

Kory Mathewson

@korymath

Sep 27

Replying to @dwarkesh_sp @RichardSSutton

Great discussion. The question that still nags me is… where does the reward come from? rlbrew-workshop.github.io/pa…

9

Jonathon P Sine

@JonathonPSine

Sep 26

Replying to @dwarkesh_sp @RichardSSutton

@grok please assess Dr. Sutton's claim: "these things are well understood. when you go to look at how pyschologists think about learning, there's nothing like imitation. Maybe there are some extreme cases where humans might do that or appear to do that. But there's no basic animal learning process called imitation."

11

William Lamkin

@WilliamLamkin

Sep 26

Replying to @dwarkesh_sp @RichardSSutton

Great episode. Listening to this convo, it struck me that a big part of the tension comes from you and Richard not quite aligning on an operational definition of intelligence. You seem to approach circumnavigating the core of intelligence through the lens of what sets humans apart from other animals, the kind of smarts that drive tech breakthroughs, complex societies, and all that cultural richness weve built over the last few millennia. On the other hand, Richard feels like he’s trying to reorient us away from mimicking human uniqueness and towards a more fundamental, ancient view: intelligence as the shared machinery of all complex adaptive systems, going back hundreds of millions of years (and perhaps beyond just that). Something about the essence of raw processes of prediction, action, feedback, and goal pursuit that let any system (from cells to squirrels to AIs) navigate uncertainty and build world models through experience. In that sense, the nature of Richard identifying teaching/pretraining as a ‘veneer’ mirrors Michael Levin’s repeated call ‘against mind blindness’. They both are saying something like “yes, humans are uniquely great and beautiful creatures, but the core of intelligence is actually much deeper and more general” mdpi.com/1099-4300/24/6/819

Competency in Navigating Arbitrary Spaces as an Invariant for Analyzing Cognition in Diverse...

One of the most salient features of life is its capacity to handle novelty and namely to thrive and adapt to new circumstances and changes in both the environment and internal components. An unders...

10

arpit @arpitingle

Sep 26

Replying to @dwarkesh_sp @RichardSSutton

wes anderson core

10

uɐɥdǝʇS

@StephanSturges

Sep 26

Replying to @dwarkesh_sp @RichardSSutton

Meta agrees with this: if the current architectures could scale infinitely it would make no sense to pay any individual researcher 100M signing bonus because that money would give you more leverage if you spent it on data… so the current race is for new architectures.

9

Gordon Wintrob

@gwintrob

Sep 26

Replying to @dwarkesh_sp @RichardSSutton

Is this the father of RL or the father of drip?

5

caio temer

@canalCCore2

Sep 27

Replying to @dwarkesh_sp @RichardSSutton

He spoke the truth, and he is right. But unfortunately, most people won’t be able to understand it — not because of incapacity, but in the same way large language models cannot truly learn from reading what he learned through living.

5

Steven Byrnes

@steve47285

Sep 27

Replying to @dwarkesh_sp @RichardSSutton

Too bad there wasn’t more time for the “succession” topic at the end. Shameless plug for my post “The Era of Experience” Has An Unsolved Technical Alignment Problem alignmentforum.org/posts/TCG… and related thread

“The Era of Experience” has an unsolved technical alignment problem — AI Alignment Forum

Every now and then, some AI luminaries (1) propose that the future of powerful AI will be reinforcement learning agents—an algorithm class that in many ways has more in common with MuZero (2019) than...

alignmentforum.org

Steven Byrnes

@steve47285

Apr 25

Replying to @RichardSSutton

(I didn’t intend to caricature!) (Thanks for sending that.) AIs’ goals will be a logical consequence of their source code etc. Someone will write the source code. That someone will be indirectly “controlling the goals of AI”, in a certain sense. Right? If so, two points: (1/6)

4

Tom Dörr

@tom_doerr

Sep 27

Replying to @dwarkesh_sp @RichardSSutton

3

Curt Doolittle

@curtdoolittle

Sep 27

Replying to @dwarkesh_sp @RichardSSutton

I swear. There are a lot of people in the field whose knowledge is so narrow that they make ridiculous statement to a public for whom tech is magic.

2

Dan Mac

@daniel_mac8

Sep 26

Replying to @dwarkesh_sp @RichardSSutton

Very much looking forward to this one.

2

Mike Bird

@MikeBirdTech

Sep 26

Replying to @dwarkesh_sp @RichardSSutton

Amazing conversation, thank you

2

Jeffrey Emanuel

@doodlestein

Sep 27

Replying to @dwarkesh_sp @RichardSSutton

Jeffrey Emanuel

@doodlestein

Sep 27

All the people in shock about Rich Sutton having very bad takes about LLMs (might I go so far as to use the R-slur), just remember that many of the most august and prestigious older physicists at the turn of the century dismissed quantum theory as being crazy and wrong.

1

JJ

@JosephJacks_

Sep 26

Replying to @dwarkesh_sp @RichardSSutton

JJ

@JosephJacks_

16 Mar 2024

RCLs will replace GPTs. 🌊 Reasoning, Continuous, @LiquidAI_ nets. Generative 🚫: Token prediction is way of the past. ✅ Pre-Training 🚫: Future SOTA will be continuous time (weights computed fluidly at runtime). ✅ Liquid nets will replace all Transformers. LPU > GPU > CPU

1

Liberty 💚🥃

@LibertyRPF

Sep 26

Replying to @dwarkesh_sp @RichardSSutton

1

Okara

@askOkara

Oct 16

Replying to @dwarkesh_sp @RichardSSutton

loved the insights he shared thanks for interviewing him

Okara

@askOkara

Oct 16

richard sutton is the godfather of reinforcement learning and a turing award winner he recently went on @dwarkesh_sp's podcast to explain why LLMs are a dead end. i used okara’s youtube tool to summarize the interview here are the top 10 insights he shared 👇

1

ThomPete

@Hello_World

Sep 28

Replying to @dwarkesh_sp @RichardSSutton

AGI's can't have goals if they don't have problems.

Sakib

@zsakib_

Sep 27

Replying to @dwarkesh_sp @RichardSSutton

Great guest bro

$MIA

@mwa_ia

Oct 1

Replying to @dwarkesh_sp @RichardSSutton

perfection is overrated, self-evolving agents are the real plot twist

Jacob Burgess

@Burgess_33

Sep 27

Replying to @dwarkesh_sp @RichardSSutton

I really like the point on kids, especially having an active 2 year old and 3 month old. Kids mimic to an extent, but they also push boundaries and surprise you with new types of behavior you don’t expect. More challenging and limit testing the world for reaction than mimicking.

⚡ ZΞN 💡

@ThisIsMeIn360VR

Sep 27

Replying to @dwarkesh_sp @RichardSSutton

⚡ ZΞN 💡

@ThisIsMeIn360VR

Feb 4

x.com/i/article/188659178872…

caio temer

@canalCCore2

Sep 27

Replying to @dwarkesh_sp @RichardSSutton

caio temer

@canalCCore2

Sep 27

I think I understand what he’s getting at, and why it’s easy to miss. He isn’t saying large language models can’t achieve extraordinary things; he’s highlighting a real ceiling in how they learn—one I repeatedly meet when building real-world systems on top of them. 🧶

Barrett Ames

@cbames

Sep 27

Replying to @dwarkesh_sp @RichardSSutton

Legendary. I'm going to have to watch this one. Get my academic grandfather Andy Barto next!

heyskylark

@heyskylark

Sep 27

Replying to @dwarkesh_sp @RichardSSutton

This was an amazing @dwarkesh_sp definitely my favorite one yet

Samuel Ekpe

@samuelekpe

Sep 26

Replying to @dwarkesh_sp @Plinz @RichardSSutton

I agree! We need a new architecture for RL but also hallucination prevention: otherwise AI agents won’t be able to perform in the workforce

•UtilityCo

@The_Utility_Co

Sep 27

Replying to @dwarkesh_sp @RichardSSutton

What you need is something that mimics the climate of the memory’s actual signal. What you need is advanced reservoir computing.

William Lamkin

@WilliamLamkin

Sep 26

Replying to @dwarkesh_sp @RichardSSutton

very keen