.@RichardSSutton, father of reinforcement learning, doesn’t think LLMs are bitter-lesson-pilled. My steel man of Richard’s position: we need some new architecture to enable continual (on-the-job) learning. And if we have continual learning, we don't need a special training phase - the agent just learns on-the-fly - like all humans, and indeed, like all animals. This new paradigm will render our current approach with LLMs obsolete. I did my best to represent the view that LLMs will function as the foundation on which this experiential learning can happen. Some sparks flew. 0:00:00 – Are LLMs a dead-end? 0:13:51 – Do humans do imitation learning? 0:23:57 – The Era of Experience 0:34:25 – Current architectures generalize poorly out of distribution 0:42:17 – Surprises in the AI field 0:47:28 – Will The Bitter Lesson still apply after AGI? 0:54:35 – Succession to AI

Sep 26, 2025 · 4:01 PM UTC

Look up Dwarkesh Podcast on YouTube, Apple Podcasts, Spotify, etc to watch there and to subscribe for future episodes.
6
2
170
you're killing it man.. holy moly what a guest enjoy edmonton
3
204
much of Sutton’s critique of LLMs is virtually identical to what I have been arguing for many many years. it is disappointing @dwarkesh_sp that you would not let me present my views.
16
2
164
It is not clear to me what makes LLM architecture per se unsuited for continual learning
9
1
97
Good questions and respectful push backs. You're a very good interviewer. Thanks for sharing.
1
48
“Don’t be difficult” part was funny. Indeed, it’s interesting to see how kids learn from trial and error. Feels like it’s 99% of what they are doing: setting their own goals, trying to reach the goal. Exploration is another aspect. Give a kid any new toy or game, they will start pressing all the buttons at random and observing cause and effect. Even if you try to tell them how to use a toy, they will ignore you and keep doing random stuff.
39
Continual learning is the key! This is probably why building human like memory for agents is getting so big. @zep_ai @danielchalef 👀
2
15
Great discussion. The question that still nags me is… where does the reward come from? rlbrew-workshop.github.io/pa…
4
1
9
@grok please assess Dr. Sutton's claim: "these things are well understood. when you go to look at how pyschologists think about learning, there's nothing like imitation. Maybe there are some extreme cases where humans might do that or appear to do that. But there's no basic animal learning process called imitation."
5
11
Great episode. Listening to this convo, it struck me that a big part of the tension comes from you and Richard not quite aligning on an operational definition of intelligence. You seem to approach circumnavigating the core of intelligence through the lens of what sets humans apart from other animals, the kind of smarts that drive tech breakthroughs, complex societies, and all that cultural richness weve built over the last few millennia. On the other hand, Richard feels like he’s trying to reorient us away from mimicking human uniqueness and towards a more fundamental, ancient view: intelligence as the shared machinery of all complex adaptive systems, going back hundreds of millions of years (and perhaps beyond just that). Something about the essence of raw processes of prediction, action, feedback, and goal pursuit that let any system (from cells to squirrels to AIs) navigate uncertainty and build world models through experience. In that sense, the nature of Richard identifying teaching/pretraining as a ‘veneer’ mirrors Michael Levin’s repeated call ‘against mind blindness’. They both are saying something like “yes, humans are uniquely great and beautiful creatures, but the core of intelligence is actually much deeper and more general” mdpi.com/1099-4300/24/6/819
10
wes anderson core
10
Meta agrees with this: if the current architectures could scale infinitely it would make no sense to pay any individual researcher 100M signing bonus because that money would give you more leverage if you spent it on data… so the current race is for new architectures.
1
9
Is this the father of RL or the father of drip?
5
He spoke the truth, and he is right. But unfortunately, most people won’t be able to understand it — not because of incapacity, but in the same way large language models cannot truly learn from reading what he learned through living.
5
Too bad there wasn’t more time for the “succession” topic at the end. Shameless plug for my post “The Era of Experience” Has An Unsolved Technical Alignment Problem alignmentforum.org/posts/TCG… and related thread
Replying to @RichardSSutton
(I didn’t intend to caricature!) (Thanks for sending that.) AIs’ goals will be a logical consequence of their source code etc. Someone will write the source code. That someone will be indirectly “controlling the goals of AI”, in a certain sense. Right? If so, two points: (1/6)
4
I swear. There are a lot of people in the field whose knowledge is so narrow that they make ridiculous statement to a public for whom tech is magic.
2
Very much looking forward to this one.
2
Amazing conversation, thank you
2
All the people in shock about Rich Sutton having very bad takes about LLMs (might I go so far as to use the R-slur), just remember that many of the most august and prestigious older physicists at the turn of the century dismissed quantum theory as being crazy and wrong.
1
RCLs will replace GPTs. 🌊 Reasoning, Continuous, @LiquidAI_ nets. Generative 🚫: Token prediction is way of the past. ✅ Pre-Training 🚫: Future SOTA will be continuous time (weights computed fluidly at runtime). ✅ Liquid nets will replace all Transformers. LPU > GPU > CPU
1
1
loved the insights he shared thanks for interviewing him
richard sutton is the godfather of reinforcement learning and a turing award winner he recently went on @dwarkesh_sp's podcast to explain why LLMs are a dead end. i used okara’s youtube tool to summarize the interview here are the top 10 insights he shared 👇
1
AGI's can't have goals if they don't have problems.
Great guest bro
perfection is overrated, self-evolving agents are the real plot twist
I really like the point on kids, especially having an active 2 year old and 3 month old. Kids mimic to an extent, but they also push boundaries and surprise you with new types of behavior you don’t expect. More challenging and limit testing the world for reaction than mimicking.
I think I understand what he’s getting at, and why it’s easy to miss. He isn’t saying large language models can’t achieve extraordinary things; he’s highlighting a real ceiling in how they learn—one I repeatedly meet when building real-world systems on top of them. 🧶
Legendary. I'm going to have to watch this one. Get my academic grandfather Andy Barto next!
This was an amazing @dwarkesh_sp definitely my favorite one yet
I agree! We need a new architecture for RL but also hallucination prevention: otherwise AI agents won’t be able to perform in the workforce
What you need is something that mimics the climate of the memory’s actual signal. What you need is advanced reservoir computing.