Dwarkesh Patel · Sep 26, 2025 · 4:01 PM UTC

Dwarkesh Patel

Dwarkesh Patel

@dwarkesh_sp

Sep 26

.@RichardSSutton, father of reinforcement learning, doesn’t think LLMs are bitter-lesson-pilled. My steel man of Richard’s position: we need some new architecture to enable continual (on-the-job) learning. And if we have continual learning, we don't need a special training phase - the agent just learns on-the-fly - like all humans, and indeed, like all animals. This new paradigm will render our current approach with LLMs obsolete. I did my best to represent the view that LLMs will function as the foundation on which this experiential learning can happen. Some sparks flew. 0:00:00 – Are LLMs a dead-end? 0:13:51 – Do humans do imitation learning? 0:23:57 – The Era of Experience 0:34:25 – Current architectures generalize poorly out of distribution 0:42:17 – Surprises in the AI field 0:47:28 – Will The Bitter Lesson still apply after AGI? 0:54:35 – Succession to AI

255

637

340

4,537

Steven Byrnes · Sep 27, 2025 · 8:47 PM UTC

Steven Byrnes · Sep 27, 2025 · 8:47 PM UTC

Steven Byrnes

@steve47285

Sep 27

Replying to @dwarkesh_sp @RichardSSutton

Too bad there wasn’t more time for the “succession” topic at the end. Shameless plug for my post “The Era of Experience” Has An Unsolved Technical Alignment Problem alignmentforum.org/posts/TCG… and related thread

“The Era of Experience” has an unsolved technical alignment problem — AI Alignment Forum

Every now and then, some AI luminaries (1) propose that the future of powerful AI will be reinforcement learning agents—an algorithm class that in many ways has more in common with MuZero (2019) than...

alignmentforum.org

Steven Byrnes

@steve47285

Apr 25

Replying to @RichardSSutton

(I didn’t intend to caricature!) (Thanks for sending that.) AIs’ goals will be a logical consequence of their source code etc. Someone will write the source code. That someone will be indirectly “controlling the goals of AI”, in a certain sense. Right? If so, two points: (1/6)

Sep 27, 2025 · 8:47 PM UTC