.@RichardSSutton, father of reinforcement learning, doesn’t think LLMs are bitter-lesson-pilled. My steel man of Richard’s position: we need some new architecture to enable continual (on-the-job) learning. And if we have continual learning, we don't need a special training phase - the agent just learns on-the-fly - like all humans, and indeed, like all animals. This new paradigm will render our current approach with LLMs obsolete. I did my best to represent the view that LLMs will function as the foundation on which this experiential learning can happen. Some sparks flew. 0:00:00 – Are LLMs a dead-end? 0:13:51 – Do humans do imitation learning? 0:23:57 – The Era of Experience 0:34:25 – Current architectures generalize poorly out of distribution 0:42:17 – Surprises in the AI field 0:47:28 – Will The Bitter Lesson still apply after AGI? 0:54:35 – Succession to AI
Too bad there wasn’t more time for the “succession” topic at the end. Shameless plug for my post “The Era of Experience” Has An Unsolved Technical Alignment Problem alignmentforum.org/posts/TCG… and related thread
Replying to @RichardSSutton
(I didn’t intend to caricature!) (Thanks for sending that.) AIs’ goals will be a logical consequence of their source code etc. Someone will write the source code. That someone will be indirectly “controlling the goals of AI”, in a certain sense. Right? If so, two points: (1/6)

Sep 27, 2025 · 8:47 PM UTC

4