.@RichardSSutton, father of reinforcement learning, doesnโ€™t think LLMs are bitter-lesson-pilled. My steel man of Richardโ€™s position: we need some new architecture to enable continual (on-the-job) learning. And if we have continual learning, we don't need a special training phase - the agent just learns on-the-fly - like all humans, and indeed, like all animals. This new paradigm will render our current approach with LLMs obsolete. I did my best to represent the view that LLMs will function as the foundation on which this experiential learning can happen. Some sparks flew. 0:00:00 โ€“ Are LLMs a dead-end? 0:13:51 โ€“ Do humans do imitation learning? 0:23:57 โ€“ The Era of Experience 0:34:25 โ€“ Current architectures generalize poorly out of distribution 0:42:17 โ€“ Surprises in the AI field 0:47:28 โ€“ Will The Bitter Lesson still apply after AGI? 0:54:35 โ€“ Succession to AI
much of Suttonโ€™s critique of LLMs is virtually identical to what I have been arguing for many many years. it is disappointing @dwarkesh_sp that you would not let me present my views.
16
2
164
Sutton gets his own frame bias of TD learning algorithm?

Sep 27, 2025 ยท 2:58 AM UTC