I don't do podcasts very often - in reality this is my first one ever, but if anyone wants to listen to someone talk about RL for an hour, this is it
How GPT-5 thinks, with @OpenAI VP of Research @MillionInt
00:00 - Intro
01:01 - What Reasoning Actually Means in AI
02:32 - Chain of Thought: Models Thinking in Words
05:25 - How Models Decide How Long to Think
07:24 - Evolution from o1 to o3 to GPT-5
11:00 - The Road to OpenAI: Growing up in Poland, Dropping out of School, Trading
20:32 - Working on Robotics and Rubik's Cube Solving
23:02 - A Day in the Life: Talking to Researchers
24:06 - How Research Priorities Are Determined
26:53 - OpenAI's Culture of Transparency
29:32 - Balancing Research with Shipping Fast
31:52 - Using OpenAI's Own Tools Daily
32:43 - Pre-Training Plus RL: The Modern AI Stack
35:10 - Reinforcement Learning 101: Training Dogs
40:17 - The Evolution of Deep Reinforcement Learning
42:09 - When GPT-4 Seemed Underwhelming at First
45:39 - How RLHF Made GPT-4 Actually Useful
48:02 - Unsupervised vs Supervised Learning
49:59 - GRPO and How DeepSeek Accelerated US Research
53:05 - What It Takes to Scale Reinforcement Learning
55:36 - Agentic AI and Long-Horizon Thinking
59:19 - Alignment as an RL Problem
1:01:11 - Winning ICPC World Finals Without Specific Training
1:05:53 - Applying RL Beyond Math and Coding
1:09:15 - The Path from Here to AGI
1:12:23 - Pure RL vs Language Models
Oct 16, 2025 · 3:58 PM UTC



































