I don't do podcasts very often - in reality this is my first one ever, but if anyone wants to listen to someone talk about RL for an hour, this is it
How GPT-5 thinks, with @OpenAI VP of Research @MillionInt 00:00 - Intro 01:01 - What Reasoning Actually Means in AI 02:32 - Chain of Thought: Models Thinking in Words 05:25 - How Models Decide How Long to Think 07:24 - Evolution from o1 to o3 to GPT-5 11:00 - The Road to OpenAI: Growing up in Poland, Dropping out of School, Trading 20:32 - Working on Robotics and Rubik's Cube Solving 23:02 - A Day in the Life: Talking to Researchers 24:06 - How Research Priorities Are Determined 26:53 - OpenAI's Culture of Transparency 29:32 - Balancing Research with Shipping Fast 31:52 - Using OpenAI's Own Tools Daily 32:43 - Pre-Training Plus RL: The Modern AI Stack 35:10 - Reinforcement Learning 101: Training Dogs 40:17 - The Evolution of Deep Reinforcement Learning 42:09 - When GPT-4 Seemed Underwhelming at First 45:39 - How RLHF Made GPT-4 Actually Useful 48:02 - Unsupervised vs Supervised Learning 49:59 - GRPO and How DeepSeek Accelerated US Research 53:05 - What It Takes to Scale Reinforcement Learning 55:36 - Agentic AI and Long-Horizon Thinking 59:19 - Alignment as an RL Problem 1:01:11 - Winning ICPC World Finals Without Specific Training 1:05:53 - Applying RL Beyond Math and Coding 1:09:15 - The Path from Here to AGI 1:12:23 - Pure RL vs Language Models

Oct 16, 2025 · 3:58 PM UTC

46
111
4
1,160
Replying to @MillionInt
wow, you look way older than in your profile picture
2
5
this is an old profile picture 😅
1
15
Replying to @MillionInt
Ooh this is going straight to the top of the queue
1
19
Replying to @MillionInt
did you maximize your step counts today?
1
6
Replying to @MillionInt
a drop everything moment
1
5
Replying to @MillionInt
Immensely grateful I got to host your first podcast Jerry and this was incredibly fun, thank you
4
Replying to @MillionInt
Great interview! I still think o1-preview was one of the most intelligent models OpenAI has released.
2
Replying to @MillionInt
love the content JT! Would you be interested in doing a more interactive podcast session about RL?
1
Replying to @MillionInt
One aspect of JT’s story I love is persistence in the face of obstacles: disillusioned with academia, 2x started a hedge fund that didn’t take off, worked on RL for robotics that didn't take off, and then created o3 that changed the world. There’s a lot of incremental progress in there, very motivating trajectory. Thanks so much for doing a podcast interview!
1
A Message for OpenAI Leadership From: Pastor Timothy P. Burt   Dear Mr. Kwon, I am writing as a father, pastor, and grandfather who cares deeply about the spiritual, moral, and emotional well-being of this generation and the next. As a long-time user and supporter of your work, I feel compelled to express my deep sadness and strong objection to any move—direct or indirect—toward enabling or normalizing pornographic content within your products or ecosystem. Pornography is not just “adult entertainment.” It is a destructive force that distorts minds, damages relationships, and harms individuals and families alike. Your platform’s choices influence millions of people across families, churches, schools, and communities. I implore you not to open the door to this industry. The evidence of harm is overwhelming: • Relationship & Sexual Damage: A large meta-analysis spanning 50+ studies across 10 countries found that pornography use is consistently associated with lower sexual and relational satisfaction. • Emotional & Mental Health Impacts: Repeated studies show that problematic pornography use correlates strongly with anxiety, depression, and psychological distress. • Aggression & Dehumanization: Research demonstrates that pornography consumption, particularly violent content, is linked to higher rates of sexual aggression and desensitization toward violence and exploitation. • Youth Exposure: Early exposure among children and adolescents leads to distorted views of sexuality, increased permissiveness, and greater likelihood of engaging in risky or harmful behaviors. • Spiritual and Moral Degradation: Beyond the data, pornography is a moral cancer that erodes purity, dignity, and respect for human life made in God’s image. Given these realities, I urge you to: 1. Draw a clear moral line: Do not create, promote, or enable pornographic content or tools that facilitate it—directly or through third-party integrations. 2. Protect minors: Strengthen safeguards, default family-friendly modes, and ensure explicit filtering of inappropriate content. 3. Promote healthy use: Guide users toward positive, educational, creative, and faith-affirming applications that uplift rather than corrupt. 4. Be transparent: Publicly communicate any potential policy changes and consult independent ethics and child-safety experts. 5. Support recovery: Provide links to resources that help individuals struggling with compulsive pornography use or addiction. Your company’s technology holds immense cultural influence. Choosing not to traffic in pornography—directly or indirectly—is a moral stand that protects minds, marriages, and future generations. Please lead with conviction, integrity, and compassion. Sincerely, Pastor Timothy P. Burt
Replying to @MillionInt
jerry how accurate is this
2
1
9
Replying to @MillionInt
the gpt5 = o3.1 comment was interesting i guess o4 non mini is one of those models that is not released to the public
4
Replying to @MillionInt
First time takes guts! kinda wanna hear it now, RL's messy but rewarding right?
3
Replying to @MillionInt
Best hour I've spent this week. Thank you!
1
Replying to @MillionInt
Excellent content
1
Replying to @MillionInt
Good podcast, it seems like the next large step above o3 is still in development. Is that the IMO model or something different?
1
Replying to @MillionInt
🐐
1
Replying to @MillionInt
Matt was great. He asked you some questions I'd had for some time.
1
Replying to @MillionInt
🙏🙏 So exited
1
Replying to @MillionInt
u did great, was really fun
1
Replying to @MillionInt
Awesome 🙏
1
Replying to @MillionInt
bro keep doing what you doing. chatGPT is the best that i know of. the science trajectory, direct objective, facts, reality, nothing breaks that.
1
Replying to @MillionInt
Make RL click with one concrete loop, perceive act reward repeat. Want sticky insights, contrast exploration and exploitation so tradeoffs stand out. Share a failure story, reward shaping gone wrong, then the fix with curriculum. Add a tiny glossary and a simple diagram, you turn a first pod into an evergreen primer.
Replying to @MillionInt
Reasoning= Using information to make a decision or solve a problem
Replying to @MillionInt
it’s great, you should do more, or encourage your colleagues to do them as well
Replying to @MillionInt
It was very interesting. You should have more public conversations.
Replying to @MillionInt
You’re a living proof that OpenAI is a company and not just set of individuals! Who would have thought that a random smart person like you will be at the forefront of most important discoveries. Hopefully we have a few more years before OpenAI becomes political and some good on paper folks kick you out and stall everything… please keep pushing!
Replying to @MillionInt
I’d recommend @karpathy and his videos. Far more informative and useful.
Replying to @MillionInt
Do u still remember how to solve a Rubik’s cube ?
Replying to @MillionInt
listening now :3
Replying to @MillionInt
This is going straight to my must-listen list. Reinforcement learning from someone who lives and breathes it every day.
Replying to @MillionInt
diving into RL can be just as intense as adversarial training. what's your take on augmenting RL with differential privacy? could be groundbreaking.
Replying to @MillionInt
"I dont do podcasts often ....this is my first one.... uh OK. Sounds like item be great. Lol.