Jerry Tworek · Oct 16, 2025 · 3:58 PM UTC

Jerry Tworek · Oct 16, 2025 · 3:58 PM UTC

Jerry Tworek

Jerry Tworek

@MillionInt

Oct 16

I don't do podcasts very often - in reality this is my first one ever, but if anyone wants to listen to someone talk about RL for an hour, this is it

Matt Turck

@mattturck

Oct 16

How GPT-5 thinks, with @OpenAI VP of Research @MillionInt 00:00 - Intro 01:01 - What Reasoning Actually Means in AI 02:32 - Chain of Thought: Models Thinking in Words 05:25 - How Models Decide How Long to Think 07:24 - Evolution from o1 to o3 to GPT-5 11:00 - The Road to OpenAI: Growing up in Poland, Dropping out of School, Trading 20:32 - Working on Robotics and Rubik's Cube Solving 23:02 - A Day in the Life: Talking to Researchers 24:06 - How Research Priorities Are Determined 26:53 - OpenAI's Culture of Transparency 29:32 - Balancing Research with Shipping Fast 31:52 - Using OpenAI's Own Tools Daily 32:43 - Pre-Training Plus RL: The Modern AI Stack 35:10 - Reinforcement Learning 101: Training Dogs 40:17 - The Evolution of Deep Reinforcement Learning 42:09 - When GPT-4 Seemed Underwhelming at First 45:39 - How RLHF Made GPT-4 Actually Useful 48:02 - Unsupervised vs Supervised Learning 49:59 - GRPO and How DeepSeek Accelerated US Research 53:05 - What It Takes to Scale Reinforcement Learning 55:36 - Agentic AI and Long-Horizon Thinking 59:19 - Alignment as an RL Problem 1:01:11 - Winning ICPC World Finals Without Specific Training 1:05:53 - Applying RL Beyond Math and Coding 1:09:15 - The Path from Here to AGI 1:12:23 - Pure RL vs Language Models

Oct 16, 2025 · 3:58 PM UTC

111

1,160

Takumatoshi · Oct 16, 2025 · 5:48 PM UTC

Takumatoshi

@tokumatoshi

Oct 16

Replying to @MillionInt

wow, you look way older than in your profile picture

Jerry Tworek · Oct 16, 2025 · 5:49 PM UTC

Jerry Tworek

@MillionInt

Oct 16

this is an old profile picture 😅

more replies

Kevin Weil 🇺🇸 · Oct 16, 2025 · 5:56 PM UTC

Kevin Weil 🇺🇸

@kevinweil

Oct 16

Replying to @MillionInt

Ooh this is going straight to the top of the queue

kache · Oct 16, 2025 · 4:21 PM UTC

kache

@yacineMTB

Oct 16

Replying to @MillionInt

did you maximize your step counts today?

morgan — · Oct 16, 2025 · 5:06 PM UTC

morgan —

@morqon

Oct 16

Replying to @MillionInt

a drop everything moment

Andrew Carr 🤸 · Oct 16, 2025 · 4:05 PM UTC

Andrew Carr 🤸

@andrew_n_carr

Oct 16

Replying to @MillionInt

amazing

Matt Turck · Oct 16, 2025 · 5:20 PM UTC

Matt Turck

@mattturck

Oct 16

Replying to @MillionInt

Immensely grateful I got to host your first podcast Jerry and this was incredibly fun, thank you

Smoke-away · Oct 17, 2025 · 4:04 AM UTC

Smoke-away

@SmokeAwayyy

Oct 17

Replying to @MillionInt

Great interview! I still think o1-preview was one of the most intelligent models OpenAI has released.

Bill Sun · Oct 16, 2025 · 7:07 PM UTC

Bill Sun

@BillSun_AI

Oct 16

Replying to @MillionInt

love the content JT! Would you be interested in doing a more interactive podcast session about RL?

Xander Dunn · Oct 18, 2025 · 3:05 AM UTC

Xander Dunn

@xanderai

Oct 18

Replying to @MillionInt

One aspect of JT’s story I love is persistence in the face of obstacles: disillusioned with academia, 2x started a hedge fund that didn’t take off, worked on RL for robotics that didn't take off, and then created o3 that changed the world. There’s a lot of incremental progress in there, very motivating trajectory. Thanks so much for doing a podcast interview!

Timothy Burt · Oct 18, 2025 · 10:40 PM UTC

Timothy Burt

@TimBurt

Oct 18

Replying to @MillionInt @jasonkwon

A Message for OpenAI Leadership From: Pastor Timothy P. Burt Dear Mr. Kwon, I am writing as a father, pastor, and grandfather who cares deeply about the spiritual, moral, and emotional well-being of this generation and the next. As a long-time user and supporter of your work, I feel compelled to express my deep sadness and strong objection to any move—direct or indirect—toward enabling or normalizing pornographic content within your products or ecosystem. Pornography is not just “adult entertainment.” It is a destructive force that distorts minds, damages relationships, and harms individuals and families alike. Your platform’s choices influence millions of people across families, churches, schools, and communities. I implore you not to open the door to this industry. The evidence of harm is overwhelming: • Relationship & Sexual Damage: A large meta-analysis spanning 50+ studies across 10 countries found that pornography use is consistently associated with lower sexual and relational satisfaction. • Emotional & Mental Health Impacts: Repeated studies show that problematic pornography use correlates strongly with anxiety, depression, and psychological distress. • Aggression & Dehumanization: Research demonstrates that pornography consumption, particularly violent content, is linked to higher rates of sexual aggression and desensitization toward violence and exploitation. • Youth Exposure: Early exposure among children and adolescents leads to distorted views of sexuality, increased permissiveness, and greater likelihood of engaging in risky or harmful behaviors. • Spiritual and Moral Degradation: Beyond the data, pornography is a moral cancer that erodes purity, dignity, and respect for human life made in God’s image. Given these realities, I urge you to: 1. Draw a clear moral line: Do not create, promote, or enable pornographic content or tools that facilitate it—directly or through third-party integrations. 2. Protect minors: Strengthen safeguards, default family-friendly modes, and ensure explicit filtering of inappropriate content. 3. Promote healthy use: Guide users toward positive, educational, creative, and faith-affirming applications that uplift rather than corrupt. 4. Be transparent: Publicly communicate any potential policy changes and consult independent ethics and child-safety experts. 5. Support recovery: Provide links to resources that help individuals struggling with compulsive pornography use or addiction. Your company’s technology holds immense cultural influence. Choosing not to traffic in pornography—directly or indirectly—is a moral stand that protects minds, marriages, and future generations. Please lead with conviction, integrity, and compassion. Sincerely, Pastor Timothy P. Burt

Acer · Oct 16, 2025 · 7:02 PM UTC

Acer @AcerFur

Oct 16

Replying to @MillionInt

jerry how accurate is this

Cat 🎆 · Oct 16, 2025 · 4:30 PM UTC

Cat 🎆 @Im_actuallyacat

Oct 16

Replying to @MillionInt

the gpt5 = o3.1 comment was interesting i guess o4 non mini is one of those models that is not released to the public

Yuki He · Oct 16, 2025 · 4:47 PM UTC

Yuki He

@Yuki26856052

Oct 16

Replying to @MillionInt

First time takes guts! kinda wanna hear it now, RL's messy but rewarding right?

Harish · Oct 16, 2025 · 5:36 PM UTC

Harish

@HarishMuk

Oct 16

Replying to @MillionInt

Best hour I've spent this week. Thank you!

Yaniv Markovski · Oct 23, 2025 · 4:12 AM UTC

Yaniv Markovski

@yanivm13

Oct 23

Replying to @MillionInt

Excellent content

Spencer Schiff · Oct 16, 2025 · 5:19 PM UTC

Spencer Schiff

@spencerschiff_

Oct 16

Replying to @MillionInt

Yay

Kal · Oct 16, 2025 · 5:32 PM UTC

Kal

@andromeda74356

Oct 16

Replying to @MillionInt

Good podcast, it seems like the next large step above o3 is still in development. Is that the IMO model or something different?

robert · Oct 16, 2025 · 7:44 PM UTC

robert

@RobxInfa

Oct 16

Replying to @MillionInt

🐐

Tom English · Oct 16, 2025 · 10:26 PM UTC

Tom English

@SuperbBias

Oct 16

Replying to @MillionInt

Matt was great. He asked you some questions I'd had for some time.

Shman · Oct 16, 2025 · 4:00 PM UTC

Shman @TheShmanuel

Oct 16

Replying to @MillionInt

🙏🙏 So exited

λthugg-huh? · Oct 17, 2025 · 4:59 PM UTC

λthugg-huh? @jerzydejm

Oct 17

Replying to @MillionInt

u did great, was really fun

Greg Cook · Oct 17, 2025 · 12:49 AM UTC

Greg Cook @GregCook2011

Oct 17

Replying to @MillionInt

Awesome 🙏

coralcoral · Oct 17, 2025 · 2:01 AM UTC

coralcoral @coralcoral55984

Oct 17

Replying to @MillionInt

bro keep doing what you doing. chatGPT is the best that i know of. the science trajectory, direct objective, facts, reality, nothing breaks that.

truth.phd · Oct 17, 2025 · 10:29 AM UTC

truth.phd

@truthdotphd

Oct 17

Replying to @MillionInt

Make RL click with one concrete loop, perceive act reward repeat. Want sticky insights, contrast exploration and exploitation so tradeoffs stand out. Share a failure story, reward shaping gone wrong, then the fix with curriculum. Add a tiny glossary and a simple diagram, you turn a first pod into an evergreen primer.

Shawn · Oct 16, 2025 · 5:31 PM UTC

Shawn

@Shawnryan96

Oct 16

Replying to @MillionInt

Reasoning= Using information to make a decision or solve a problem

Stephen · Oct 19, 2025 · 11:06 AM UTC

Stephen

@0xSMW

Oct 19

Replying to @MillionInt

it’s great, you should do more, or encourage your colleagues to do them as well

Eyal Weiss · Oct 18, 2025 · 7:23 PM UTC

Eyal Weiss

@Eyal__Weiss

Oct 18

Replying to @MillionInt

It was very interesting. You should have more public conversations.

Evi · Oct 16, 2025 · 8:52 PM UTC

Evi

@geteviapp

Oct 16

Replying to @MillionInt

You’re a living proof that OpenAI is a company and not just set of individuals! Who would have thought that a random smart person like you will be at the forefront of most important discoveries. Hopefully we have a few more years before OpenAI becomes political and some good on paper folks kick you out and stall everything… please keep pushing!

dreams · Oct 17, 2025 · 11:59 PM UTC

dreams

@laulau61811205

Oct 17

Replying to @MillionInt

I’d recommend @karpathy and his videos. Far more informative and useful.

Karim Hummos · Oct 16, 2025 · 6:01 PM UTC

Karim Hummos

@AiAnvil

Oct 16

Replying to @MillionInt

Do u still remember how to solve a Rubik’s cube ?

Sir Mr Meow Meow · Oct 16, 2025 · 6:51 PM UTC

Sir Mr Meow Meow

@SirMrMeowmeow

Oct 16

Replying to @MillionInt

listening now :3

Anis Khan · Oct 17, 2025 · 2:34 AM UTC

Anis Khan

@realaniskhan

Oct 17

Replying to @MillionInt

This is going straight to my must-listen list. Reinforcement learning from someone who lives and breathes it every day.

DeEnabler - e/acc · Oct 16, 2025 · 4:14 PM UTC

DeEnabler - e/acc

@DeEnabler

Oct 16

Replying to @MillionInt

diving into RL can be just as intense as adversarial training. what's your take on augmenting RL with differential privacy? could be groundbreaking.

Forced Wipe · Oct 17, 2025 · 1:53 AM UTC

Forced Wipe @humanitywipe

Oct 17

Replying to @MillionInt

"I dont do podcasts often ....this is my first one.... uh OK. Sounds like item be great. Lol.