Chuanyang Jin · Nov 5, 2025 · 1:08 AM UTC

Chuanyang Jin · Nov 5, 2025 · 1:08 AM UTC

Chuanyang Jin

Chuanyang Jin

@chuanyang_jin

Nov 5

I gave a talk on the Era of Real-World Human Interaction @Google It's great to see frontier AI labs like Google taking a strong interest in understanding users and evolving their models through user interaction. Yes, while today's AI can win gold at the IMO, it often struggles with meeting daily user needs and adapting to personal preferences. The challenge: how can we bridge the gap between training data and real user demands? How can AI get smarter through every conversation? In this talk, I argue for moving beyond expert-annotated data → learning from real user conversations. Our method, Reinforcement Learning from Human Interaction (RLHI), offers a simple, concrete approach. RLHI learns directly from in-the-wild conversations: (1) User-Guided Rewrites – revises unsatisfactory model outputs based on users' natural-language follow-up responses; (2) User-Based Rewards – learns via a reward model conditioned on knowledge of the user's long-term interaction history (termed persona). Together, they link long-term user personas to turn-level preferences via persona-conditioned preference optimization. Trained on WildChat, RLHI outperforms RLHF in personalization and instruction-following. Thanks @maximillianc_ for the invite!

Nov 5, 2025 · 1:08 AM UTC

182

Chuanyang Jin · Nov 5, 2025 · 1:08 AM UTC

Chuanyang Jin

@chuanyang_jin

Nov 5

Slides: docs.google.com/presentation…

RL from User Conversations

The Era of Real-World Human Interaction: RL from User Conversations Chuanyang Jin Johns Hopkins University & Meta FAIR

docs.google.com

Max Chen · Nov 5, 2025 · 8:11 PM UTC

Max Chen @maximillianc_

Nov 5

Replying to @chuanyang_jin @Google @jaseweston

It was an awesome talk, thanks so much for visiting us!

Chuanyang Jin · Nov 6, 2025 · 3:23 AM UTC

Chuanyang Jin

@chuanyang_jin

Nov 6

❤️

Kevin Wang · Nov 6, 2025 · 3:07 PM UTC

Kevin Wang

@KevinWang_111

Nov 6

Replying to @chuanyang_jin @Google @jaseweston

Love this direction. Moving beyond expert labels to in-the-wild learning is the missing bridge. Curious how RLHI handles persona drift over long horizons?

Leonard Tang · Nov 5, 2025 · 6:17 AM UTC

Leonard Tang

@leonardtang_

Nov 5

Replying to @chuanyang_jin @Google @jaseweston

Thank you Chuanyang for the wonderful work Thinking about very similar directions from a SME perspective if it is of interest to you

Himanshu Kumar · Nov 5, 2025 · 6:45 AM UTC

Himanshu Kumar

@codewithimanshu

Nov 5

Replying to @chuanyang_jin @Google @jaseweston

That's a fantastic talk, Chuanyang! It's true, AI needs that human touch to really understand the real world, isn't it?

Min Chon Chi · Nov 5, 2025 · 6:00 AM UTC

Min Chon Chi @MinChonChiSF

Nov 5

Replying to @chuanyang_jin @Google @jaseweston

Interesting to see RL applied directly to user conversations and personalization.