I gave a talk on the Era of Real-World Human Interaction @Google It's great to see frontier AI labs like Google taking a strong interest in understanding users and evolving their models through user interaction. Yes, while today's AI can win gold at the IMO, it often struggles with meeting daily user needs and adapting to personal preferences. The challenge: how can we bridge the gap between training data and real user demands? How can AI get smarter through every conversation? In this talk, I argue for moving beyond expert-annotated data → learning from real user conversations. Our method, Reinforcement Learning from Human Interaction (RLHI), offers a simple, concrete approach. RLHI learns directly from in-the-wild conversations: (1) User-Guided Rewrites – revises unsatisfactory model outputs based on users' natural-language follow-up responses; (2) User-Based Rewards – learns via a reward model conditioned on knowledge of the user's long-term interaction history (termed persona). Together, they link long-term user personas to turn-level preferences via persona-conditioned preference optimization. Trained on WildChat, RLHI outperforms RLHF in personalization and instruction-following. Thanks @maximillianc_ for the invite!

Nov 5, 2025 · 1:08 AM UTC

6
23
2
182
It was an awesome talk, thanks so much for visiting us!
1
1
Love this direction. Moving beyond expert labels to in-the-wild learning is the missing bridge. Curious how RLHI handles persona drift over long horizons?
Thank you Chuanyang for the wonderful work Thinking about very similar directions from a SME perspective if it is of interest to you
1
That's a fantastic talk, Chuanyang! It's true, AI needs that human touch to really understand the real world, isn't it?
1
Interesting to see RL applied directly to user conversations and personalization.
1