I gave a talk on the Era of Real-World Human Interaction @Google
It's great to see frontier AI labs like Google taking a strong interest in understanding users and evolving their models through user interaction.
Yes, while today's AI can win gold at the IMO, it often struggles with meeting daily user needs and adapting to personal preferences.
The challenge: how can we bridge the gap between training data and real user demands? How can AI get smarter through every conversation?
In this talk, I argue for moving beyond expert-annotated data → learning from real user conversations. Our method, Reinforcement Learning from Human Interaction (RLHI), offers a simple, concrete approach.
RLHI learns directly from in-the-wild conversations:
(1) User-Guided Rewrites – revises unsatisfactory model outputs based on users' natural-language follow-up responses;
(2) User-Based Rewards – learns via a reward model conditioned on knowledge of the user's long-term interaction history (termed persona).
Together, they link long-term user personas to turn-level preferences via persona-conditioned preference optimization. Trained on WildChat, RLHI outperforms RLHF in personalization and instruction-following.
Thanks @maximillianc_ for the invite!
Nov 5, 2025 · 1:08 AM UTC






