I'm in Montréal to present new work on XAI for RL agents using World Models at the IJCAI Workshop on XAI, Aug 17
arxiv.org/abs/2505.08073
When an agent doesn't perform as expected, we run the world model backwards to generate counterfactual states from which the agent would have met expectations.
Explaining RL agent behaviors is wickedly hard. Even harder to provide non-technical end-users *actionable* insights.
We show that our explanations are actionable, in that people can recognize what they need to do to get the desired behavior from an agent.
This is work done by my amazing crew: Madhuri Singh, Amal Alabdulkarim, and Gennie Mansi.
Aug 16, 2025 · 6:47 PM UTC

