Excited to release BFM-Zero, an unsupervised RL approach to learn humanoid Behavior Foundation Model. Existing humanoid general whole-body controllers rely on explicit motion tracking rewards, on-policy PG methods like PPO, and distillation to one policy. In contrast, BFM-Zero directly learns an effective shared latent representation that embeds motions, goals, and rewards into a common space, which enables a single policy zero-shot perform multiple tasks: (1) natural transition from any pose to any goal pose, (2) real-time motion following, (3) optimize any user-specified reward function at test time, etc. How it works? We don't give the model any specific reward in training. It builds upon recent advances in Forward-Backward (FB) models, where a latent-conditioned policy, a deep "forward dynamics model" and a deep "inverse dynamics model"are jointly learned. In such a way, the learned representation space understands humanoid dynamics and unifies different tasks. More details: lecar-lab.github.io/BFM-Zero…
Meet BFM-Zero: A Promptable Humanoid Behavioral Foundation Model w/ Unsupervised RL👉 lecar-lab.github.io/BFM-Zero… 🧩ONE latent space for ALL tasks ⚡Zero-shot goal reaching, tracking, and reward optimization (any reward at test time), from ONE policy 🤖Natural recovery & transition

Nov 7, 2025 · 5:07 PM UTC

Want to make our claim more accurate and rigorous: BFM-Zero is "unsupervised" in the task level: The training doesn't involve any *task-related* reward, but it still has "auxiliary" rewards such as joint angle limit (only for sim2real) and AMP-style rewards to "shape" the latent space to be more human-like. W/o the AMP-style reward it still works but won't be that natural or human-like. Similarly, w/o the auxiliary reward it works in sim but cannot smoothly transfer to real.
1
9