Democracy works best when we all come together. Thank you, Governor @Schwarzenegger, for joining USC today to recognize International Day of Democracy.
Much thanks to my amazing co-authors @DiChang10, Minh Tran, Hongkun Gong, Ashutosh Chaubey, and my advisor Prof. @msoleymani for their hard work and collaboration throughout this project!
Our model sets SOTA performance in both:
📈 +73.8% improvement in photorealism (FID on RealTalk)
⬆️ +6.1% gain in motion representation (FD on VICO)
📊 User studies show preference for DiTaiListener on diversity, smoothness, and realism.
DiTaiListener is two-stage system:
1️⃣DiTaiListener-Gen: Adapts a Diffusion Transformer with a Causal Temporal Multimodal Adapter to generate listener clips from multimodal input
2️⃣DiTaiListener-Edit: Refines transitions between clips to create long video with coherent expressions
Prior methods compress listener facial motion into low-dimensional code – DiTaiListener avoids this bottleneck with an multimodal video diffusion model that generates photorealistic, temporally coherent listener head videos directly from a speaker’s speech and facial motions.
Glad to share our new work on speaker and listener head generation, Dyadic Interaction Modeling for Social Behavior Generation, has been accepted by #ECCV2024!
TL;DR: We propose Dyadic Interaction Modeling, a pre-training strategy that jointly models speakers’ and listeners’ motions and learns representations that capture the dyadic context.
Our code has been fully open-sourced and please kindly gives the repo a star 🌟 if you find our project interesting🥰
💻 paper: arxiv.org/abs/2403.09069
🔗website: boese0601.github.io/dim/
⌨️ code: github.com/Boese0601/Dyadic-…
A huge thanks to my amazing collaborators for their hard work, including my lab mates Minh Tran, Maksim Siniukov and faculty advisor @msoleymani at @CSatUSC and @USC_ICT. See you in Milan, Italy! 😘
#eccv#genai#computervision#videogeneration#behaviorgeneration#talkinghead