Maksim Siniukov · Jul 29, 2025 · 1:28 AM UTC

Maksim Siniukov

Pinned Tweet

Maksim Siniukov @siniukov

Jul 29

Excited to share that our paper DiTaiListener–Controllable High Fidelity Listener Video Generation with Diffusion – has been accepted to @ICCVConference! 🔗paper: arxiv.org/pdf/2504.04010 🌐website: cv.maxi.su/DiTaiListener/ #ICCV2025 #videogeneration #computervision #genai

USC · Sep 16, 2025 · 12:36 AM UTC

Maksim Siniukov retweeted

USC

@USC

Sep 16

Democracy works best when we all come together. Thank you, Governor @Schwarzenegger, for joining USC today to recognize International Day of Democracy.

Maksim Siniukov · Jul 29, 2025 · 1:28 AM UTC

Maksim Siniukov @siniukov

Jul 29

See you at #ICCV2025 in Hawaii! 🏝️ #behaviorgeneration #multimodalAI #diffusionmodels #affectivecomputing #talkingheads @ICCVConference #ICCV #DeepLearning #ICCV25

Maksim Siniukov · Jul 29, 2025 · 1:28 AM UTC

Maksim Siniukov @siniukov

Jul 29

Much thanks to my amazing co-authors @DiChang10, Minh Tran, Hongkun Gong, Ashutosh Chaubey, and my advisor Prof. @msoleymani for their hard work and collaboration throughout this project!

Maksim Siniukov · Jul 29, 2025 · 1:28 AM UTC

Maksim Siniukov @siniukov

Jul 29

Our model sets SOTA performance in both: 📈 +73.8% improvement in photorealism (FID on RealTalk) ⬆️ +6.1% gain in motion representation (FD on VICO) 📊 User studies show preference for DiTaiListener on diversity, smoothness, and realism.

Maksim Siniukov · Jul 29, 2025 · 1:28 AM UTC

Maksim Siniukov @siniukov

Jul 29

DiTaiListener is two-stage system: 1️⃣DiTaiListener-Gen: Adapts a Diffusion Transformer with a Causal Temporal Multimodal Adapter to generate listener clips from multimodal input 2️⃣DiTaiListener-Edit: Refines transitions between clips to create long video with coherent expressions

Maksim Siniukov · Jul 29, 2025 · 1:28 AM UTC

Maksim Siniukov @siniukov

Jul 29

Prior methods compress listener facial motion into low-dimensional code – DiTaiListener avoids this bottleneck with an multimodal video diffusion model that generates photorealistic, temporally coherent listener head videos directly from a speaker’s speech and facial motions.

Di Chang · Jul 2, 2024 · 3:01 AM UTC

Maksim Siniukov retweeted

Di Chang @DiChang10

2 Jul 2024

Glad to share our new work on speaker and listener head generation, Dyadic Interaction Modeling for Social Behavior Generation, has been accepted by #ECCV2024! TL;DR: We propose Dyadic Interaction Modeling, a pre-training strategy that jointly models speakers’ and listeners’ motions and learns representations that capture the dyadic context. Our code has been fully open-sourced and please kindly gives the repo a star 🌟 if you find our project interesting🥰 💻 paper: arxiv.org/abs/2403.09069 🔗website: boese0601.github.io/dim/ ⌨️ code: github.com/Boese0601/Dyadic-… A huge thanks to my amazing collaborators for their hard work, including my lab mates Minh Tran, Maksim Siniukov and faculty advisor @msoleymani at @CSatUSC and @USC_ICT. See you in Milan, Italy! 😘 #eccv #genai #computervision #videogeneration #behaviorgeneration #talkinghead