PhD student @ Tsinghua University.

Joined May 2020
🚀 Introducing Nano3D — a training-free framework for precise, coherent 3D object editing without masks! By integrating FlowEdit into TRELLIS and introducing Voxel/Slat-Merge, Nano3D preserves structure & consistency while delivering superior 3D quality.
Zhengyi Wang retweeted
This is a very solid and promising research that scales consistency models to 10B+ video diffusion models. The combination of sCM and Variational Score Distillation is a very promising direction for few-step generation!
🚀Try out rCM—the most advanced diffusion distillation! ✅First to scale up sCM/MeanFlow to 10B+ video models ✅Open-sourced FlashAttention-2 JVP kernel & FSDP/CP support ✅High quality & diversity videos in 2~4 steps Paper: arxiv.org/abs/2510.08431 Code: github.com/NVlabs/rcm
2
12
124
Meet RDT2, our latest foundation model that zero-shot deploys on any robot arms with unseen scenes, objects & instructions.🔥 Fully open-sourced: github.com/thu-ml/RDT2 Project page: rdt-robotics.github.io/rdt2/
😠💢😵‍💫Tired of endless data collection & fine-tuning every time you try out VLA? Meet RDT2, the first foundation model that zero-shot deploys on any robot arms with unseen scenes, objects & instructions. No collection. No tuning. Just plug and play🚀 Witness a clear sign of embodied superintelligence - 7B one-step diffusion → 23 Hz inference⚡ - Re-designed UMI @chichengcc @SongShuran and manufactured 100 portable devices - Trained on 10K-hour UMI data on 100 real houses - Zero-shot: pick, place, press, wipe… open-vocabulary - Demos: block 30 m/s arrows in 500 ms🛡️; first to play ping-pong with an end-to-end model 🏓; extinguish burning incense by shaking quickly🥢 Fully open source at github.com/thu-ml/RDT2 Project page: rdt-robotics.github.io/rdt2/ Thanks to awesome collaborators @bang_guo96535 @D0g4M74794 @EthanNg51931527
6
Zhengyi Wang retweeted
Cool
Vibe coding with @xai grok 4 and @Alibaba_Qwen Image on my phone
So excited to share Qwen-Image—a 20B MMDiT model! 🚀 It’s been amazing to watch its accurate text rendering start emerging and getting better and better during training. Also, it starts to show preliminary abilities in understanding 3D space and handling spatial transformation.
🚀 Meet Qwen-Image — a 20B MMDiT model for next-gen text-to-image generation. Especially strong at creating stunning graphic posters with native text. Now open-source. 🔍 Key Highlights: 🔹 SOTA text rendering — rivals GPT-4o in English, best-in-class for Chinese 🔹 In-pixel text generation — no overlays, fully integrated 🔹 Bilingual support, diverse fonts, complex layouts 🎨 Also excels at general image generation — from photorealistic to anime, impressionist to minimalist. A true creative powerhouse. Blog:qwenlm.github.io/blog/qwen-i… Hugging Face:huggingface.co/Qwen/Qwen-Ima… ModelScope:modelscope.cn/models/Qwen/Qw… Github:github.com/QwenLM/Qwen-Image Technical report:qianwen-res.oss-cn-beijing.a… Demo: modelscope.cn/aigc/imageGene…
5
49
DeepMesh V2 drops soon! Upgraded autoregressive 3D mesh generator.🔥🔥🔥
🚀 Introducing ShapeLLM-Omni, a 3D-native multimodal large language model finetuned from Qwen2.5-VL-7B. It builds on voxel-based 3D VQVAE and a 2.56M-dialogue 3D-Alpaca dataset, enabling 4 tasks: text/image-to-3D, 3D comprehension and editing. Code, model, data open-sourced!
Illustration video for DeepMesh, our latest model for 3D mesh generation! 🔥 DeepMesh generates high-quality meshes from raw point clouds. It can also refine existing meshes, improving their structure and quality. Open-sourced: github.com/zhaorw02/DeepMesh
Thank @_akhaliq for sharing our work! We're thrilled to announce DeepMesh, our latest auto-regressive artist-mesh generative model. The model weights and inference code are fully open-sourced!🎉 Code: github.com/zhaorw02/DeepM… Project page: zhaorw02.github.io/DeepMesh/
DeepMesh is out on Hugging Face Auto-Regressive Artist-mesh Creation with Reinforcement Learning Conditioned on point clouds and images, DeepMesh generates meshes with intricate details and precise topology, outperforming state-of-the-art methods in both precision and quality.
🚀 So excited to see LLaMa-Mesh integrated into Blender with #meshgen! 🎉 Now you can generate 3D meshes locally with AI in Blender. Open-source and available now! 🙌 #AI #3D #Blender #LLaMaMesh #OpenSource
Generate meshes with AI locally in Blender 📢 New open-source release meshgen, a local blender integration of LLaMa-Mesh, is open source and available now 🤗
3
20
Zhengyi Wang retweeted
LLaMa-Mesh running locally in blender official @huggingface release soon 🤗
🚀 Introducing LLaMA-Mesh! 🎉 We fine-tuned LLaMA on 3D Mesh data, enabling LLMs to natively generate 3D meshes via chatting while retaining their original language capabilities. ✨ Model weights and inference code are fully open-sourced. 🌐 Project page: research.nvidia.com/labs/tor…
CRM is accepted by #ECCV2024 ! CRM can generate high fidelity 3D textured mesh from single image in 10 seconds.🔥
CRM Single Image to 3D Textured Mesh with Convolutional Reconstruction Model Feed-forward 3D generative models like the Large Reconstruction Model (LRM) have demonstrated exceptional generation speed. However, the transformer-based methods do not leverage the geometric
5
High resolution 4D results.
Vidu4D Single Generated Video to High-Fidelity 4D Reconstruction with Dynamic Gaussian Surfels Video generative models are receiving particular attention given their ability to generate realistic and imaginative frames. Besides, these models are also observed to exhibit strong
4
Amazing video generation results🔥
Thank you all for introducing Vidu 🚀: Elevating video creation with our revolutionary U-ViT tech. Vidu AI supports crafting 16-second, 1080p HD videos with multicam flair and flawless transitions. Explore the edge in AI-driven video magic. Guess what's next!#Vidu #AIGC
1