Shipping super optimized AI models, endpoints and the tools to do it @PrunaAI Cofounder CEO

Paris, France
Joined July 2021
Pinned Tweet
Replying to @Alibaba_Qwen
QWEN-IMAGE NOW SUPER FASSSSSSST (<12s) replicate.com/qwen/qwen-imag… @replicate
1
1
13
0
Huge thank you to everyone who joined us yesterday to create a new merch design for @PrunaAI ! The design will be launched soon, stay tuned 👀 If you’re at the @dotaiconf today, come say hi. And don’t miss @Bertrand_Charp talk at 2:50 PM, see you there! 🚀
3
2
Rayan retweeted
HunyuanImage 3.0 on @replicate passes this test with flying colors. Try it here: replicate.com/tencent/hunyua…
Seedream 4 passed this prompt test pretty well, only missing the tear in the gold backdrop.
1
6
28
Rayan retweeted
You can now run Hunyuan-Image-3.0 on Replicate replicate.com/tencent/hunyua… The #1 model in Text-to-Image LMArena Create images in under 30 seconds Another @PrunaAI collab to deliver the fastest speeds possible
i love this model
You can now run Ovi by @character_ai Simultaneously generate both audio & video in under 40 seconds replicate.com/character-ai/o… Another @PrunaAI collab to deliver the fastest speeds possible
1
Replying to @johnrachwan
Me too. Do some extreme optimization please! @PrunaAI @wavespeed_ai !
1
4
This replacement on @replicate was actually like the highest res, but it kept me as black and white (like source image) which stands out. The jacket is CRISP tho!
Learn how our team cut 𝐀𝐈 𝐢𝐧𝐟𝐞𝐫𝐞𝐧𝐜𝐞 𝐭𝐢𝐦𝐞 𝐛𝐲 𝟓𝟎% in just 2 days, without compromising quality. Some torch.compile + collaboration with our partners at @PrunaAI is the secret sauce. We've captured the full story in our latest blog ➡️go.bria.ai/4fJX6Le Link preview
2
6
Rayan retweeted
Animate Pokémon cards with Wan2.2
Rayan retweeted
🔥 Master model compression, quantization, and deployment optimization with our comprehensive learning path! Key Features: • Deep-dive lecture slided on LLM architectures & compression • 7+ hands-on coding exercises with real benchmarks • CPU vs GPU performance comparisons • Advanced quantization techniques Perfect for: ML engineers, researchers, and students ready to optimize AI models for production. Hardware: Works on modest GPUs (1080Ti+) or Google Colab 👉 Access materials now: github.com/PrunaAI/ai-effici… ⭐️ Don’t forget to star our repo!
1
9
Efficient ML papers + hot takes 👇 :)
𝗜’𝗺 𝘀𝘁𝗮𝗿𝘁𝗶𝗻𝗴 𝗮 𝗻𝗲𝘄 𝘀𝗲𝗿𝗶𝗲𝘀: 𝗣𝗮𝗽𝗲𝗿 𝗛𝗶𝗴𝗵𝗹𝗶𝗴𝗵𝘁𝘀! I’ll share research papers about (efficient) AI I’ve read including their code when available. 𝗣𝗮𝗽𝗲𝗿 𝗛𝗶𝗴𝗵𝗹𝗶𝗴𝗵𝘁 #𝟬𝟭: “Olica: Efficient Structured Pruning of Large Language Models without Retraining” | 𝗔𝘂𝘁𝗵𝗼𝗿𝘀: Jiujun He, Huazhen Lin | 𝗩𝗲𝗻𝘂𝗲: @icmlconf 2025 This paper explores how to efficiently prune large language models 𝘄𝗶𝘁𝗵𝗼𝘂𝘁 𝗳𝘂𝗹𝗹 𝗿𝗲𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴. Key contributions: • 𝗦𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲𝗱 𝗽𝗿𝘂𝗻𝗶𝗻𝗴 𝘄𝗶𝘁𝗵 𝗳𝗮𝘀𝘁 𝗣𝗖𝗔: They analyze matrix products in the multi-head attention (MHA) and remove neurons with the lowest importance scores. • 𝗙𝗮𝘀𝘁 𝗿𝗲𝗰𝗮𝗹𝗶𝗯𝗿𝗮𝘁𝗶𝗼𝗻 𝘄𝗶𝘁𝗵 𝗳𝗲𝘄 𝗱𝗮𝘁𝗮 𝘀𝗮𝗺𝗽𝗹𝗲𝘀: Residual errors are compensated via a low-rank decomposition, requiring only a small calibration dataset. • 𝗣𝗿𝗮𝗰𝘁𝗶𝗰𝗮𝗹 𝗲𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝗰𝘆: An LLM can be pruned and recalibrated in ~7 minutes using just hundreds of samples, resulting in smaller and faster models. 𝗣𝗲𝗿𝘀𝗼𝗻𝗮𝗹 𝗛𝗼𝘁 𝘁𝗮𝗸𝗲: This does not feel like zero retraining because of the recalibration phase but it makes retraining very efficient!
2
Rayan retweeted
🚀 Major Partnership Drop: Pruna AI × @Wiroai! 🔬 We've sped up Wiro's video generation models with our optimization technology! Our compression delivers blazing-fast inference speeds without sacrificing quality. Performance Gain: • Wan 2.2-TI2V-5B – Text-to-Video-Fast → 16 sec wiro.ai/tools/Wan-AI/Wan2.2-… - wiro.ai/tools/Wan-AI/Wan2.2-… - • Wan 2.2-TI2V-5B – Image-to-Video-Fast → 12 sec wiro.ai/tools/Wan-AI/Wan2.2-… - wiro.ai/tools/Wan-AI/Wan2.2-… - • Wan 2.2-T2V-A14B-Fast → 22 sec wiro.ai/tools/Wan-AI/Wan2.2-… - wiro.ai/tools/Wan-AI/Wan2.2-… - • Wan 2.2-I2V-A14B-Fast → 10 sec wiro.ai/tools/Wan-AI/Wan2.2-… - wiro.ai/tools/Wan-AI/Wan2.2-… - 🏎️ This collaboration showcases how intelligent model compression can transform creative workflows - users now get enterprise-grade video generation with consumer-friendly speed! 👉 See our optimization magic in action at wiro.ai/ ⭐️ Interested in optimizing your models? Let's talk!
4
14
New efficient ml meetup let's go !
🚀 We offer tech croissants and pizzas in post-holiday Paris | Wednesday 17 September 🔥 Join AI developers, ML engineers, and researchers for an evening of cutting-edge optimization insights and practical implementation strategies! This meetup is built for practitioners scaling LLMs, fine-tuning diffusion models, and deploying real-time inference pipelines. 📅 What's Coming: 🎯 7:00 PM: Welcome & Networking 🧠 7:30-8:15 PM: Technical Sessions 🍕 8:15-9:00 PM: Deep Tech Conversations & Food • James Martin (CEO, Better Tech): "AI's Impacts: Solving Real Problems" • Jules Belveze (Software Engineer, Dust): "Early Exiting: Under-Hyped Compression Methods" 🎤 Speaking Opportunity: Submit your talk proposal - we want YOUR breakthrough insights! 📍 Location: Paris 9th (exact venue shared with approved attendees) ⚡️ Limited spots available: lu.ma/i24gy980
1
Rayan retweeted
Qwen Image Edit. 3 seconds, $0.03. replicate.com/qwen/qwen-imag… @PrunaAI do the impossible. > Make the text 3D and floating on a city street
The much anticipated image editing model from Qwen is now on Replicate replicate.com/qwen/qwen-imag… Edit images in just 3 seconds, for $0.03 per image. We've worked with Pruna to deliver you the fastest way to use Qwen Image Edit.
7
33
2
301
Rayan retweeted
Replying to @Alibaba_Qwen
who wants an über fast version of the model? Available with our inference partners :)
The much anticipated image editing model from Qwen is now on Replicate replicate.com/qwen/qwen-imag… Edit images in just 3 seconds, for $0.03 per image. We've worked with Pruna to deliver you the fastest way to use Qwen Image Edit.
2
14
Rayan retweeted
We are really happy to collaborate with @replicate to bring you a 3s per generation Qwen-Image-Edit!
The much anticipated image editing model from Qwen is now on Replicate replicate.com/qwen/qwen-imag… Edit images in just 3 seconds, for $0.03 per image. We've worked with Pruna to deliver you the fastest way to use Qwen Image Edit.
2
7
38
Rayan retweeted
🌊 We didn’t just ride the wave, we shredded it and made it faster 🤘 With Wan 2.2 Juiced, we’ve launched the fastest, most affordable version of Wan 2.2 Video on Replicate, optimized with Pruna’s compression research. ➤ 100x cheaper than Veo3 at $0.05 per video ➤ Around 2x faster than the original Wan 2.2 👉 Ready to experience lightning-fast video generation? Try our endpoints on @replicate: replicate.com/wan-video/wan-… - replicate.com/wan-video/wan-… -
2
2
18
0
just published a @Docker using @FastAPI and @PrunaAI that make you able to download and compile @bfl_ml #Flux, @StablesLabs Stable diffusion and many to come!
Rayan retweeted
🚀 Big update to qwen-image on @replicate ! → replicate.com/qwen/qwen-imag… ⚡ 3x faster: 3s per gen 🎨 New: img2img pipelines 🪄 Fast LoRAs: 4s per gen Checkout this cool realistic image (right) I generated with a realism lora (input in replies) compared to base (left)
5
4
1
35
Rayan retweeted
🔥 Wan 2.2-Image is the cool new kid on the block Wan 2.2 Image generates one 2 Megapixels images for just $0.02 • 𝟮.𝟰𝘅 𝗳𝗮𝘀𝘁𝗲𝗿 𝘁𝗵𝗮𝗻 𝗦𝗲𝗲𝗱𝗗𝗿𝗲𝗮𝗺 • 𝟭.𝟴𝘅 𝗳𝗮𝘀𝘁𝗲𝗿 𝘁𝗵𝗮𝗻 𝗙𝗹𝘂𝘅-𝟭.𝟭 𝗣𝗿𝗼 • 𝟭.𝟭𝘅 𝗳𝗮𝘀𝘁𝗲𝗿 𝘁𝗵𝗮𝗻 𝘁𝗵𝗲 𝗮𝗹𝗿𝗲𝗮𝗱𝘆 𝗳𝗮𝘀𝘁 𝗪𝗮𝗻 𝟮.𝟭 𝗜𝗺𝗮𝗴𝗲. Try it on @replicate: replicate.com/prunaai/wan-2.… In short, have fun, make the most of the model, and help us improve it when needed!
Rayan retweeted
🚀 v0.2.8 Released - Image Generation Metrics Powerhouse! 🔥 Major Feature Drop: • ARNIQA - Advanced image quality assessment • CLIPIQA - Perceptual quality metrics • Sharpness Detection - Crystal-clear evaluation tools Now you can benchmark your compressed models with professional-grade evaluation tools. ⚡ 𝗣𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝗕𝗼𝗼𝘀𝘁: supercharged our CI pipeline with parallel test execution - faster merges, quicker releases! 🐛 𝗤𝘂𝗮𝗹𝗶𝘁𝘆 𝗜𝗺𝗽𝗿𝗼𝘃𝗲𝗺𝗲𝗻𝘁𝘀: • Fixed device state handling for memory metrics • Resolved transformers compatibility issues • Enhanced documentation efficiency • Streamlined deprecated code cleanup 👉 𝗥𝗲𝗮𝗱𝘆 𝘁𝗼 𝗲𝘃𝗮𝗹𝘂𝗮𝘁𝗲 𝘆𝗼𝘂𝗿 𝗺𝗼𝗱𝗲𝗹 𝗰𝗼𝗺𝗽𝗿𝗲𝘀𝘀𝗶𝗼𝗻 𝗿𝗲𝘀𝘂𝗹𝘁𝘀? github.com/PrunaAI/pruna/rel… ⭐ Star the repo if these metrics boost your AI workflow!
1
7