Pioneering a new generation of LLMs.

Joined February 2025
Mercury is refreshed – with across-the-board improvements in coding, instruction following, math, and knowledge recall. Start building responsive, in-the-flow AI solutions! Read more: inceptionlabs.ai/blog/mercur…
Inception retweeted
Diffusion will obviously work on any bitstream. With text, since humans read from first word to last, there is just the question of whether the delay to first sentence for diffusion is worth it. That said, the vast majority of AI workload will be video understanding and generation, so good chance diffusion is the biggest winner overall. Also means that the ratio of compute to memory bandwidth will increase.
Inception retweeted
Mercury is now much better at agentic tasks! You can try out the blazing speed of our dLLMs on your coding agents 😎 Here's a little zombie shooter game I spun up using Mercury with @goose_oss . Go ahead and diffuse w goose @_inception_ai
1
16
0
Mercury runs five times faster than Claude 4.5 Haiku at less than one-fourth the price, while maintaining higher quality.
5
1
35
Today’s LLMs are painfully slow and expensive. They are autoregressive and spit out words sequentially. One. At. A. Time. Our dLLMs generate text in parallel, delivering answers up to 10X faster. Now we’ve raised $50M to scale them. Full story from @russellbrandom in @TechCrunch. techcrunch.com/2025/11/06/in…
🚀We've partnered with ProxyAI! Our Mercury Coder dLLM is now the default for ProxyAI's autocomplete, next edit, and auto apply tooling, providing developers with lightning-fast and accurate code edits. Read more: tryproxy.io/blog/proxyai-inc… #AI #DiffusionModels #dLLM
1
1
1
12
Inception retweeted
Nice, short post illustrating how simple text (discrete) diffusion can be. Diffusion (i.e. parallel, iterated denoising, top) is the pervasive generative paradigm in image/video, but autoregression (i.e. go left to right bottom) is the dominant paradigm in text. For audio I've seen a bit of both. A lot of diffusion papers look a bit dense but if you strip the mathematical formalism, you end up with simple baseline algorithms, e.g. something a lot closer to flow matching in continuous, or something like this in discrete. It's your vanilla transformer but with bi-directional attention, where you iteratively re-sample and re-mask all tokens in your "tokens canvas" based on a noise schedule until you get the final sample at the last step. (Bi-directional attention is a lot more powerful, and you get a lot stronger autoregressive language models if you train with it, unfortunately it makes training a lot more expensive because now you can't parallelize across sequence dim). So autoregression is doing an `.append(token)` to the tokens canvas while only attending backwards, while diffusion is refreshing the entire token canvas with a `.setitem(idx, token)` while attending bidirectionally. Human thought naively feels a bit more like autoregression but it's hard to say that there aren't more diffusion-like components in some latent space of thought. It feels quite possible that you can further interpolate between them, or generalize them further. And it's a component of the LLM stack that still feels a bit fungible. Now I must resist the urge to side quest into training nanochat with diffusion.
BERT is just a Single Text Diffusion Step! (1/n) When I first read about language diffusion models, I was surprised to find that their training objective was just a generalization of masked language modeling (MLM), something we’ve been doing since BERT from 2018. The first thought I had was, “can we finetune a BERT-like model to do text generation?”
Our CEO @StefanoErmon joined the Infinite Curiosity Podcast and shared how our Mercury diffusion LLMs deliver faster, cheaper models and why diffusion is reshaping coding, reasoning, and multimodal AI. Thanks for having him on @PrateekVJoshi! piped.video/watch?v=BaZT4aQI…
1
1
18
We’re in! We are now part of the #AWSGenAIAccelerator2025. We’re looking forward to working with @AWSstartups to help us deliver ultra-fast and efficient diffusion large language models.
3
10
Honored that our co-founder @adityagrover_ has been named to the 2025 Mayfield | Divot AI List! Thank you to @MayfieldFund and @StartupGrind for the recognition alongside 50 innovators shaping the future of AI. See the full list here: divot.org/list
3
You can use Mercury Coder’s Apply-Edit functionality immediately through the @continuedev extension on VS Code. Apply-Edit is also available through the Inception Platform: platform.inceptionlabs.ai #dLLM #InceptionAI #ApplyEdit
1
8
Apply-Edit entails integrating suggested code changes into existing code. With its diffusion-based, parallel generation process, Mercury Coder strictly dominates both frontier LLMs and specialized Apply-Edit models in speed, quality, and cost.
1
2
8
Mercury Coder now supports Apply-Edit capabilities, providing quality on par with GPT-5 at speeds 46x faster!
Our cofounder @adityagrover_ is one of MIT Technology Review's 35 Innovators Under 35, which recognizes the top young innovators around the world!!
Introducing: This year's list of 35 Innovators Under 35. Every year, MIT Technology Review recognizes 35 extraordinary young people brimming with ideas for how to crack tough problems—all of whom are under the age of 35. Get to know them all: trib.al/5cVFFbL
4
3
1
58
🚀Our CEO @StefanoErmon recently spoke on @latentspacepod about Mercury - our family of game-changing diffusion LLMs! 📊 The numbers speak for themselves: 1000+ tokens/second 5-10x faster than speed-optimized models #1 for speed & quality on Copilot Arena Parallel token prediction is the future. Check out the interview👇 piped.video/watch?v=2fDBeMu6… Try it: platform.inceptionlabs.ai | chat.inceptionlabs.ai #AI #dLLM #Mercury
4
18
Inception retweeted
Code editing just got a lot faster.⚡Introducing Next Edit⚡ Next Edit delivers real-time, multi-line suggestions at up to 1100 tokens/sec, powered by @inceptionAILabs Mercury Coder, the world’s first commercial-scale diffusion LLM for code.
4. Batch Processing: Use Bedrock’s batch processing APIs to get a 10x throughput improvement for large-scale tasks.
3. Unified API: Access Mercury models alongside a wide array of other Bedrock foundation models through a single, consistent API, simplifying development and integration.