Dustin Tran · Aug 1, 2025 · 8:12 PM UTC

Dustin Tran

Pinned Tweet

Dustin Tran

@dustinvtran

Aug 1

So proud of our team! Math and coding remain critical to our mission. Especially proud of our work to land new SOTA in Humanity's Last Exam: 34.8%. A +13% boost: no tools, just an intelligent base model and reasoning capabilities. blog.google/products/gemini/…

Try Deep Think in the Gemini app

Deep Think utilizes extended, parallel thinking and novel reinforcement learning techniques for significantly improved problem-solving.

blog.google

187

Dustin Tran · Sep 19, 2025 · 11:51 PM UTC

Dustin Tran

@dustinvtran

Sep 19

A cheekier version

Yuhuai (Tony) Wu

@Yuhu_ai_

Sep 19

Grok4 Fast maximizing intelligence density.

621

Dustin Tran · Sep 19, 2025 · 11:35 PM UTC

Dustin Tran

@dustinvtran

Sep 19

I departed Google DeepMind after 8 years. So many fond memories—from early foundational papers in Google Brain (w/ @noamshazeer @ashvaswani @lukaszkaiser on Image Transformer, Tensor2Tensor, Mesh TensorFlow) to lead Gemini posttraining evals to catch up & launch in 100 days, then leading the team to leapfrog to LMArena #1 (and stay there for over a year!), and finally working on the incredible reasoning innovations for Gemini’s IMO & ICPC gold medals (w/ @HengTze @quocleix). Gemini has been a wild journey from one paradigm to another: first, revamping our LaMDA model (the first instruction-like chatbot!) from an actual chatbot to long contentful responses with RLHF; then, reasoning and deep thinking by training over long thinking chains, novel environments, and reward heads. When we first started, public sentiment was bad. Everyone thought Google was doomed to fail due to its search legacy and organizational politics. Now, Gemini is consistently #1 in user preference and spearheading new scientific accomplishments, and everyone thinks Google winning is obvious. 😂 (It also used to be the case that OpenAI would jump the AI newscycle by announcing before us from a backlog of ideas for every new Google release; safe to say that backlog is empty.) I have since joined xAI. The recipe is well-known. Compute, data, and O(100) brilliant, hard-working people are all that’s needed to obtain a frontier-level LLM. xAI *really* believes in this. For compute, even at Google I have never experienced this # of chips per capita (& 100K+ GB200/300K’s are incoming with Colossus 2). For data, Grok 4 made the biggest bet in scaling RL & posttraining. xAI is making new bets to scale data, deep thinking, and the training recipe. And the team is quick. No company has gotten to where xAI is today in AI capabilities in as little as time. As @elonmusk says, a company’s first- and second-order derivatives are the most important: xAI’s acceleration is the highest. I’m excited to announce that in my first few weeks, we launched Grok 4 Fast. Grok 4 is an amazing reasoning model, still the top on ARC-AGI and new benchmarks like FinSearchComp. But it’s slow and was never really targeted for general-purpose user needs. Grok 4 Fast is the best mini-class model—on LMArena, it is #8 (Gemini 2.5 Flash is #18!), and on core reasoning evals like AIME, it is on par with Grok 4 while 15x cheaper. S/o to @LiTianleli @jinyilll @ag_i_2211 @s_tworkowski @keirp1 @yuhu_ai_

391

536

8,051

Quoc Le · Aug 1, 2025 · 2:43 PM UTC

Dustin Tran retweeted

Quoc Le

@quocleix

Aug 1

Following its IMO gold-level win, @GoogleDeepMind is sharing Gemini Deep Think with mathematicians for feedback. Excited to see what they discover! 🧠 Plus, an updated Gemini 2.5 Deep Think is now rolling out for Google AI Ultra subscribers. Learn more: bit.ly/3IWcWq0

Try Deep Think in the Gemini app

Deep Think utilizes extended, parallel thinking and novel reinforcement learning techniques for significantly improved problem-solving.

blog.google

306

Dustin Tran · May 6, 2025 · 5:07 PM UTC

Dustin Tran

@dustinvtran

May 6

Our latest and greatest coding model! We've made some big strides for web app and visual development. And it continues dominating in user preference: #1 with a 37 Elo gap from #2.

Demis Hassabis

@demishassabis

May 6

Very excited to share the best coding model we’ve ever built! Today we’re launching Gemini 2.5 Pro Preview 'I/O edition' with massively improved coding capabilities. Ranks no.1 on LMArena in Coding and no.1 on the WebDev Arena Leaderboard. It’s especially good at building interactive web apps - this demo shows how it can be helpful for prototyping ideas. Try it in @GeminiApp, Vertex AI, and AI Studio ai.dev Enjoy the pre-I/O goodies !

Dustin Tran · May 5, 2025 · 9:36 PM UTC

Dustin Tran

@dustinvtran

May 5

This is so good. Love meta-analyses. From a benchmark it's much harder to optimize the test set (implicitly or otherwise).

Lisan al Gaib

@scaling01

May 5

The Ultimate LLM Meta-Leaderboard averaged across the 28 best benchmarks Gemini 2.5 Pro > o3 > Sonnet 3.7 Thinking

Dustin Tran · Mar 25, 2025 · 6:11 PM UTC

Dustin Tran

@dustinvtran

Mar 25

2.5 Pro Exp is a model we're so proud of: #1 on LMArena, #1 on benchmarks like AIME, Aider, MMMU, and MRCR, & significant gains across coding, reasoning, multimodal, and so much more. Try it now! aistudio.google.com gemini.google.com

‎Google Gemini

Meet Gemini, Google’s AI assistant. Get help with writing, planning, brainstorming, and more. Experience the power of generative AI.

gemini.google.com

Google DeepMind

@GoogleDeepMind

Mar 25

Think you know Gemini? 🤔 Think again. Meet Gemini 2.5: our most intelligent model 💡 The first release is Pro Experimental, which is state-of-the-art across many benchmarks - meaning it can handle complex problems and give more accurate responses. Try it now → goo.gle/4c2HKjf

Dustin Tran · Dec 19, 2024 · 5:34 PM UTC

Dustin Tran

@dustinvtran

19 Dec 2024

Here is what Gemini can do on *Flash*. My favorite perk: Gemini 2.0 Flash Thinking has significant gains in core capabilities while also excellent in user preferences (co-#1 with gemini-exp-1206 on @lmarena_ai). The best of both worlds.

Noam Shazeer

@NoamShazeer

19 Dec 2024

We’ve been *thinking* about how to improve model reasoning and explainability Introducing Gemini 2.0 Flash Thinking, an experimental model trained to think out loud, leading to stronger reasoning performance. Excited to get this first model into the hands of developers to try out!

Dustin Tran · Dec 11, 2024 · 5:04 PM UTC

Dustin Tran

@dustinvtran

11 Dec 2024

We’ve been able to ship models in less than 24 hours. I’ve heard multiple VPs state they’ve never seen Google able to ship so quickly before.

Dustin Tran · Dec 11, 2024 · 5:03 PM UTC

Dustin Tran

@dustinvtran

11 Dec 2024

I love the team’s shipping speed: today, not only the base model but also our update to Astra for real-time multimodal interactions, our Jules coding assistant, & Colab with Gemini 2.0.

Dustin Tran · Dec 11, 2024 · 4:54 PM UTC

Dustin Tran

@dustinvtran

11 Dec 2024

Try out Gemini 2.0 Flash today. We made significant improvements across all domains, especially code, math, and multimodal reasoning. And 2.0 newly has native audio and vision generation! aistudio.google.com/prompts/…

Sundar Pichai

@sundarpichai

11 Dec 2024

We’re kicking off the start of our Gemini 2.0 era with Gemini 2.0 Flash, which outperforms 1.5 Pro on key benchmarks at 2X speed (see chart below). I’m especially excited to see the fast progress on coding, with more to come. Developers can try an experimental version in AI Studio and Vertex AI today. It is also available to try in @GeminiApp on the web today, mobile coming soon.

Dustin Tran · Dec 6, 2024 · 5:25 PM UTC

Dustin Tran

@dustinvtran

6 Dec 2024

gemini-exp-1206, out now. #1 everywhere. A 1 year anniversary for Gemini! aistudio.google.com/app/prom…

lmarena.ai

@arena

6 Dec 2024

Replying to @arena

Gemini-Exp-1206 tops all the leaderboards, with substantial improvements in coding and hard prompts. Try it at lmarena.ai !

Dustin Tran · Nov 21, 2024 · 7:04 PM UTC

Dustin Tran

@dustinvtran

21 Nov 2024

The team says hi again

lmarena.ai

@arena

21 Nov 2024

Woah, huge news again from Chatbot Arena🔥 @GoogleDeepMind’s just released Gemini (Exp 1121) is back stronger (+20 points), tied #1🏅Overall with the latest GPT-4o-1120 in Arena! Ranking gains since Gemini-Exp-1114: - Overall #3 → #1 - Overall (StyleCtrl): #5 -> #2 - Hard Prompts (StyleCtrl): #3 → #1 - Coding: #3 → #1 - Vision: #1 - Math: #2 → #1 - Creative Writing #2 → #1 Congrats again @GoogleDeepMind! The LLM race is on fire — progress is now measured in days! See more analysis below👇

126

lmarena.ai · Nov 14, 2024 · 5:17 PM UTC

Dustin Tran retweeted

lmarena.ai

@arena

14 Nov 2024

Massive News from Chatbot Arena🔥 @GoogleDeepMind's latest Gemini (Exp 1114), tested with 6K+ community votes over the past week, now ranks joint #1 overall with an impressive 40+ score leap — matching 4o-latest in and surpassing o1-preview! It also claims #1 on Vision leaderboard. Gemini-Exp-1114 excels across technical and creative domains: - Overall #3 -> #1 - Math: #3 -> #1 - Hard Prompts: #4 -> #1 - Creative Writing #2 -> #1 - Vision: #2 -> #1 - Coding: #5 -> #3 - Overall (StyleCtrl): #4 -> #4 Huge congrats to @GoogleDeepMind on this remarkable milestone! Come try the new Gemini and share your feedback!

Logan Kilpatrick

@OfficialLoganK

14 Nov 2024

gemini-exp-1114…. available in Google AI Studio right now, enjoy : ) aistudio.google.com

306

197

1,575

Dustin Tran · Aug 29, 2024 · 6:24 PM UTC

Dustin Tran

@dustinvtran

29 Aug 2024

Nice work on controlling style biases! In this view, many models are no longer inflated (e.g., response length, formatting). Gemini 1.5 Flash also outperforms gpt-4o-mini overall and across all categories except for coding.

lmarena.ai

@arena

29 Aug 2024

Does style matter over substance in Arena? Can models "game" human preference through lengthy and well-formatted responses? Today, we're launching style control in our regression model for Chatbot Arena — our first step in separating the impact of style from substance in rankings. Highlights: - GPT-4o-mini, Grok-2-mini drop below most frontier models when style is controlled - Claude 3.5 Sonnet, Opus, and Llama-3.1-405B rise significantly - In Hard Prompts, Claude 3.5 Sonnet ties for #1 with ChatGPT-4o-latest. Llama-405B climbs to joint #3. More analysis in the thread below👇

Dustin Tran · Aug 2, 2024 · 6:24 PM UTC

Dustin Tran

@dustinvtran

2 Aug 2024

Welcome back @NoamShazeer to Google! It'll be a great time working together again since 2018. Let's take Gemini which is #1 and continue expanding the limits of its capabilities. techcrunch.com/2024/08/02/ch…

Exclusive: Character.AI CEO Noam Shazeer returns to Google as the tech giant invests in the AI...

In a big move, Character.AI co-founder and CEO Noam Shazeer is returning to Google after leaving the company in October 2021 to found the a16z-backed

techcrunch.com

Melvin Johnson · Aug 1, 2024 · 4:34 PM UTC

Dustin Tran retweeted

Melvin Johnson @melvinjohnsonp

1 Aug 2024

Our latest version of Gemini 1.5 Pro in AI Studio is #1 on the LMSys leaderboard. 🚀 This is the result of various advances in post-training and we have more lined up. Congrats to the Gemini team.

lmarena.ai

@arena

1 Aug 2024

Exciting News from Chatbot Arena! @GoogleDeepMind's new Gemini 1.5 Pro (Experimental 0801) has been tested in Arena for the past week, gathering over 12K community votes. For the first time, Google Gemini has claimed the #1 spot, surpassing GPT-4o/Claude-3.5 with an impressive score of 1300 (!), and also achieving #1 on our Vision Leaderboard. Gemini 1.5 Pro (0801) excels in multi-lingual tasks and delivers robust performance in technical areas like Math, Hard Prompts, and Coding. Huge congrats to @GoogleDeepMind on this remarkable milestone! Gemini (0801) Category Rankings: - Overall: #1 - Math: #1-3 - Instruction-Following: #1-2 - Coding: #3-5 - Hard Prompts (English): #2-5 Come try the model and let us know your feedback! More analysis below👇

154

Dustin Tran · Aug 1, 2024 · 5:39 PM UTC

Dustin Tran

@dustinvtran

1 Aug 2024

Gemini is #1 overall on both text and vision arena, and Gemini is #1 on a staggering total of 20 out of 22 leaderboard categories. It's been a journey attaining such a powerful posttrained model. Proud to have co-lead the team!

lmarena.ai

@arena

1 Aug 2024

112

Dustin Tran · May 29, 2024 · 5:01 PM UTC

Dustin Tran

@dustinvtran

29 May 2024

On results: * Spanish is where I expect models at. Gemini is within CI of #1 and should be #1 (it is so good at multilingual and also #1 on LMSYS non-English). * Coding as well. * Math focuses on grade school math which can be saturated. I expect the ranking to change on more complex problems. * Instruction following is surprising. Would be great to iron out whether it's a quirk from eval or generally consistent.

Dustin Tran · May 29, 2024 · 4:50 PM UTC

Dustin Tran

@dustinvtran

29 May 2024

New public leaderboard from Scale! It looks like a solid set of evals. Mitigates two of the biggest problems in evals today: eval sets contaminated in model training, and rater quality for human evaluation.

Summer Yue

@summeryue0

29 May 2024

🚀 Introducing the SEAL Leaderboards! We rank LLMs using private datasets that can’t be gamed. Vetted experts handle the ratings, and we share our methods in detail openly! Check out our leaderboards at scale.com/leaderboard! Which evals should we build next?