Dan Hendrycks · Oct 18, 2025 · 7:14 PM UTC

Dan Hendrycks · Oct 18, 2025 · 7:14 PM UTC

Dan Hendrycks

Dan Hendrycks

@DanHendrycks

Oct 18

The main capability changes from GPT-4 (2023) to GPT-5 (2025) are: * Large context window jump (8K tokens → 100K+) which means it can read full documents * Support for vision and audio * Improved math skills, which also led to better on-the-spot reasoning capabilities

Oct 18, 2025 · 7:14 PM UTC

113

Dan Hendrycks · Oct 18, 2025 · 7:14 PM UTC

Dan Hendrycks

@DanHendrycks

Oct 18

agidefinition.ai agidefinition.ai/paper.pdf

Dustin · Oct 20, 2025 · 5:48 AM UTC

Dustin

@dustin_zeb

Oct 20

Replying to @DanHendrycks

These upgrades are truly impressive and could significantly enhance how we interact with AI. The expanded context window and multi-modal support will open up many possibilities for more seamless and comprehensive AI applications.

Krishna Kaasyap · Oct 19, 2025 · 8:11 PM UTC

Krishna Kaasyap

@krishnakaasyap

Oct 19

Replying to @DanHendrycks

I think it is a little bit more than that!

Krishna Kaasyap

@krishnakaasyap

Jun 9

Replying to @MatthewJBar

NOOO! We need a comprehensive bird's eye view of the past 2 years to answer this: --- 1) Long context that truly works: 1M context for Gemini and 200k for every SOTA (State-of-the-Art) model. 2) True Multimodality: The ability to generate worlds (e.g., via Veo 3 as a tool) and simultaneously understand various modalities natively, from video and LIDAR to sound and many more. 3) Cost per token: For example, GPT-4 32k was $60/$120, whereas Gemini 2.5 Pro is $1.25/$10. 4) Test-time inference (Reasoning/Thinking): Characterized by coherent, long thinking chains and, in the very near future, multimodal thinking (as already demonstrated by Flash 2.0 image generation at a small scale). 5) Parallel inference: Similar to MCTS (e.g., o1-Pro, Deep Think). 6) Extremely complex multiple tool calls and backtracking: All within a single session (e.g., o3). 7) Ability to work in complex scaffoldings: Examples include OpenAI's Operator (aka CUA), Google's Mariner, and Agent Mode within Gemini. 8) Multiple models/systems from multiple providers at the same performance level: Specifically, Gemini 2.5 Pro, o3, and Opus 4. This convergence alone will enable us to use all of them in a "swarm," applying the best of their capabilities in the most suitable ways. It is always better to leverage multiple intelligences from diverse providers rather than relying on a single model or intelligence from a sole provider (e.g., GPT-4 from OpenAI, March 2023). --- When all the aforementioned capabilities are combined, we observe a step change analogous to the leap from GPT-3 to GPT-4, exemplified by o3 and Gemini 2.5 Pro. GPT-3 was released in May 2020, and GPT-4 followed in March 2023. This represented a solid three-year gap for such a significant step change. While the next three-year cycle would typically point to March 2026, we have already achieved all these capabilities by May 2025. In just five years, from May 2020 to May 2025, we have witnessed two such step changes, from GPT-3 to GPT-4 and then again from GPT-4 to o3/2.5 Pro/Opus 4, signifying two OOMs improvement in "capabilities and usefulness. If this is not nuts, I don't know what is!!

Pablo Moreno 🔸 🇪🇺 🇺🇦 · Oct 23, 2025 · 8:12 AM UTC

Pablo Moreno 🔸 🇪🇺 🇺🇦 @Pablomorecasa

Oct 23

Replying to @DanHendrycks

Jaime Sevilla

@Jsevillamol

Oct 18

My take: The new paper making the rounds is basically a worse version of the ADeLe paper by JH Orallo and friends. kinds-of-intelligence-cfi.gi…

Meth posting · Oct 19, 2025 · 6:29 PM UTC

Meth posting @Meth_posting

Oct 19

Replying to @DanHendrycks

I don't understand the math coverage here.

Robert from Code Web Chat · Oct 18, 2025 · 7:53 PM UTC

Robert from Code Web Chat @robertpiosik

Oct 18

Replying to @DanHendrycks

What do you think, are efforts on building AGI similar to efforts on achieving sustainable nuclear fusion with ITER? I see a lot of common characteristics like big investments and expectations.