Amnon Shashua · Aug 14, 2025 · 12:54 PM UTC

Amnon Shashua · Aug 14, 2025 · 12:54 PM UTC

Amnon Shashua

Amnon Shashua

@AmnonShashua

Aug 14

Deep reasoning is beyond the capabilities of today’s AI models. GPT5 shows some progress but overall the performance is a far cry to what is required to solve problems at expert level. Statements about models reaching PhD level should be taken with a measure of skepticism.

Shai Shalev-Shwartz

@shai_s_shwartz

Aug 14

Are frontier AI models really capable of “PhD-level” reasoning? To answer this question, we introduce FormulaOne, a new reasoning benchmark of expert-level Dynamic Programming problems. We have curated a benchmark consisting of three tiers, in increasing complexity, which we call ‘shallow’, ‘deeper’, ‘deepest’. The results are remarkable: - On the ‘shallow’ tier, top models reach performance of 50%-70%, indicating that the models are familiar with the subject matter. - On ‘deeper’, Grok 4, Gemini-Pro, o3-Pro, Opus-4 all solve at most 1/100 problems. GPT-5 Pro is significantly better, but still solves only 4/100 problems. - On ‘deepest’, all models collapse to 0% success rate. 🧵

Aug 14, 2025 · 12:54 PM UTC

Aliza Chazan · Aug 14, 2025 · 1:21 PM UTC

Aliza Chazan

@AlizaChazanTalk

Aug 14

Replying to @AmnonShashua

Appreciate so much when performance is measured by truly talented experts who are not affiliated with the companies building and marketing their models. Thank you 🙏🏻

James Hoffmann · Aug 14, 2025 · 1:18 PM UTC

James Hoffmann @JamesHoffmann3

Aug 14

Replying to @AmnonShashua

Interesting study. I feel like this explains why autonomous driving can struggle with "edge cases", especially if they require deeper thinking. And I am guessing this is why @Mobileye argues for compound AI and redundancy instead of pure e2e single large model? @shai_s_shwartz

Val Zudans MD · Aug 14, 2025 · 7:50 PM UTC

Val Zudans MD

@ValZudans

Aug 14

Replying to @AmnonShashua @tikums

AI can mimick the resemblance of reason, creativity, and even consciousness… to the point of fooling humans in a Turing test. but it is impossible to have any consciousness as a non-metabolic system. consciousness is the ontological primary, not the other way around as much as the world appears physicalist.

Gennady Simanovsky · Aug 14, 2025 · 5:07 PM UTC

Gennady Simanovsky @GennadySimanovs

Aug 14

Replying to @AmnonShashua

To my mind, this is just another step in automatizaion - the process that humanity deal with since age xviii😀

O'Neil · Aug 14, 2025 · 1:25 PM UTC

O'Neil @ONeilBox

Aug 14

Replying to @AmnonShashua

thank you Prof. Shashua. you remind me of my graduate Archaeology prof. "ground truth is in layers"