Shengjia Zhao · Dec 20, 2024 · 9:52 PM UTC

Shengjia Zhao · Dec 20, 2024 · 9:52 PM UTC

Shengjia Zhao

Shengjia Zhao

@shengjia_zhao

20 Dec 2024

The new AI challenge should be to come up with an eval that is actually solvable by people/gradable and survives for >1 yr.

Nat McAleese

@__nmca__

20 Dec 2024

o3 represents enormous progress in general-domain reasoning with RL — excited that we were able to announce some results today! Here’s a summary of what we shared about o3 in the livestream (1/n)

Dec 20, 2024 · 9:52 PM UTC

115

Conor · Dec 20, 2024 · 10:34 PM UTC

Conor

@jconorgrogan

20 Dec 2024

Replying to @shengjia_zhao

What about an eval of evals? :D EG the ability for AI to create evaluations and rating systems that are solvable by people/gradable

AI Leaks and News · Dec 20, 2024 · 10:07 PM UTC

AI Leaks and News

@AILeaksAndNews

20 Dec 2024

Replying to @shengjia_zhao

Might be impossible

Kiki · Jan 15, 2025 · 2:01 PM UTC

Kiki @UYv3iOABQ140338

Jan 15

Replying to @shengjia_zhao

Perhaps in the future, the Earth won’t need too many ordinary people; having elites might be enough.

RA · Dec 21, 2024 · 2:00 AM UTC

RA @hoangbrian

21 Dec 2024

Replying to @shengjia_zhao

It's over