The new AI challenge should be to come up with an eval that is actually solvable by people/gradable and survives for >1 yr.
o3 represents enormous progress in general-domain reasoning with RL — excited that we were able to announce some results today! Here’s a summary of what we shared about o3 in the livestream (1/n)

Dec 20, 2024 · 9:52 PM UTC

6
9
115
Replying to @shengjia_zhao
What about an eval of evals? :D EG the ability for AI to create evaluations and rating systems that are solvable by people/gradable
Replying to @shengjia_zhao
Perhaps in the future, the Earth won’t need too many ordinary people; having elites might be enough.
Replying to @shengjia_zhao
It's over