The new AI challenge should be to come up with an eval that is actually solvable by people/gradable and survives for >1 yr.
o3 represents enormous progress in general-domain reasoning with RL — excited that we were able to announce some results today! Here’s a summary of what we shared about o3 in the livestream (1/n)
Dec 20, 2024 · 9:52 PM UTC





