Until this morning, I've never spent time on this problem because iterate convergence is less important to me compared to convergence rates.
What's new/difficult? First, Lemma 1 and the Lyapunov function are not new and are key to prior work on this problem. The key difficulty is theorem 2.
When is it easy?
If we know that f is any reasonable function -- e.g., analytic, semialgebraic (built from polynomial pieces), or more generally (locally) definable/tame -- the theorem follows immediately, because such functions have growth near every critical point. That is:
f(x(t)) - f* >= mu * dist^p(x(t), argmin f)
for some power p and constant mu. Then, since the function gap tends to zero like 1/t^2, it quickly follows (e.g., via Opial) that x(t) converges to the minimizing set -- even at a controlled rate or 1/t^(2/p)
This class comprises essentially every nonpathological problem you see in practice. So in that sense, the result doesn't give us much information beyond the problems we care about.
But let's say you want to cover EVERY convex function. Then the proof takes more work because the convex function can be 'infinitely flat' and we lose the growth above. (See the plot.) That's when theorem 2 gives you something new.
Now the proof of theorem 2 is algebraically very clever. Things cancel in just the right way. I imagine someone who was good at ODEs could have seen this pretty quickly. I'm not one of those people.
When I attempted this problem I tried to apply Opial's lemma from convex analysis. In this context it says that since we know all cluster points of x(t) are minimizers of f, we just need to show that for any minimizer x_* of f, the the limit of the following expression exists as t approches infinity:
norm(x(t) - x_*)
It seems hard to show this directly. The main difficulty is that the term
t\dot x(t)
may not converge to zero. The cleverness in the gpt proof is the cancellation and the reformulation as the linear ODE.
So is this a significant breakthrough? While it's certainly a clever use of ODE techniques, I would say its value is more in satisfying our curiosity for answering the question: do we still get convergence for infinitely flat functions. Beyond that, the techniques do not seem to lead to new insights about other optimization algorithms.
Finally, I have been noticing more and more improvement in GPT5 for math I care about lately, specifically in basic optimization and concentration inequalities. These are areas where there is a lot of mechanical calculations that people perform again and again. These are problems that I used to hand off to graduate students when I ran out of 'calculation energy.' My bet is that openai has generated a ton of synthetic data in these areas because it's easy to create it for such problems. If you ask
@littmath, they seem to have substantially less synthetic data in algebraic geometry/number theory. I'll be very curious how gpt6 performs as the situation changes.