First Proof | Research-Level Math for AI Evaluation

A set of ten math questions to evaluate the capabilities of AI systems to autonomously solve problems that arise naturally in the research process.

On Euler Day — February 7, 2026

In 1768, in his Institutiones calculi integralis, Euler proposed a method for approximating solutions to ordinary differential equations with given initial values. His idea would seem simple in hindsight: compute the slope of the solution curve at a point, step forward a small distance along the tangent line, recompute the slope at the new point, and repeat. The result is an approximation to the true solution curve by a succession of segments.

Formally, given an equation specifying the rate of change of some quantity w as a function of its current value and time, dw/dt = g(w, t), w(0) = w₀, Euler's method updates the estimate as w(t + Δt) ≈ w(t) + α · g(w(t), t). Euler needed his method because the ODEs he cared about, such as the equations of motion for rigid bodies, celestial mechanics problems, and various problems in fluid flow, lack closed-form solutions, and higher-order approximations were beyond the reach of hand computation.

In modern machine learning, training a neural network means minimizing a loss function F over the network's parameters w. The standard approach is gradient descent, which updates parameters by moving in a fixed amount in the direction of locally maximal decrease: wₙ₊₁ = wₙ − α · ∇F(wₙ). Gradient descent, whose particular importance was recognized by Cauchy several decades later, is in fact Euler's method applied to the particular gradient flow ODE dw/dt = −∇F(w(t)).

Gradient descent is used in training for neural networks today for an analogous reason to Euler's, transposed to a different scale. Better numerical methods for approximating ODEs exist; in particular, methods that incorporate curvature information via the Hessian, the d × d matrix of second partial derivatives of F. But for models with hundreds of billions to trillions of parameters, the Hessian has on the order of 10²²–10²⁴ entries. Computing, storing, or even approximating a Hessian at each training step is not feasible with current hardware. First-order methods, which encompass gradient descent and its modern variants like Adagrad and Adam, require only a gradient computation at each step, which scales gently linearly with the size of the parameter space.

On Euler day, we recognize that the simple discretization of the gradient flow ODE, proposed in a more general setting by Euler in the eighteenth century for enabling approximations by hand, today enables the training of neural networks at the frontier massive scale.

Team for February 2026 Release

Mohammed Abouzaid
Stanford University

Andrew J. Blumberg
Columbia University

Martin Hairer
EPFL and Imperial

Joe Kileel
University of Texas at Austin

Tamara G. Kolda
MathSci.ai

Paul D. Nelson
Aarhus University

Daniel Spielman
Yale University

Nikhil Srivastava
University of California, Berkeley

Rachel Ward
University of Texas at Austin

Shmuel Weinberger
University of Chicago

Lauren Williams
Harvard University