A set of ten math questions to evaluate the capabilities of AI systems to autonomously solve problems that arise naturally in the research process.

About the First Batch

In baking, the first proof, or bulk fermentation process, is a crucial step in which one lets the entire batch of dough ferment as one mass, before dividing and shaping it into loaves.

This project represents our preliminary efforts to develop an objective and realistic methodology for assessing the capabilities of AI systems to autonomously solve research-level math questions. After letting these ideas ferment in the community, we hoped to produce a more structured benchmark.

We presented a diverse set of 10 research-level math questions, drawn from algebraic combinatorics, spectral graph theory, algebraic topology, stochastic analysis, symplectic geometry, representation theory, lattices in Lie groups, tensor analysis, and numerical linear algebra. Each question arose naturally in the research process of the authors and has been answered with a proof of roughly five pages or less.

Questions & Resources

Read the Paper

Our methodology, the complete set of questions, and discussion of related work.

View Paper on arXiv

LaTeX Source

The LaTeX source of the paper, including the problem statements.

View LaTeX Source

Solutions

Solutions were released at 11:59pm Pacific Time on February 13, 2026. These include the author solutions, a link to the original encrypted solutions together with the key to unlock them, and the AI solutions produced by the project team.

View Solutions and Commentary

Community Participation

We are thrilled about the excitement this project has generated, and we are grateful to the community for engaging with us. The Institute for Computer-Aided Reasoning in Mathematics (ICARM), a new NSF Mathematical Research institute, has generously agreed to host a web-public Zulip channel in which discussions of the solutions are being hosted.

Frequently Asked Questions →

Team for First Batch

The following mathematicians contributed problems and/or led the first batch:

Mohammed Abouzaid Stanford University

Andrew J. Blumberg Columbia University

Martin Hairer EPFL and Imperial College

Joe Kileel University of Texas at Austin

Tamara G. Kolda MathSci.ai

Paul D. Nelson Aarhus University

Daniel Spielman Yale University

Nikhil Srivastava University of California, Berkeley

Rachel Ward University of Texas at Austin

Shmuel Weinberger University of Chicago

Lauren Williams Harvard University

Acknowledgements. We thank the Simons Institute for the Theory of Computing for hosting the organizational meeting of this project in early December 2025, with support from the Director's Opportunity Fund. We thank the Institute for Computer-Aided Reasoning in Mathematics (ICARM) for generously hosting a public Zulip channel for community discussion of this project.