First Batch
In baking, the first proof, or bulk fermentation process, is a crucial step in which one lets the entire batch of dough ferment as one mass, before dividing and shaping it into loaves.
This project represents our preliminary efforts to develop an objective and realistic methodology for assessing the capabilities of AI systems to autonomously solve research-level math questions. After letting these ideas ferment in the community, we hoped to produce a more structured benchmark.
We presented a diverse set of 10 research-level math questions, drawn from algebraic combinatorics, spectral graph theory, algebraic topology, stochastic analysis, symplectic geometry, representation theory, lattices in Lie groups, tensor analysis, and numerical linear algebra. Each question arose naturally in the research process of the authors and has been answered with a proof of roughly five pages or less.
Questions & Resources
Read the Paper
Our methodology, the complete set of questions, and discussion of related work.
Solutions
Solutions were released at 11:59pm Pacific Time on February 13, 2026. These include the author solutions, a link to the original encrypted solutions together with the key to unlock them, and the AI solutions produced by the project team.
FAQ
Common questions about methodology, autonomy criteria, grading, and how to participate.
Community Participation
We invite the community to experiment with our ten questions and to share their results and observations online. Ideally, participants should share a complete transcript of their interaction with an AI system. The most credible solutions will be those that were completed before the solutions were officially released.
We are thrilled about the excitement this project has generated, and we are grateful to the community for engaging with us. The Institute for Computer-Aided Reasoning in Mathematics (ICARM), a new NSF Mathematical Research institute, has generously agreed to host a web-public Zulip channel in which discussions of the solutions will be hosted.
We encourage participants to share these questions and their findings on social media using the hashtag #1stProof.
What Counts as an Autonomous Solution
We consider that an AI model has answered one of our questions if it can produce in an autonomous way a proof that conforms to the levels of rigor and scholarship prevailing in the mathematics literature. In particular, the AI should not rely on human input for any mathematical idea or content, or to help it isolate the core of the problem. Citations should include precise statement numbers and should either be to articles published in peer-reviewed journals or to arXiv preprints.
Team for First Batch
The following mathematicians contributed problems and/or led the first batch.