Let’s say that someone had a big pot of money, and wanted to use it to accelerate mathematical discovery. How might they go about doing this?
The traditional approach
Historically it has been governments who have been driving this agenda, with taxpayer-funded institutions such as the NSF (in the US), the EPSRC (in the UK), the ERC (in Europe) and many other national institutions (NSERC, ARC, DFG, JSPS…) typically funding individuals and small teams to work on cutting-edge problems at the boundary of mathematical understanding. These institutions also fund centres and institutes where mathematicians gather for workshops and exchange ideas.
The question I want to raise here is: can new advances in technology somehow accelerate the process of growing mathematical knowledge and understanding in different ways to the traditional ones? And I want to argue that right now this is not as easy as it sounds.
“Duh just let AI do it”
The immediate and unhelpful answer from the AI bros (who may have no understanding of the nature of mathematics) is “this question is not even relevant, AGI will happen within the next few years and then math will be solved”. But this is not what is happening on the ground. Right now there seem to be two relevant technologies poised to help disrupt mathematics: the language model (LLM) and the interactive theorem prover (ITP). There is progress, but there is certainly no “killer app” yet. How are experts using these tools in 2026?
Uses of LLMs in mathematics
Professional mathematicians seem to be using these tools to do a variety of things. For example:
- High-level brainstorming for ideas;
- Literature search;
- Solving intermediate undergraduate/PhD level questions which arise on the way to harder problems.
Amateurs and cranks aim higher, and are currently producing large documents which purport to contain breakthrough results, but which do not survive 5 minutes of expert scrutiny. I continually find it surprising that a non-expert can genuinely believe that they might contribute to research mathematics using only the tools which the experts also have access to. The slop being produced by LLMs is getting better, and I find it extremely depressing ploughing through this kind of stuff, because the AI wants to please, so it wants to prove your theorem for you, even if your theorem is false or not accessible given the tools that humanity currently has access to. LLMs will lie to you. I’ve asked a language model a question and have been unambiguously told that the answer is YES and given a “proof” which, when inspected turns out to be nonsense. Raising a specific issue in the solution with the LLM, it decides that actually you don’t think the question is true after all so it changes its answer to NO and rattles off a “proof” of that instead. Is there a word or phrase yet for the depression that descends when I am reading LLM output trying to decide whether it is mathematically correct or whether the system is trying to be a people-pleaser and is lying to me? This feeling is in stark contrast to my emotions when refereeing a human-written paper, where I am typically working under the assumption that the author surely genuinely believes that they’ve correctly proved the result they’re announcing.
LLMs can have good ideas, but cannot always be trusted, It seems that right now the most successful contributions of LLM to mathematics have been human-LLM collaborations, which are currently showing some signs of promise. I do worry that there might be a bit of a pile-on right now, with human authors claiming that an LLM gave them the key idea for a paper, or was in some other way an essential component of the research, on the basis that it makes the paper look more exciting and modern.
Right now, evidence seems to suggest that the way to make LLMs better at mathematics is not to attempt to train an LLM on mathematics specifically, but just to leave the tech companies to do what they are doing and to see what happens. In particular, it is not at all clear that there is an LLM-only way to accelerate mathematics with money (unless you have many billions, in which case I suppose you can try and make your own LLM; it is not at all clear that you will be able to compete though).
Uses of ITPs in mathematics
Lean is a programming language expressive enough to understand the concepts of a mathematical theorem and proof. If your code compiles, the mathematics is correct. For the last 8 years, the Lean community has been building a library containing most of the results in a typical undergraduate degree, and many more advanced results from graduate courses in certain areas of higher mathematics. More recently, tech companies have been developing tools which can write Lean code. The advantage of an LLM-generated Lean proof is that the LLM can no longer lie to you; if the code compiles, the argument is complete and correct. The argument might also be amateurish or unnecessarily convoluted, although one could imagine future tools being trained to tidy up long-winded but correct arguments.
However, even though ITPs such as Lean offer a solution to AI hallucinations in mathematics, their abilities are limited by the current state of the libraries available. Lean’s mathematics library mathlib is still missing literally hundreds of the definitions used in modern day mathematics. People may have read about Lean/LLM combos being used to solve Erdős problems (Terry Tao is keeping track of recent developments here); but one thing which characterises Erdős problems is that although they may or may not be hard to solve (and the LLM/Lean-solved problems were usually not ultimately that hard, with solutions typically only being a page or two at most and using no advanced machinery), they are always easy to state.
This absence of many modern definitions in Lean’s mathematical library (and indeed in the libraries of all other computer theorem provers) is, I believe, holding the area back.
Growing ITP libraries
Let’s get technical. The Langlands Program is a web of mostly-unsolved conjectures involving mathematical objects called “automorphic representations for a connected reductive group over a global field”. Writing down the definition of an automorphic representation in the language of an ITP would be an extremely long project. But until this work is done, the only technological tool we have for accelerating the Langlands Program is LLMs, with their inaccuracies, hallucinations and lies. This will not scale. So say we have a big bag of money. How do we accelerate ITP libraries with it? For example, how do we get the definition of an automorphic representation into Lean’s mathematics library, so we can state Langlands’ conjectures and let AI work on them within the framework of an ITP, where hallucinations are impossible?
There are four approaches to solving this problem. We could wait until it happens organically (free and very slow), we could get humans to do it, we could get computers to do it, or we could somehow use a human/computer combination. What are the pros and cons here?
The problem with humans
Humans are expensive. Not only that, the humans who are able to do the job of teaching nontrivial definitions in modern research mathematics are a rare breed. As things currently stand, if a human were to embark upon the project of teaching more of the Langlands Philosophy to a computer, they would have to be an expert in both the Langlands Philsophy and the language of the ITP in question. I have been attempting to train PhD students and post-docs in these skills but they get snapped up by tech companies! Also, Lean’s mathematics library has an extremely high bar for code quality and right now it has over 2000 open PRs on its queue (that is, 2000 pieces of code corresponding to mathematics and waiting to be reviewed by human experts). Nothing is going to happen quickly here, with the current state of things. One of the jobs of the recently-created Mathlib Initative is to make serious inroads into the backlog, by paying expert humans to review PRs. We have seen a huge acceleration of feature PRs being merged since the Initiative arrived on the scene. Funding it is certainly one way to accelerate library growth.
But even with this acceleration, I feel like that a human-centred approach to getting Lean to understand the Langlands Philosophy will still need to be powered by a person or people whose job it is to make this happen. And for that we need two ingredients — the person, and the job. Neither are particularly easy to come by, not least because if you are a human who is very good at the Langlands Philosophy then you might want to spend your time proving new theorems in the Langlands Philosophy — after all, that’s what will get you the Fields Medal. Painstakingly going over basic material might not be your own idea of how you want your career to progress.
The problem with computers
Recall the question: how to accelerate growth of an ITP’s mathematics library. Can we use computers? Recently I raised this question on the on the Lean Zulip. After quite a lively general discussion here and here the conclusion seemed to be weirdly similar to the situation with LLMs writing mathematics in the traditional way: a machine might be able to write 10,000 lines of code, but then these lines need careful human review. A particularly delicate issue is with AI formalizing definitions in Lean; these definitions are complex, and if you miss or garble an axiom then your code will still compile, it just won’t mean what it is supposed to mean. As an example: a prerequisite for defining automorphic representations would be to define reductive groups; I was asked what would convince me that an AI-generated definition was correct and I suggested a formal proof of the classification theorem for connected reductive groups over an algebraically closed field — but such a proof would be a challenging and very long formalization task; right now, AI alone is simply not up to this task. And if we need humans in the loop then we’re running back into the problems mentioned above: the right humans are expensive, rare, and even if they have the technical know-how to supervise the project, they could well also have a job in an academic institution and are being paid to do something completely different.
tl;dr
It’s hard to “10x mathematics”, even with funds. AI output needs extremely careful review right now, and this is a dispiriting task which needs to be done by an expert, who probably has better things to do. LLMs will lie to you, whether writing in natural language or definitions in a formal language. Formal languages can check proofs, but humans need to check definitions.
Right now the formalization community on the whole seems to be relying on “let the ITP libraries grow organically, and hope that the taste of those in the community is such that it will grow towards modern research”. Universities are typically unwilling to employ people whose job is library growth because traditionally this skill is not recognised by the mathematical community, which is slow to change (a perennial problem in academia). Many times I have been asked “how do we 10x mathlib? I do not know. And without ITPs in the picture, our only defence against slop is humans ploughing through LLM output, which is inaccurate and vague sufficiently often that it simply cannot yet be trusted without extremely careful expert review. This is one of the main reasons that formalization of mathematics is still essentially all being done by humans.