Lean in 2024

A huge amount happened in the Lean theorem prover community in 2023; this blog post looks back at some of these events, plus some of what we have to look forward to in 2024.

Modern mathematics

I personally am a member of the Lean community because of its phenomenal mathematics library mathlib, which was born around six and a half years ago and has now become a very fast-growing database of mathematics, containing many (but by no means all) of the definitions used by modern research mathematicians today. 2023 was the year where we really began to accrue evidence that formalisation of certain parts of serious modern mathematics is feasible in “real time”. Quanta’s video on 2023’s biggest breakthroughs in mathematics talks about several results; let’s take a look at their status when it comes to formalisation in Lean.

The first result in the Quanta video is the result by Campos–Griffiths–Morris–Sahasrabudhe giving an exponential improvement for upper bounds for Ramsey numbers. The previous post on this blog, by Bhavik Mehta, discusses his formalisation of the result; he formalised the 50+ page paper in five months (before the paper had even been refereed).

The second result in the video is the work by Smith, Myers, Kaplan and Goodman-Strauss on aperiodic monotiles. Myers has developed enough of the theory of planar geometry to formalise the solution to a 2019 IMO problem, and announced earlier this week his intention to formalise the result in Lean in 2024.

The third and final result mentioned in Quanta’s round-up was the Kelley–Maka result on bounds for how big a set of positive integers must be before it must contain a 3-term arithmetic progression. A formalisation of the result is being actively worked on by Yael Dillies; they have made substantial progress.

However, when it comes to formalising mathematics in real time, we now have an even more spectacular data point: Terry Tao led a team which formalised the main result in his breakthrough new paper with Gowers, Green and Manners proving Katalin Marton’s polynomial Freiman-Ruzsa conjecture. The 33 page paper was formalised in just three weeks (before the paper had even been submitted)! Tao used Patrick Massot’s Lean blueprint software to make this web page which contains a LaTeX/Lean hybrid document, containing an explanation of the proof which is comprehensible to both a human and a computer. He also made very effective use of the Lean Zulip, with summaries posted every few days of what was ready to be worked upon; ultimately 25 people contributed.

Based on these and other stories coming out of the mathlib community, one can come to the conclusion that there are parts of the modern theory of combinatorics (and in particular additive combinatorics) for which asking for a complete computer formalisation is not an unreasonable thing to do. Mehta and Bloom also formalised Bloom’s proof of the Erdos-Graham unit fractions conjecture, which for me is evidence that some parts of analytic number theory are also becoming amenable to this technique.

Fermat’s Last Theorem

However not all of modern mathematics is ready to be eaten by these systems. For example in September I was awarded a grant by the EPSRC (the UK mathematics funding body) to spend five years of my life formalising a proof of Fermat’s Last Theorem in Lean! Lean has the definitions of elliptic curves and modular forms required to start us off, but we are still working on other definitions, such as automorphic representations, finite flat group schemes and the like. I cannot yet link to a blueprint because the project does not officially start until October 2024 and right now I’m more worried about teaching my undergraduate Lean course, but I am part way through writing one and right now the graph looks like this:

People familiar with Tao’s blueprint graph will know that green means “done” and blue means “ready to be done”. The new orange colour means “a long way away”. Already we see the first major bifurcation: we need Mazur’s theorem on the torsion subgroups of elliptic curves over the rationals, a huge project, and completely independent of the R=T theorems we’ll need to push Wiles’ ideas over the line. The route we’ll be taking is not the original Wiles/Taylor-Wiles argument, but a more modern one due to Richard Taylor, who is collaborating with me on the mathematics of the project.

The Focused Research Organisation

In July 2023 the Lean Focused Research Organisation (FRO) was launched. Focused Research Organisations are a new type of nonprofit for science, backed by Convergent Research. Lean’s FRO is funded by the Simons Foundation, Sloan, and Richard Merkin. This has enabled a team of people to work full-time on supporting the ever-growing Lean infrastructure and focus on many things (such as making the mathematics library compile faster, and adding infrastructure and tactics to enhance the user experience: I give some examples below). The FRO is hiring, by the way. One example of the ways that the FRO has made my own life easier: Scott Morrison has developed omega for Lean 4, a tactic for proving obvious-to-mathematicians lemmas about integers and naturals; these things will no longer slow down my undergraduate students.

In fact July 2023 was an amazing month for the community, because as well as the announcement of the FRO, it marked the end of the project to port all of Lean’s mathematics library from the now end-of-life Lean 3 to Lean 4.

Growing user base in computer science

But let’s get back to 2024. Sometimes it felt to me that Lean 3 was viewed as “a tool for mathematics”, with its mathematics library the flagship project. But both the FRO and other researchers in computer science are contributing code and tools to make Lean much more than this. Between 9th and 12th January 2024, the Lean Together 2024 conference showcased some of what was currently going on. As well as talks by mathematicians on growing the mathematics library, there were plenty of talks by computer scientists on tools or projects which they have been working on with Lean. For example, David Thraine Christianson from the FRO presented Verso, his domain-specific language for documentation. Static blog posts such as Tao’s Lean 4 proof tour can now be enhanced using this tool. Take a look at Joachim Breitner’s blog post about recursive definitions on the Lean-lang web page: this contains “live” code (hover over it!) and the entire post was generated from a Lean file; if the language updates and the code breaks then a warning will be generated. Perhaps in the future, infrastructure developed using Alex Best’s leaff tool (also presented at the conference) could even be used to suggest fixes.

Other non-mathematical highlights for me at the conference: Emina Torlak gave a presentation about how Amazon are using Lean in their Cedar project; Evgenia Karunus and Anton Kovsharov spoke about Paperproof, their visualisation tool for Lean proofs (which undergraduates seem to really like when I show it off in talks), and Kaiyu Yang spoke about LeanCopilot, an LLM tool trained on Lean’s mathematics library, whose main goal is to generate Lean proofs. I have always been skeptical of people who want to claim “Lean + AI = AGI”; in fact I’ve been maximally on the other side of the argument, with my opinion being that LLMs are right now of essentially no use to me as a working mathematician. But because I thought my undergraduates might like what I’d seen in the talk, I added LeanCopilot as a dependency for the repository I’m using in my undergraduate course, and it really can do a bunch of the problems (sometimes in totally different ways to my model solutions). I also tried it on a question someone asked on the Lean Zulip chat and it solved that too — and quickly. I think it will be interesting to see whether future versions of these tools will actually be of use to me in the Fermat project I mentioned above (which is funded until 2029; who knows what language models will be capable of then).

What else is to come in 2024?

Lean’s widgets offer the user a very flexible way of presenting data in an interactive graphical way. One application: I am very much hoping that this year will be the year that I will be able to start doing high school level geometry interactively in Lean; I give talks in schools about the Natural Number Game but I would like to write some new material about how to interact with statements such as “angle at centre equals twice angle at circumference” (proving it, using it) in Lean: there are some subtleties here which are not usually explained to schoolchildren (such as the difference between oriented and unoriented angles) and which would be exposed by a formal approach.

As far as mathematics goes, my team at Imperial are currently working on Tate modules, Selmer groups, the local and global Langlands program, commutative algebra, Lie algebras and more. When I start working full time on Fermat’s Last Theorem this will push the development of more of this kind of mathematics in Lean. There seems to be a coherent plan developing on the Lean Zulip for getting a basic theory of L-functions formalised (thanks to David Loeffler, Michael Stoll and others); one natural goal would be results like Cebotarev’s density theorem, used in Fermat. Heather Macbeth in 2023 formalised the definition of Riemannian manifolds; I feel like differential geometry is another area which now has huge potential for growth. The FRO have made it clear that they want to support this push for more mathematics in Lean; any mathematician interested in joining the community might want to start by working through the free online mathematics in Lean textbook.

For plans in computer science-related topics in 2024, I would invite you to consult the FRO roadmap . One thing I like about that page is the broad vision which the FRO has: one of the goals is “enhancing its interface and functionality to cater to a wide range of users, from mathematicians and software developers to verification engineers, AI researchers, and the academic community.”

As always, the Lean community web pages are another place where people can begin to learn about Lean (including installation instructions and so on); anyone who wants to try a fast interactive type-based programming language for their project, or who wants to formalise some mathematics, is welcome to jump in, and ask questions on the Zulip chat. Full names are preferred, and the two rules which have emerged in our community are: stay on topic, and be nice. Beginner questions are welcome in the #new-members stream.

Posted in Uncategorized | Tagged , , | 3 Comments

Formalising modern research mathematics in real time

(This is a guest post by Bhavik Mehta)

On March 16, 2023, a paper by Campos, Griffiths, Morris, and Sahasrabudhe appeared on the arXiv, announcing an exponential improvement to the upper bound on Ramsey numbers, an open problem since 1935. Around the same time, posts by Terence Tao, Timothy Gowers and Gil Kalai appeared, all discussing this announcement. These were soon followed by articles in Quanta, Nature and New Scientist. The new paper was clearly exciting news.

About six months after this, I finished formally verifying the titular result of Campos, Griffiths, Morris, and Sahasrabudhe’s beautiful paper in the Lean Theorem Prover. In this post I’ll discuss the gap between informal mathematics and formalised mathematics, demonstrating that very recent and involved arguments in combinatorics can often be formalised in real time. The code for this formalisation is available here: https://github.com/b-mehta/exponential-ramsey/blob/main/src/main_results.lean, and this specific file shows an overview of the relevant definitions and theorems. Another version can be found in a branch of Lean’s mathlib, which additionally shows that the proof has been checked by Lean, and further verified by an external type-checker.

This isn’t the first recent result in combinatorics to be formalised in Lean soon after its announcement: together with Thomas Bloom I formalised a 2021 paper of his within a year of it being uploaded, and Ellenberg-Gijswijt’s 2016 result giving upper bounds on the cap-set problem (which later appeared in the Annals) was formalised by Dahmen, Hölzl, and Lewis in 2019. Of course, no post about formalisation in combinatorics would be complete without a mention of the the four colour theorem, proved in 1976 (with the final gaps closed in 1989) and formalised in Coq in 2005.

5 years, 50 pages, 5 months

Quite unlike the remarkably short proof of Ellenberg-Gijswijt result on cap-sets, the new result here has a particularly involved proof which was a 5-year long project of the authors, and the paper comes to over 50 pages in total. In comparison, the formalisation took me about 5 months, including prerequisites, and I ended up with around 20k lines of Lean code.

For the sake of comparison with other recent large-scale formalisations, the Liquid Tensor Experiment took just over 18 months with a large amount of person-hours, and the portion of the lecture notes that was formalised was around 10 pages; although that project had a massive amount of prerequisites to formalise also. The cap-set proof (4 pages long) similarly had a large proportion of prerequisities to deal with, as that project was written at a time when Lean’s mathlib had not much support for polynomials, which is of course necessary for a proof which uses the polynomial method.

The formalised exponential Ramsey proof also relies on mathlib, most heavily the support for manipulating finite sets, but vitally on asymptotic estimates as well, since almost all the statements are about “sufficiently large” naturals. Another key aspect of the proof is probabilistic constructions, in which we show the existence of an object because it has non-zero probability of appearing if we try to make it randomly. Fortunately, the probability spaces needed here are all finite, so they can be formally represented by sums on finite sets.1

The end of the proof requires surprisingly precise numerical approximations, as the final part of the argument fundamentally relies on the fact that the red and blue regions in the diagram below don’t intersect. In the original paper, it is enough to say that this can be checked by computer (though the authors give suggestions towards a slightly more detailed proof as well), but formally some more careful analysis is needed.

A graph of a red and blue region which are very close to intersecting.

In terms of the global structure of the proof, the formal version follows the informal version closely, which is a testament to both the clarity of the paper, and the strength of Lean and mathlib to formalise practical arguments. In particular, no blueprint was needed for this formalisation, and it was in some sense done “on the fly”. I understood each part of the proof as it came to translate it, rather than writing a separate semi-formal version first. Indeed, when I started, my goal wasn’t to verify the entire paper. It was just to understand the argument because I thought it was an interesting result!

Mathematical typos

An obvious question when formalising a maths paper is how much the proof needed to change to be formalised: in other words whether there were any flaws in the paper. This question is surprisingly subtle, but a simple summary is that most of the mistakes I identified were easily fixable. Importantly all mistakes were fixable in a way that preserved the spirit of the argument, so the Lean version of the proof can reasonably be said to be a translation of the informal proof.

I like to consider certain such small mistakes “mathematical typos”. It’s fair to call them errors, and just like typos in English prose can confuse a non-native speaker, mathematical typos can impede meaning for a reader who isn’t familiar with mathematics. However, with a little experience, it’s not hard to see what the intended meaning was, and fixing it isn’t too bad. A mathematical typo certainly doesn’t constitute a flaw in the proof, and almost all the problems I found fall in this category.

That said, there were two places where I needed some help from the authors to figure out how to fix something from the paper, but it still remains true that the spirit of the argument was unchanged.2

What are Ramsey numbers?

The existence of Ramsey numbers was shown by Frank Ramsey in a 1930 paper whose goal was to show decidability of a particular fragment of first-order logic. Shortly afterward, in 1935, a paper of Erdős and Szekeres popularised Ramsey numbers, giving rise to the fundamental branch of combinatorics now known as Ramsey theory. Ramsey theory itself can fill entire lecture series and textbooks, but in this post I’ll focus on the simplest and most well-studied case: two-colour Ramsey numbers.

The definition of two-colour Ramsey numbers is fairly easy to give. For any numbers k and l, R(k, l) is the smallest number of people at a party you need, to find k people who all know each other or l people who all don’t know each other. More mathematically, for any red/blue colouring of the edges of the complete graph on R(k, l) vertices, you can find k vertices with only red edges between them or l vertices with only blue edges between them.

Erdős and Szekeres in their 1935 paper gave an upper bound of R(k, l) \leq \binom{k + l - 2}{k - 1}, (implying that the diagonal Ramsey number R(k, k) can be upper-bounded by 4^k). The research community attempted improvements to the problem of decreasing their upper bound, and in 1988 the first polynomial improvement was found by Thomason. His method was taken further by Conlon in 2009 and later by Sah, eventually reaching a bound of the form R(k, k) \leq 4^{k - (\log k)^2}. Conlon’s paper, giving the first super-polynomial improvement to the Erdős-Szekeres bound, was especially noteworthy and appeared in the Annals of Mathematics. But despite all of these successes, the question of decreasing the base of the exponent in the bound was still open. Is there a constant C < 4 for which R(k, k) \leq C ^ k?

It is worth saying something briefly about the lower bound for R(k, k). It was first shown by Erdős in 1947 that R(k, k) \geq \sqrt{2}^k using a probabilistic argument which was itself groundbreaking, and gave rise to the field of probabilistic combinatorics. While a lower order improvement has been made to this, the base \sqrt{2} has not been improved, giving a wide gap between the lower and upper bounds of \sqrt{2} and 4.

Finally, in March 2023, the paper of Campos, Griffiths, Morris and Sahasrabudhe was made public, giving the desired exponential improvement, a massive strengthening of Conlon’s result. They show that there is a constant c > 0 such that for all k, we have R(k, k) \leq (4 - c) ^ k. This paper is in the publication process, and this is the result which I formalised.

As part of this formalisation project, I additionally formalised a general form of Erdős’s lower bound and gave some upper and lower bounds for the off-diagonal Ramsey numbers R(3, k).3

“Suppose aliens invade the earth and threaten to obliterate it in a year’s time unless human beings can find the Ramsey number for red five and blue five. We could marshal the world’s best minds and fastest computers, and within a year we could probably calculate the value. If the aliens demanded the Ramsey number for red six and blue six, however, we would have no choice but to launch a preemptive attack.” – Paul Erdős, relayed by Graham and Spencer.

This famous quote of Erdős discusses the difficulty of computing R(5, 5) and R(6, 6), both of which are still unknown, and the main result of this post doesn’t get us any closer. Nonetheless, as part of this project, I formalised bounds on some of the known small values of Ramsey numbers, including R(4, 4).4

What’s next?

To get to the primary result – an exponential improvement on the upper bound for Ramsey numbers – the authors give two proofs, their second proof providing a slightly better constant. Here I only formalised the first, meaning a future project could be to improve the constant in the Lean proof. This should be easier now because a large portion of the technical difficulties have been overcome. Furthermore, any future claimed improvements on the number should be much quicker to verify, since the original framework of this paper can be extended.

This formalisation was done in Lean 3, for multiple reasons, but most importantly that when I started the prerequisite mathlib lemmas were not yet implemented in Lean 4. I’m in the process of trying to port the project to Lean 4.5

Finally, if you want to get involved in the Lean community, contribute to mathlib, use Lean to check some part of your research, or just try to understand a new argument like I did, the community website is a great place to start.


  1. While mathlib does have support for probability spaces in general, working with them in this case ultimately requires viewing them as finite sums anyway, so for convenience it’s simpler here to avoid the formal probability world. ↩︎
  2. For brevity I’ll avoid too much detail, but one example is in the proof of Theorem 9.1, where we cannot quite guarantee the red density is as large as claimed. Allowing for some small error term in \eta however, the proof can be recovered using that |U| can be made large enough. ↩︎
  3. The bounds on the off-diagonal Ramsey numbers I formalised aren’t the best-known ones, there’s really exciting recent work on lower bounds for R(3, k) and R(4, k) which I’d love to see formalised. ↩︎
  4. The problem of determining Schur numbers falls into Ramsey theory, and Schur numbers are also difficult to compute. But a recent result of Marijn Heule computes Schur number five using SAT solvers, and checks its result in a proof-checker, which itself has been formally verified in ACL2. It would be really exciting to see similar methods applied to Ramsey number five as well! ↩︎
  5. It should – in principle – be possible to port the formalisation to Lean 4, but at the moment it seems more difficult than one might hope. ↩︎
Posted in Research formalisation | Tagged , | Leave a comment

Lean 2022 round-up

A brief survey post containing some of the things which happened in the Lean community in 2022.

The Liquid Tensor Experiment

In December 2020, Fields Medallist Peter Scholze challenged the formal computer proof community to verify one of his theorems with Clausen. By June 2021 a key technical lemma had been formalised by the Lean community and there was a certain amount of media attention (e.g. here and here), to the extent that many people assumed that the challenge was over. It was not. There was the small matter of developing the theory of abelian categories, derived functors, Ext groups and various other standard tools in research mathematics. This took another year. The project was finished at the Lean for the Curious Mathematician workshop in ICERM, where Johan Commelin got the formalisation of the full Clausen–Scholze result compiling. Adam Topaz was another big player here. See the Lean community blog post announcing the result for more details, and a follow-up post discussing the question of how someone who doesn’t want to read much Lean code might attempt to check that the theorem has been correctly stated. There is also the blueprint which describes the precise mathematical details of the route to the goal.

The Sphere Eversion Project

People who know some mathematics but have not used proof assistants might have a mental model of the following form: “discrete stuff should be easy to formalise, continuous stuff is probably harder because there are pictures involved”. Patrick Massot, Floris van Doorn and Oliver Nash ably demonstrated that continuous picture-based mathematics — in this case differential topology — was now in scope for theorem provers, with their paper “Formalising the h-principle and sphere eversion“. The work formalises a modern proof of an instance of Gromov’s h-principle, and deduces Smales’ result that you can turn a sphere inside-out without creasing or snapping it, if you allow the sphere to pass through itself. The work also comes with a blueprint which explains the mathematical route the authors took. The work was accepted for publication at CPP2023 and there will be a talk on the project on 16th January 2023 at CPP.

Unit fractions

In late 2021, Thomas Bloom announced a proof of an old conjecture of Erdos and Graham on unit fractions; any positive density subset of the positive integers contains a finite subset the sum of whose reciprocals is 1. This result still surprises me — for example can you really find a finite set of distinct positive integers all of which end in 7 and such that the sum of their reciprocals is 1? Apparently you can. Bhavik Mehta and Bloom started work on a full formalisation of Bloom’s result (including the prerequisites, such as an instance of the Hardy-Littlewood circle method). They finished in June 2022, before Bloom had received a referee’s report for the paper. Here’s a link to the blueprint.

p-adic Hodge theory

Fontaine kick-started modern p-adic Hodge theory with his observation that certain p-adic period rings played a key role. María Inés de Frutos Fernández has developed enough of the theory of local fields to define the first two rings in the theory: ℂₚ and B_{HT}, together with the Galois action. An attack on local class field theory is now viable.

Kaplansky’s unit conjecture

In 2021, Gardam disproved an old conjecture of Kaplansky. In 2022 Anand Rao Tadipatri and Siddhartha Gadgil formalised Gardam’s disproof in Lean (and they’re speaking about the work in the London Learning Lean Zoom seminar on Feb 9th), but in contrast to the three projects above, they used Lean 4, meaning that they had no access to Lean 3’s mathematics library mathlib. Talking of which…

The port has started!

The thing powering mathematics in Lean is the Lean 3 mathematics library mathlib. The problem is that we now have a viable Lean 4, so the library needs to be ported, and Lean 4 is not backwards compatible with Lean 3; it’s better. A first pass at the problem was made by computer but it seems that we need humans to finish it off. The port started in earnest in December, and right now we are 12 percent of the way through. Will it get harder or easier as we proceed? Nobody knows! That’s one of the fun things about doing research. Anyone interested in learning Lean 4 by porting some basic mathematics from Lean 3 (finite sets, subgroups, topology etc all still need to be done) is welcome to take a look at the mathlib 4 porting wiki. With the advent of cached compiled olean files the job of porting got much easier in the last week or so, as you no longer have to compile anyone else’s branch manually. I am very much hoping that the port will be complete by the summer, and the way to make this happen is to get more people on board. Please consider joining us if you want to make a contribution to open source software in mathematics!

Posted in Uncategorized | Tagged , , , , | 4 Comments

Beyond the Liquid Tensor Experiment

This blog post is a report on how a bunch of mathematicians recently did something which five years ago I would have said was completely out of reach. Note that it’s my (Kevin Buzzard’s) personal take on the work; I am not speaking for all the authors or the Lean community or the mathlib community or any other organization. Note also that I was personally involved with the work.

The background

In December 2020, Fields Medallist Peter Scholze challenged the computer proof community to formally verify an important new theorem of himself and Dustin Clausen. Scholze issued the challenge here, and the challenge itself is now published in Experimental Mathematics, one of very few mathematical journals that is currently showing an interest in the area of formalisation. The name of the challenge was the Liquid Tensor Experiment (LTE), and yes it was a homage to the Liquid Tension Experiment, a band which Scholze is fond of. The theorem behind the challenge is a highly technical result, and I will not go into the details; check out the links I just mentioned for more on the mathematics involved. It would not be unreasonable to describe the theorem as a complex result about complex mathematical objects.

I have a bucket list of milestones for the area of mathematical formalisation, and formalising a complex theorem about complex objects was on this list. Why? One could argue that many of the major computer formalisations (the Kepler conjecture, four colour theorem and odd order theorem) were complex theorems about “simple objects” (spheres, planar graphs and finite groups, respectively). A formalisation which should not be placed in the same category as any of these milestone results, but which could perhaps be regarded as complementary to these major achievements, is my formalisation with Commelin and Massot of the definition and very basic properties of perfectoid spaces — this amounted to very trivial theorems about complex objects. For me, the formalisation of a proof of a complex theorem about complex objects was a missing part of the puzzle.

But not any more, because Johan Commelin, assisted by Adam Topaz and a team of other mathematicians, announced earlier this month that their Lean formalisation of Scholze’s challenge was complete. The final touches of the formalisation were put together at ICERM in Providence RI, USA, during the Lean for the Curious Mathematician 2022 workshop in July. These workshops are for working mathematicians with little or no Lean experience and who are interested in finding out what’s going on and how to get involved. If this sounds like you, then the videos for the ICERM workshop are now online at the above link. If you’re interested in seeing Commelin’s announcement of the completion of the project, it’s just over 40 minutes into my talk on the status of algebraic geometry in Lean.

More on the challenge and its resolution

The reference for the proof which we followed was Scholze’s lectures on analytic geometry, the notes for a course he gave in 2019/20 in Bonn. The theorem we formalised was Theorem 9.1 of these notes (see also Remark 8.15). On p59 of the pdf there is a five line argument reducing Theorem 9.1 to Theorem 9.4, and then the next six pages contain a proof of Theorem 9.4. I’ve claimed above that Theorem 9.1 is a complex theorem about complex objects; however Theorem 9.4 is a more low-level statement, which unravels to a technically messy but essentially completely low-level claim. It took around one month to formalise the statement of Theorem 9.4, and a further five months to formalise its proof. This averages out at about one page of mathematics per month. You can see a map of the formalisation of the proof of Theorem 9.4 here. Clicking on a rectangle (definition) or an oval (theorem) will pop up a LaTeX statement of the result and will offer you a link to the corresponding Lean code. We finished the proof of 9.4 about a year ago, and to my mild surprise the media got interested; we were even featured in Nature. What probably caused the interest was that one of Scholze’s motivations to encourage a formalisation was that he was not convinced that too many humans had, or would, plough through the proof of Theorem 9.4; the formalisation gave him the assurance that his work was correct.

All that remained then for the challenge, was to formalise the five lines in the paper which showed that Theorem 9.4 implied Theorem 9.1. I’ll reproduce them here:

“By the preceding discussion, one can compute RHom_{\mathbb{Z}[T^{-1}]}(\overline{\mathcal{M}}_{r'}(S), \widehat{V}) as the derived inverse limit of C_c^{\bullet} over all c > 0; equivalently, all c \geq c_0. Theorem 9.4 implies that
for any m \geq 0 the pro-system of cohomology groups H^m(C_c^{\bullet}) is pro-zero (as H^m(C^{\bullet}_{kc})\to H^m(C_c^{\bullet}) is zero). Thus, the derived inverse limit vanishes, as desired.”

These five lines took a further year to formalise, and that’s what has just been completed. The reason it took so long was that the Lean community had to build a working theory of abelian categories, cohomology, Ext groups, derived functors etc etc as prerequisites, and whilst some of these subjects had been approached before in theorem provers, we had to get everything formalised in one system at once and everything playing well with each other. It is worth noting however that even though I’m claiming that the proof of the theorem is “complex”, the actual length of the paper proof is nowhere near as long as, for example, the proof of the Kepler conjecture. However I do regard the result as an indication that we’re going in the right direction, pushing these systems to do things they’ve never done before (although of course I was involved in the formalisation, so one should bear this in mind when evaluating my claims here).

There’s something called the “de Bruijn factor” of a formalisation, which is the ratio of the number of lines of computer code to the number of lines of human mathematics. One major goal in the area of computer formalisation is to “get the de Bruijn factor down to 1”, that is, to make formalising mathematics in a computer no more difficult than formalising it in LaTeX. However the Commelin-Topaz-… formalisation of the Liquid Tensor Experiment seems to make a mockery of this concept, because around 30,000 lines of code went into formalising the five page proof of Theorem 9.4, and around 60,000 lines of code went into formalising the five lines above. At the very least it indicates that humans can squeeze a huge amount of mathematics into one line of text. Had the required basics (abelian categories etc) been in Lean’s mathematics library mathlib already, the formalisation would have taken far less time and far fewer lines. One thing we did benefit from was Scott Morrison’s work — Scott has built a huge amount of category theory in mathlib over the previous six years, providing us with a stable foundation to build on.

One question which a professional mathematician with no Lean experience could ask when trying to evaluate the work is: how to be sure that the Lean formalisation of the challenge actually corresponds to the mathematical version of the challenge? For example how can we tell that the symbol in the Lean code actually corresponds to the real numbers, and there is no cheating in the background? Commelin and Topaz have attempted to deal with this issue by making an examples directory in the repository, where they prove basic theorems about the objects involved (for example they show that is a complete ordered field and that any complete ordered field is uniquely isomorphic to , as evidence that Lean’s really does correspond to the real numbers and there is no trickery going on). They also created evidence for Lean’s Ext groups, profinite spaces, condensed abelian groups and so on; if you understand the mathematical definitions of these objects then you can see an indication that the project’s definitions coincide with the usual mathematical definitions, even if you cannot directly read the Lean code.

What did we learn?

Peter Scholze already explains in a second blog post two quite interesting things that we learnt during the process. Firstly, we learnt that the stable homotopy theory used in the original proof (via the Breen-Deligne resolution) can be eliminated. The way this was discovered was quite interesting; the formalisers were avoiding having to formalise this part of the proof but we were forced to start formalising the statements of what we needed; once these were clarified Commelin and Scholze realised that actually they could get away with a weaker theorem; initially we were calling the trick the “Commelin complex” but later on we discovered that the construction (but not the theorem needed about it) was made by MacLane many decades ago. And secondly Scholze seemed to learn something conceptual about the proof — he described it as “what makes the proof work”. You can see his thoughts on the matter at this mathoverflow question. But in some sense this is old news (the second blog post was published over a year ago). What else did we learn?

In my mind, one of the most important things we learnt is that if you get a group of mathematicians who are focussed on formalising an arbitrary piece of modern mathematics, then it can get done, however complex the theory is. In other words, we have a substantial piece of evidence showing that the software that computer scientists have developed for doing this task has now “caught up with humans”. One could regard this as a triumph. One could also regard it as a pretty weak statement. The computer did not “do” the mathematics — the humans did it; first Scholze and Clausen did it, and then Scholze write the argument down (in pdf notes for the course in Bonn, linked to above) in a form suitable for mathematicians, and then Commelin and Massot wrote a more detailed mathematical blueprint in a “hybrid” part-Lean part-LaTeX format (you can view the blueprint here; it goes into much of the detail of the proof), and then finally a team of people transcribed what they had written from a human language into a computer language. One thing we learn is that the computer language (Lean’s dependent type theory, in this case) is rich and expressive enough to be able to formalise recent and complex work of a Fields Medallist. One could ask whether it is feasible to formalise this work in a weaker logic, for example in Higher Order Logic, the logic used by Isabelle/HOL and the other HOL systems. I don’t think anyone actually knows the answer to this question; I have conjectured that the answer is “no” but there are expert HOL users who seem to conjecture that the answer is “yes”. My real fear is that we will never find out. Manuel Eberl is making fabulous progress formalising what I would call “standard MSc mathematics” such as the basic analytic theory of modular forms in a HOL system, but what we are missing is a HOL evangelist from the mathematical community (a kind of HOL version of me) and I don’t know how to create such a person. Indeed in some sense I am making the problem harder, because my constant evangelism for Lean to mathematicians has the consequence that research mathematicians who get interested in formalisation tend to come to Lean first, and the situation is becoming more and more like a monopoly. There are plus and minus points to this; one minus point is that nobody seems to be working on getting HOL to do things that “it wasn’t designed to do” (for example a modern treatment of sheaf cohomology, etale cohomology etc via derived functors).

We also learn that formalisation can be, at times, very slow going. However one should not lose faith here. The reason it took 1 year to formalise five lines of mathematics was that a huge amount of theory-building had to be embarked upon beforehand. That theory only needs to be built once. Lean now has a working theory of short and long exact sequences, derived functors, abelian categories, complexes and so on; furthermore there is evidence that the formalisation of these definitions is usable in practice, because we used them. Indeed this is another thing that we are learning in the Lean community; instead of thinking “what MSc course shall we formalise next” we are thinking “what major result shall we start working towards” because this not only drives development but it ensures that developments of theories are actually usable.

We learn that formalisation can be parallelised, although we already knew this long ago from the work of Gonthier et al on the odd order theorem, and the work of Hales et al on the Kepler conjecture (both papers have 10+ authors). Different people can be put to work on different areas of a formalisation project and do not even have to know about, or understand, what others are working on. The forthcoming paper on the formalisation will have a nice long list of authors, and this is something I’m very happy about; furthermore anyone interested in seeing the precise details of what everyone did can just look at the list of commits and committers on github — the details are completely open to everybody. This is open source mathematics.

The future

So where are we going and what is the point? These are very reasonable questions. One possible reaction that a mathematician might have to the work is that it is pointless, because it is a computer verification of a result which was “basically known to be true anyway”. This is missing the point. Why are we teaching computers modern mathematics? Because I believe that one day future computer systems will be helping us, and whilst I cannot see into the future, consider the following question. Scholze and Clausen are arguing that condensed abelian groups are an important new definition in mathematics. How can AI developers create a chatbot which helps graduate students to learn about the category of condensed abelian groups, until professional mathematicians have taught computers what a condensed abelian group is? I know that some people in the machine learning crowd would answer things like “use language models, train on the papers about condensed abelian groups” but I am not yet convinced that this alone will turn into a useful tool. Lean (and other theorem provers) offer a framework where one can create a library of useful theorems, examples, and counterexamples. We do not even have to type in all the proofs. If “we” (the mathematical community, for example PhD students in arithmetic geometry) start translating important statements, examples, counterexamples, references and so on, and somehow combine this with language models, we will at least make some kind of interesting thing, which I imagine will get more helpful to humans over time. Some dream of a time where computers are better at proving theorems than humans. For me this is still complete science fiction; I have seen no evidence that we are anywhere near this milestone, and have no real vision as to how we are ever going to get there. But one can instead dream of a time where computers are helping humans to do research level mathematics, and work like this makes me believe that this goal is something I might see in my lifetime.

Posted in Algebraic Geometry, liquid tensor experiment | Tagged , | 4 Comments

The Future of Interactive Theorem Proving?

This is a guest post, written by Zhangir Azerbayev. Zhangir is an undergraduate at Yale, majoring in computer science and mathematics. He completed this work while visiting Carnegie Mellon’s Hoskinson Center for Formal Mathematics.

Introduction

The history of interactive theorem proving can be told as a story of allowing the user to interact with the system at gradually higher levels of abstraction, getting further away from the axioms and closer to informal mathematics. Automath, the first interactive prover, floated very close to the axioms indeed. Mizar was the first attempt to make formal proofs comprehensible to mainstream mathematicians. Contemporary systems such as Coq and Isabelle add many useful features such as type classes and tactics, and most users don’t have to think about what the axioms even are. Today, there is still a learning curve to Lean, but once one learns the syntax, mathlib proofs are not terribly far from an informal version—at least compared to Automath. If we extrapolate this trend to its logical conclusion, we might imagine an interactive theorem prover where every single line of code is hidden behind the interface, and the user interacts with the machine purely in natural language. Recent advances in machine learning suggest such a system is not as far in the future as it may sound.

As a very small step towards realizing this vision, myself and Edward Ayers have developed Lean Chat, a VS-code extension that provides an interface for autoformalizing natural language theorem statements in Lean. Under the hood, these translation are generated by OpenAI’s Codex, a text generation model trained on Github source code.

The interface is straightforward: enter a theorem statement in good-old \LaTeX, and the app will have a go at formalizing it. If the generated formal statement is correct, fantastic! You can copy and past the statement into your Lean source file. If it is incorrect, you can give the app feedback on what to fix in a natural, conversational way. Let’s see a few examples.

Examples

Let’s begin with a simple statement about groups.

Lean would be happy with this, and we could accept this if we’re feeling lenient. But the predicate is_group_hom is deprecated, so let’s tell the app to use the up-to-date syntax.

I’m happy with that.

Let’s give the app a slightly harder example, this time the statement of an exercise from James Munkres’ famous topology textbook.

Perfect on the first try!

The next example is due to Johan Commelin: let’s ask for a statement of Burnside’s theorem, an important result in finite group theory which is not in mathlib.

Quite a lot is going on here. The app is able to figure out that the correct way of stating a group is finite is [group G] [fintype G], and the correct way of denoting the order of a group is fintype.card G. Moreover, it is able to interpret quite high-level instructions such as “no need for the existential”, suggesting the app has at least partial grasp of the semantic connection between formal and informal mathematics. Slipping up on minor syntactic conventions, like writing solvable instead of is_solvable, is one of Lean Chat’s most common mistakes. These errors can almost always be fixed with dialogue.

Unfortunately, Lean Chat is far from perfect, and often fails in unpredictable and somewhat comical ways. This next example was also discovered by Johan Commelin:

For some completely unclear reason, the app decides to consider a Lie algebra over \mathbb{R} instead of my explicit instruction to use a field k, and that’s even before addressing the fact that tangent_space_at_identity is not a mathlib definition. I tried a few different responses to the app’s initial attempt, and couldn’t get it to do anything helpful.

However, the app knows at least something about Lie algebras. The next example is due to Oliver Nash.

One particular class of problem Lean Chat struggles with is Olympiad-style questions. The next example was discovered by David Renshaw:

This example is particularly embarrassing, because despite my specific instructions, the app steadfastly sticks by its initial guess.

How does this work?

Under the hood, Lean Chat’s formalizations are generated by OpenAI’s Codex language model. The Codex model is architecturally the same as the famous GPT-3, and has a simple objective: given a string of text, predict the next word. Iterating this next-word prediction allows the model to generate coherent text. Both Codex and GPT-3 are neural networks, which means they are a function parametrized by billions of real numbers, and these parameters are tuned by gradient descent to maximize next-word prediction performance on a corpus of training data. GPT-3’s training data is virtually all the text on the internet, and Codex is trained on all the source code on Github. This explains why Codex has some modest ability at writing Lean, although Lean code is a very tiny fraction of its training data.

If you take seriously the idea that neural networks are a lot like the brain, as I do, it is unsurprising that there exists some configuration of parameters that allows the network to competently use language. However, it is almost entirely a mystery why gradient descent is able to find this configuration of weights.

These language models are pattern matchers: if you ask them to complete the sentence “Not a creature was stirring, not even a _”, it will finish with the only pattern it’s seen in it’s training data: “mouse”. We can exploit this pattern matching ability to steer our language model into doing a desired task using a technique called few-shot prompting. Suppose we tell GPT-3 to complete the following text

    English: How are you this morning?
    French: Comment vas tu ce matin?

    English: I arrived in London late because I missed my connecting flight.
    French: Je suis arrivé à Londres avec un jour de retard car j'ai raté mon vol de correspondance.

    English: Grothendieck was an algebraic geometer
    French:

Because the language model was trained to complete patterns, given this text it will generate “Grothendieck était un géomètre algébrique”. What Lean Chat does under the hood is exactly the same idea. Given the user’s input, we wrap it in the following prompt:

Natural language version: \"If $z_1, \\dots, z_n$ are complex, then $|z_1 + z_2 + \\dots + z_n|\\leq |z_1| + |z_2| + \\dots + |z_n|$.\" Translate the natural language version to a Lean mathlib version:

theorem abs_sum_leq_sum_abs (n : ℕ) (f : ℕ → ℂ) :
  abs (∑ i in finset.range n, f i) ≤ ∑ i in finset.range n, abs (f i) :=

... (two more examples go here)...

Natural language version: \"Let $X$ be a topological space; let $A$ be a subset of $X$. Suppose that for each $x\\in A$ there is an open set $U$ containing $x$ such that $U\\subset A$. Show that $A$ is open in $X$.\" Translate the natural language version to a Lean mathlib version:

theorem subset_of_open_subset_is_open (X : Type*) [topological_space X]
  (A : set X) (hA : ∀ x ∈ A, ∃ U : set X, is_open U ∧ x ∈ U ∧ U ⊆ A):
  is_open A :=

Natural language: [Your input goes here]. Translate the natural language version to a Lean mathlib version:

The Codex language model completes the pattern to generate the response you see in the app. To incorporate the user’s feedback, we simply concatenate the initial few-shot prompt, the model’s incorrect response, and the instruction “[user’s feedback goes here]. Try again:”, then feed the text back into the neural network.

Note that we don’t provide the model with any examples of failed and subsequently corrected statements, since then the model might start intentionally getting its first try wrong!

What does the future look like?

Lean chat is only the very beginning for autoformalization. Codex is an off-the-shelf model available through OpenAI’s API. Its creators likely never imagined it would be used for formal mathematics, and as such it is not at all optimized for that use case. The few-shot autoformalization capabilities of Codex were only noticed quite recently, and the community working at the intersection of machine learning and formal math has a lot of ideas for how to improve these systems. I expect that in the comings months my examples of Lean Chat’s failures will become completely outdated.

A natural next step after progress on autoformalizing theorem statements is to work on autoformalizing proofs. This is much more daunting that autoformalizing statements, but is definitely within reach. Current language model-based automatic theorem provers take as input the tactic state, and predict the next proof step. One could imagine building a prover that takes as input both a natural language proof and the tactic state to predict the next proof step, but otherwise works in exactly the same way.

Armed with these concrete ideas of how one might approach autoformalizing theorem statements and proofs, building a natural language interface to interactive theorem provers is still an ambitious project, but it is not science fiction either.

Posted in Machine Learning | Tagged , , , , | Leave a comment

Teaching formalisation to mathematics undergraduates

It’s been a hectic 2022 so far, but August is looking a lot calmer; this is the first of hopefully a few blog posts this month catching up on various things.

In this post I want to talk about the undergraduate course “Formalising Mathematics”, which I gave in January to March 2022 at Imperial College London. The course notes are here. The course was for 3rd and 4th year undergraduates and also for MSc students, and it was specifically about formalising mathematics in a theorem prover (in this case, the Lean theorem prover). I’m certainly not the first person to give such a course in a mathematics department — Patrick Massot has been teaching a formalisation course in Orsay to 1st year undergraduates for several years now — but probably I’m one of the first, so perhaps it’s worth recording what happened.

Overview of the course.

In short, the aim was to get students formalising undergraduate level mathematics in Lean. In stark contrast to all the other courses offered by pure mathematicians at Imperial, there was no exam. Students taking the course were asked to submit three projects. The projects were completely open-ended; the first was “formalise some mathematics you learnt in your first year”, the second was “formalise some mathematics you learnt in your second year” and the third was “formalise some mathematics you learnt this year” (note in particular that the MSc students are thus forced to formalise harder material than the 3rd year undergraduates). The first project was due in 4 weeks after the course started, the second 4 weeks after that, and the third 4 weeks after that. I’m extremely grateful to Imperial College for letting me run the course in such an unconventional way. Because resources for this area (i.e., information about computer formalisation written with mathematicians in mind) are hard to come by, I spent a lot of time online helping students. One of the most exciting things for me about teaching the course was that even though I was under some kind of contractual obligation to give these people a mark out of 100 at the end of term, my primary motivation was to teach undergraduate mathematicians how to formalise undergraduate mathematics and with the set-up I’d created I found myself in a really good position to do this. Students would ask question after question about their projects and I would simply help them, or get them to help each other, or they would ask on Discord and get help there. Nobody could copy from other people because all students chose different projects to work on, and I made it absolutely clear to the students that it was fine for them to ask other students, or me, if they were stuck. There are other courses (typically involving writing code, and run by applied mathematicians or statisticians) in my department which are assessed by projects, but typically in these courses everybody would be given the same project and the lecturer would have to think very hard about how to prevent plagiarism. I had no such problems. The course was a joy to run and a joy to mark; thank you to all 23 students who attended.

The projects

So what does a “project” consist of? Well, this cohort of students had no prior examples of projects, and of course I also didn’t know what a project was supposed to look like, because in some sense I was making the whole thing up as I went along. The students were told that a project consisted of a few hundred lines of Lean code, containing comments, backed up by a small pdf write-up of 5 or more pages, explaining what they’d done, what they’d learnt, and what they found hard. Some students were slightly thrown by this rather informal description but I assured them that it was not really possible to say anything more precise because we were all on a journey together. One concrete question, which they stumbled onto after a while, was “what will you be giving marks for?”. I thought this was eminently reasonable, so I told them that each project would be graded out of 100, with 30 marks for presentation, 30 marks for content, and 40 marks for “bonus”. This description seemed to placate them (it’s precisely the description of the mark scheme they’re given for other projects we run in the department, for example MSc projects) and we pressed on from there.

The first project.

What I got was glorious. Of course there was a huge range in quality. Remember that this is a course for 3rd and 4th year students, but some of those students have been coming to my formalising club, the Xena Project, since they were 1st years, and one or two of them know the software and the mathematics library better than I do. Kexing Ying, a regular at the club, asked if he could formalise Egorov’s theorem in measure theory for his first project despite it not being taught in the first year; I said that this was fine and he went on to formalise the theorem and proof, write up 5 pdf pages on how he did it, and then made a pull request containing the proof to mathlib, Lean’s maths library. One could argue that students such as Kexing had an unfair advantage. I would respond that actually they had a fair advantage. If I had instead been teaching a 3rd year Galois theory course and was running a weekly Galois theory club, and a student had been coming along since their first year and learning Galois theory, then of course after over 2 years of this they are going to get a good mark in a Galois theory course, and they deserve to, because they’ve put in the work.

At the other extreme, I had some students who after 4 weeks were still struggling to get to grips with the system and who needed a lot of support from me and others, and who formalised some very basic stuff, sometimes quite poorly (in a stylistic sense, at least: the code compiled so the proofs were certainly correct!). Their write-ups were however very informative, as they explained what they’d found hard, or how their initial plans of formalising [something complicated] had to be hugely scaled back as they began to understand that sometimes even the simplest-looking things on paper can be difficult to teach a computer. Computers are relentlessly pedantic and also highly non-geometric; our intuition is that x + y = y + x is “obvious”, but this is a theorem, and you cannot make a formal proof from “draw a picture”. Of course the students did not need to worry about this, because the proof that x + y = y + x is already in the maths library so they can just use it.

I had told the students that each project would have an accompanying oral exam, where I would spend 15 minutes discussing the Lean code which the students had handed in, just to check that they had written it and had understood it. I (intentionally) gave few details about how the orals would run. Because we were still nominally in some kind of Covid situation I decided to run the first set of orals online. I will be honest and say that actually part of the idea here was so that I could get to know each student individually and make sure that they understood that for me the important thing was the learning objective: making sure they came away from the course being moderately good at formalising mathematics in a theorem prover. It was manifestly clear that each student understood the code they’d written; I gave encouragement to the weaker students and observed that now they clearly had “got the hang of it”, the second project would surely be much easier.

Examples of what students proved are below. I leave it to you to guess which students had used the software before the course had started.

  • All cyclic groups are abelian.
  • The first isomorphism theorem.
  • The Schroeder-Bernstein theorem.
  • (1+p)^n >= 1+np for natural numbers n and p, and other similar results.
  • Egorov’s Theorem.
  • The category of small categories is complete (note: also not taught to our first years).
  • A sequence of real numbers is Cauchy iff it’s convergent.
  • The Bolzano-Weierstrass theorem.
  • The theory of nets and its relationship to the theory of filters (not on the first year syllabus).
  • If a product of two positive coprime naturals is a k’th power then each natural is a k’th power.

NB in case you didn’t guess, the last one was done by a seasoned Lean user and is rather fiddly to formalise.

Marking the first project.

One mistake I made was being too generous with marks for the first project. I certainly didn’t go over the top, but I wish I’d left myself some more room. There were some students who had clearly tried hard but had ultimately produced some very average work for their first project, and I rewarded them with a reasonable grade, which meant that later on when they were becoming more competent I was not able to say things like “you have improved vastly and this is reflected in the vast improvement in your grade”. However something I had not mentioned before was that the relative weights of the three projects were 20:30:50, so really everyone knew that the important project was the final one.

The lectures

I haven’t mentioned the lectures yet! I was given two hours with these poor souls, once a week for 11 weeks. This is the only course I have ever given where attendance at the lectures went up as the term progressed, although this might be partly due to the fact that Covid was much less of a thing in March than it was in January in the UK. The lectures were extremely informal. I would typically take some topic, e.g. one of the earlier ones was “the theory of sets, subsets, unions and intersections (both finite and infinite)” and I would just formalise some basic results from first principles, and then typically explain that they were already in Lean’s maths library, and then go on to some more difficult stuff. I was very open to questions. I would occasionally write on the board but it was mostly me live coding and taking questions. The hardest part was deciding what topics to cover, but given that I was generating the course notes on the fly and the students were thinking about what topics their projects would be on, there was no shortage of suggestions. For most topics (there ended up being about 15) I would set some example sheets for the students at the course Github repository and video myself solving the problem sheets live and then dump the results on a YouTube playlist. The results are not always pretty to watch, but Talia Ringer mentioned once on Twitter that she thought it was a good idea to let students see you thinking in real time about course material (not least because they’ll see that you sometimes struggle just like them), and I agree with her, so that was what the students got.

The lectures, and in particular the topics of the lectures, were really the one part of the course where I ended up making big changes to my plans. Last academic year, as preparation for this course, I had given a multi-center graduate level course for students at Bath, Bristol, Imperial, Oxford and Warwick, and you can see the topics I covered at that link. The material covered there was basically what I was planning to do with the undergraduates. However when it came to doing things like filters the students actively discouraged me from covering the material. Why? Well, it’s obvious why, when you think about it. We don’t teach them filters in our undergraduate degree, and their task was to formalise stuff from their undergraduate degree, so why take a detour through filters when I could be doing things like basic ring theory and other stuff which they’d in practice find much more useful? This was a shock to me and involved me having to generate a bunch of material at very short notice, but given that I am so obsessed with the software and am happy to have any excuse to formalise anything in it, I played along. In particular, we have a popular graph theory course at Imperial so I was forced to learn how to use the graph theory part of mathlib, something I’d never looked at all before.

The second project

The second project was due in 8 weeks after the start of the course, and I really hoped that by this point every student would be on top of the use of the software, but of course it’s very easy to forget (at least if you’re as naive as I am) that (a) the students are actually also studying other courses and (b) theorem proving software is really hard to learn. Some students were still struggling. I still had all the time in the world to help them out though, and of course students were also developing little communities (often online) where they’d help each other out.

Something I haven’t mentioned before: I was the victim of a loophole in the rules. Some of the students doing the course were doing joint mathematics and computer science degrees, and some of those people chose to formalise results which they perceived as “mathematics” but which was getting beyond my pay grade. For their second project, one student formalised frame definability for modal logic and basically I had to learn the theory from their write-up and the references, and then judge the project afterwards. That wasn’t supposed to happen. Another student proved completeness and compactness of a system of logic I’d never heard of. Next year I am unfortunately going to have to ban formalisation of material which was not taught in the mathematics department, for my own sanity.

Again I got a huge variation of projects. A student formalised topology from first principles and developed a bunch of theory. Topology was quite popular in fact, with other students proving facts such as “continuous image of compact is compact” and so on. A student formalised facts about the arithmetic of the Gaussian integers Z[i] (for example that it was a UFD). Some students simply took problem sheet questions from other classes and formalised these: I got worked examples to problem sheet questions from analysis and ring theory, and I quite liked those projects, not least because for some “prove a standard theorem” projects the student may well produce a proof which is somehow much worse than the proof which is already in the maths library, whereas for worked solutions to problem sheet questions you can be almost guaranteed that the results will not be in the library already. The students who were already fluent in Lean (and typically tended to be more advanced mathematically too) again took the option to do some fancy abstract stuff (more category theory, for example). Some people were too ambitious and realised that their plans needed to be curtailed; students began to ask whether they could assume results without proof and then build on them. I said yes, so some of the projects were incomplete in the sense that they assumed various mathematical theorems as axioms and then built on them. One student assumed something false as an axiom (it was nearly true but they made a slip) and this presented me with quite a conundrum; I had to really think hard about how they had used the false statement and whether the related true statement would have sufficed. This project was hard to mark.

By this point I was getting tired, and I really didn’t want to have to lose an entire day doing orals for 23 projects when I knew that students were engaged with the course and were not cheating (indeed by this point I was on first name terms with most of the students and had talked to most of them about their projects several times before the deadline), so I told the students to each find another student and ask them to give them an oral and then to report back to me. I thus had to come up with guidelines for being an oral examiner. The guidelines said “get the other student to show you some of their code, look through it, choose a part you don’t understand and then get them to explain it to you. Once you understand it, they’ve passed; let me know this”. Note that the orals were not worth any points so I felt academically happy to do this.

As part of the marking process I would write feedback for the students. By the second project I had understood that this was an important part of the process and I spent a huge amount of time writing long feedback for most of the students (which kind of made up for the fact that I’d dodged doing the second orals, at least in my mind). Conversely, I had noticed that some of the write-ups were twice the length I had recommended, because students were keen to explain what they had done and would ask if it was OK to write more than 5-7 pages; I said “sure”.

The final project

By the last few weeks of the course, we finally all knew what we were doing. I would spend time on material which students had requested, showing the class how to formalise it in Lean. Students came up with ideas about what to formalise very early in the four week cycle; they had four weeks to prepare each project but by this stage it seemed to me that many of them started the four weeks knowing exactly what they were going to be doing. Even the weaker students had figured out a good strategy — don’t bite off more than you can chew, and ask if you need help. Students were completely open about getting help — they would openly say in their write-ups “I got totally stuck when trying to do X, so I asked student Y who explained a really nice trick; lines 153-163 of my code were written by them”. All this was fine. Remember that I didn’t care at all about the marks or how the student had got there — as far as I was concerned the goal was to achieve the learning objective, which was that by the end of the course the students should know something about formalising mathematics, and ideally mathematics at the level they were currently at academically, in a theorem prover. Here’s some of what I got:

  • The Vitali convergence theorem.
  • Box product of two graphs and proof that the product was connected iff both graphs were connected.
  • Chinese remainder theorem for commutative rings.
  • Lagrange’s 4 square theorem (incomplete proof but only assuming true things).
  • If P is a presheaf on a category then the category of presheaves over the category of elements of P is equivalent to the over-category of P.
  • Categorical semantics of the simply-typed lambda calculus (gaargh).
  • Classification of integer solutions to y^2=x^3-1.
  • Theorems about economic models involving quasi-concavity (gaargh).
  • Hensel’s Lemma (fully proved, for complete nonarchimedean fields).
  • Solution of a number theory exam question about which rings Z/nZ have unit group with exponent 2.
  • Existence of nontrivial integer solutions to Pell’s equation.

By this stage we had somehow all got the hang of things. Students in general asked for much less help, and had a much better feeling for what was feasible. In contrast to the first project, where everyone proved everything, here several people assumed some facts as axioms in their projects (this time all the facts were true though). Marking stuff which students had learnt this year in courses from departments other than the mathematics department was very hard and as I’ve said, will be banned next year. Again I gave copious feedback (even though in some sense it was irrelevant because the course was now over; however I found that it was very easy to find things to say). The “oral” this time was replaced by an internal mini-conference which was going to be hybrid but at the last minute was forced online because of Covid; all the students showed up and each one gave a three minute presentation on their work to the rest of the class. Some were great, some were disastrous (when a student says “wow is that 3 minutes already?” this can be interpreted as “I didn’t practice”), but the course was over by this point and the orals were worth nothing; I was just super-pleased that everyone showed up!

So that was it. As is probably clear, I loved it. I got very positive feedback from the students too. The course notes are still kind of incomplete, and I am not particularly motivated to complete them because Lean 4 is on the horizon and I’ll have to rewrite everything when the port of the mathematics library happens. In the mean time Jeremy Avigad is leading a team which is working on a much more mature document — Mathematics In Lean — and when I’ve finished up writing this and a couple more blog posts I’ll start writing a chapter on linear algebra for that.

The course is running again next year 😀

Posted in Uncategorized | Tagged | 2 Comments

2022 Xena Project undergraduate workshop

I have funding to run an undergraduate workshop!

When?

26th to 30th September 2022 (the week before term starts for many people in the UK).

What?

A week-long in person workshop including a bunch of working on group projects in Lean, and some special guest lectures. The goal is ultimately to make some “outputs”, e.g. projects on GitHub and/or publications in journals/conference proceedings.

Who?

If you’re on, or have recently finished, a maths undergraduate or masters level degree, and haven’t started on a PhD yet, you’re eligible to apply. There are around 30 slots. For those offered a place on the workshop: I can pay for accommodation for the week in London, and travel to/from locations within the UK only. If you are a non-UK person interested then you can certainly apply but you’ll have to make it to/from the UK using your own resources, and I can pay from there to get you to/from London and for the hotel.

How to apply?

Send an email to xenaprojectimperial@gmail.com with a brief cover letter saying something about any past Lean experience you have, or Lean projects you’ve been involved in. Preliminary indications are that I’ll get more than 30 applications so probably not everyone who applies will get a slot. Deadline for applications is 24th July 2022. Anyone who wants to get some Lean experience quick to beef up their CV can take a look at my recent undergraduate course notes and see if they can knock something up. Or just play the natural number game and tell me how it went!

Projects?

Yeah, why not. Several people from the Lean community have suggested some projects which they are interested in running based on questions in combinatorics, probability theory, number theory and other fields. There will be projects at all levels from undergraduate to more challenging.

Diversity!

I take this seriously. This will not be a room full of cis white males. Applications from anyone who is not a cis white male will be very very welcome, and thanks to the people I’ve already heard from who are not cis white males but who have told me they’re going to apply. Remark: looks like the majority of the project leaders will not be cis white males.

Questions?

Email to k.buzzard@imperial.ac.uk works.

Thanks

Thanks go to Imperial College and Jalex Stark for funding.

Posted in Uncategorized | Leave a comment

New year, new teaching material

Imperial College has gone back to in-person teaching so I have 250 new subjects to experiment on, namely the new 1st year maths undergraduates.

I’ve spent a few years now trying to figure out how best to teach maths undergraduates how to get up to speed quickly with Lean, a theorem prover which uses dependent type theory and has a large mathematics library. In theory students can do all of the problem sheets I give to them in Lean. In practice things aren’t nearly so simple. As well as a long list of tactics to internalise, it helps if students get a feeling for what’s in Lean’s maths library already, and perhaps they should also know the difference between definitional and non-definitional (i.e. propositional) equality. Unfortunately definitional equality is a non-mathematical concept, in the following sense: if you define addition on the naturals recursively by n+0:=n and n+(succ m):=succ (n+m) then n+0=n is true by definition and 0+n=n is not. This is an asymmetry which is of no relevance in real-world mathematics.

This year I’m still avoiding talking about definitional equality, and I’m also avoiding talking about the example sheet questions which I’m giving to my class. A typical example sheet question, even if easy from a mathematical point of view, may only yield to someone who is armed with a whole bunch of Lean tactics. So this year, instead of encouraging the students to work on the example sheets, I’m encouraging them to work on easier problems, so that we can build up to the example sheets later.

The 2021 Lean course home page is here, and I’m making an associated collection of short videos here . I’m going through basic tactics one by one, making what I hope is a more comprehensible introduction to doing mathematics in Lean. All the Lean problem sheets I’ve written so far can be tried online without having to install Lean, but installation instructions for those who want a slicker experience are here. 1st years are busy with their coursework right now, but when it’s over hopefully I will be able to get some feedback from them and others about this new approach. At the time of writing, I have finished the logic sheets, and I’ve just started pushing sets sheets. Still to come: functions and binary relations.

Next term things are getting much more serious. I’m actually teaching an official Lean course as part of our undergraduate program. In contrast to what I’m doing this term (evangelising), next term (Jan to March 2022) I will actually be figuring out how to get students to engage with some more serious undergraduate mathematics. Students will be examined by a series of projects, in contrast to the usual approach here for final year courses (a closed book exam under timed conditions). I’m going to cover standard topics like basic analysis and topology, and also more esoteric ideas like filters and perhaps uniform spaces. Should be interesting! This will involve yet another repository, which I suspect will be to a certain extent based on this one. But more on that later.

Posted in Uncategorized | Tagged | 3 Comments

2021 Xena Summer Projects

So I spent July and August supervising mathematics undergraduates who were doing Lean projects in various areas of mathematics. Conformal maps, Euclid’s SAS theorem, nilpotent groups and the Radon-Nikodym theorem are just some of the topics I’ve been engaging with this summer. Earlier this week we had a summer project conference, where eight students presented 15 minute talks on their work. I recorded the talks, and they’re now up on YouTube. For each talk there were also 15 minutes of questions but I decided not to record these so that attendees could feel free to ask basic questions.

The students behind four of the projects managed to get PR’s accepted into mathlib, and for two of them this was their first mathlib PR (and possibly even their first contribution to open source). These students had to use git and github (some for the first time), but these are skills which I personally now value as important and worth teaching (it is commonplace to teach this sort of thing to computer scientists, but mathematicians seem to miss out here). In 2022 I will be teaching a formalisation course to undergraduates at Imperial and we will be using git and github for this too.

I think the video titles are pretty explanatory, and perhaps now isn’t the time to be going through the technicalities of exactly what the students achieved. However for the most part we stuck to the mantra of: learn the maths first, and once you think you understand it, then try to formalise it. That way, of course, you find out whether you do actually understand it 🙂

What was the set-up?

We ran the entire thing on the Xena Project Discord server, a server for undergraduate mathematicians interested in formalisation of mathematics. This worked very well for me and, I think, for them. Students could share their screen if they had technical questions. Everything was done in voice or text channels, and in particular there are some students which I supervised and who I would not recognise if I met them in the street. Not that this bothered me in the slightest.

Timetabling: I basically promised to be online every Tuesday in July and August from 10am until about 6pm, so the entire thing functioned as a drop-in. Some students talked to me a lot, some students barely talked to me at all, and of course some students talked to other people. Creating a community was, I felt, a bit harder than previous in-person summer projects which I’ve run in the past (where you can just go to lunch with a bunch of UGs and let them talk to each other) but of course these are extraordinary times and we have to make do. One big advantage of running stuff online was that students could be in different countries and still participate, and more generally students could move around (e.g. to and from London) without it disrupting their supervision. I live and work in London and for some students it’s problematic to stay without a serious source of funding, and working online also solved that problem. Going forward — assuming things are far more normal next year I might be tempted to run summer projects in a hybrid way next year.

Thank you to all students who participated. If you are a mathematics undergraduate who is interested in formalising some mathematics in Lean (probably Lean 4 next year I guess!) over the summer of 2022, then get in touch with me at some point and we’ll see what we can do.

Posted in Uncategorized | Leave a comment

Half a year of the Liquid Tensor Experiment: Amazing developments

[This is a guest post by Peter Scholze.]

Exactly half a year ago I wrote the Liquid Tensor Experiment blog post, challenging the formalization of a difficult foundational theorem from my Analytic Geometry lecture notes on joint work with Dustin Clausen. While this challenge has not been completed yet, I am excited to announce that the Experiment has verified the entire part of the argument that I was unsure about. I find it absolutely insane that interactive proof assistants are now at the level that within a very reasonable time span they can formally verify difficult original research. Congratulations to everyone involved in the formalization!!

In this Q&A-style blog post, I want to reflect on my experience watching this experiment.

Question: Who is behind the formalization?

Answer: It was formalized in the Lean Proof Assistant, mostly written by Leonardo de Moura from Microsoft Research, and used the extensive mathematical library (mathlib) written by the Lean community over the last four years. Immediately after the blog post, the Lean prover/mathlib community discussed the feasibility of the experiment on the Lean Prover Zulip Chat. Reid Barton did some critical early work, but then Johan Commelin has taken the leading role in this. In outline, Johan made an attack along the path of steepest ascent towards the proof, and handed off all required self-contained lemmas to the community. In particular, to get the project started, by January 14 he had formalized the statement of Theorem 9.4 of [Analytic], whose proof became the first target, and has now been completed on May 28, with the help of the Lean community, including (mathematicians) Riccardo Brasca, Kevin Buzzard, Heather Macbeth, Patrick Massot, Bhavik Mehta, Scott Morrison, Filippo Nuccio, Damiano Testa, Adam Topaz and many others, but also with occasional help from computer scientists like Mario Carneiro. Here is a link to the repository containing the formalised proof of Theorem 9.4, and you can also view its dependency graph, now fully green and completed.

Question: How did you follow the project?

Answer: I joined the Zulip chat to answer any mathematical questions that may arise, but also as an interested spectator.

Question: What is the significance of Theorem 9.4? Is the Liquid Tensor Experiment completed?

Answer: Theorem 9.4 is an extremely technical statement, whose proof is however the heart of the challenge, and is the only result I was worried about. So with its formal verification, I have no remaining doubts about the correctness of the main proof. Thus, to me the experiment is already successful; but the challenge of my blog post has not been completed. It is probably fair to guess that the experiment is about half-way done. Note that Theorem 9.4 abstracts away from any actual condensed mathematics, so the remaining half will involve a lot of formalization of things like condensed abelian groups, Ext groups in abelian categories, and surrounding machinery. The basics for this have already been built, but much work remains to be done.

Question: How did the formalization proceed?

Answer: Initially, I imagined that the first step would be that a group of people study the whole proof in detail and write up a heavily digested version, broken up into many many small lemmas, and only afterwards start the formalization of each individual lemma. This is not what happened. Instead, the formalization followed quite closely the original lecture notes, and directly attacked Lemma after Lemma there. It did seem that the process was to directly type the proofs into Lean. Lean actually gives the user a very clear summary of what the current goal is, so one always needs to get a very clear sense of what the next few steps really are. Sometimes it was then realized that even on paper it does not seem clear how to proceed, and the issue was brought to attention in the chat, where it was usually quickly resolved. Only after a lemma was entirely formalized, the proof, now thoroughly digested, was again written up in the Blueprint in human readable form.

Question: So it’s not actually a blueprint, is it?

Answer: Right — it’s not the blueprint from which the Lean code was formed, but (largely) the other way around! The Lean Proof Assistant was really that: An assistant in navigating through the thick jungle that this proof is. Really, one key problem I had when I was trying to find this proof was that I was essentially unable to keep all the objects in my “RAM”, and I think the same problem occurs when trying to read the proof. Lean always gives you a clear formulation of the current goal, and Johan confirmed to me that when he formalized the proof of Theorem 9.4, he could — with the help of Lean — really only see one or two steps ahead, formalize those, and then proceed to the next step. So I think here we have witnessed an experiment where the proof assistant has actually assisted in understanding the proof.

Question: Was the proof in [Analytic] found to be correct?

Answer: Yes, up to some usual slight imprecisions.

Question: Were any of these imprecisions severe enough to get you worried about the veracity of the argument?

Answer: One day I was sweating a little bit. Basically, the proof uses a variant of “exactness of complexes” that is on the one hand more precise as it involves a quantitative control of norms of elements, and on the other hand weaker as it is only some kind of pro-exactness of a pro-complex. It was implicitly used that this variant notion behaves sufficiently well, and in particular that many well-known results about exact complexes adapt to this context. There was one subtlety related to quotient norms — that the infimum need not be a minimum (this would likely have been overlooked in an informal verification) — that was causing some unexpected headaches. But the issues were quickly resolved, and required only very minor changes to the argument. Still, this was precisely the kind of oversight I was worried about when I asked for the formal verification.

Question: Were there any other issues?

Answer: There was another issue with the third hypothesis in Lemma 9.6 (and some imprecision around Proposition 8.17); it could quickly be corrected, but again was the kind of thing I was worried about. The proof walks a fine line, so if some argument needs constants that are quite a bit different from what I claimed, it might have collapsed.

Question: So, besides the authors of course, who understands the proof now?

Answer: I guess the computer does, as does Johan Commelin.

Question: Did you learn anything about mathematics during the formalization?

Answer: Yes! The first is a beautiful realization of Johan Commelin. Basically, the computation of the Ext-groups in the Liquid Tensor Experiment is done via a certain non-explicit resolution known as a Breen-Deligne resolution. Although constructed in the 70’s, this seems to have not been in much use until it was used for a couple of computations in condensed mathematics. The Breen-Deligne resolution has certain beautiful structural properties, but is not explicit, and the existence relies on some facts from stable homotopy theory. In order to formalize Theorem 9.4, the Breen-Deligne resolution was axiomatized, formalizing only the key structural properties used for the proof. What Johan realized is that one can actually give a nice and completely explicit object satisfying those axioms, and this is good enough for all the intended applications. This makes the rest of the proof of the Liquid Tensor Experiment considerably more explicit and more elementary, removing any use of stable homotopy theory. I expect that Commelin’s complex may become a standard tool in the coming years.

Question: Interesting! What else did you learn?

Answer: What actually makes the proof work! When I wrote the blog post half a year ago, I did not understand why the argument worked, and why we had to move from the reals to a certain ring of arithmetic Laurent series. But during the formalization, a significant amount of convex geometry had to be formalized (in order to prove a well-known lemma known as Gordan’s lemma), and this made me realize that actually the key thing happening is a reduction from a non-convex problem over the reals to a convex problem over the integers. This led me to ask my MathOverflow question whether such a reduction was known before; unfortunately, it did not really receive a satisfactory answer yet.

Question: Did the formalization answer any questions from the lecture notes?

Answer: Yes, it did, Question 9.9 on the growth of certain constants. There are now explicit recursive definitions of these constants that are formally verified to work, and using this one can verify that indeed they grow roughly doubly-exponentially.

Question: What did you learn about the process of formalization?

Answer: I learnt that it can now be possible to take a research paper and just start to explain lemma after lemma to a proof assistant, until you’ve formalized it all! I think this is a landmark achievement.

Question: And about the details of it?

Answer: You know this old joke where a professor gets asked whether some step really is obvious, and then he sits down for half an hour, after which he says “Yes, it is obvious”. It turns out that computers can be like that, too! Sometimes the computer asks you to prove that A=B, and the argument is “That’s obvious — it’s true by definition of A and B.” And then the computer works for quite some time until it confirms. I found that really surprising.

Question: Can you read the Lean code?

Answer: The definitions and theorems are surprisingly readable, although I did not receive any training in Lean. But I cannot read the proofs at all — they are analogous to referring to theorems only via their LaTeX labels, together with a specification of the variables to which it gets applied; plus the names of some random proof finding routines. Still, I have the feeling that it should be possible to create a completely normal mathematical manuscript that is cross-linked with the Lean code that makes it possible to navigate the Lean code seamlessly — I think the creation of such an interface has also become a goal of the experiment.

Question: So there will be more documentation work to do, also for the part that has just been completed?

Answer: Definitely! Currently, the Lean code leading up to the proof of Theorem 9.4 is not well-documented, and some parts of the proof could definitely be streamlined. Moreover, large parts of it are basic material that should become part of mathlib. It should be noted that because mathlib is constantly evolving, any project that uses it has to continually make small changes so that it will still compile with the newest version of mathlib. So it is vital that the parts of the proof of general interest are moved into mathlib, where they will be maintained.

Question: What is the de Bruijn factor, i.e. the ratio between the length of the computer-verified proof and the length of the human proof?

Answer: It depends on the method of calculation, but somewhere around 20. I think this is amazingly small! I had expected that the first step of taking the lecture notes and turning them into a properly digested human proof — which as I said didn’t actually happen — would already introduce a factor of ~5. But the blueprint is actually only a factor of ~2.5 longer than the relevant part of the lecture notes right now.

Question: How do you feel now that the proof has been broken down to the axioms of mathematics?

Answer: Good question! Usually the verification of a proof involves trying small variations of the argument and seeing whether they break or not, whether they lead to statements that are too strong etc., in order to get a sense of what is happening. Basically a proof is like a map of how to get up a mountain, say; it may be a nice, slightly winding path with a broad view, or it may lead through the jungle and up some steep wall, requiring climbing skills. Usually there’s not just one way up, and one may try whether taking a left turn the view is nicer, or taking a right turn one can take a shortcut.

In the case at hand, it feels like the main theorem is some high plateau with a wonderful view, but the way there leads through a large detour, to attack the mountain from the other side, where it is dark and slippery, and one has to climb up a certain steep wall; and no other pathways are seen left or right. Answering the questions in the Zulip chat felt like I would give instructions of the form “put your foot here, then your hand there, then pull yourself up this way” at the more difficult passages.

So I have gained the reassurance that it is possible to climb the mountain along this route, but I still have no sense of the terrain.

Posted in Uncategorized | Tagged , , | 22 Comments