Beyond the Liquid Tensor Experiment

This blog post is a report on how a bunch of mathematicians recently did something which five years ago I would have said was completely out of reach. Note that it’s my (Kevin Buzzard’s) personal take on the work; I am not speaking for all the authors or the Lean community or the mathlib community or any other organization. Note also that I was personally involved with the work.

The background

In December 2020, Fields Medallist Peter Scholze challenged the computer proof community to formally verify an important new theorem of himself and Dustin Clausen. Scholze issued the challenge here, and the challenge itself is now published in Experimental Mathematics, one of very few mathematical journals that is currently showing an interest in the area of formalisation. The name of the challenge was the Liquid Tensor Experiment (LTE), and yes it was a homage to the Liquid Tension Experiment, a band which Scholze is fond of. The theorem behind the challenge is a highly technical result, and I will not go into the details; check out the links I just mentioned for more on the mathematics involved. It would not be unreasonable to describe the theorem as a complex result about complex mathematical objects.

I have a bucket list of milestones for the area of mathematical formalisation, and formalising a complex theorem about complex objects was on this list. Why? One could argue that many of the major computer formalisations (the Kepler conjecture, four colour theorem and odd order theorem) were complex theorems about “simple objects” (spheres, planar graphs and finite groups, respectively). A formalisation which should not be placed in the same category as any of these milestone results, but which could perhaps be regarded as complementary to these major achievements, is my formalisation with Commelin and Massot of the definition and very basic properties of perfectoid spaces — this amounted to very trivial theorems about complex objects. For me, the formalisation of a proof of a complex theorem about complex objects was a missing part of the puzzle.

But not any more, because Johan Commelin, assisted by Adam Topaz and a team of other mathematicians, announced earlier this month that their Lean formalisation of Scholze’s challenge was complete. The final touches of the formalisation were put together at ICERM in Providence RI, USA, during the Lean for the Curious Mathematician 2022 workshop in July. These workshops are for working mathematicians with little or no Lean experience and who are interested in finding out what’s going on and how to get involved. If this sounds like you, then the videos for the ICERM workshop are now online at the above link. If you’re interested in seeing Commelin’s announcement of the completion of the project, it’s just over 40 minutes into my talk on the status of algebraic geometry in Lean.

More on the challenge and its resolution

The reference for the proof which we followed was Scholze’s lectures on analytic geometry, the notes for a course he gave in 2019/20 in Bonn. The theorem we formalised was Theorem 9.1 of these notes (see also Remark 8.15). On p59 of the pdf there is a five line argument reducing Theorem 9.1 to Theorem 9.4, and then the next six pages contain a proof of Theorem 9.4. I’ve claimed above that Theorem 9.1 is a complex theorem about complex objects; however Theorem 9.4 is a more low-level statement, which unravels to a technically messy but essentially completely low-level claim. It took around one month to formalise the statement of Theorem 9.4, and a further five months to formalise its proof. This averages out at about one page of mathematics per month. You can see a map of the formalisation of the proof of Theorem 9.4 here. Clicking on a rectangle (definition) or an oval (theorem) will pop up a LaTeX statement of the result and will offer you a link to the corresponding Lean code. We finished the proof of 9.4 about a year ago, and to my mild surprise the media got interested; we were even featured in Nature. What probably caused the interest was that one of Scholze’s motivations to encourage a formalisation was that he was not convinced that too many humans had, or would, plough through the proof of Theorem 9.4; the formalisation gave him the assurance that his work was correct.

All that remained then for the challenge, was to formalise the five lines in the paper which showed that Theorem 9.4 implied Theorem 9.1. I’ll reproduce them here:

“By the preceding discussion, one can compute RHom_{\mathbb{Z}[T^{-1}]}(\overline{\mathcal{M}}_{r'}(S), \widehat{V}) as the derived inverse limit of C_c^{\bullet} over all c > 0; equivalently, all c \geq c_0. Theorem 9.4 implies that
for any m \geq 0 the pro-system of cohomology groups H^m(C_c^{\bullet}) is pro-zero (as H^m(C^{\bullet}_{kc})\to H^m(C_c^{\bullet}) is zero). Thus, the derived inverse limit vanishes, as desired.”

These five lines took a further year to formalise, and that’s what has just been completed. The reason it took so long was that the Lean community had to build a working theory of abelian categories, cohomology, Ext groups, derived functors etc etc as prerequisites, and whilst some of these subjects had been approached before in theorem provers, we had to get everything formalised in one system at once and everything playing well with each other. It is worth noting however that even though I’m claiming that the proof of the theorem is “complex”, the actual length of the paper proof is nowhere near as long as, for example, the proof of the Kepler conjecture. However I do regard the result as an indication that we’re going in the right direction, pushing these systems to do things they’ve never done before (although of course I was involved in the formalisation, so one should bear this in mind when evaluating my claims here).

There’s something called the “de Bruijn factor” of a formalisation, which is the ratio of the number of lines of computer code to the number of lines of human mathematics. One major goal in the area of computer formalisation is to “get the de Bruijn factor down to 1”, that is, to make formalising mathematics in a computer no more difficult than formalising it in LaTeX. However the Commelin-Topaz-… formalisation of the Liquid Tensor Experiment seems to make a mockery of this concept, because around 30,000 lines of code went into formalising the five page proof of Theorem 9.4, and around 60,000 lines of code went into formalising the five lines above. At the very least it indicates that humans can squeeze a huge amount of mathematics into one line of text. Had the required basics (abelian categories etc) been in Lean’s mathematics library mathlib already, the formalisation would have taken far less time and far fewer lines. One thing we did benefit from was Scott Morrison’s work — Scott has built a huge amount of category theory in mathlib over the previous six years, providing us with a stable foundation to build on.

One question which a professional mathematician with no Lean experience could ask when trying to evaluate the work is: how to be sure that the Lean formalisation of the challenge actually corresponds to the mathematical version of the challenge? For example how can we tell that the symbol in the Lean code actually corresponds to the real numbers, and there is no cheating in the background? Commelin and Topaz have attempted to deal with this issue by making an examples directory in the repository, where they prove basic theorems about the objects involved (for example they show that is a complete ordered field and that any complete ordered field is uniquely isomorphic to , as evidence that Lean’s really does correspond to the real numbers and there is no trickery going on). They also created evidence for Lean’s Ext groups, profinite spaces, condensed abelian groups and so on; if you understand the mathematical definitions of these objects then you can see an indication that the project’s definitions coincide with the usual mathematical definitions, even if you cannot directly read the Lean code.

What did we learn?

Peter Scholze already explains in a second blog post two quite interesting things that we learnt during the process. Firstly, we learnt that the stable homotopy theory used in the original proof (via the Breen-Deligne resolution) can be eliminated. The way this was discovered was quite interesting; the formalisers were avoiding having to formalise this part of the proof but we were forced to start formalising the statements of what we needed; once these were clarified Commelin and Scholze realised that actually they could get away with a weaker theorem; initially we were calling the trick the “Commelin complex” but later on we discovered that the construction (but not the theorem needed about it) was made by MacLane many decades ago. And secondly Scholze seemed to learn something conceptual about the proof — he described it as “what makes the proof work”. You can see his thoughts on the matter at this mathoverflow question. But in some sense this is old news (the second blog post was published over a year ago). What else did we learn?

In my mind, one of the most important things we learnt is that if you get a group of mathematicians who are focussed on formalising an arbitrary piece of modern mathematics, then it can get done, however complex the theory is. In other words, we have a substantial piece of evidence showing that the software that computer scientists have developed for doing this task has now “caught up with humans”. One could regard this as a triumph. One could also regard it as a pretty weak statement. The computer did not “do” the mathematics — the humans did it; first Scholze and Clausen did it, and then Scholze write the argument down (in pdf notes for the course in Bonn, linked to above) in a form suitable for mathematicians, and then Commelin and Massot wrote a more detailed mathematical blueprint in a “hybrid” part-Lean part-LaTeX format (you can view the blueprint here; it goes into much of the detail of the proof), and then finally a team of people transcribed what they had written from a human language into a computer language. One thing we learn is that the computer language (Lean’s dependent type theory, in this case) is rich and expressive enough to be able to formalise recent and complex work of a Fields Medallist. One could ask whether it is feasible to formalise this work in a weaker logic, for example in Higher Order Logic, the logic used by Isabelle/HOL and the other HOL systems. I don’t think anyone actually knows the answer to this question; I have conjectured that the answer is “no” but there are expert HOL users who seem to conjecture that the answer is “yes”. My real fear is that we will never find out. Manuel Eberl is making fabulous progress formalising what I would call “standard MSc mathematics” such as the basic analytic theory of modular forms in a HOL system, but what we are missing is a HOL evangelist from the mathematical community (a kind of HOL version of me) and I don’t know how to create such a person. Indeed in some sense I am making the problem harder, because my constant evangelism for Lean to mathematicians has the consequence that research mathematicians who get interested in formalisation tend to come to Lean first, and the situation is becoming more and more like a monopoly. There are plus and minus points to this; one minus point is that nobody seems to be working on getting HOL to do things that “it wasn’t designed to do” (for example a modern treatment of sheaf cohomology, etale cohomology etc via derived functors).

We also learn that formalisation can be, at times, very slow going. However one should not lose faith here. The reason it took 1 year to formalise five lines of mathematics was that a huge amount of theory-building had to be embarked upon beforehand. That theory only needs to be built once. Lean now has a working theory of short and long exact sequences, derived functors, abelian categories, complexes and so on; furthermore there is evidence that the formalisation of these definitions is usable in practice, because we used them. Indeed this is another thing that we are learning in the Lean community; instead of thinking “what MSc course shall we formalise next” we are thinking “what major result shall we start working towards” because this not only drives development but it ensures that developments of theories are actually usable.

We learn that formalisation can be parallelised, although we already knew this long ago from the work of Gonthier et al on the odd order theorem, and the work of Hales et al on the Kepler conjecture (both papers have 10+ authors). Different people can be put to work on different areas of a formalisation project and do not even have to know about, or understand, what others are working on. The forthcoming paper on the formalisation will have a nice long list of authors, and this is something I’m very happy about; furthermore anyone interested in seeing the precise details of what everyone did can just look at the list of commits and committers on github — the details are completely open to everybody. This is open source mathematics.

The future

So where are we going and what is the point? These are very reasonable questions. One possible reaction that a mathematician might have to the work is that it is pointless, because it is a computer verification of a result which was “basically known to be true anyway”. This is missing the point. Why are we teaching computers modern mathematics? Because I believe that one day future computer systems will be helping us, and whilst I cannot see into the future, consider the following question. Scholze and Clausen are arguing that condensed abelian groups are an important new definition in mathematics. How can AI developers create a chatbot which helps graduate students to learn about the category of condensed abelian groups, until professional mathematicians have taught computers what a condensed abelian group is? I know that some people in the machine learning crowd would answer things like “use language models, train on the papers about condensed abelian groups” but I am not yet convinced that this alone will turn into a useful tool. Lean (and other theorem provers) offer a framework where one can create a library of useful theorems, examples, and counterexamples. We do not even have to type in all the proofs. If “we” (the mathematical community, for example PhD students in arithmetic geometry) start translating important statements, examples, counterexamples, references and so on, and somehow combine this with language models, we will at least make some kind of interesting thing, which I imagine will get more helpful to humans over time. Some dream of a time where computers are better at proving theorems than humans. For me this is still complete science fiction; I have seen no evidence that we are anywhere near this milestone, and have no real vision as to how we are ever going to get there. But one can instead dream of a time where computers are helping humans to do research level mathematics, and work like this makes me believe that this goal is something I might see in my lifetime.

Posted in Algebraic Geometry, liquid tensor experiment | Tagged , | 4 Comments

The Future of Interactive Theorem Proving?

This is a guest post, written by Zhangir Azerbayev. Zhangir is an undergraduate at Yale, majoring in computer science and mathematics. He completed this work while visiting Carnegie Mellon’s Hoskinson Center for Formal Mathematics.

Introduction

The history of interactive theorem proving can be told as a story of allowing the user to interact with the system at gradually higher levels of abstraction, getting further away from the axioms and closer to informal mathematics. Automath, the first interactive prover, floated very close to the axioms indeed. Mizar was the first attempt to make formal proofs comprehensible to mainstream mathematicians. Contemporary systems such as Coq and Isabelle add many useful features such as type classes and tactics, and most users don’t have to think about what the axioms even are. Today, there is still a learning curve to Lean, but once one learns the syntax, mathlib proofs are not terribly far from an informal version—at least compared to Automath. If we extrapolate this trend to its logical conclusion, we might imagine an interactive theorem prover where every single line of code is hidden behind the interface, and the user interacts with the machine purely in natural language. Recent advances in machine learning suggest such a system is not as far in the future as it may sound.

As a very small step towards realizing this vision, myself and Edward Ayers have developed Lean Chat, a VS-code extension that provides an interface for autoformalizing natural language theorem statements in Lean. Under the hood, these translation are generated by OpenAI’s Codex, a text generation model trained on Github source code.

The interface is straightforward: enter a theorem statement in good-old \LaTeX, and the app will have a go at formalizing it. If the generated formal statement is correct, fantastic! You can copy and past the statement into your Lean source file. If it is incorrect, you can give the app feedback on what to fix in a natural, conversational way. Let’s see a few examples.

Examples

Let’s begin with a simple statement about groups.

Lean would be happy with this, and we could accept this if we’re feeling lenient. But the predicate is_group_hom is deprecated, so let’s tell the app to use the up-to-date syntax.

I’m happy with that.

Let’s give the app a slightly harder example, this time the statement of an exercise from James Munkres’ famous topology textbook.

Perfect on the first try!

The next example is due to Johan Commelin: let’s ask for a statement of Burnside’s theorem, an important result in finite group theory which is not in mathlib.

Quite a lot is going on here. The app is able to figure out that the correct way of stating a group is finite is [group G] [fintype G], and the correct way of denoting the order of a group is fintype.card G. Moreover, it is able to interpret quite high-level instructions such as “no need for the existential”, suggesting the app has at least partial grasp of the semantic connection between formal and informal mathematics. Slipping up on minor syntactic conventions, like writing solvable instead of is_solvable, is one of Lean Chat’s most common mistakes. These errors can almost always be fixed with dialogue.

Unfortunately, Lean Chat is far from perfect, and often fails in unpredictable and somewhat comical ways. This next example was also discovered by Johan Commelin:

For some completely unclear reason, the app decides to consider a Lie algebra over \mathbb{R} instead of my explicit instruction to use a field k, and that’s even before addressing the fact that tangent_space_at_identity is not a mathlib definition. I tried a few different responses to the app’s initial attempt, and couldn’t get it to do anything helpful.

However, the app knows at least something about Lie algebras. The next example is due to Oliver Nash.

One particular class of problem Lean Chat struggles with is Olympiad-style questions. The next example was discovered by David Renshaw:

This example is particularly embarrassing, because despite my specific instructions, the app steadfastly sticks by its initial guess.

How does this work?

Under the hood, Lean Chat’s formalizations are generated by OpenAI’s Codex language model. The Codex model is architecturally the same as the famous GPT-3, and has a simple objective: given a string of text, predict the next word. Iterating this next-word prediction allows the model to generate coherent text. Both Codex and GPT-3 are neural networks, which means they are a function parametrized by billions of real numbers, and these parameters are tuned by gradient descent to maximize next-word prediction performance on a corpus of training data. GPT-3’s training data is virtually all the text on the internet, and Codex is trained on all the source code on Github. This explains why Codex has some modest ability at writing Lean, although Lean code is a very tiny fraction of its training data.

If you take seriously the idea that neural networks are a lot like the brain, as I do, it is unsurprising that there exists some configuration of parameters that allows the network to competently use language. However, it is almost entirely a mystery why gradient descent is able to find this configuration of weights.

These language models are pattern matchers: if you ask them to complete the sentence “Not a creature was stirring, not even a _”, it will finish with the only pattern it’s seen in it’s training data: “mouse”. We can exploit this pattern matching ability to steer our language model into doing a desired task using a technique called few-shot prompting. Suppose we tell GPT-3 to complete the following text

    English: How are you this morning?
    French: Comment vas tu ce matin?

    English: I arrived in London late because I missed my connecting flight.
    French: Je suis arrivé à Londres avec un jour de retard car j'ai raté mon vol de correspondance.

    English: Grothendieck was an algebraic geometer
    French:

Because the language model was trained to complete patterns, given this text it will generate “Grothendieck était un géomètre algébrique”. What Lean Chat does under the hood is exactly the same idea. Given the user’s input, we wrap it in the following prompt:

Natural language version: \"If $z_1, \\dots, z_n$ are complex, then $|z_1 + z_2 + \\dots + z_n|\\leq |z_1| + |z_2| + \\dots + |z_n|$.\" Translate the natural language version to a Lean mathlib version:

theorem abs_sum_leq_sum_abs (n : ℕ) (f : ℕ → ℂ) :
  abs (∑ i in finset.range n, f i) ≤ ∑ i in finset.range n, abs (f i) :=

... (two more examples go here)...

Natural language version: \"Let $X$ be a topological space; let $A$ be a subset of $X$. Suppose that for each $x\\in A$ there is an open set $U$ containing $x$ such that $U\\subset A$. Show that $A$ is open in $X$.\" Translate the natural language version to a Lean mathlib version:

theorem subset_of_open_subset_is_open (X : Type*) [topological_space X]
  (A : set X) (hA : ∀ x ∈ A, ∃ U : set X, is_open U ∧ x ∈ U ∧ U ⊆ A):
  is_open A :=

Natural language: [Your input goes here]. Translate the natural language version to a Lean mathlib version:

The Codex language model completes the pattern to generate the response you see in the app. To incorporate the user’s feedback, we simply concatenate the initial few-shot prompt, the model’s incorrect response, and the instruction “[user’s feedback goes here]. Try again:”, then feed the text back into the neural network.

Note that we don’t provide the model with any examples of failed and subsequently corrected statements, since then the model might start intentionally getting its first try wrong!

What does the future look like?

Lean chat is only the very beginning for autoformalization. Codex is an off-the-shelf model available through OpenAI’s API. Its creators likely never imagined it would be used for formal mathematics, and as such it is not at all optimized for that use case. The few-shot autoformalization capabilities of Codex were only noticed quite recently, and the community working at the intersection of machine learning and formal math has a lot of ideas for how to improve these systems. I expect that in the comings months my examples of Lean Chat’s failures will become completely outdated.

A natural next step after progress on autoformalizing theorem statements is to work on autoformalizing proofs. This is much more daunting that autoformalizing statements, but is definitely within reach. Current language model-based automatic theorem provers take as input the tactic state, and predict the next proof step. One could imagine building a prover that takes as input both a natural language proof and the tactic state to predict the next proof step, but otherwise works in exactly the same way.

Armed with these concrete ideas of how one might approach autoformalizing theorem statements and proofs, building a natural language interface to interactive theorem provers is still an ambitious project, but it is not science fiction either.

Posted in Machine Learning | Tagged , , , , | Leave a comment

Teaching formalisation to mathematics undergraduates

It’s been a hectic 2022 so far, but August is looking a lot calmer; this is the first of hopefully a few blog posts this month catching up on various things.

In this post I want to talk about the undergraduate course “Formalising Mathematics”, which I gave in January to March 2022 at Imperial College London. The course notes are here. The course was for 3rd and 4th year undergraduates and also for MSc students, and it was specifically about formalising mathematics in a theorem prover (in this case, the Lean theorem prover). I’m certainly not the first person to give such a course in a mathematics department — Patrick Massot has been teaching a formalisation course in Orsay to 1st year undergraduates for several years now — but probably I’m one of the first, so perhaps it’s worth recording what happened.

Overview of the course.

In short, the aim was to get students formalising undergraduate level mathematics in Lean. In stark contrast to all the other courses offered by pure mathematicians at Imperial, there was no exam. Students taking the course were asked to submit three projects. The projects were completely open-ended; the first was “formalise some mathematics you learnt in your first year”, the second was “formalise some mathematics you learnt in your second year” and the third was “formalise some mathematics you learnt this year” (note in particular that the MSc students are thus forced to formalise harder material than the 3rd year undergraduates). The first project was due in 4 weeks after the course started, the second 4 weeks after that, and the third 4 weeks after that. I’m extremely grateful to Imperial College for letting me run the course in such an unconventional way. Because resources for this area (i.e., information about computer formalisation written with mathematicians in mind) are hard to come by, I spent a lot of time online helping students. One of the most exciting things for me about teaching the course was that even though I was under some kind of contractual obligation to give these people a mark out of 100 at the end of term, my primary motivation was to teach undergraduate mathematicians how to formalise undergraduate mathematics and with the set-up I’d created I found myself in a really good position to do this. Students would ask question after question about their projects and I would simply help them, or get them to help each other, or they would ask on Discord and get help there. Nobody could copy from other people because all students chose different projects to work on, and I made it absolutely clear to the students that it was fine for them to ask other students, or me, if they were stuck. There are other courses (typically involving writing code, and run by applied mathematicians or statisticians) in my department which are assessed by projects, but typically in these courses everybody would be given the same project and the lecturer would have to think very hard about how to prevent plagiarism. I had no such problems. The course was a joy to run and a joy to mark; thank you to all 23 students who attended.

The projects

So what does a “project” consist of? Well, this cohort of students had no prior examples of projects, and of course I also didn’t know what a project was supposed to look like, because in some sense I was making the whole thing up as I went along. The students were told that a project consisted of a few hundred lines of Lean code, containing comments, backed up by a small pdf write-up of 5 or more pages, explaining what they’d done, what they’d learnt, and what they found hard. Some students were slightly thrown by this rather informal description but I assured them that it was not really possible to say anything more precise because we were all on a journey together. One concrete question, which they stumbled onto after a while, was “what will you be giving marks for?”. I thought this was eminently reasonable, so I told them that each project would be graded out of 100, with 30 marks for presentation, 30 marks for content, and 40 marks for “bonus”. This description seemed to placate them (it’s precisely the description of the mark scheme they’re given for other projects we run in the department, for example MSc projects) and we pressed on from there.

The first project.

What I got was glorious. Of course there was a huge range in quality. Remember that this is a course for 3rd and 4th year students, but some of those students have been coming to my formalising club, the Xena Project, since they were 1st years, and one or two of them know the software and the mathematics library better than I do. Kexing Ying, a regular at the club, asked if he could formalise Egorov’s theorem in measure theory for his first project despite it not being taught in the first year; I said that this was fine and he went on to formalise the theorem and proof, write up 5 pdf pages on how he did it, and then made a pull request containing the proof to mathlib, Lean’s maths library. One could argue that students such as Kexing had an unfair advantage. I would respond that actually they had a fair advantage. If I had instead been teaching a 3rd year Galois theory course and was running a weekly Galois theory club, and a student had been coming along since their first year and learning Galois theory, then of course after over 2 years of this they are going to get a good mark in a Galois theory course, and they deserve to, because they’ve put in the work.

At the other extreme, I had some students who after 4 weeks were still struggling to get to grips with the system and who needed a lot of support from me and others, and who formalised some very basic stuff, sometimes quite poorly (in a stylistic sense, at least: the code compiled so the proofs were certainly correct!). Their write-ups were however very informative, as they explained what they’d found hard, or how their initial plans of formalising [something complicated] had to be hugely scaled back as they began to understand that sometimes even the simplest-looking things on paper can be difficult to teach a computer. Computers are relentlessly pedantic and also highly non-geometric; our intuition is that x + y = y + x is “obvious”, but this is a theorem, and you cannot make a formal proof from “draw a picture”. Of course the students did not need to worry about this, because the proof that x + y = y + x is already in the maths library so they can just use it.

I had told the students that each project would have an accompanying oral exam, where I would spend 15 minutes discussing the Lean code which the students had handed in, just to check that they had written it and had understood it. I (intentionally) gave few details about how the orals would run. Because we were still nominally in some kind of Covid situation I decided to run the first set of orals online. I will be honest and say that actually part of the idea here was so that I could get to know each student individually and make sure that they understood that for me the important thing was the learning objective: making sure they came away from the course being moderately good at formalising mathematics in a theorem prover. It was manifestly clear that each student understood the code they’d written; I gave encouragement to the weaker students and observed that now they clearly had “got the hang of it”, the second project would surely be much easier.

Examples of what students proved are below. I leave it to you to guess which students had used the software before the course had started.

  • All cyclic groups are abelian.
  • The first isomorphism theorem.
  • The Schroeder-Bernstein theorem.
  • (1+p)^n >= 1+np for natural numbers n and p, and other similar results.
  • Egorov’s Theorem.
  • The category of small categories is complete (note: also not taught to our first years).
  • A sequence of real numbers is Cauchy iff it’s convergent.
  • The Bolzano-Weierstrass theorem.
  • The theory of nets and its relationship to the theory of filters (not on the first year syllabus).
  • If a product of two positive coprime naturals is a k’th power then each natural is a k’th power.

NB in case you didn’t guess, the last one was done by a seasoned Lean user and is rather fiddly to formalise.

Marking the first project.

One mistake I made was being too generous with marks for the first project. I certainly didn’t go over the top, but I wish I’d left myself some more room. There were some students who had clearly tried hard but had ultimately produced some very average work for their first project, and I rewarded them with a reasonable grade, which meant that later on when they were becoming more competent I was not able to say things like “you have improved vastly and this is reflected in the vast improvement in your grade”. However something I had not mentioned before was that the relative weights of the three projects were 20:30:50, so really everyone knew that the important project was the final one.

The lectures

I haven’t mentioned the lectures yet! I was given two hours with these poor souls, once a week for 11 weeks. This is the only course I have ever given where attendance at the lectures went up as the term progressed, although this might be partly due to the fact that Covid was much less of a thing in March than it was in January in the UK. The lectures were extremely informal. I would typically take some topic, e.g. one of the earlier ones was “the theory of sets, subsets, unions and intersections (both finite and infinite)” and I would just formalise some basic results from first principles, and then typically explain that they were already in Lean’s maths library, and then go on to some more difficult stuff. I was very open to questions. I would occasionally write on the board but it was mostly me live coding and taking questions. The hardest part was deciding what topics to cover, but given that I was generating the course notes on the fly and the students were thinking about what topics their projects would be on, there was no shortage of suggestions. For most topics (there ended up being about 15) I would set some example sheets for the students at the course Github repository and video myself solving the problem sheets live and then dump the results on a YouTube playlist. The results are not always pretty to watch, but Talia Ringer mentioned once on Twitter that she thought it was a good idea to let students see you thinking in real time about course material (not least because they’ll see that you sometimes struggle just like them), and I agree with her, so that was what the students got.

The lectures, and in particular the topics of the lectures, were really the one part of the course where I ended up making big changes to my plans. Last academic year, as preparation for this course, I had given a multi-center graduate level course for students at Bath, Bristol, Imperial, Oxford and Warwick, and you can see the topics I covered at that link. The material covered there was basically what I was planning to do with the undergraduates. However when it came to doing things like filters the students actively discouraged me from covering the material. Why? Well, it’s obvious why, when you think about it. We don’t teach them filters in our undergraduate degree, and their task was to formalise stuff from their undergraduate degree, so why take a detour through filters when I could be doing things like basic ring theory and other stuff which they’d in practice find much more useful? This was a shock to me and involved me having to generate a bunch of material at very short notice, but given that I am so obsessed with the software and am happy to have any excuse to formalise anything in it, I played along. In particular, we have a popular graph theory course at Imperial so I was forced to learn how to use the graph theory part of mathlib, something I’d never looked at all before.

The second project

The second project was due in 8 weeks after the start of the course, and I really hoped that by this point every student would be on top of the use of the software, but of course it’s very easy to forget (at least if you’re as naive as I am) that (a) the students are actually also studying other courses and (b) theorem proving software is really hard to learn. Some students were still struggling. I still had all the time in the world to help them out though, and of course students were also developing little communities (often online) where they’d help each other out.

Something I haven’t mentioned before: I was the victim of a loophole in the rules. Some of the students doing the course were doing joint mathematics and computer science degrees, and some of those people chose to formalise results which they perceived as “mathematics” but which was getting beyond my pay grade. For their second project, one student formalised frame definability for modal logic and basically I had to learn the theory from their write-up and the references, and then judge the project afterwards. That wasn’t supposed to happen. Another student proved completeness and compactness of a system of logic I’d never heard of. Next year I am unfortunately going to have to ban formalisation of material which was not taught in the mathematics department, for my own sanity.

Again I got a huge variation of projects. A student formalised topology from first principles and developed a bunch of theory. Topology was quite popular in fact, with other students proving facts such as “continuous image of compact is compact” and so on. A student formalised facts about the arithmetic of the Gaussian integers Z[i] (for example that it was a UFD). Some students simply took problem sheet questions from other classes and formalised these: I got worked examples to problem sheet questions from analysis and ring theory, and I quite liked those projects, not least because for some “prove a standard theorem” projects the student may well produce a proof which is somehow much worse than the proof which is already in the maths library, whereas for worked solutions to problem sheet questions you can be almost guaranteed that the results will not be in the library already. The students who were already fluent in Lean (and typically tended to be more advanced mathematically too) again took the option to do some fancy abstract stuff (more category theory, for example). Some people were too ambitious and realised that their plans needed to be curtailed; students began to ask whether they could assume results without proof and then build on them. I said yes, so some of the projects were incomplete in the sense that they assumed various mathematical theorems as axioms and then built on them. One student assumed something false as an axiom (it was nearly true but they made a slip) and this presented me with quite a conundrum; I had to really think hard about how they had used the false statement and whether the related true statement would have sufficed. This project was hard to mark.

By this point I was getting tired, and I really didn’t want to have to lose an entire day doing orals for 23 projects when I knew that students were engaged with the course and were not cheating (indeed by this point I was on first name terms with most of the students and had talked to most of them about their projects several times before the deadline), so I told the students to each find another student and ask them to give them an oral and then to report back to me. I thus had to come up with guidelines for being an oral examiner. The guidelines said “get the other student to show you some of their code, look through it, choose a part you don’t understand and then get them to explain it to you. Once you understand it, they’ve passed; let me know this”. Note that the orals were not worth any points so I felt academically happy to do this.

As part of the marking process I would write feedback for the students. By the second project I had understood that this was an important part of the process and I spent a huge amount of time writing long feedback for most of the students (which kind of made up for the fact that I’d dodged doing the second orals, at least in my mind). Conversely, I had noticed that some of the write-ups were twice the length I had recommended, because students were keen to explain what they had done and would ask if it was OK to write more than 5-7 pages; I said “sure”.

The final project

By the last few weeks of the course, we finally all knew what we were doing. I would spend time on material which students had requested, showing the class how to formalise it in Lean. Students came up with ideas about what to formalise very early in the four week cycle; they had four weeks to prepare each project but by this stage it seemed to me that many of them started the four weeks knowing exactly what they were going to be doing. Even the weaker students had figured out a good strategy — don’t bite off more than you can chew, and ask if you need help. Students were completely open about getting help — they would openly say in their write-ups “I got totally stuck when trying to do X, so I asked student Y who explained a really nice trick; lines 153-163 of my code were written by them”. All this was fine. Remember that I didn’t care at all about the marks or how the student had got there — as far as I was concerned the goal was to achieve the learning objective, which was that by the end of the course the students should know something about formalising mathematics, and ideally mathematics at the level they were currently at academically, in a theorem prover. Here’s some of what I got:

  • The Vitali convergence theorem.
  • Box product of two graphs and proof that the product was connected iff both graphs were connected.
  • Chinese remainder theorem for commutative rings.
  • Lagrange’s 4 square theorem (incomplete proof but only assuming true things).
  • If P is a presheaf on a category then the category of presheaves over the category of elements of P is equivalent to the over-category of P.
  • Categorical semantics of the simply-typed lambda calculus (gaargh).
  • Classification of integer solutions to y^2=x^3-1.
  • Theorems about economic models involving quasi-concavity (gaargh).
  • Hensel’s Lemma (fully proved, for complete nonarchimedean fields).
  • Solution of a number theory exam question about which rings Z/nZ have unit group with exponent 2.
  • Existence of nontrivial integer solutions to Pell’s equation.

By this stage we had somehow all got the hang of things. Students in general asked for much less help, and had a much better feeling for what was feasible. In contrast to the first project, where everyone proved everything, here several people assumed some facts as axioms in their projects (this time all the facts were true though). Marking stuff which students had learnt this year in courses from departments other than the mathematics department was very hard and as I’ve said, will be banned next year. Again I gave copious feedback (even though in some sense it was irrelevant because the course was now over; however I found that it was very easy to find things to say). The “oral” this time was replaced by an internal mini-conference which was going to be hybrid but at the last minute was forced online because of Covid; all the students showed up and each one gave a three minute presentation on their work to the rest of the class. Some were great, some were disastrous (when a student says “wow is that 3 minutes already?” this can be interpreted as “I didn’t practice”), but the course was over by this point and the orals were worth nothing; I was just super-pleased that everyone showed up!

So that was it. As is probably clear, I loved it. I got very positive feedback from the students too. The course notes are still kind of incomplete, and I am not particularly motivated to complete them because Lean 4 is on the horizon and I’ll have to rewrite everything when the port of the mathematics library happens. In the mean time Jeremy Avigad is leading a team which is working on a much more mature document — Mathematics In Lean — and when I’ve finished up writing this and a couple more blog posts I’ll start writing a chapter on linear algebra for that.

The course is running again next year 😀

Posted in Uncategorized | Tagged | 2 Comments

2022 Xena Project undergraduate workshop

I have funding to run an undergraduate workshop!

When?

26th to 30th September 2022 (the week before term starts for many people in the UK).

What?

A week-long in person workshop including a bunch of working on group projects in Lean, and some special guest lectures. The goal is ultimately to make some “outputs”, e.g. projects on GitHub and/or publications in journals/conference proceedings.

Who?

If you’re on, or have recently finished, a maths undergraduate or masters level degree, and haven’t started on a PhD yet, you’re eligible to apply. There are around 30 slots. For those offered a place on the workshop: I can pay for accommodation for the week in London, and travel to/from locations within the UK only. If you are a non-UK person interested then you can certainly apply but you’ll have to make it to/from the UK using your own resources, and I can pay from there to get you to/from London and for the hotel.

How to apply?

Send an email to xenaprojectimperial@gmail.com with a brief cover letter saying something about any past Lean experience you have, or Lean projects you’ve been involved in. Preliminary indications are that I’ll get more than 30 applications so probably not everyone who applies will get a slot. Deadline for applications is 24th July 2022. Anyone who wants to get some Lean experience quick to beef up their CV can take a look at my recent undergraduate course notes and see if they can knock something up. Or just play the natural number game and tell me how it went!

Projects?

Yeah, why not. Several people from the Lean community have suggested some projects which they are interested in running based on questions in combinatorics, probability theory, number theory and other fields. There will be projects at all levels from undergraduate to more challenging.

Diversity!

I take this seriously. This will not be a room full of cis white males. Applications from anyone who is not a cis white male will be very very welcome, and thanks to the people I’ve already heard from who are not cis white males but who have told me they’re going to apply. Remark: looks like the majority of the project leaders will not be cis white males.

Questions?

Email to k.buzzard@imperial.ac.uk works.

Thanks

Thanks go to Imperial College and Jalex Stark for funding.

Posted in Uncategorized | Leave a comment

New year, new teaching material

Imperial College has gone back to in-person teaching so I have 250 new subjects to experiment on, namely the new 1st year maths undergraduates.

I’ve spent a few years now trying to figure out how best to teach maths undergraduates how to get up to speed quickly with Lean, a theorem prover which uses dependent type theory and has a large mathematics library. In theory students can do all of the problem sheets I give to them in Lean. In practice things aren’t nearly so simple. As well as a long list of tactics to internalise, it helps if students get a feeling for what’s in Lean’s maths library already, and perhaps they should also know the difference between definitional and non-definitional (i.e. propositional) equality. Unfortunately definitional equality is a non-mathematical concept, in the following sense: if you define addition on the naturals recursively by n+0:=n and n+(succ m):=succ (n+m) then n+0=n is true by definition and 0+n=n is not. This is an asymmetry which is of no relevance in real-world mathematics.

This year I’m still avoiding talking about definitional equality, and I’m also avoiding talking about the example sheet questions which I’m giving to my class. A typical example sheet question, even if easy from a mathematical point of view, may only yield to someone who is armed with a whole bunch of Lean tactics. So this year, instead of encouraging the students to work on the example sheets, I’m encouraging them to work on easier problems, so that we can build up to the example sheets later.

The 2021 Lean course home page is here, and I’m making an associated collection of short videos here . I’m going through basic tactics one by one, making what I hope is a more comprehensible introduction to doing mathematics in Lean. All the Lean problem sheets I’ve written so far can be tried online without having to install Lean, but installation instructions for those who want a slicker experience are here. 1st years are busy with their coursework right now, but when it’s over hopefully I will be able to get some feedback from them and others about this new approach. At the time of writing, I have finished the logic sheets, and I’ve just started pushing sets sheets. Still to come: functions and binary relations.

Next term things are getting much more serious. I’m actually teaching an official Lean course as part of our undergraduate program. In contrast to what I’m doing this term (evangelising), next term (Jan to March 2022) I will actually be figuring out how to get students to engage with some more serious undergraduate mathematics. Students will be examined by a series of projects, in contrast to the usual approach here for final year courses (a closed book exam under timed conditions). I’m going to cover standard topics like basic analysis and topology, and also more esoteric ideas like filters and perhaps uniform spaces. Should be interesting! This will involve yet another repository, which I suspect will be to a certain extent based on this one. But more on that later.

Posted in Uncategorized | Tagged | 3 Comments

2021 Xena Summer Projects

So I spent July and August supervising mathematics undergraduates who were doing Lean projects in various areas of mathematics. Conformal maps, Euclid’s SAS theorem, nilpotent groups and the Radon-Nikodym theorem are just some of the topics I’ve been engaging with this summer. Earlier this week we had a summer project conference, where eight students presented 15 minute talks on their work. I recorded the talks, and they’re now up on YouTube. For each talk there were also 15 minutes of questions but I decided not to record these so that attendees could feel free to ask basic questions.

The students behind four of the projects managed to get PR’s accepted into mathlib, and for two of them this was their first mathlib PR (and possibly even their first contribution to open source). These students had to use git and github (some for the first time), but these are skills which I personally now value as important and worth teaching (it is commonplace to teach this sort of thing to computer scientists, but mathematicians seem to miss out here). In 2022 I will be teaching a formalisation course to undergraduates at Imperial and we will be using git and github for this too.

I think the video titles are pretty explanatory, and perhaps now isn’t the time to be going through the technicalities of exactly what the students achieved. However for the most part we stuck to the mantra of: learn the maths first, and once you think you understand it, then try to formalise it. That way, of course, you find out whether you do actually understand it 🙂

What was the set-up?

We ran the entire thing on the Xena Project Discord server, a server for undergraduate mathematicians interested in formalisation of mathematics. This worked very well for me and, I think, for them. Students could share their screen if they had technical questions. Everything was done in voice or text channels, and in particular there are some students which I supervised and who I would not recognise if I met them in the street. Not that this bothered me in the slightest.

Timetabling: I basically promised to be online every Tuesday in July and August from 10am until about 6pm, so the entire thing functioned as a drop-in. Some students talked to me a lot, some students barely talked to me at all, and of course some students talked to other people. Creating a community was, I felt, a bit harder than previous in-person summer projects which I’ve run in the past (where you can just go to lunch with a bunch of UGs and let them talk to each other) but of course these are extraordinary times and we have to make do. One big advantage of running stuff online was that students could be in different countries and still participate, and more generally students could move around (e.g. to and from London) without it disrupting their supervision. I live and work in London and for some students it’s problematic to stay without a serious source of funding, and working online also solved that problem. Going forward — assuming things are far more normal next year I might be tempted to run summer projects in a hybrid way next year.

Thank you to all students who participated. If you are a mathematics undergraduate who is interested in formalising some mathematics in Lean (probably Lean 4 next year I guess!) over the summer of 2022, then get in touch with me at some point and we’ll see what we can do.

Posted in Uncategorized | Leave a comment

Half a year of the Liquid Tensor Experiment: Amazing developments

[This is a guest post by Peter Scholze.]

Exactly half a year ago I wrote the Liquid Tensor Experiment blog post, challenging the formalization of a difficult foundational theorem from my Analytic Geometry lecture notes on joint work with Dustin Clausen. While this challenge has not been completed yet, I am excited to announce that the Experiment has verified the entire part of the argument that I was unsure about. I find it absolutely insane that interactive proof assistants are now at the level that within a very reasonable time span they can formally verify difficult original research. Congratulations to everyone involved in the formalization!!

In this Q&A-style blog post, I want to reflect on my experience watching this experiment.

Question: Who is behind the formalization?

Answer: It was formalized in the Lean Proof Assistant, mostly written by Leonardo de Moura from Microsoft Research, and used the extensive mathematical library (mathlib) written by the Lean community over the last four years. Immediately after the blog post, the Lean prover/mathlib community discussed the feasibility of the experiment on the Lean Prover Zulip Chat. Reid Barton did some critical early work, but then Johan Commelin has taken the leading role in this. In outline, Johan made an attack along the path of steepest ascent towards the proof, and handed off all required self-contained lemmas to the community. In particular, to get the project started, by January 14 he had formalized the statement of Theorem 9.4 of [Analytic], whose proof became the first target, and has now been completed on May 28, with the help of the Lean community, including (mathematicians) Riccardo Brasca, Kevin Buzzard, Heather Macbeth, Patrick Massot, Bhavik Mehta, Scott Morrison, Filippo Nuccio, Damiano Testa, Adam Topaz and many others, but also with occasional help from computer scientists like Mario Carneiro. Here is a link to the repository containing the formalised proof of Theorem 9.4, and you can also view its dependency graph, now fully green and completed.

Question: How did you follow the project?

Answer: I joined the Zulip chat to answer any mathematical questions that may arise, but also as an interested spectator.

Question: What is the significance of Theorem 9.4? Is the Liquid Tensor Experiment completed?

Answer: Theorem 9.4 is an extremely technical statement, whose proof is however the heart of the challenge, and is the only result I was worried about. So with its formal verification, I have no remaining doubts about the correctness of the main proof. Thus, to me the experiment is already successful; but the challenge of my blog post has not been completed. It is probably fair to guess that the experiment is about half-way done. Note that Theorem 9.4 abstracts away from any actual condensed mathematics, so the remaining half will involve a lot of formalization of things like condensed abelian groups, Ext groups in abelian categories, and surrounding machinery. The basics for this have already been built, but much work remains to be done.

Question: How did the formalization proceed?

Answer: Initially, I imagined that the first step would be that a group of people study the whole proof in detail and write up a heavily digested version, broken up into many many small lemmas, and only afterwards start the formalization of each individual lemma. This is not what happened. Instead, the formalization followed quite closely the original lecture notes, and directly attacked Lemma after Lemma there. It did seem that the process was to directly type the proofs into Lean. Lean actually gives the user a very clear summary of what the current goal is, so one always needs to get a very clear sense of what the next few steps really are. Sometimes it was then realized that even on paper it does not seem clear how to proceed, and the issue was brought to attention in the chat, where it was usually quickly resolved. Only after a lemma was entirely formalized, the proof, now thoroughly digested, was again written up in the Blueprint in human readable form.

Question: So it’s not actually a blueprint, is it?

Answer: Right — it’s not the blueprint from which the Lean code was formed, but (largely) the other way around! The Lean Proof Assistant was really that: An assistant in navigating through the thick jungle that this proof is. Really, one key problem I had when I was trying to find this proof was that I was essentially unable to keep all the objects in my “RAM”, and I think the same problem occurs when trying to read the proof. Lean always gives you a clear formulation of the current goal, and Johan confirmed to me that when he formalized the proof of Theorem 9.4, he could — with the help of Lean — really only see one or two steps ahead, formalize those, and then proceed to the next step. So I think here we have witnessed an experiment where the proof assistant has actually assisted in understanding the proof.

Question: Was the proof in [Analytic] found to be correct?

Answer: Yes, up to some usual slight imprecisions.

Question: Were any of these imprecisions severe enough to get you worried about the veracity of the argument?

Answer: One day I was sweating a little bit. Basically, the proof uses a variant of “exactness of complexes” that is on the one hand more precise as it involves a quantitative control of norms of elements, and on the other hand weaker as it is only some kind of pro-exactness of a pro-complex. It was implicitly used that this variant notion behaves sufficiently well, and in particular that many well-known results about exact complexes adapt to this context. There was one subtlety related to quotient norms — that the infimum need not be a minimum (this would likely have been overlooked in an informal verification) — that was causing some unexpected headaches. But the issues were quickly resolved, and required only very minor changes to the argument. Still, this was precisely the kind of oversight I was worried about when I asked for the formal verification.

Question: Were there any other issues?

Answer: There was another issue with the third hypothesis in Lemma 9.6 (and some imprecision around Proposition 8.17); it could quickly be corrected, but again was the kind of thing I was worried about. The proof walks a fine line, so if some argument needs constants that are quite a bit different from what I claimed, it might have collapsed.

Question: So, besides the authors of course, who understands the proof now?

Answer: I guess the computer does, as does Johan Commelin.

Question: Did you learn anything about mathematics during the formalization?

Answer: Yes! The first is a beautiful realization of Johan Commelin. Basically, the computation of the Ext-groups in the Liquid Tensor Experiment is done via a certain non-explicit resolution known as a Breen-Deligne resolution. Although constructed in the 70’s, this seems to have not been in much use until it was used for a couple of computations in condensed mathematics. The Breen-Deligne resolution has certain beautiful structural properties, but is not explicit, and the existence relies on some facts from stable homotopy theory. In order to formalize Theorem 9.4, the Breen-Deligne resolution was axiomatized, formalizing only the key structural properties used for the proof. What Johan realized is that one can actually give a nice and completely explicit object satisfying those axioms, and this is good enough for all the intended applications. This makes the rest of the proof of the Liquid Tensor Experiment considerably more explicit and more elementary, removing any use of stable homotopy theory. I expect that Commelin’s complex may become a standard tool in the coming years.

Question: Interesting! What else did you learn?

Answer: What actually makes the proof work! When I wrote the blog post half a year ago, I did not understand why the argument worked, and why we had to move from the reals to a certain ring of arithmetic Laurent series. But during the formalization, a significant amount of convex geometry had to be formalized (in order to prove a well-known lemma known as Gordan’s lemma), and this made me realize that actually the key thing happening is a reduction from a non-convex problem over the reals to a convex problem over the integers. This led me to ask my MathOverflow question whether such a reduction was known before; unfortunately, it did not really receive a satisfactory answer yet.

Question: Did the formalization answer any questions from the lecture notes?

Answer: Yes, it did, Question 9.9 on the growth of certain constants. There are now explicit recursive definitions of these constants that are formally verified to work, and using this one can verify that indeed they grow roughly doubly-exponentially.

Question: What did you learn about the process of formalization?

Answer: I learnt that it can now be possible to take a research paper and just start to explain lemma after lemma to a proof assistant, until you’ve formalized it all! I think this is a landmark achievement.

Question: And about the details of it?

Answer: You know this old joke where a professor gets asked whether some step really is obvious, and then he sits down for half an hour, after which he says “Yes, it is obvious”. It turns out that computers can be like that, too! Sometimes the computer asks you to prove that A=B, and the argument is “That’s obvious — it’s true by definition of A and B.” And then the computer works for quite some time until it confirms. I found that really surprising.

Question: Can you read the Lean code?

Answer: The definitions and theorems are surprisingly readable, although I did not receive any training in Lean. But I cannot read the proofs at all — they are analogous to referring to theorems only via their LaTeX labels, together with a specification of the variables to which it gets applied; plus the names of some random proof finding routines. Still, I have the feeling that it should be possible to create a completely normal mathematical manuscript that is cross-linked with the Lean code that makes it possible to navigate the Lean code seamlessly — I think the creation of such an interface has also become a goal of the experiment.

Question: So there will be more documentation work to do, also for the part that has just been completed?

Answer: Definitely! Currently, the Lean code leading up to the proof of Theorem 9.4 is not well-documented, and some parts of the proof could definitely be streamlined. Moreover, large parts of it are basic material that should become part of mathlib. It should be noted that because mathlib is constantly evolving, any project that uses it has to continually make small changes so that it will still compile with the newest version of mathlib. So it is vital that the parts of the proof of general interest are moved into mathlib, where they will be maintained.

Question: What is the de Bruijn factor, i.e. the ratio between the length of the computer-verified proof and the length of the human proof?

Answer: It depends on the method of calculation, but somewhere around 20. I think this is amazingly small! I had expected that the first step of taking the lecture notes and turning them into a properly digested human proof — which as I said didn’t actually happen — would already introduce a factor of ~5. But the blueprint is actually only a factor of ~2.5 longer than the relevant part of the lecture notes right now.

Question: How do you feel now that the proof has been broken down to the axioms of mathematics?

Answer: Good question! Usually the verification of a proof involves trying small variations of the argument and seeing whether they break or not, whether they lead to statements that are too strong etc., in order to get a sense of what is happening. Basically a proof is like a map of how to get up a mountain, say; it may be a nice, slightly winding path with a broad view, or it may lead through the jungle and up some steep wall, requiring climbing skills. Usually there’s not just one way up, and one may try whether taking a left turn the view is nicer, or taking a right turn one can take a shortcut.

In the case at hand, it feels like the main theorem is some high plateau with a wonderful view, but the way there leads through a large detour, to attack the mountain from the other side, where it is dark and slippery, and one has to climb up a certain steep wall; and no other pathways are seen left or right. Answering the questions in the Zulip chat felt like I would give instructions of the form “put your foot here, then your hand there, then pull yourself up this way” at the more difficult passages.

So I have gained the reassurance that it is possible to climb the mountain along this route, but I still have no sense of the terrain.

Posted in Uncategorized | Tagged , , | 22 Comments

The trace of an endomorphism (without picking a basis)

What mathematics can we do with vector spaces, without having to pick a basis? For example, if V is a finite-dimensional vector space and \phi:V\to V is a linear map, can we define its trace “without picking a basis”?

I got interested in trying to understand if this question even has a meaning. Here are some thoughts.

Models for vector spaces

When we learn linear algebra at high school, we typically first learn the “concrete” theory, where vectors are columns of numbers, and we can multiply them by matrices and thus get a conceptual understanding of systems of linear equations. Then at university we go on to the “abstract” theory, where a real vector space is something defined by a list of axioms, and spaces like \mathbb{R}^n are now examples of these abstract objects.

We then learn about the fundamental notion of a basis of a vector space. Say we have an abstract finite-dimensional vector space. By picking a basis, the vectors in our vector space suddenly transform back into columns of numbers. Not only that, but linear maps between vector-spaces-with-a-basis turn back into matrices. By the time we’ve learnt that every vector space has a basis, we can see that our new theory is in some sense “the same as” our old theory. In some sense it took me a long time to get on top of this principle as an undergraduate; perhaps the key concepts were not emphasized to me enough, or maybe I just got lost in all the new (to me, at the time) ideas. Nowadays I think about it like this: initially we learn that \mathbb{R}^n is an example of a finite-dimensional vector space, but after learning about bases we can conclude that every finite-dimensional real vector space is isomorphic to \mathbb{R}^n for some n, so in fact \mathbb{R}^n can be thought of as a model for a finite-dimensional real vector space, just like the collection of equivalence classes can be thought of as a model for a quotient by an equivalence relation. Every vector space has a basis, so one can prove theorems about finite dimensional vector spaces by checking them on models, i.e. by picking a basis.

Don’t pick a basis (unless you have to)

After a couple of years at university the following idea had somehow sunk in: if possible, one “should not choose a basis”. The canonical example shows up when we learn about the dual of a vector space. The dual of a real vector space V is just the space of linear maps from V to \mathbb{R}; this space has a natural vector space structure and is called the dual space of V, with notation V^*. Confusing example: if V=\mathbb{R}^n, then an element of V is represented by a column of n numbers. Give V its canonical basis e_1=(1,0,0,\ldots,0), e_2=(0,1,0,\ldots), \ldots, e_n. Then any linear map V\to\mathbb{R} is uniquely determined by where it sends the e_i, and in particular an element of V^* is also uniquely determined by n numbers, so we can represent it as a vector of length n and we have proved that the dual of \mathbb{R}^n is \mathbb{R}^n again. Great! Furthermore, every finite-dimensional vector space equals \mathbb{R}^n (proof: pick a basis) so we’ve proved that duality is just the identity map!

Except that we haven’t, because we have been a bit too liberal with equality here (and as some computer scientists are very quick to point out, many mathematicians don’t understand equality properly, and hence they might accidentally end up teaching this point badly). This argument proves that if V is a finite-dimensional vector space, then it is isomorphic to its dual V^*; however it is not in general canonically isomorphic to its dual, whatever that is supposed to mean. In this instance, it means that different bases in general produce different isomorphisms between V (identified with \mathbb{R}^n) and V^* (also identified with \mathbb{R}^n). This is a bit confusing because in the group theory course running at the same time as the linear algebra course, a student is simultaneously being taught that when we say “how many groups are there of order 4” we obviously mean “…up to isomorphism”, because isomorphic groups are equal for that question.

However, if we do this trick twice, we can identify V with its double-dual (V^*)^* and it turns out that this identification is canonical, whatever that means. What it appears to mean in this case is that there is a really cool way of writing down the isomorphism from V to (V^*)^* which doesn’t ever pick a basis!

[Technical note/reminder: Here’s the explicit map from V to its double dual. Say v\in V. We need to write down a linear map V^*\to\mathbb{R} associated to v. So take \psi\in V^* and we need to construct a real number somehow. Well, what about \psi(v)? That works a treat! One can check that this map L(v) sending \psi to \psi(v) is indeed linear, and induces a map L from V to (V^*)^* which can be checked to be an injection and hence, by dimension counting, an isomorphism (NB: some magic just happened there). I suspect that this argument was my initiation into the mysterious word canonical, a word I now rail against, not least because in my opinion the Wikipedia page about canonical maps contains a definition which is full of fluff (“In general, it is the map which preserves the widest amount of structure, and it tends to be unique” — this is not a definition — this is waffle).]

The moral: all the canonical kids are too cool to pick a basis.

PS here is a funny way of thinking about it: if we identify V with \mathbb{R}^n as column vectors, then perhaps we should identify the dual of V with \mathbb{R}^n as row vectors, because multiplication on the left by a row vector sends a column vector to a number, which is what we want a dual vector to do. So identifying V with V^* is some high-falutin’ version of transpose, and if you transpose once you don’t quite get the same thing, but if you transpose twice then you’re back where you started. Canonically.

Picking a basis and then pretending you didn’t

OK so here’s a pretty cool theorem about traces (although any formaliser would tell you that it is actually a definition, not a theorem). If M is an n\times n square matrix then it has a trace, the sum of the elements on the leading diagonal. Now say V is a finite-dimensional real vector space, and \phi:V\to V is a linear map, crucially from V to itself. If we choose a basis of V and use the same basis for the source V and the target V, then \phi becomes a matrix, and we can take its trace. If we change our basis of V then the matrix representing \phi changes. But, miraculously, its trace does not! This can be proved by an explicit calculation: changing our basis for V changes the matrix M representing \phi to P^{-1}MP for a certain invertible “change of basis” matrix P (here was where it was key that the source and the target of the endomorphism were the same, otherwise we would have got P^{-1}MQ), and the traces of M and P^{-1}MP are equal because of the general fact that the traces of AB and BA are equal if A and B are square matrices of the same size (apply with A=P^{-1}M and B=P).

As a consequence, this means that if V is a finite-dimensional vector space then we can unambiguously talk about the trace of a linear map \phi:V\to V, in the following way. First do the slightly distasteful choose-a-basis thing, then take the trace of the corresponding matrix, and then prove that the calculation was independent of the basis you chose, so we can pretend you never did it. Similarly one can talk about the determinant and characteristic polynomial of \phi:V\to V, because these are also invariants of matrices which are constant on conjugacy classes.

However, you did do it — you chose a basis. Something is a little different to the map from V to its double dual — the map from V to its double dual really was defined without choosing a basis at all. Here we did something slightly different — we chose a basis, and then proved that it didn’t matter. Can we go one better, and define the trace of a linear map from V to V without choosing a basis at all?

So in the process of discussing this question on Twitter and on the Lean chat, established firstly that this very much depends on what the question actually means, and secondly I managed to learn something new about Lean, and when I learn something new about Lean I tend to blog about it, so here we are. First I’ll talk about the failed attempts to define the trace of an endomorphism, which led to some other questions and clarifications, and then I’ll talk about trunc, something which looked to me like a completely pointless operation in Lean and which up until now I’ve just ignored, but which somehow might be at the bottom of this.

A failed attempt, and some progress

In characteristic p there are some problems with the ideas below, but these are not relevant to what I want to talk about, so let’s just stick to vector spaces over the reals (or more generally any field of characteristic zero). The first observation is that computing the trace, determinant and characteristic polynomial all seem to be pretty much the same question: for example, if you can compute the char poly of \phi then you can compute its trace and det because you can read these things off from the coefficients. Conversely, if you can compute traces then applying this to some exterior powers you can read off the coefficients of the characteristic polynomial including the det. So computing any of these invariants without choosing a basis somehow boils down to the same thing.

Next let’s turn to Wikipedia, where we are informed of a basis-free way to compute the trace of an endomorphism! Here’s the trick. There’s an obvious bilinear map V\times V^*\to\mathbb{R}, sending (v,\psi) to \psi(v), and by the universal property of the tensor product this induces a linear map T:V\otimes_{\mathbb{R}}V^*\to\mathbb{R}. There is also an obvious bilinear map V\times V^*\to Hom(V,V) sending (v,\psi) to the linear map sending w to \psi(w)v, and this induces a linear map V\otimes_{\mathbb{R}}V^*\to Hom(V,V), which is easily checked to be an isomorphism if V is finite-dimensional. Composing the inverse of this isomorphism with T gives us a linear map Hom(V,V)\to\mathbb{R} which we can check to be the trace (e.g. by picking a basis). So we’re done, right?

Well, the thing about this construction is that whilst the map V\otimes_{\mathbb{R}}V^*\to Hom(V,V) is canonical (indeed, it even exists in the infinite-dimensional case), to prove that it’s surjective in the finite-dimensional case the natural thing to do is to pick a basis :-/ and to make the corresponding matrix by taking an appropriate linear combination of tensor products of elements of the basis and the dual basis. I would argue that we have made some progress here — we still picked a basis, but we used it to fill in a proof, rather than to construct data. However, we still picked a basis. Note also that the inverse of a computable bijection might not be computable, if you’re interested in that sort of thing, and I suspect that this might be a situation where this annoyance kicks in.

What is the dimension of a finite-dimensional vector space?

One might instead be tempted to argue that the map V\otimes_{\mathbb{R}}V^*\to Hom(V,V) is surjective because it is an injective map between vector spaces of the same dimension (I’m not entirely sure how to prove it’s injective without picking a basis, but it might well be possible; however I do not know how to prove that the dimension of V\otimes W is the product of the dimensions of V and W without picking a basis :-/ ). Anyway, talking of dimensions, here is the other “basis-free” method I learnt to do these sorts of things: the canonical method to work out the determinant of an endomorphism without choosing a basis. If \phi:V\to V is a linear map, and if n is the dimension of V then \phi induces a linear map on top wedge powers \bigwedge^n(V)\to\bigwedge^n(V), and an endomorphism of a 1-dimensional space is canonically a number (proof: pick a basis :-/ and check it’s independent of the choice) which can be checked to be the determinant of \phi, and if you can do determinants then you can do char polys and hence traces.

The problem with this approach is that it relies on you knowing what n is, the dimension of V, and if all you know is that V is finite-dimensional, then how do you get n, the dimension? Well obviously you pick a basis :-/ , count it, and then prove that your answer is independent of the choice of basis. So this “top exterior power” argument has moved us closer to the heart of the problem: forget defining the trace — how do we define the dimension of a finite-dimensional vector space without picking a basis? Note that the dimension is just the trace of the identity function, so we can add dimension to our list of things which we cannot “compute” without picking a basis. And now we’re getting somewhere — what does it mean to say a vector space is finite-dimensional? Did we pick a basis even to make that statement?

This is something to do with constructivism.

I am not a big fan of constructivism, as many of you will know. I think that the emphasis placed on it by the computer theorem proving community historically has held the area back; it puts off mathematicians (e.g. the 2012 version of me) who have been indoctrinated by a traditional mathematics course which assumes the law of the excluded middle from day one. One problem I have with constructivism, as someone who was classically trained, is that it turns out that sometimes there is more than one way of doing something constructively, and all these ways are the same classically. For example very early on in my Lean career I was surprised to learn that there was function.bijective, the predicate that a function was a bijection, but also there is the concept of a function with a two-sided inverse. As far as I’m concerned these are the same thing, but constructively they’re not, because given a bijection there might not be a “formula” for its inverse. The existence of a two-sided inverse is a true/false statement — but actually having the two-sided inverse is data, and, in constructive mathematics, data can sometimes be hard to come by. The function which takes as input a set/type X and a proof that X is nonempty, and outputs an element of X, is a noncomputable function and its existence, if you think about it, is closely related to the axiom of choice, which is something that the constructivists are not big fans of.

So it turns out that this whole “V is finite-dimensional” thing, which this entire post has assumed all the way through, is a victim of this subtlety. What does it mean for a vector space to be finite-dimensional? The following answers are all the same classically (i.e. in “normal maths”), but constructively they’re all different:

  • A proof that V has a finite basis;
  • An actual choice of a finite basis;
  • An element of the truncation of the set of all pairs (n,\alpha), where n\in\mathbb{N} and \alpha:V\cong\mathbb{R}^n is an isomorphism. Here by the truncation of a set we mean the quotient of the set by the “always true” equivalence relation. Yes, you read that right.

OK so we know what the first two things are. The first statement is just a proof. If your proof is nonconstructive that’s fine, I don’t care. The second thing is data. For me this is a problematic definition of finite-dimensional, precisely because it’s not a proof of a true-false statement. If I am working with a finite-dimensional vector space V in Lean then I might end up having to deal with the fact that if some argument changed V‘s basis whilst leaving V alone, I might not have V = V (as finite-dimensional vector spaces) any more, because the data going on behind the scenes saying that V is finite-dimensional might not match up. I have enough problems formalising mathematics in type theory without having to deal with this too.

This brings us to the third definition, involving trunc. OK so if X is a type or set or however you want to set up your mathematics, then, as I mentioned above, trunc X is the quotient of X by the equivalence relation which is always true. In particular if X is nonempty then trunc X has one element/term, and if X is empty then it has no elements/terms. If you’re happy about the concept that propositions can be types, with the idea that a true proposition has one term (its proof), and a false proposition has no terms, then trunc X seems to be basically the proposition that X is nonempty. However it is more than that, because it is data. It is the missing link in our story.

Let’s say that V is a real vector space, and we have a “proof” that it is finite-dimensional in the sense that we have a term t of type trunc X(V), where X(V) is the space of all pairs (n,\alpha:V\cong\mathbb{R}^n). Here’s how to define the trace of an endomorphism \phi:V\to V. We’re going to define a map from trunc X(V) to the reals, and the idea is that if you evaluate this map at t then you’ll get the trace of \phi. Now to define a map from a quotient, we use the universal property of quotients. First of all we have to define a map from the space we’re quotienting by to the reals. This is easy: given an isomorphism \alpha:V\to \mathbb{R}^n we get an induced map \alpha_*\phi:\mathbb{R}^n\to\mathbb{R}^n and we just take its trace with respect to the standard basis. And secondly, we need to prove that this map descends to the quotient, which boils down to proving a theorem, and the theorem of course turns out to be precisely the statement that the trace of an endomorphism is independent of choice of basis.

Similarly we can define a function from trunc X(V) to the naturals, such that if V is finite-dimensional in the sense that we have t : trunc X(V) then its dimension is what you get by evaluating this function. And det and char poly etc.

Note that for any vector space V, the type trunc X(V) has at most one term — it’s a so-called subsingleton type. Lean’s exact command will be fussy about subsingleton types because it’s a theorem that two terms of a subsingleton type are equal, rather than it being a definitional equality. Hence Lean’s exact tactic still might not work if we’re carrying around t : trunc X(V) as our definition of finite-dimensionality. However the convert tactic will operate fine, because it looks out for subsingletons and applies the appropriate theorem.

Type-theoretic conclusion

We now seem to have got to the bottom of this. To do this kind of metamathematics — “did we pick a basis?” — we need to think really carefully about what we even mean by the assumption of V being finite-dimensional. My preferred approach, and the one which makes mathematics easier to do in Lean, is to just use the propositional definition “there exists a basis”. This way one never runs into problems with equality, and if one needs a basis one just applies Lean’s version of the axiom of choice. Constructivists would complain that it breaks computation, but I want to prove theorems and I prefer to do computations using tactics rather than trying to persuade the kernel to reduce things. The other approach is to carry around some extra data, and this can lead to problems for beginners with equality being a bit broken, however the experts have learnt ways to deal with this. Ultimately the best choice in a theorem prover will depend on what you actually are doing with your objects, and given that I just want to prove theorems, for me, the bare existence definition of finite-dimensional is enough. However to move from the Prop world to the Type world, e.g. when defining the trace of an endomorphism, one has to at some point do some magic, and the moment where this happens is the moment you picked a basis.

digital stones
Posted in computability, undergrad maths | Tagged , | 16 Comments

Induction on equality

Or: how I learnt to stop worrying, and figured out a proof that equality was an equivalence relation.

What is the definition of = ?

I used to teach a course where I defined the notion of what it meant for two integers to be congruent modulo N. Here N is an integer, and two integers a and b are said to be congruent modulo N if their difference is a multiple of N. For example, 37 and 57 are congruent modulo 10.

I would go on to prove that congruence modulo N is an equivalence relation on the integers. Reminder: an equivalence relation on X is a binary relation on X which is reflexive, symmetric and transitive. The proof I gave never assumed that N was non-zero, and congruence modulo 0 is the same relation as equality, so you might like to deduce from this that equality on the integers is an equivalence relation. Well, tough darts. In the proof that congruence modulo N is an equivalence relation, it turns out that we assume that equality is an equivalence relation, as you can readily check if you type it into a theorem prover.

So how do we prove that equality is an equivalence relation? In my 1st year undergraduate lecture notes this is stated as an “obvious example” of an equivalence relation. Euclid was more cautious — he explicitly noted that he would be assuming transitivity and reflexivity of equality in his common notions 1 and 4, and symmetry followed from his use of language — he would say “this collection of things are equal” e.g. “all right angles are equal”: so equality of two things was a predicate on unordered pairs for him, and symmetry was hence built in.

I also looked at my 3rd year logic and set theory notes, to try and figure out the definition of = , but they were also no help. There is some waffle about “the interpretation in a model is the diagonal of A \times A” (which I think might be circular) and how x = y means that the two terms x and y are actually the same term, but don’t get me started about what mathematicians mean when they say two things are “the same”, e.g. “the same proof” or “canonical isomorphism is denoted by =” or all the other highly useful and yet irritatingly non-rigorous stuff we say. Anyway, just “defining” equality to mean another word or phrase which is synonymous with equality isn’t going to get us anywhere. Ultimately my impression is that you might just assume that equality is an equivalence relation, when setting up mathematics via first order logic and ZFC set theory. I’m no logician though — feel free to correct me in the comments. This post is about a different way to approach things, which part of me finds a whole lot more satisfactory.

A definition of equality.

Lean’s type theory contains a definition of equality! This is great because it means we know where we stand. Here’s the definition:

inductive eq {X : Type} : X → X → Prop
| refl (a : X) : eq a a

infix ` = `:50 := eq

What does all that baloney mean? Well, this is an inductive definition of a binary relation. Let X be a type or a set or however you want to think of a collection of stuff. The binary relation eq on X takes as input two things in X, call them a and b, and spits out a true-false statement eq a b. We’ll get to the definition in a second, but that last line means “the notation a = b is defined to mean eq a b, with BIDMAS-power 50”. Let’s use the = notation from now on instead of eq, because it’s familiar-looking.

So, let’s get to the heart of the matter: how do we define a = b? Well, Lean’s definition says the following: “there is only one tool we have to prove the equality of two things, and it’s the theorem refl a, which is a proof of the statement a = a. That’s it.” Thanks to Andrej Bauer who on Twitter pointed out that a neat way to think about this definition is: equality is the smallest binary relation which is reflexive. Andrej also tells me that this is Martin-Löf’s definition of equality.

OK great. We have now apparently defined the true-false statement a = b, and we can prove that a = a is true, so this is a good start. But how the heck are we going to go from this to symmetry and transitivity? We’re going to use induction!

Inductive types: a reminder

In my last post, I waffled on about inductive types. I was going to talk about equality in that post, but it had already got quite long, so I thought I’d deal with equality separately, and so here we are. The take home message from the last post was that if you define something as an inductive type in a type theory like Lean’s, then Lean automatically generates a new axiom in the system called the recursor for that type. This axiom is generated automatically by the rules of the calculus of inductive constructions. For example, if you define the natural numbers inductively using Peano’s axioms, then the recursor is just the statements that induction and recursion are true (propositions are types in Lean, so one function can be interpreted in two ways depending on whether you apply it in the Prop universe or the Type universe). The recursor attached to an inductive type is a way of saying “if you want to do something to every instance of the thing you’re defining, it suffices to do it for all the constructors”, and a special case of it is the “inductor”, which says, in our case, that if you want to deduce something from a = b then all you have to do is make sure you can prove it in the special case a = a. Formally, the inductor for = is the following statement:

eq.inductor (X : Type) (R : X → X → Prop)
  (h : ∀ x, R x x) (a b : X) : a = b → R a b

Note the fancy type theory R : X → X → Prop for what a mathematician would call “R is a binary relation on X” (reason: R : X → X → Prop means R\in \mathrm{Hom}(X,\mathrm{Hom}(X,\{\tt{true},\tt{false}\})) in maths speak, i.e., R is a true-false statement attached to every pair of elements of X). So, in words, the inductor says that if you have any binary relation R on X such that R x x is true for all x, and if you know that a = b, then R a b is true. That’s the tool we have. Again let’s go back to Andrej’s observation: the definition of equality can be thought of as saying that it is the smallest (or the initial) binary relation which is reflexive, and the inductor then makes this rigorous by saying that any other reflexive binary relation must contain equality.

Summary : we are armed with two things, and two things only. (1) a proof of ∀ a, a = a and (2) a proof that if R is a binary relation on X and ∀ x, R x x is true, then a = b implies R a b.

The game now is to prove that = is symmetric and transitive, without accidentally assuming it! And what better framework to play that game in, than Lean! (because trust me, if you play it on paper, you are so going to mess this one up).

The equivalence relation game

Insert coin. That’s hard mode, i.e. spoiler-free. Copy and paste the code at the other end of that link into VS Code if you have Lean installed locally if you want a much speedier experience. The game is to prove symm and trans from refl and ind. I would work out the maths proofs first.

If you want no spoilers at all, stop reading now. I usually post some art generated by my children in my blog posts so perhaps now is a good time to do this. If you want to hear more about equality, and in particular have one or more hints and spoilers, read on.

Tactic hint alert

refine is a great tactic. It’s like apply on steroids. You can do everything with intros, apply, refine and exact. You can read about what these tactics do here, in Lean’s documentation. All Lean proofs can be done in just two or three lines.

Hint alert

Life is easier if we have more tools than just ind. The thing about ind is it wants as input a binary relation. A simpler object than a binary relation on X is a subset of X, or equivalently its characteristic function P : X → Prop (sending the elements of the subset to true and the other elements to false). Here’s something which you believe about equality: if a = b and P a is true, then P b is true. This is a much cleaner statement than ind — it’s just as powerful, and more flexible. Logicians call it the substitution property of equality. Applying it is the way Lean’s rewrite or rw tactic works. Why not try to prove it first? It makes the proof of trans much easier — it’s basically a master sword. Here’s the statement. Just cut and paste it before symm. Of course, you’ve got to prove it (using ind and refl).

theorem subst (hab : a ∼ b) (P : X → Prop) : P a → P b :=
begin
  sorry
end

Mathematical solutions

Lean solutions

Posted in M1F, M40001, Type theory, undergrad maths | Tagged , | 8 Comments

Induction, and inductive types.

Here’s a conundrum: why does a schoolkid think recursion is a triviality (“the Fibonacci sequence is defined like this: 1,1,2,3,5,8,…, with each number in the sequence the sum of the two before it” — this is a definition very likely to be accepted by a schoolkid) but they might struggle with the principle of mathematical induction (a procedure for generating a sequence of proofs, each one a consequence of the one before it)? I was taught at university that induction and recursion were different things, but Lean has a common abstraction which makes them both instances of the same thing. In this post I’ll talk about induction, and recursion, and also about how something called the Calculus of Inductive Constructions (the way Lean’s inductive command works) provides some rather surprising proofs of basic mathematical facts.

Defining a type.

It is not unreasonable to think of what Lean calls a type as what a mathematician would usually call a set. For example a mathematician might say “let S be a set with three elements; call the elements a, b and c“. Here’s how to make that set, or type, in Lean:

inductive S : Type
| a : S
| b : S
| c : S

In fact the full names of the elements of S, or the terms of type S as we call them in type theory, are S.a, S.b and S.c, and we might want to open the S namespace so that we can just refer to them as a, b and c. An undergraduate might want to make this kind of definition when they are answering the following kind of question in Lean: “Let f : X \to Y and g : Y \to Z be functions. True or false: if the composition g \circ f is injective, then g is injective”. This is false, and to prove it’s false one can either use the sets we mathematicians have lying around (such as the naturals, integers or reals), or one can just build some explicit sets of small size like S above, and some explicit functions between those sets.

Defining functions between types.

So here’s what we’re going to do. Let’s make a type X with one term p, a type Y with two terms q and r, and a type Z with one term s. This is easy given what we’ve already seen:

inductive X : Type
| p : X

inductive Y : Type
| q : Y
| r : Y

inductive Z : Type
| s : Z

[By the way, if you want to play along but you haven’t got Lean installed on your computer, you can do all this within a web browser by clicking here (although you could instead click here to find out how to install Lean and the community tools, which give you a far slicker experience).]

Our counterexample is going to be the following: We define f : X → Y by f(p)=q and g : Y → Z by g(q)=g(r)=s. Let’s do this.

open X Y Z

def f : X → Y
| p := q

def g : Y → Z
| q := s
| r := s

As a mathematician I find the use of the | symbol quite intimidating (especially that we are now using it in a different way), but given that I’ve told you what we’re doing and now I’m telling you how we are doing it, you can probably guess what it all means. One can now go ahead and state the result:

open function

example : ¬ (∀ (X Y Z : Type) (f : X → Y) (g : Y → Z),
  injective (g ∘ f) → injective g) :=
begin
  sorry
end

and now if you fancy proving a mathematical theorem by playing a puzzle game, you can click here to get all the code at once, and have a go. Instead of talking about the proof though, I want to talk about the rather surprising (at least to me) fact that Lean is defining f and g by recursion.

What happens when we define an inductive type?

What happens under the hood when Lean sees this code

inductive X : Type
| p : X

is quite surprising (at least to me as a mathematician). I’ve been arguing above that we should think of this code as saying “Let X := \{p\} be a set with one element”. But here’s what’s really going on when Lean sees this code. Unsurprisingly, Lean defines a new type X and a new term p (or more precisely X.p) of type X. It also defines one more new thing, which expresses that p is the only element of X. But the way it does this is surprising: it defines the so-called recursor for X, which is the following statement:

X.rec : ∀ {C : X → Sort u}, C p → (∀ (x : X), C x)

Whatever does that mean? Well first I think I’d better explain what this Sort u thing is. I’ve written in the past an explanation of how sets and their elements, and theorems and their proofs, are unified in Lean’s type theory as types and their terms. The sets/elements story goes on in the Type universe, and the theorems/proofs story goes on in the Prop universe. When Lean says Sort u it means “either of these universes”. So we can rewrite X.rec as two statements:

X.recursor : ∀ {C : X → Type}, C p → (∀ (x : X), C x)
X.inductor : ∀ {C : X → Prop}, C p → (∀ (x : X), C x)

The first statement is the principle of recursion for X. In set-theoretic language it says this. “Let’s say that for every element x of X we have a set C(x), and let’s say we have an element of C(p). Then we have a method of constructing an element of C(x) for all x \in X.” This looks like a rather long-winded way of saying that p is the only element of X. In fact it is worth looking at the special case of X.recursor where C is the constant function sending every element of X to the set S:

X.recursor_constant : ∀ S, S → (X → S)

This says that if S is any set, and we have an element a of S, then we can get a function from X to S. What is unsaid here, but true by definition, is that it’s the function that sends p to S, as can be checked thus:

-- give X.rec the constant function C sending everything to S
def X.recursor_constant : ∀ S, S → (X → S) := λ S, @X.rec (λ x, S) 

example (S : Type) (a : S) : (X.recursor_constant S a) p = a :=
begin
  -- true by definition
  refl
end

Do you remember our definition of f above?

def f : X → Y
| p := q

This function f is defined using X.recursor_constant, letting S be Y and letting the element of S be q. The notation Lean uses is short, but under the hood this is how f is constructed.

So much for recursion. The second statement coming from X.rec is X.inductor, the principle of induction for X. I should probably say that I made up the word “inductor”, but inductor is to induction as recursor is to recursion. In more mathematical language the inductor says this. “Let’s say that for every element x of X we have a true-false statement C(x), and let’s say that C(p) is true. Then C(x) is true for every element x of X.” So again it is just a rather long-winded way of saying that p is the only element of X.

Why have computer scientists isolated these rather elaborate statements as the fundamental way to say that p is the only element of X? It’s actually because of a fundamental symmetry. We have defined a new type X, in a functional programming language, and now the fundamental thing we need to do next is to explain how to define functions into X, and how to define functions out of X. To define functions into X, we need to have access to terms of type X, or in computer science lingo to constructors of X. This is exactly what X.p is — a way to construct a term of type X. To define functions out of X, we need access to eliminators for X, that is, some kind of gadget whose output is a function from X to somewhere else. Because X only has one term, namely p, we need a way of saying “to give a function out of X, we only need to say what happens to p“, and this is exactly what the recursor is doing. Between them, the constructor and recursor say in a formal way that the elements of X are “at least p, and at most p, so are exactly p.”

Recursors for general inductive types.

Lean automatically generates constructors and a recursor for every type defined with the inductive command. There is a general rule for how to do this, but informally it’s pretty clear. We define inductive types using this | symbol, and you get a constructor for each line with a | in. The eliminator or recursor simply says that to define a function from the new type you’re defining, all you have to do is to make sure you’ve defined it on each constructor.

The rest of this post is the fun part. I will go through a bunch of inductive types defined in Lean, we can look at the definition, figure out the recursor attached to each of the types, and then see what this corresponds to mathematically. We will see some familiar things popping up in surprising ways.

Example: a set with two elements.

Recall our inductive type Y:

inductive Y : Type
| q : Y
| r : Y

The recursor for Y tells us that if S is a set, then to get a map from Y to S we have to give two elements of S, one corresponding to where q goes and one corresponding to where r goes.

def Y.recursor_constant : ∀ S, S → S → (Y → S) := λ S, @Y.rec (λ y, S)

The full recursor can even be used (with non-constant C) to define a function from Y which sends q into one type and r into a different type, but when defining the function g above we do not need this level of generality. If you want to see what Y‘s recursor looks like, just type #check @Y.rec in a Lean session after the definition of Y, and remember that Π is just computer science for (in Lean 4 they will be using instead of Π in fact).

Example: the natural numbers.

Mathematicians who have seen the development of mathematics in ZFC set theory know that a complex number is defined to be a pair of real numbers, a real number is defined to be an equivalence class of Cauchy sequences of rational numbers, a rational number is defined to be a multiplicative localisation of the integers at the nonzero integers, an integer is defined to be an additive localisation of the naturals at the naturals, and the naturals are defined by the ZFC axiom of infinity. In Lean’s type theory a complex number is defined to be a pair of real numbers, a real number is defined to be an equivalence class of Cauchy sequences of rational numbers etc etc etc, and it’s all just the same up to the very end, because in Lean the naturals are defined by Peano’s axioms:

inductive nat : Type
| zero : nat
| succ (n : nat) : nat

This means that we have two ways to make natural numbers. First, zero is a natural number. Second, if n is a natural number, then succ n (usually called n+1 by mathematicians) is a natural number. Now we need a way of expressing the idea that this is the only way to make natural numbers, and this is the recursor, which is automatically generated by Lean, and says a precise version of the following informal thought: “If you want to do something for all naturals, then you need to tell me how to do it for both constructors”. In other words, “…you need to tell me how to do it for zero, and then you have to tell me a way to do it for n+1 assuming we’ve already done it for n“. Sounds familiar?

The recursor in general involves a map to Sort u. Let’s just specialise to the two universes we’re interested in, and take a look at the constant recursor, and the inductor (and let’s use Lean’s notation for nat):

nat.recursor_constant : ∀ (S : Type), S → (∀ (n : ℕ), S → S) → (ℕ → S)
nat.inductor : ∀ (C : ℕ → Prop), C 0 → (∀ (n : ℕ), C n → C (succ n)) → ∀ (n : ℕ), C n

[The proof of nat.recursor_constant is λ S, @nat.rec (λ n, S) and the proof of nat.inductor is just @nat.rec. ]

The constant recursor says this: if S is a set, and we want to make a function f : ℕ → S, here’s a way of doing it. First we need an element of S (namely f(0)) and second, for each natural number we need a map from S to S (telling us how to make f(n+1) given that we know f(n)).

The inductor says this. Say we have a family C(n) of true-false statements, and that C(0) is true, and that for all n we have a proof that C(n) implies C(n+1). Then we can deduce that C(n) is true for all n.

What I think is really cute about this example is that Peano’s definition of the natural numbers makes it immediately clear why the principle of mathematical induction works. In the natural number game we use the recursor in the background to define addition and multiplication on the naturals. We also use it to prove things which I call “axioms” in the natural number game — for example the proof that 0 is not equal to succ n for any natural number n uses the recursor to define a function from ℕ to \{0,1\} sending 0 to 0 and succ n to 1, and using this function it’s easy to prove the “axiom” zero_ne_succ by contradiction. If you want an exercise, try using nat.recursor_constant to prove injectivity of the succ function, something else I also claimed was an axiom in the natural number game (as Peano did) but which was actually proved using the recursor.

Example : false

false is a true-false statement, and you can probably guess which one it is. In Lean false is defined as an inductive type! Here’s the full definition:

inductive false : Prop

This time there are no | s at all! Every constructor of false would be a proof of a false statement, so this design decision is not surprising. The recursor is

false.rec : Π (C : Sort u), false → C

In other words, to give a map from false to C you have to define it on all constructors, of which there are none. Let’s take a look at the inductor then, by changing Sort u to Prop:

false.inductor : ∀ (P : Prop), false → P

It says that if P is any true-false statement, then false implies P. This logical tautology has been automatically generated by Lean, because Lean’s model of an implication Q\implies P is a function from proofs of Q to proofs of P, and false has no terms, i.e., no proofs.

There is a similar story with inductive empty : Type, Lean’s definition of the empty type. The recursor for empty says that to give a map from the empty type to any type S, you don’t have to do anything other than feed in S.

Example : or

The logical or on propositions is defined as an inductive type in Lean!

inductive or (P Q : Prop) : Prop
| inl (hp : P) : or
| inr (hq : Q) : or

There are two constructors for P ∨ Q, where now I’m using the usual logicians’ notation for or. In other words, there are two ways to prove P ∨ Q. First you can prove P, and second you can prove Q. Lean’s auto-generated inductor for this is

or.inductor : ∀ (P Q R : Prop), (P → R) → (Q → R) → (P ∨ Q → R)

In other words, if you can prove P\implies R and you can prove Q\implies R, then you can deduce (P\lor Q)\implies R. Again no mathematician is surprised that this statement is true, but perhaps some are surprised by the fact that a computer scientist might claim that this is true by induction on or.

Example : equality.

The = symbol in Lean is defined as an inductive type! But I think I’m going to save the topic of what induction on equality is until the next post, where we will prove, by induction, that equality is an equivalence relation.

Some final notes.

I was very surprised when I realised that every inductive type came with a principle of induction. In fact one can even define the reals as an inductive type, which means that there will be an inductor for reals meaning that you can do induction on the reals! But when I figured out what the induction principle said I was disappointed — it says “if you can prove it for every real which is an equivalence class of Cauchy sequences of rationals, you can prove it for every real”. Remember that the idea of the recursor is that it is a way of saying “every term of your type can be made using the constructors”, so if your only constructor for a real is an equivalence class of Cauchy sequences of rationals then this is what you get. However these other examples, and in particular these examples coming from logic, are quite funky. An example I didn’t talk about: and is an inductive type and its inductor is ∀ (P Q R : Prop), (P → (Q → R)) → (P ∧ Q → R), which is some propositional version of uncurrying (indeed the constant recursor for prod, the product of two types, is uncurrying on the nose). The basic facts in propositional logic about and and or and are proved constructively in Lean using recursors rather than by truth tables, because directly constructing the functions corresponding to the proofs is more appealing than a case split.

Not everything is an inductive type in Lean — there are two other kinds of types. There are quotient types, which are there for some kind of computer science efficiency reasons and which could be constructed using inductive types, and then there are function types, which are a different kind of thing. I don’t think it’s of mathematical interest whether they type you’re looking at is an inductive type or a function type, but here’s an example of a function type: logical not. In Lean, ¬ P is defined to be P → false. On the other hand most of the structures used by mathematicians (groups, subgroups, rings, fields, perfectoid spaces and so on) are defined as inductive types (often however with one constructor, so their induction principle is boring). An inductive type with one constructor is known as a structure in Lean. You can read more about inductive types and structures in the very wonderful Theorem Proving In Lean, in sections 7 to 9.

In my next post I’ll talk about induction on equality.

My daughter is busy with exam revision, so here’s some old digital art by one of my sons.

Posted in Learning Lean, Type theory | Tagged , | 9 Comments