Lean Together 2021

Thank you so much to Rob Lewis and Patrick Massot for organising Lean Together 2021! This annual Lean conference, like mathlib, seems to double in size every year. This year over 200 people registered, and even though it was all online and there were a lot of talks spanning everything from research level mathematics to hard-core computer science (so nobody would understand all of them), there still seemed to be between 50 and 100 people attending every talk. I attended all of them, but certainly didn’t follow all of them; I’ll discuss some the talks that I feel qualified to talk about. Anyone jealous that they missed out can catch up with essentially all of the talks on the leanprover-community youtube channel.

Most or all of the talks mentioned below have slides and/or video available. Check out the schedule for links.

Overview of some of the talks.

Do your theorems use all their assumptions? Is your theorem about fields actually true more generally for division rings? Or even rings? Or even semirings? Lean now has a linter which will check these things for you, written by Alex Best and showcased in his talk. This is important for Lean’s maths library because, like Bourbaki, we strive to prove theorems in the correct generality, and now we have a tool which checks that this is happening.

Floris van Doorn has formalised the theory of the Haar measure in Lean. This is the sort of thing which really excites me. Haar measure is a measure on a locally compact Hausdorff topological group which is translation-invariant. I read the proof of existence and uniqueness of Haar measure when I was teaching an old course on Tate’s thesis. The proof is long, and tricky in places, but fairly low-level. It’s just the kind of thing which one might imagine “sure this could be checked by a computer — in theory”. Floris has shown that with Lean it can be checked by a computer in practice. The systems work and this is why we mathematicians should be using them.

Heather Macbeth did something equally great even though to a non-expert it might look less impressive. She gave the n-sphere the structure of a manifold. The reason this is important is that it’s essentially the first non-trivial example of a manifold that we have. Manifolds are really complicated! Sorting out all the theory of charts took a while, and the fact that Heather can make a sphere into a manifold is, for me, strong evidence that the Lean analysts (Heather included) have got the set-up right. Future work: use the inverse function theorem to make zeros of functions in affine space into manifolds. This stuff is hard work. But it’s happening.

I remember when the world wide web was born. Academics made web pages, which consisted of text, pictures, and hyperlinks to more web pages. People could see the potential, but it took many more years before the web became accessible to the masses, and web pages became powerful and flexible tools. Ed Ayers talked about widgets, his addition to core Lean which enables pictures and links to be embedded in Lean output in VS Code, and enables the user to inspect the types of terms and subterms in Lean output, jump to definitions and more. This has got huge potential, and it will be interesting to see it realised. Students on the Xena Discord have already used them to make graphical logic games. There is more to come here.

Chris Hughes talked about his group theory tactic. It is now fast, and solves problems in group theory of the form “show this finite set of equations implies this one”. In fact he has written several tactics which do this; time will tell which techniques are most useful in practice.

The blue-eyed islanders puzzle is something we’ve talked about on the blog before. One has to think hard about how to formalise the question mathematically. Paula Neeley talked about her work formalising a framework for dynamic epistemic logic in Lean, which will enable us to work on questions such as this in a much less ad-hoc way.

Kenny Lau talked about his work formalising the Deligne/Fontaine–Wintenberger/Scholze theory of tilting, part of his Imperial MSc project. Working with Kenny has been an interesting experience. When Massot, Commelin and I were working on perfectoid spaces we formalised a very general definition of a valuation, taking values in an arbitrary totally ordered abelian group with zero; this is what is needed to formalise the theory of adic spaces. For the theory of tilting, Kenny and I have realised that actually we need an even more general theory — in Deligne’s paper on fields of characteristic p which are “limits” of fields of characteristic zero, he uses valuations taking values in certain monoids with zero. I’ve been working on refactoring the definition and we are nearly there.

Yasmine Sharoda talked about her work on building algebraic hierarchies. This is a very “computer science” problem, and quite hard to explain to mathematicians. Basically mathematicians carry around their head a big web of trivial facts — a scheme is by definition a locally ringed space, which is by definition a ringed space, which is by definition a topological space. A ring is by definition a semiring, a monoid, an additive group, and additive semigroup, and so on. Computer programs like Lean need some kind of organisational principle to mean that mathematicians can move freely between these ideas without noticing that they’re doing so — either that, or libraries suddenly experience an irrelevant growth factor when the theorem that 0+x=x has to be proved for additive monoids, groups, semirings, rings, division rings, integral domains and fields. This is a problem which I do not understand the details of, but I have now seen sufficiently many talks on it to understand that computer scientists are concerned about this issue. I think they are in some sense planning for the future though — Lean has no trouble working with schemes or fields right now.

Adam Topaz talked about his work formalising theorems in anabelian geometry. There is a problem here — beyond a certain point one needs the theorems of local and global class field theory, and the proofs of these things are long and difficult. My proposed solution: let’s just formalise the statements, and worry about the proofs later. The sooner we start doing interesting maths like what Adam is attempting, the better.

Peter Nelson talked about formalising matroids. This is hard, for the wrong reasons. It seems to me that type theory is ideal if you are formalising things which can be expressed in the language of category theory (i.e., if you have objects and morphisms). Matroids are combinatorial objects and a fundamental construction stressed by Peter was that of a minor of a matroid. This is some kind of subobject, but there seems to me to be no natural general notion of a morphism of matroids. One needs to talk about minors of minors freely, and this can start to get convoluted if one is not careful. One wonders whether this kind of stuff would be easier to do in a set-theory based system. One approach would be the “Coq odd order theorem” approach, where you work in a large matroid and everything is some kind of submatroid of this (groups in the Coq odd order work were all subgroups of a large ambient group which had no mathematical relevance — it was just a way to hold things together). But I really don’t want to go down this route. Kenny Lau seems to have solved the problem of subfields of subfields using an elegant is_scalar_tower structure; I wonder if a similar idea can be used here. I’ve realised that matroids are a great example to bear in mind when one is thinking about solving these type-theoretic issues practically.

Marie Kerjean talked about her work on the foundations of complex analysis in Coq. We have essentially no complex analysis in Lean, although there is a ton of the stuff in Isabelle/HOL. I’m sure that the Lean community will be able to learn from her efforts when we turn our attention to these matters.

Damiano Testa is well on the way to proving that the image of a constructible set under a finite type morphism of schemes is constructible. What we are missing right now is the definition of a product of schemes, so he has been working in the affine case for now and has essentially finished that part. I remember ploughing through the construction of a product of schemes in Hartshorne when I was a PhD student. It will be fun to do it in Lean — this is the kind of stuff which should turn out nicely.

Amelia Livingston talked about the nature of universal objects. This talk had some ideas in which I’m still wrestling with. Here’s a funny observation, which goes back to Patrick Massot. Say R is a commutative ring and S is a multiplicative subset. How does one prove that the kernel of the canonical map R\to R[1/S] is the annihilator of S? This can’t be hard. I don’t know how to deduce it directly from the universal property of R[1/S] though. The only proof I know involves having to make the usual explicit construction of R[1/S] as R\times S/\sim and to prove it for this specific model. Is this a surprise? Well, if I told you that you could assume that the real numbers were a complete ordered archimedean field (the universal property) and then I asked you to prove a fact about the real numbers, and it turned out that to prove it you had to use the construction of the reals as, say, Cauchy sequences, I think you would be surprised. The universal property is supposed to characterise the object up to unique isomorphism and save us the trouble of having to build it every time we need it. Amelia in her talk explained that she had discovered something even weirder. Despite the fact that mathlib has both a construction of the tensor algebra T(M) of an R-module M and a proof of its universal property, the statement that M^{\otimes n}\to T[M] is injective seemed to follow from neither the universal property nor the construction in mathlib, so she had to make a second construction. This is like coming up with an assertion about the real numbers which you can’t prove from the axioms, or from the construction as Cauchy sequences, so you have to make them again as Dedekind cuts before the result becomes clear. What is going on here? Amelia is working on Koszul complexes in Lean as part of her MSc project and this is where the issue came up.

Alena Gusakov, one of the moderators of the Xena Discord, talked about her work with Bhavik Mehta (another moderator) and Kyle Miller on formalising all the forms of Hall’s Marriage Theorem that you’ve ever heard of. Graph theory is another one of these objects which don’t naturally fall into the arms of category theory (what is a morphism of graphs? I did an entire graph theory course as an undergraduate and saw some deep and tricky theorems, but this concept was not mentioned once). Like matroids, graphs get manipulated (edges get removed or contracted) rather than morphismed, and for some reason this makes them more difficult to use in type theory. What is going on here? This deserves a better explanation.

Thomas Browning and Patrick Lutz are PhD students at UC Berkeley, and, building on work of undergraduates at Imperial, they have formalised the fundamental theorem of Galois theory in Lean and are on their way to proving insolvability of the quintic, problem 16 of the Formalizing 100 Theorems challenge. To my astonishment, this is one of the problems which has not been formalised in any theorem prover, although Cyril Cohen tells me that Sophie Bernard, Pierre-Yves Strub, Assia Mahboubi and him are working on it in Coq. What I think is interesting to note here is that Cyril, Assia and Pierre-Yves are professional computer scientists who are working on this in Coq presumably with the idea of getting a publication out of it, whereas Thomas and Patrick are mathematics PhD students specialising in “normal” mathematics, who spent a few months learning how Lean worked and are proving the theorem in their spare time for fun. This is an interesting indication of the different cultures at work here.

There were other talks, but on things I understand less well. Yury Kudryashov talked about dynamics of the circle. Mario Carneiro talked about Metamath 1. Vaibhav Karve talked about axiomatising various theories he’s been thinking about such as model theory. Jannis Limberg talked about tactic-writing. Stan Polu and Jason Rute talked about Lean and AI (including demonstrations of an AI program solving levels of the natural number game!), and Koundinya Vajjha, Joe Hendrix and Logan Murphy all talked about formalisation projects which are way beyond my area of expertise.

Oh, and Leo de Moura and Sebastian Ullrich released Lean 4, and gave several talks demonstrating what it did. A lot of people were excited about this, but for me Lean 4 is currently unusable (there is no maths in it). I’m not going back to proving (x+y)^3=x^3+3x^2y+3xy^2+y^3 directly from the axioms of a ring, and unfortunately I am not capable of doing any of the things which are required in order to make Lean 4 have a ring tactic. However, there were a bunch of people at the conference who clearly are capable of doing these things, including Daniel Selsam, who is thinking about how one can import compiled Lean 3 files into Lean 4 (Lean 3 and Lean 4 share the same type theory). Once we have this, I will be extremely interested. And hopefully it won’t be too long.


I counted no fewer than four occurrences of the phrase “it’s fun!” in various slides in the talks, and there could have been more. This was within the context of people asking “why formalise X?”. Typically it would be one of several answers. I did not speak at the conference, but had I done so I suspect there would have been a fifth occurrence. Imagine people putting “it’s fun!” in the slides in a traditional number theory talk! Sure it’s interesting to do research into modern number theory — this was something I did with great verve and enthusiasm for nearly 25 years, and I suspect I probably was having fun some of the time. But “fun” is not a word I ever remember running into in a presentation. Why are people highlighting it here? Perhaps we are more conscious of the fact. Is it guilt that it’s supposed to be work but actually we’re really enjoying it? Sometimes I wake up in the middle of the night and want to get out of bed and formalise. Maybe “it’s addictive” is closer to the truth?

Whatever the real reason is behind it, Lean’s maths library continues to grow. In 2020 several more professional number theorists started formalising in Lean (Riccardo Brasca, Damiano Testa, Marc Masdeu, Filippo Nuccio) and stuff like the theory of cyclotomic polynomials has just appeared organically in mathlib as a result of this. Johan Commelin continues to work tirelessly on this stuff too; my PhD student Ashvni needed Bernoulli numbers for a project she’s working on, and it turned out that Johan had formalised them a few months earlier. As long as things like this keep happening, mathlib will more and more start to look like a respectable basis for a lot of MSc level mathematics. Perhaps my forthcoming Lean course for PhD students (starting in 10 days; more info to come) will get us a few more converts.

Thanks once again to Patrick and Rob for organising the conference.

My daughter says she hasn’t got any art ready, so here’s something one of my sons did a few years ago when he was learning Blender.

Posted in Uncategorized | Tagged | 6 Comments

The end of the summer.

So, what’s been happening? Well, the Lean 4 developers told us in mid-June that Lean 4 would be released at “the end of the summer“. And on Monday Lean 4 got released! The corollary is that it’s the end of the summer now, so it must finally be time to talk about the summer projects, and some of the things which happened since then.

Summer projects

In stark contrast to the 2018 Lean summer projects, the 2020 projects all happened online, on Discord. I had far less control of what was going on, and (something which I rather liked) I ended up working with a bunch of people, some of whom I didn’t know the real name, race or gender of. I say I ended up working with them — what happened in practice was that they worked with each other. A community was born, people did Lean stuff. People did other stuff too. The community still exists. I didn’t really expect that. Undergraduate mathematicians welcome. So, what got done?

Harun Khan proved that 144 was the largest square in the Fibonacci sequence in Lean, following an ingenious argument of Cohn from the ’60s. As far as I know this result has never been formalised.

Xiang Li formalised a crazy construction of the reals directly from the integers, called the Eudoxus reals. The idea is that the reals are equivalence classes of almost-linear maps from the integers to the integers (a function f: \mathbb{Z}\to\mathbb{Z} is almost linear if f(a+b)-f(a)-f(b) is bounded). Once the mathlib PR characterising the reals as the unique complete archimedean field hits, he’ll be able to get an isomorphism from his reals to mathlib’s reals for free.

Elizabeth Sun figured out which natural numbers were the sum of two squares. It’s great to see some basic number theory being done in Lean. I am a number theorist but for some reason a lot of my Lean work is in commutative algebra rather than the elementary stuff. Thank you Elizabeth and Harun for flying the number theory flag 🙂

James Arthur proved some theorems about some kind of twisted variants of sine and cosine. James is an undergraduate at Exeter and is currently in charge of marketing down there 😉 Thanks James.

People learnt about foundations. James Palmer, an undergraduate at Warwick, ended up writing an essay on it. Foundations are not something I know much about, but James is a maths/philosophy student so had a particular interest in it.

People made games. Kendall Frey made a Rubiks cube (although he’s not an undergraduate); Markus Himmel made formally verified Sudoku, and more recently Angela Li has made a tower of Hanoi. All these things are possible because of Ed Ayers’ widgets, part of Ed’s Cambridge university PhD thesis. Angela is now working on a 15 puzzle and thinking about “loopy” and other games from Simon Tatham’s puzzle game collection.

The summer has dragged on. People keep making stuff. Alena Gusakov and Bhavik Mehta worked hard on graph theory over the summer, initially formalising some of the problem sheets from the graph theory course Alena had attended, and now this work has culminated in a formal proof of Hall’s Marriage Theorem that Alena has put together with Bhavik Mehta and Kyle Miller! Alena gave a talk on that at Lean Together 2021 — more of that in the next post.

MSc projects

There was some sort of blurring between work done over the summer and work which is now becoming MSc projects. Chris Hughes made a group theory tactic! Given a collection of hypotheses, the tactic will try and prove another one. For example if a,b are elements of a group, the tactic will prove abab^2=1\implies ab=ba (try it online, or on paper if you’re still living in the past). What’s even better, after some discussions with Kyle Miller he made another tactic, implementing a different algorithm to solve the same problem. This work started over the summer but has now turned into his MSci project (he’s an MSc student at Imperial). Similarly Kenny Lau has defined the tilt of a perfectoid field, and Amelia Livingston has got a working API for tensor and exterior algebras and is on the way to Koszul complexes. All of these will become Imperial MSci projects.

All this has been a really positive change to my work life. For 20 years I’ve supervised MSc/MSci students in the maths department and many of the projects have been of the form “here’s a Springer GTM; now rephrase the relevant chapters”. I have grown tired of such projects. My colleague Toby Gee has started to give students research papers to read instead, which is brave, but still ultimately the results are a student giving an exposition of something which is there already. I now have students who are not only learning the material but also formalising it in Lean, and some of this stuff ultimately makes it into mathlib. My previous students have formalised schemes, group cohomology, and transcendence of e. The schemes work has quite a history, and several people were involved. Our account of what happened is now on ArXiv. It’s a joy to be writing papers with undergraduates and MSc students, and something I’m rather proud of. Before I moved into this area, such an idea was essentially inconceivable.

Undergraduate/PG teaching

I spent September trying to learn how to teach online. I made lectures for my undergraduate course, some of them cheekily using Lean. But Lean is not a compulsory part of my course; I do not think I really have the resources to teach almost 300 students how to use some complicated software; I have to rely on the fact that students will get interested and become engaged. In the introductory course, I teach the students about sets, functions and equivalence relations, and then Marie-Amélie Lawn teaches them about how to build the naturals and the reals from scratch in an axiomatic way. A lot of this material is very Lean-friendly. Welcome to the new Imperial members of the Xena community — Aksel, Jack, Deepro, Deniz, Jia, Archie, and all the other new 1st years who came along and whose names I’ve now forgotten. These kids show up on Thursday evenings on the Discord and we work through undergraduate problem sheets together. Is this good for them? Is it teaching them mathematics in a new way? Is it making them learn better, or understand the material better? I don’t know. Maybe! Is it fun? Yes. Does it change the way these undergraduates think about mathematics? Yes. I am convinced it makes them think more clearly. Students who engage with Lean seem to be more careful about their logic, and more careful about pointing out special cases. This can’t be a bad thing. But is this because they’re using Lean, or are they drawn to Lean because they are already the kind of people who think like that? I don’t know.

Should I make Lean a compulsory part of my 1st year undergraduate course? I am not convinced. People did not come to Imperial to do mathematics in a computer proof system. Some undergraduates have no interest in using computers at all. Athina Thoma, an education expert, told me that perhaps it is difficult for a student to learn new mathematics and to learn Lean at the same time. I think this is a very pertinent comment. So why do I even put time into this optional component of my course? Because I think it is time that this area begins to grow, possibly in ways I’ve not thought of yet, and the best way to make it grow is to make sure that a bunch of smart young people know about it. This is the motivation behind the Xena project. Once mathematicians can use this software, they’ll figure out interesting things to do with it. The killer apps will come.

As a reaction to Athina’s comment, I thought it might be interesting to teach Lean to people who did know the mathematics already. And so this term I am teaching a graduate course which will deal with undergraduate mathematics! As part of the Imperial/Oxford/Bath/Bristol/Warwick EPSRC Taught Course Centre, I am teaching a course on formalising mathematics this coming term! The course will comprise 8 two-hour workshops, where I get PhD students to think about stuff they understand well and to see if they understand it well enough to convince Lean that they do. Introductory lectures will be on things like equivalence relations and basic group theory; we will later move on to harder stuff (possibly guided by interests of the audience). Current ideas for what we will do in the last couple of workshops: some algebraic geometry, condensed sets, commutative algebra? We’ll see.

Oh — talking of EPSRC — they gave me a New Horizons grant! Job ad for digitising the Langlands program is out next week 😀

Posted in Imperial, Learning Lean, M40001, undergrad maths | Tagged | Leave a comment

Liquid tensor experiment

This is a guest post, written by Peter Scholze, explaining a liquid real vector space mathematical formalisation challenge. For a pdf version of the challenge, see here. For comments about formalisation, see section 6. Now over to Peter.

1. The challenge

I want to propose a challenge: Formalize the proof of the following theorem.

Theorem 1.1 (Clausen-S.) Let 0<p'<p\leq 1 be real numbers, let S be a profinite set, and let V be a p-Banach space. Let \mathcal M_{p'}(S) be the space of p'-measures on S. Then

\mathrm{Ext}^i_{\mathrm{Cond}(\mathrm{Ab})}(\mathcal M_{p'}(S),V)=0

for i\geq 1.

(This is a special case of Theorem 9.1 in www.math.uni-bonn.de/people/scholze/Analytic.pdf, and is the essence of the proof of Theorem 6.5 there.)

Below, I will explain what all the terms in the theorem are, and why I care. I apologize in advance that the background story is a little longer, but I think it’s a fascinating story and I tried to be as brief as possible.

2. Getting condensed

The first thing to explain is the category of condensed abelian groups \mathrm{Cond}(\mathrm{Ab}), in which this computation takes place. This is a variant of the category of topological abelian groups, but with much better properties. It is part of the “condensed mathematics” that Dustin Clausen and myself started to develop two years ago. I gave two courses in Bonn about it: www.math.uni-bonn.de/people/scholze/Condensed.pdf and www.math.uni-bonn.de/people/scholze/Analytic.pdf, and recently we gave a Masterclass in Copenhagen about it, https://www.math.ku.dk/english/calendar/events/condensed-mathematics/.

Condensed mathematics claims that topological spaces are the wrong definition, and that one should replace them with the slightly different notion of condensed sets. Before giving their definition, let me state a few properties:

— in practice, most topological spaces of interest are “compactly generated weak Hausdorff”. In fact, this is the class of topological spaces customarily used within algebraic topology; also, all usual topological vector spaces (e.g., all metrizable ones, or just first-countable ones) have this property. Now compactly generated weak Hausdorff spaces also embed fully faithfully into condensed sets, so in this large class of examples, the transition is completely inconsequential.

— condensed sets have very nice categorical properties, in particular all limits and colimits (like topological spaces). But they also have internal mapping objects: if X, Y are condensed sets, there is a condensed set \mathrm{Hom}(X,Y) such that for any other condensed set T, maps T\to \mathrm{Hom}(X,Y) are functorially identified with maps T \times X \to Y. In fact, the same is true relatively, for maps over some base condensed set S. In category-speak, “condensed sets are locally cartesian closed”.

— even better, up to mild set-theoretic problems, condensed sets form a topos. (More precisely, they satisfy Giraud’s axioms except for the existence of a set of generators. One can build a variant of the category of condensed sets, known as the category of pyknotic sets of Barwick–Haine, that is a topos, essentially by restricting to the subcategory generated by some set of generators.)

Let me now explain what condensed sets are. Clausen and I came to it from different trajectories (and Barwick and Haine yet from a different one); there are some precursors in the literature, notably Spanier’s quasi-topological spaces (in algebraic topology) and Waelbroeck’s compactological spaces (in functional analysis). For me, it started (unsurprisingly…) in relation to my work on perfectoid geometry, but feel free to skip this paragraph. The starting point of perfectoid geometry is the idea to study something like a p-adic annulus \{T\in \mathbb C_p, |T|=1\} by extracting all p-power roots of T, leading to an infinite tower of covers, and then pass to the limit object, which is a perfectoid space. One wants to do this as, surprisingly, perfectoid spaces are in some ways nicer than the usual annulus; in particular, there is the tilting procedure relating them to characteristic p (and again, surprisingly, characteristic p is in some ways easier than characteristic 0). Technically, one defines the pro-étale site of the annulus (or any rigid-analytic variety), and shows that locally in the pro-étale site, the space is perfectoid. But let us forget all about perfectoid geometry, and let us simply look at what the analogue of this story would be for a point: In the fibre over a point, say T=1, in each finite stage one covers it by finitely many points, and in the limit by a limit of finite sets, i.e. a profinite set. (A profinite set is by definition an object of the category Pro-(finite sets). Their category is equivalent to the category of totally disconnected compact Hausdorff spaces, and I will often tacitly identify the two concepts.)

This led Bhatt and myself to introduce the pro-étale site of a point (or a general scheme). This is the category of profinite sets S, where covers are given by finite collections of jointly surjective maps.

Definition 2.1. Condensed sets are sheaves on the pro-étale site of a point.

This makes it sound like condensed sets are a topos, but above I mentioned set-theoretic problems. Indeed, the category of profinite sets is not small, so one has to resolve this in some way. Let me gloss over this point here; it is not essential for any of the following discussion.

This definition (without the name) is already in my 2013 paper with Bhatt. But I never tried to thoroughly understand this notion. I knew that there is a functor from topological spaces to condensed sets (up to set-theoretic problems): Given a topological space T, one can send any profinite set S to the continuous maps from S to T. What I did not understand is that this functor is fully faithful on a very large class of topological spaces, and that condensed sets are actually an improvement over topological spaces. This is what Clausen quickly convinced me of when he arrived in Bonn in 2018 as a PostDoc.

Before going on, let me describe what a condensed set X “is”: For each profinite set S, it gives a set X(S), which should be thought of as the “maps from S to X”, so it is measuring how profinite sets map into X. The sheaf axiom guarantees some coherence among these values. Taking S=\ast a point, there is an “underlying set” X(\ast). Beware however that there are condensed sets X with X(\ast)=\ast a point, but with X(S) big for general S. We will see an example below. It is important to allow such examples in order to have a nice general theory. In practice, they do not produce much trouble either. It’s a free world.

Example 2.2. Let T be a compact Hausdorff space. Then a classical and somewhat weird fact is that T admits a surjection S\to T from a profinite set S. One construction is to let S be the Stone-Čech compactification of T^\delta, where T^\delta is T considered as a discrete set. This lets one recover T as the quotient of S by the equivalence relation R = S\times_T S\subset S\times S. Thus, compact Hausdorff spaces can be thought of as quotients of profinite sets by profinite equivalence relations. Of course, this seems like a horrible way to look at compact Hausdorff spaces, but this is what happens in the condensed perspective, which only records maps from profinite sets. Part of the reason for the challenge is the question: Can this possibly be a good perspective?

Example 2.3. Let T=[0,1] be the interval. In high-school, we learn to think of real numbers in terms of decimal expansions. I claim that this produces a surjection from a profinite set onto T! Why? Any point of [0,1] can be written as 0.a_1a_2a_3\ldots with coefficients a_i\in \{0,1,\ldots,9\}, however one needs to make some identifications, like 0.099999\ldots = 0.100000\ldots. Now decimal expansions per se naturally form a profinite set, namely \prod_{n\geq 1} \{0,1,\ldots,9\}. It is only when doing these extra identifications that one produces the continuum [0,1]! So secretly, in high school we learn to think of the interval as a quotient of a profinite set.

3. Condensed abelian groups

Just like topological abelian groups, one can consider condensed abelian groups, that is abelian group objects in the category of condensed sets. Equivalently, these are sheaves of abelian groups on the pro-étale site of a point. In fact, one nice feature of the condensed formalism is that for any notion of mathematical objects, one can consider their condensed version: Condensed groups, condensed rings, condensed categories, …, just by talking about sheaves of such on the pro-étale site of a point, making it possible to “put a topology on top of something” without trouble. (This is the spell of “working internally in a topos”…) However, unlike topological abelian groups, condensed abelian groups have excellent formal properties:

Proposition 3.1. The category \mathrm{Cond}(\mathrm{Ab}) of condensed abelian groups is an abelian category with all limits and colimits, and a class of compact projective generators. In particular, it satisfies Grothendieck’s axioms (AB3–6) and (AB*3–4) (i.e., the same that the category of abelian groups satisfies), in particular filtered colimits and infinite products are exact.

This makes \mathrm{Cond}(\mathrm{Ab}) exceptionally well-behaved, even more well-behaved than the category of abelian sheaves on a usual topos: Usually, infinite products fail to be exact.

Example 3.2. In topological abelian groups, a standard problem is that one can have continuous maps f: M \to N of topological abelian groups that are isomorphisms of underlying abelian groups, but the map is not a homeomorphism. For example, let N=\mathbb R be the real numbers and M=\mathbb R^\delta be the real numbers equipped with the discrete topology. Then the kernel and cokernel of f are trivial, while f is not an isomorphism. This is corrected in condensed abelian groups as follows: There is an exact sequence

0\to \mathbb R^\delta \to \mathbb R \to Q \to 0

where Q is a condensed abelian group with underlying abelian group Q(\ast)=0, but for a general profinite set S, one has Q(S) = \mathrm{Cont}(S,\mathbb R)/\mathrm{LocConst}(S,\mathbb R), which is in general not zero.

Another nice feature of the condensed world is that it’s a free world. Namely, for any condensed set X, there is a free condensed abelian group \mathbb Z[X] on X (given by the sheafification of the presheaf taking S to \mathbb Z[X(S)]). If X is a compact Hausdorff space, one can describe \mathbb Z[X] explicitly: It is a union of compact Hausdorff spaces \mathbb Z[X]_{\leq n}\subset \mathbb Z[X], with underlying set \{\sum_{s\in S} n_s[s]\mid \sum_{s\in S} |n_s|\leq n\}. This defines in fact also a (compactly generated weak Hausdorff) topological abelian group; in that context, this construction goes back to old work of Markov.

What are compact projective generators of \mathrm{Cond}(\mathrm{Ab})? One can take \mathbb Z[S] for certain special profinite sets S, namely so-called “extremally disconnected” profinite S. This includes the Stone-Čech compactifications of discrete sets, and all other examples are retracts. In fact, extremally disconnected profinite sets are exactly the projective objects of the category of compact Hausdorff spaces, by an old theorem of Gleason. Extremally disconnected profinite sets are pretty rare: The smallest infinite one has cardinality 2^{2^{\aleph_0}}. All convergent sequences in extremally disconnected sets are eventually constant. So all your favourite examples of profinite sets (Cantor sets, \mathbb Z_p, etc.) are not extremally disconnected. In fact, the existence of extremally disconnected ones is tied extremely closely to the axiom of choice (in the form of the existence of non-principal ultrafilters): The basic examples are Stone-Čech compactifications of discrete sets, which are discrete sets together with some ghostly dust: You will never be able to name any point in the boundary!

The proposition ensures that one can do homological algebra in \mathrm{Cond}(\mathrm{Ab}), but it means that for a computation, one has to resolve by free condensed abelian groups on extremally disconnected sets. These are some “clouds of dust”, and projective resolutions in \mathrm{Cond}(\mathrm{Ab}) amount to resolving nice objects by “clouds of dust”. It is basically impossible to do this completely explicitly, but one can sometimes be reasonably explicit. Clausen had developed some techniques to carry out nontrivial computations in \mathrm{Cond}(\mathrm{Ab}). In particular:

Theorem 3.3. Let A, B be locally compact abelian groups, considered as objects in \mathrm{Cond}(\mathrm{Ab}). Then \mathrm{Ext}^i_{\mathrm{Cond}(\mathrm{Ab})}(A,B)=0 for i\geq 2, and \mathrm{Ext}^1_{\mathrm{Cond}(\mathrm{Ab})}(A,B) agrees with the Yoneda-Ext in the category of locally compact abelian groups, while \mathrm{Hom}_{\mathrm{Cond}(\mathrm{Ab})}(A,B) are the usual continuous maps.

Thus, starting from locally compact abelian groups, there are no “strange” condensed abelian groups arising as extensions (or “higher” extensions): Even if \mathrm{Cond}(\mathrm{Ab}) chops things up into clouds of dust, the only ways to build the dust back together are the ways you would have thought about. The proof of the theorem is quite nontrivial; it makes use of the Breen–Deligne resolution of abelian groups. However, given the right approach the proof is quite tidy.

4. Completeness

These results show that \mathrm{Cond}(\mathrm{Ab}) behaves quite well, and if one starts with reasonable examples, then their \mathrm{Hom} and \mathrm{Ext}-groups generally stay well-behaved. However, this is not true for tensor products. Namely, if one takes say a tensor product of \mathbb R and \mathbb Z_p, one gets a condensed abelian group \mathbb R\otimes \mathbb Z_p whose underlying abelian group is the usual algebraic tensor product of \mathbb R and \mathbb Z_p, which is a bit of a mess as the real and p-adic topologies are incompatible. One would like to have a notion of “complete” condensed abelian groups, and a resulting notion of completed tensor product, such that the completed tensor product vanishes. Similarly, the completed tensor product of \mathbb Z_p with itself should just be \mathbb Z_p.

After toying around, we found a notion of completeness that works perfectly away from the real numbers.

Definition 4.1. A condensed abelian group M is solid if for any profinite set S, any map f: S\to M extends uniquely to a map \mathbb Z[S]^\blacksquare \to M, where \mathbb Z[S]^\blacksquare = \varprojlim_i \mathbb Z[S_i] when S=\varprojlim_i S_i is written as a limit of finite sets.

One can regard \mathbb Z[S]^\blacksquare also as \mathrm{Hom}(\mathrm{Cont}(S,\mathbb Z),\mathbb Z), i.e. as the \mathbb Z-valued measures on S. In this sense, this says that any map f: S\to M gives a unique map \mathbb Z[S]^\blacksquare\to M sending a measure \mu to \int f\mu. Note that there is a map S\to \mathbb Z[S]^\blacksquare given by “Dirac measures”.

Theorem 4.2. The category of solid abelian groups is stable under all limits and colimits and extensions, and is an abelian subcategory of condensed abelian groups. The inclusion into condensed abelian groups admits a left adjoint M\mapsto M^\blacksquare “solidification”, which is the unique colimit-preserving extension of \mathbb Z[S]\mapsto \mathbb Z[S]^\blacksquare. There is a unique tensor product on solid abelian groups making M\mapsto M^\blacksquare symmetric monoidal, i.e. compatible with the tensor product. The category of solid abelian groups has compact projective generators, given by infinite products of copies of \mathbb Z.

Example 4.3. The discrete group \mathbb Z is solid, and then all objects built via limits and colimits (and internal Hom’s) from \mathbb Z are still solid. This includes \mathbb Z_p, \mathbb Q_p, \prod_I \mathbb Q_p, etc., in fact everything that comes up in usual algebra. The only way to leave the category is to take tensor products; these have to be resolidified. But then \mathbb Z_p \otimes^\blacksquare \mathbb Z_p = \mathbb Z_p, \mathbb Z_p \otimes^\blacksquare \mathbb Z_\ell = 0 for p\neq \ell, etc. In fact, considering solid \mathbb Q_p-vector spaces, one gets a perfect framework for doing p-adic functional analysis. The solid tensor product recovers the usual completed tensor product of Banach spaces, and even of Fréchet spaces.

5. Road to reality

Let us recall where we stand. The condensed formalism was largely motivated by questions in p-adic geometry, is based on profinite sets, and it can beautifully handle p-adic functional analysis. All of these are very much nonarchimedean. On the other hand, we would like to claim that condensed sets are better than topological spaces, even over \mathbb R. So what about the real numbers?

The bad news is that the real numbers are definitely not solid; their solidification is equal to 0. The issue is that \mathbb Z[S]^\blacksquare was the space of \mathbb Z-valued measures on S, without any archimedean bounds.

Working over \mathbb R, one is thus led to consider for profinite sets S=\varprojlim_i S_i the space \mathcal M(S) of (signed Radon) measures on S; as a condensed abelian group,

\mathcal M(S) = \mathrm{Hom}_{\mathbb R}(\mathrm{Cont}(S,\mathbb R),\mathbb R).

This is a union of compact Hausdorff spaces \mathcal M(S)_{\leq c} (equipped with the weak-\ast-topology). One can write \mathcal M(S)_{\leq c} = \varprojlim_i \mathbb R[S_i]_{\ell^1\leq c}, where \mathbb R[S_i] is the free vector space on the finite set S_i, and \ell^1\leq c refers to the part of \ell^1-norm at most c. Originally, we were hoping that the following definition was reasonable.

Definition 5.1. A condensed \mathbb R-vector space V is \mathcal M-complete if for any profinite set S, any map f: S\to V extends uniquely to a map \mathcal M(S)\to V.

Any complete locally convex vector space gives rise to an \mathcal M-complete V. Conversely, the condition of \mathcal M-completeness is closely related to local convexity. One can translate much of functional analysis to this setting. In particular, (up to also asking for quasiseparatedness), one can define a completed tensor product in this setting, and in some sense it recovers both the injective and the projective tensor product of Banach spaces. Unfortunately, it turns out that this category is not abelian, not stable under extensions, etc. The problem is the existence of the Ribe extension, that is an extension of Banach spaces that is not itself locally convex; it arises from the entropy functional. This forces us to include non-locally convex spaces in the formalism. Traditionally, this leads to the notion of p-Banach spaces for 0<p\leq 1, which includes \ell^p-spaces: a p-Banach space is a topological \mathbb R-vector space V that is complete for a p-norm ||\cdot||: V\to \mathbb R_{\geq 0}. The only axiom that is changed is the scaling axiom: ||av||= |a|^p ||v|| for a\in \mathbb R, v\in V. In particular, ||\cdot|| still satisfies the usual triangle inequality. [There is some unfortunate clash of notation here between p always denoting a prime number for me, and it being the standard letter to denote \ell^p-norms. Denoting primes by \ell doesn’t even help much!]

There is a theorem of Kalton that any extension of p-Banach spaces is p'-Banach for all p'<p. This leads to the following definition.

Definition 5.2. For a profinite set S=\varprojlim_i S_i, let the space of p-measures be \mathcal M_p(S) = \bigcup_{c>0} \mathcal M_p(S)_{\leq c}, where

\mathcal M_p(S)_{\leq c} = \varprojlim_i \mathbb R[S_i]_{\ell^p\leq c}.

Definition 5.3. A condensed \mathbb R-vector space V is p-liquid if for any profinite set S and any p'<p, any map f: S\to V extends uniquely to a map \mathcal M_{p'}(S)\to V.

After a lot of struggles, we believe we were able to prove the following analogue of our theorem on solid abelian groups.

Theorem 5.4. The class of p-liquid \mathbb R-vector spaces is stable under all limits and colimits and extensions (even in condensed abelian groups), and forms an abelian subcategory of condensed abelian groups. The inclusion into all condensed abelian groups admits a left adjoint “p-liquidification”, which is the unique colimit-preserving extension of \mathbb Z[S]\mapsto \mathcal M_{<p}(S) = \varinjlim_{p'<p} \mathcal M_{p'}(S). There is unique tensor product of p-liquid \mathbb R-vector spaces making p-liquidification symmetric monoidal. The category of p-liquid \mathbb R-vector spaces has compact projective generators, given by \mathcal M_{<p}(S) for S extremally disconnected.

On nuclear spaces, the p-liquid tensor product agrees with the usual completed tensor product, for any choice of 0<p\leq 1. (Arguably, this is the best one can hope for, as for say Banach spaces, there is not one but several natural choices for a tensor product as defined by Grothendieck. The p-liquid one is still different, and does not usually produce a Banach space, but in the nuclear case these subtleties disappear.)

This is the theorem that I would like to see formalized. As stated in the beginning, it comes down (by a not completely obvious, but very safe reduction) to Theorem 1.1, the vanishing of \mathrm{Ext}-groups between \mathcal M_{p'}(S) and p-Banach spaces.

Remark 5.5. The class of p-liquid \mathbb R-vector spaces depends on the choice of 0<p\leq 1; for p=1, the condition is the strongest (closest to local convexity) and gets weaker as p approaches 0. For applications, usually any choice of p is fine. It is quite mysterious that there is not only one theory over \mathbb R now, but a whole family of them! In fact, it turns out that there are similar theories also over \mathbb Q_\ell. In that case, they exist for all 0<p<\infty, and in some sense the limiting theory for p=\infty is the theory of solid \mathbb Q_\ell-vector spaces. One can thus think that for increasing p, the objects get more and more “viscous” until they become solid for p=\infty. (On the other hand, the limit for p=0 is in some sense the class of all condensed \mathbb Q_\ell-vector spaces, which one might refer to as “gaseous”.) Over \mathbb R, one needs to take p\leq 1 as otherwise the transition maps in the limit \varprojlim_i \mathbb R[S_i]_{\ell^p\leq c} are not well-defined, as they can increase the \ell^p-norm.

6. Sympathy for the devil

Why do I want a formalization?

— I want to make the strong claim that in the foundations of mathematics, one should replace topological spaces with condensed sets (except when they are meant to be topoi — topoi form a separate variant of topological spaces that is useful, and somewhat incomparable to condensed sets). This claim is only tenable if condensed sets can also serve their purpose within real functional analysis.

— with this theorem, the hope that the condensed formalism can be fruitfully applied to real functional analysis stands or falls. I think the theorem is of utmost foundational importance, so being 99.9% sure is not enough.

— if it stands, the theorem gives a powerful framework for real functional analysis, making it into an essentially algebraic theory. For example, in the Masterclass, Clausen sketched how to prove basic results on compact Riemann surfaces or general compact complex manifolds (finiteness of cohomology, Serre duality), and one can black box all the functional analysis into this theorem. Generally, whenever one is trying to mix real functional analysis with the formalism of derived categories, this would be a powerful black box. As it will be used as a black box, a mistake in this proof could remain uncaught.

— I spent much of 2019 obsessed with the proof of this theorem, almost getting crazy over it. In the end, we were able to get an argument pinned down on paper, but I think nobody else has dared to look at the details of this, and so I still have some small lingering doubts.

— as I explain below, the proof of the theorem has some very unexpected features. In particular, it is very much of arithmetic nature. It is the kind of argument that needs to be closely inspected.

— while I was very happy to see many study groups on condensed mathematics throughout the world, to my knowledge all of them have stopped short of this proof. (Yes, this proof is not much fun…)

— I have occasionally been able to be very persuasive even with wrong arguments. (Fun fact: In the selection exams for the international math olympiad, twice I got full points for a wrong solution. Later, I once had a full proof of the weight-monodromy conjecture that passed the judgment of some top mathematicians, but then it turned out to contain a fatal mistake.)

— the Lean community has already showed some interest in formalizing parts of condensed mathematics, so the theorem seems like a good goalpost.

— from what I hear, it sounds like the goal is not completely out of reach. (Besides some general topos theory and homological algebra (and, for one point, a bit of stable homotopy theory(!)), the argument mostly uses undergraduate mathematics.) If achieved, it would be a strong signal that a computer verification of current research in very abstract mathematics has become possible. I’ll certainly be excited to watch any progress.

— I think this may be my most important theorem to date. (It does not really have any applications so far, but I’m sure this will change.) Better be sure it’s correct…

7. Arithmetic reality

Finally, let me say a few words about the proof of Theorem 1.1; I tried to give a summary of the idea in Lectures 17 — 18 of the Copenhagen Masterclass, you can find them on YouTube (here and here). In brief, one is battling with two enemies:

a) The real numbers are not locally profinite. So in some philosophical sense there’s a mismatch between the objects \mathcal M_{p'}(S), V (real vector spaces) and the category \mathrm{Cond}(\mathrm{Ab}) (built from profinite sets). Our task is to show that the two can still be friends, even though \mathrm{Cond}(\mathrm{Ab}) wants to chop up all real vector spaces into locally profinite clouds of dust.

b) Putting bounds on the real numbers leads to subsets that are not stable under addition anymore. That’s hardly news to a normal human being — of course 1+1 is larger than 1 — but it is very bad news for a person that has spent all of their life in the p-adic world (where 1+1 is not larger than 1). When combining bounds with homological algebra, it means that within all our complexes we have to carefully keep track of norms. This quickly gets nasty. Did you ever try to chase norms in the snake lemma? In a spectral sequence?

Let me first say a few words about how we battled a), which I believe is a most profound point. In some sense we come back to decimal expansions of the real numbers. More precisely, let 0<r<1 and consider the ring \mathbb Z((T))_{>r} of those Laurent series \sum_{n\gg -\infty} a_n T^n with integer coefficients a_n\in \mathbb Z that converge on a complex punctured disc \{0<|T|< r'\} for some r' > r. Assuming r\geq \tfrac 1{10}, we get a surjection

\mathbb Z((T))_{>r}\to \mathbb R: \sum a_n T^n\mapsto \sum \frac{a_n}{10^n}

and in fact \mathbb R\cong \mathbb Z((T))_{>r}/(10T-1). (Needless to say, the choice of \tfrac 1{10} here is completely arbitrary; you can also use \tfrac 1{\pi}. Except that it’s less obvious that the kernel is a principal ideal. A theorem of Harbater states that \mathbb Z((T))_{>r} is a principal ideal domain!) Now \mathbb Z((T))_{>r} is naturally a countable union of profinite subsets — essentially, putting bounds on each a_n is cutting them down to finite sets. This means that problem a) disappears if we work with \mathbb Z((T))_{>r}. The story of liquid modules also works over \mathbb Z((T))_{>r}, and the whole proof happens here. The variant of Theorem 1.1 for \mathbb Z((T))_{>r} is Theorem 9.1 in www.math.uni-bonn.de/people/scholze/Analytic.pdf, and its proof is given in Lecture 9.

Let me not say much about the proof in Lecture 9 — you can look for yourself — but to note that as indicated in b), it is a very delicate argument fighting with estimates against homological algebra. There is a hierarchy of implicit constants that have to be chosen, and you have to do it in exactly the right order. In the end, one formulates Theorem 9.5 which can be proved by induction; it is a statement of the form \forall \exists \forall \exists \forall \exists, and there’s no messing around with the order of the quantifiers. It may well be the most logically involved statement I have ever proved. (On the other hand, if I want to claim that the theorem on liquid vector spaces makes it possible to black box functional analysis, hard estimates have to be somewhere. Better be sure the estimates actually work…!)

Note that I’m saying that to prove a theorem about real vector spaces (in fact, to set up real functional analysis) we have to work with the arithmetic ring \mathbb Z((T))_{>r}! As a number theorist, this seems like a fair state of affairs to me, but I wonder whether I let my prejudices guide me. So here is a challenge for human mathematicians:

Question 7.1. Can one prove Theorem 1.1 directly over the reals?

Remark 7.2. In Theorem 1.1 the \mathrm{Ext}-groups are computed in \mathrm{Cond}(\mathrm{Ab}). One could also compute them in condensed \mathbb R-vector spaces. The result of Theorem 1.1 is slightly stronger (it implies the vanishing of \mathrm{Ext}-groups in condensed \mathbb R-vector spaces via a mostly formal argument), and implies that p-liquid \mathbb R-vector spaces embed fully faithfully into condensed abelian groups, i.e. the \mathbb R-linear structure is necessarily unique. One could also consider the variant of Theorem 1.1 taking \mathrm{Ext}-groups in condensed \mathbb R-vector spaces; I’d already be happy to see a direct proof of this over the reals. However, I actually believe that this is not any easier.

Finally, here is a somewhat vague question.

Question 7.3. Is the proof of Theorem 1.1 in any way related to some known arguments in functional analysis?

For example, is it related to “multi-scale” arguments? The different scales here being the powers 1,T,T^2,\ldots of the fixed element T (taken as \tfrac 1{10} above).

Peter Scholze, 5th December 2020

Posted in number theory | Tagged , | 11 Comments

Thoughts on the Pythagorean theorem

I’m sure I’m saying nothing new here. I’m just explaining another example of how thinking about how to formalise things has taught me stuff about what mathematics is.

What is the Pythagorean theorem?

The Pythagorean theorem, a.k.a. Pythagoras’ theorem, comes in two parts. Firstly there is the theorem statement, which says that in a right angled triangle (like the dark blue one below), the square of the hypotenuse equals the sum of the squares of the other two sides. And then there is the proof, which originally is either due to Pythagoras or not depending on who you believe. Let’s start with the statement.

What does the statement actually mean?

Let’s take a look at the picture of the squares on the hypotenuse and the other two sides.

Some squares

The dark blue triangle is right-angled. The claim is that the square C is equal to the sums of the squares A and B. On the face of it, this is nonsense. If you take squares A and B, you have a picture containing two squares; but square C is just one square. How can one square equal two squares? But of course the claim is not that the pictures are equal, the claim is that the areas are equal.

But what is area? To find out, let’s go back to Euclid.

Euclid’s take on the theorem

Euclid’s Elements contains a proof of the Pythagorean theorem, right at the end of book 1. The proof involves drawing some triangles and arguing that various things are “equal”. This approach is valid because Euclid has explicitly stated as his Common Notion 1 that equality, whatever it is, is transitive.

One can chase this concept of equality back to Proposition 35, which claims that two parallelograms with the same base and the same height are “equal”. In fact this seems to be the first time that the word “equal” is used to mean “having equal area” in the Elements. Halving the parallelograms we deduce the more familiar Proposition 37, that two triangles with the same base and the same height are also “equal”. So what goes into the proof of Proposition 35, that two parallelograms with the same base and height are “equal” in the sense of having equal area?

The key ideas in the proof are Euclid’s second and third common notions: that “equals added to equals are equal”, and “equals subtracted from equals are equal”. In high-level terms, these common notions imply that equality is not just an equivalence relation, but a congruence relation. But let’s see how Euclid uses these notions in his proofs.

Equals added to equals are equal.

The two orange regions have equal areas, because they are both “equals added to equals”: the small triangles and the big triangles are both congruent.

Equals subtracted from equals are equal

Here, the two larger triangles are congruent, so the two orange areas are equal, because they are equals (the dark blue triangle) subtracted from equals (the larger triangles). For Euclid, the equality of the areas of the two orange regions in these examples is axiomatic. Take a look at the proof of Proposition 35 to see how these facts are used to prove that two parallelograms with the same base and height are “equal”.

Area in Euclid book 1

So, what Euclid does mean by the “area” of a shape? Well this is the funny thing — he never says, throughout book 1! He only says what it means for two shapes to have “equal area”!

This is exactly what an equivalence relation is. An equivalence relation on a type is a concept of equality on terms of that type. It can be thought of as focussing on a particular attribute of the terms you are considering (for example the area of a shape, or the value of an integer modulo 10) and saying that two terms are equivalent if the values of those attributes are equal. Euclid is putting an equivalence relation on shapes. His definition of the relation involves cutting and pasting in geometry, and at the end of the day the proof of the Pythagorean theorem in Euclid is essentially a jigsaw puzzle. Here is an even simpler jigsaw puzzle proof:

A proof of the Pythagorean theorem

Euclid and type theory

When Euclid did mathematics, he was doing type theory. For Euclid, points were terms of the Point type, lines were terms of the Line type, and planes were terms of the Plane type. Euclid wrote down the axioms his types satisfied (for example there was a unique line between any two distinct points) and proceeded to work from there. He has a definition of a 2-dimensional shape, and assuming that a plane exists, his shapes exist too. He defined an equivalence relation on 2D shapes, and proved that the 2D shape corresponding to the square on the hypotenuse was related to the 2D shape corresponding to the union of the squares on the other two sides, using properties of this relation which he has earlier axiomatised.

The proof of Pythagoras’ theorem in Euclid is what is known as a synthetic proof. We assume that a Euclidean Plane exists and satisfies a list of axioms, which Euclid attempted to write down and which most of us never even contemplate. We then formulate the theorem, and prove it using the axioms.

Numbers from geometry?

Note that Euclid is in some kind of a position to define real numbers at this point, or at least the positive real numbers. For example, Euclid knows what it means for two line segments to have equal length — it means that you can translate and rotate one line until it coincides with the other. He could hence define the positive reals to be equivalence classes of line segments, under the equivalence relation of being the same length. However one runs into problems when it comes to completeness, something Euclid’s axioms were not really designed for.

Geometry from numbers: Enter Descartes.

Descartes suggested doing things the other way around, using numbers to do geometry rather than using geometry to define numbers. Descartes observed that one could label a point in the plane with an x and y coordinate. This changed everything. All of a sudden “the plane” (a term whose existence is never talked about in Euclid) becomes modelled by \mathbb{R}^2. Euclid’s definitions, common notions, and axioms now need to be revisited. We need to check that this more refined model satisfies the rules of Euclid’s game (a bit like checking that Einstein’s theory turns into Newton’s in the limit).

We model a point as an ordered pair of real numbers, we can define lines as solutions to linear equations because the reals are a field so we have that language available. We can prove the parallel postulate no problem. The theory of integration gives us a way to measure lines (length), angles (measure), curves (length) and 2-dimensional shapes (area), using the natural (Euclidean) Riemannian metric on the plane. We can now completely rephrase Pythagoras’ theorem: it is now an equality of numbers. We can re-interpret the “jigsaw puzzle” proof in Euclid as a consequence of finite additivity of Lebesgue measure on the plane. We can also give a completely new proof, using the theorem that the distance from (a,b) to (c,d) is \sqrt{(a-c)^2+(b-d)^2}, as one can check using a line integral (modulo the theorem that the shortest distance between two points is a straight line, which needs proving in this context).

I saw measure theory developed as an undergraduate, and probably a few years ago I would have argued that this is now the “proper” proof — but now I realise that this proof still has some synthetic element to it: namely, the real numbers. We have a non-synthetic plane, but it is made from synthetic numbers.

What are the real numbers?

I was told as an undergraduate that it was an axiom that the reals existed, and that they were a complete ordered field. All of the analysis I learnt as an undergraduate was built upon this assumption, or rather, this structure (addition, multiplication, inequality) and these axioms (associativity and commutativity of addition and multiplication etc) on this type (the reals). In some sense it is no different to Euclid, who also had types (e.g. points), structures (e.g. triangles) and axioms (e.g. the common notions, or the parallel postulate), but who was modelling the Pythagorean theorem in a different, arguably more primitive, way.

Enter the analysts, bearing sequences

Descartes solved the problem of how to represent points in a plane with real numbers, but for Descartes, the reals were a type. Many years later, Cauchy and Dedekind gave two ways to represent the reals using simpler objects. Indeed, Cauchy sequences and Dedekind cuts are (different) ways of building the reals from the rationals. Similarly, the rationals can be built from the integers, the integers from the naturals, and the naturals are…well, they’re just some synthetic thing satisfying Peano’s axioms, right? At this point one could argue that Pythagoras’ theorem has become a statement about sequences of pairs of natural numbers (or however else one is modelling positive rationals), and the natural numbers have no definition — they are synthetic. But we can go further.

Enter the logicians, bearing sets.

One thing that ZFC set theory (the usual set-theoretic foundations of 20th century mathematics) has going for it, is that it gives a very unambiguous answer to the question: “Why are the natural numbers a set?”. The answer is “It’s one of the axioms”. One thing against it is that in ZFC set theory, everything is a set, even stuff which you don’t want to be a set. For example, real numbers (like the area of the square on the hypotenuse) are now sets, and the Pythagorean Theorem is now a theorem about the equality of two sets, although we don’t know exactly what the sets are, because we can never be sure whether the real numbers in the Platonic universe (or whichever universe we’re operating in) use Cauchy sequences or Dedekind cuts. [Pro tip: if your universe offers good quotienting facilities, use Cauchy sequences: they’re cleaner.]

The fact that we don’t know whether the reals being used in Pythagoras’ theorem are Cauchy sequences or Dedekind cuts is an indication that we have unfolded things too far, as far as Pythagoras’ theorem goes. Most mathematicians regard the real numbers as a type. A real number is not a set — Gauss or Riemann could certainly have informed you of that.

It is interesting that we can keep unfolding this way — but we can never get to the bottom of things. We can’t define everything, we always have to start somewhere — a synthetic beginning. Euclid started with points, lines and planes. Descartes started with the reals. Cauchy and Dedekind observed that you could start with the naturals. Set theorists start with sets. There is no definition of a set in ZFC set theory — a set is just a term in a model of ZFC set theory. The model can be thought of as the type, and its elements (or whatever you want to call them — they are not elements of a set in the internal logic of the model) as the terms. The behaviour of sets is developed using the axioms.

Pythagoras’ theorem : refinements of equality

So what is Pythagoras’ theorem? Is it that two shapes are “equal”, two numbers are equal, or two sets are equal? In some sense, it’s all of these things. In some sense this story reminds me of chemistry at school. The joke in the chemistry class was that they were always telling you lies — models of atoms which were very naive, and then more careful models which were more accurate, culminating (for me) with an undergraduate quantum mechanics course which told me that an electron was actually some kind of a complex-valued function. It feels very similar here. Euclid had a perfectly adequate notion of geometry, he axiomatised what he needed and argued from there. Later on we found a way to model the plane from more fundamental objects such as the real numbers. After that we found a way of modelling the real numbers using the natural numbers, and in some sense this is where we stop; using either type theory or set theory, the natural numbers have no definition — their existence is asserted, and we use them via their API (addition, multiplication, the principle of induction). Maybe someone will come along in future with a theory of mathematical quarks, which can be used together with the quark axioms to build the natural numbers in a natural way, and then our understanding of what Pythagoras’ theorem is might change again.

Posted in General, Olympiad stuff, Type theory, undergrad maths | Tagged , , | 8 Comments

Two types of universe for two types of mathematician

Thank you Johan for pointing out to me that the mathlib stats page had got really good! But one thing that made me laugh is that somehow on their stats for commits

I see I have done just enough to get a mention! I was quite surprised. Let’s look at those mathlib committers, most of them far more prolific than I am. Who are they? One surprising thing about them: although we have a common goal, of formalising mathematics in Lean, we are a really mixed bag of people, some of whom you might expect to have no research interests in common with me at all.

Yury Kudryashov is a post-doc in the maths department at Toronto. I’ve never met him, because of visa issues. The dark blue line shows his meteoric rise.

Scott Morrison does TQFT‘s at ANU and was one of the founders of MathOverflow.

Mario Carneiro is a PhD student of Jeremy Avigad, in the philosophy department at CMU. He taught me Lean.

Johannes Hoelzl is a computer scientist who works for Apple on formally verifying their products. He wrote the filter library for mathlib, which was essential for Patrick Massot’s work on topology, in turn crucial for the construction of perfectoid spaces in Lean.

Chris Hughes is an undergraduate at Imperial College London. He’s going to an MSc project with me next year, on a one-relator group tactic. Chris has taught me so much about what mathematics really is. Chris learnt Lean when he was a 1st year undergraduate and began to interpret all of his lectures from a type theoretic point of view, which in turn has led to what I think is an extraordinary understanding of “what is actually going on” in a pure mathematics undergraduate degree (in particular exactly how informal some of it is).

Rob Lewis has a PhD in logic and is now a post-doc in the computer science department at VU Amsterdam. He wrote linarith (“Loves the jobs you hate”). I ran into serious technical problems with tactics malfunctioning when making the natural number game, and Rob wrote the code which enabled me to hack tactic mode and fix stuff, saving the project. He set up the mathlib documentation project. This is the community’s effort to explain Lean’s interface to mathematicians. Before his academic career, he was a teacher.

Sébastien Gouëzel is a professor of mathematics in Nantes, who won the Brin Prize in 2019. He was the driving force behind manifolds in Lean. We now have C^n and C^\infty manifolds, over general complete fields such as the complexes, reals or p-adics, and in the real case you can also have corners. It was harder than you might initially think.

Johan Commelin is a post-doc in arithmetic geometry in Freiburg, working with Annette Huber. He has a wife and three small kids, and last week uploaded not one but two papers to ArXiv about o-minimality. He developed the API for a new Lean type called a “group with zero”, for example a field or a division ring but forget about the addition. Talking to him about stuff like this made me understand that the definition of a UFD never uses addition, and is hence a special case of a more general definition. History sometimes needs rewriting, but it’s OK — we hide the details from you. A UFD is a monoid with zero, by the way (such that the underlying monoid is a product of a group and a free abelian monoid — that’s the full definition in fact, although there are two ways of interpreting “is”).

Simon Hudon is a computer scientist and an excellent functional programmer. His definition of a monad is different to mine, and he knows what all the weird >-> symbols mean. He wrote a bunch of stuff in core Lean and was around since the start. Mostly meta.

Patrick Massot has translated most of Bourbaki’s Topologie générale into Lean, and can now confirm that they were (almost always) right. He’s a topologist in Orsay (except now we say Saclay) and teaches his undergraduates using Lean. You should try his introductory analysis problem sheets and other stuff in the Lean Tutorials repo. If you have Lean installed the modern way, you can just type leanproject get tutorials and then open the project in VS Code. Patrick also wrote the leanproject tool, and it has solved the constant problems beginners on Windows machines were having when trying to get a fully compiled mathlib running locally without waiting an hour or more for it to compile.

Reid Barton won four gold medals at the IMO.

Kenny Lau is an undergraduate at Imperial College, whose first year project on formalising the statements of local class field theory for abelian algebraic groups in Lean unsurprisingly won the “best pure maths project of the year” prize. I needed localisation of rings when doing schemes and he bashed out the entire theory like it was easy.

Gabriel Ebner is our bald-headed fixer. If the mathlib people can’t get something to work and they blame Lean, he sometimes has words with it directly. He has driven Lean from 3.4.2 to 3.17.1 in the space of, what — six months? It’s better and it’s faster. To anyone still on Lean 3.4.2 — switch to the Lean Prover community version of Lean. It’s so much easier now. Anyone searching for Lean — beware of Microsoft’s old pages. Search for Leanprover community or mathlib.

And then me at the bottom. A professor of maths with a general malaise about the state of number theory, who three years ago this week tried Lean for the first time and got hooked.

We’re a really motley crew, talking to each other about different ways of thinking about common areas of interest. There are many other people who have committed to mathlib too — e.g. people I’ve met on discord whose names I don’t even know but who got interested in seeing if they could formalising a random thing in Lean, and it has turned out that the answer is “yes, and in doing so you can make mathlib better”. People like Amelia Livingston, another Imperial undergrad, who felt that Kenny Lau’s theory of localisation of rings should be generalised to localisation of monoids and rewrote the whole lot. She was right. People like Shing Tak Lam, who was practicing for his STEP exams (hard UK school maths exams for 18 year olds) by formalising the questions, and his project to formalise a question about polynomials over the reals from STEP ended up with him developing the entire mathlib theory of partial derivatives for multivariable polynomial rings. He asked what a ring R was at some point, I said “just pretend it says the real numbers \mathbb{R}“.

The two cultures of mathematics

Together we are investigating the boundary between the specification of mathematics, and the implementation of mathematics, for both definitions and for proofs. I learnt from Sébastien at LFTCM2020 that we can now prove that smooth morphism of smooth manifolds induces a morphism on tangent spaces in a manner functorial in the manifold, but it took a bunch of people to turn the theorem statement from maths into code, and then a bunch of people to translate the proof from maths into code (and everyone was standing on the shoulders of giants, in some sense).

There are two universes involved in what we are making. There’s the part that’s in Type (where the creative ideas such as perfectoid spaces and the Cauchy reals are digitised) and Prop (the part within the begin/end blocks where the computer games are). In Tim Gowers’ essay on the two cultures of mathematics, he talks about the concepts of theory-building and problem-solving. I have always believed that there was something very true at the core of this, but now I am beginning to understand it much better. Type is where the theory-building is going on, and Prop is where the problem-solving is occurring.

But these two parts of mathematics are inextricably linked. Patrick, Johan and I formalised the definition of a perfectoid space, but on the way there we had to prove a whole load of theorems — even before we could write down the definition. There are two topologies on an affinoid perfectoid space. One generated by basic open subsets, and one generated by rational open subsets. In the future, when defining sheaves on a perfectoid space, sometimes we will use one basis, and sometimes the other. The proof that they are the same is 20 pages of very tricky algebra involving valuations on topological rings, and we have not done it yet — we just picked one of the definitions when defining a perfectoid space and moved on. At some point in the development, we will need to use the other definition, and then we’ll have to prove the theorem. But in fact this sort of thing already happened to us many many times before, and then we did have to prove the theorems. Sometimes things get tough. Mathematicians so good at instantly switching between the various “obviously equivalent” ways that a mathematician looks at a complicated algebraic object (“It’s an equivalence relation! Now it’s a partition! Now it’s an equivalence relation again! Let your mental model jump freely to the point of view which makes what I’m saying in this particular paragraph obvious!”, or “Matrices are obviously associative under multiplication because functions are associative under composition.” (dutiful student realises later that this proof assumes the action of matrices on \mathbb{R}^n is faithful, and did you see that dependent type there?). Some of the proofs we’re writing are simply proofs that humans are behaving correctly when using mathematical objects.

But writing some proofs in mathlib is just fun. In fact there are now a growing number of proofs in Lean written by mathematicians who are coming to mathlib and finding that the interface to the thing they wanted (e.g. a topological space, or an equivalence relation, or a ring) is there and usable. It is hence possible for them to state and prove (or reprove, if mathlib did it already) the results they are interested in, using Lean’s tactic framework. We don’t know how far this can go, and we don’t know whether type-theoretic issues will cause problems further down the line (e.g with etale cohomology) but at this rate, it looks like we’re going to find out.

Currently on display at Royal Academy of Art (proud Dad :-))

Posted in Uncategorized | Tagged , , | 7 Comments

Lean for the Curious Mathematician 2020

I have just spent an exhilarating week in the company of a whole bunch of mathematicians, a fair few of whom are serious professors from prestigious universities, all of us learning how to use Lean to do mathematics ranging from basic logic to category theory and differential geometry.

Why? Because I’ve been at Lean for the Curious Mathematician 2020, an online meeting organised by Johan Commelin and Patrick Massot. It was like no other conference I’ve ever been to. On a typical day there were three “sessions”. A session was typically two hours long, and consisted of a mathematician who knows how to use Lean giving a 15 to 30 minute Zoom talk introducing a part of Lean’s maths library mathlib, and then the audience splitting up into breakout rooms of between 5 and 10 people, with one expert per room, and working on the exercises which the speaker had prepared. People could ask questions to each other or to the experts whenever they were stuck. A few years ago I would never have guessed that in 2020 I would have a Cambridge professor of number theory asking me for help in proving that the standard basis of \mathbb{R}^n was a basis — but when you have just started learning how to do mathematics in a new way, these are natural questions to ask. The answer is not hard — but you have to learn how to do it.

What is even better is that everything was recorded, so if you missed LftCM 2020, you can still join in. All the talks are up on a LftCM 2020 playlist on the Leanprover community YouTube site, and all of the exercises are available at the LftCM 2020 GitHub repository. If you have installed leanproject by following the instructions on the Leanprover Community website then all you have to do is to type

leanproject get lftcm2020

and then, using VS Code, open the lftcm2020 directory which just appeared on your computer. You’ll see all the exercises, and all the solutions. Furthermore, you have easy access to the very same experts who wrote the exercises because they all hang out on the Lean Zulip chat, and the #new members stream is dedicated to beginner questions. I will be live streaming my way through some of the exercises over the next few weeks on the Xena Project Discord server; I monitor the chat on most days, but aim to spend every Tuesday and Thursday on the server throughout July and August. If you’re a undergraduate mathematician, even if you’re a complete Lean beginner, you’re very welcome to join us. A good time to join us is Thursday evening UK time; there are often a whole bunch of us there then.

What did we learn?

One thing we learnt was that technology like Zoom works very well for a workshop of this nature. For a beginner, Lean’s error messages are very intimidating. But if a beginner shares their screen on Zoom so that an expert can read the error, then the expert often immediately knows what is wrong.

Something else we learnt was that a very good way to teach mathematicians how to use Lean is to give them a whole ton of exercises involving doing mathematics which they already understand conceptually, and asking them to attempt them in Lean. In some sense a large source of exercises corresponding to interesting mathematics is something which, before now, had been lacking in the Lean ecosystem. There is the very wonderful Theorem Proving In Lean, the book I read when learning Lean, but it does not talk about Lean’s mathematics library at all and focuses more on things like Lean’s underlying type system. I have always thought of it as more of a book for computer scientists. Jeremy Avigad, one of the authors of Theorem Proving In Lean, is currently working with Patrick Massot, Rob Lewis and myself on a new book, Mathematics In Lean (a book which can be read and run entirely within VS Code; read the instructions in chapter 1 on how to interact with it in this way). But this book is not yet finished. The book does not come with videos — Jeremy is writing words to explain how things work. The LftCM GitHub repository goes right to the point: every file, after a brief introduction, goes straight onto exercises, the vast majority of which can be solved in tactic mode, so it is a very natural continuation for anyone who has played the natural number game. If you don’t understand something in the file, you can watch the corresponding video and see if this helps. I will be very interested in seeing whether mathematicians find the repository a useful asset and I am almost certain that they will.

We learnt that Scott Morrison’s tireless efforts to teach both Lean and the Leanprover community category theory have now gone as far as giving us a whole host of accessible exercises, together with hints!

We learnt that smart people can pick up Lean very quickly. It was very interesting watching Sophie Morel learning to use the system, and seeing her assimilating more and more concepts as the week went on; I was especially interested in watching her learn a new trick to prove a theorem and then, instead of immediately moving on, playing around with her proof to understand the trick better. Although it sounds a bit ridiculous to compare a mathematician of her calibre to Kenny Lau (an Imperial undergraduate), watching her learning Lean this week reminded me very much of watching Kenny learning Lean back in 2017, when he was just a first year UG and within a few weeks of starting using Lean was formalising the theory of localisation of commutative rings. Kenny has gone on to contribute just under 20,000 lines of code to mathlib, including a lot of MSc level commutative ring theory. Sophie, many thanks for coming, and I hope you like what you saw. And Kenny, and Chris and Amelia, it was great to chat to you all this week and I am very much looking forward to working with all of you this forthcoming academic year on your Lean MSci projects.

One important thing, for me at least, that came out of the week, was that we got to see exactly the kind of problems which beginners run into. Lean is very much under development. New versions come out every few weeks of the community version of Lean, and the maths library is updated several times a day. There are many open issues on the Lean and mathlib github repositories, which are opened, dealt with, and closed. But as an experienced user it was interesting to see which ones were tripping people up. We know library_search sometimes fails because of the apply bug, and we know how to work around this. But this would completely throw beginners off. We know that nat.pow is not definitionally equal to monoid.pow and we know how to work around this. I would say that it is regarded as a low priority issue. But this week we saw people being totally confused by it. It is a reminder that we can still make Lean 3 a better system. I am not convinced by the “let’s just wait until Lean 4” argument. Johannes Hoelzl, a very experienced formaliser, told me that in his opinion porting mathlib from Lean 3 to Lean 4 whilst simultaneously refactoring it would be a bad idea. Despite the fact that I want to set up a theory of Picard groups, Ext and Tor, I should remember that the basics are still not 100% right and I can play my part in trying to make them better (beginning with that big subgroup refactoring).

I personally learnt that mathematicians do like Lean games. A couple of people asked me if it would be possible to make a filter game! I think this idea has definite potential! Until then, there’s always the max minigame, a game you can play in your browser like the natural number game and something which will eventually become part of the real number game.

I have learnt a lot this week. Watching other people using Lean is something I find very instructive, both as an educator and as a Lean user. Thank you to to all who attended, and to those who led sessions and wrote example sheets, and I very much hope to see many of you again at LftCM 2021.

Sketchy Dude
Posted in Learning Lean | Tagged , | Leave a comment

Division by zero in type theory: a FAQ

Hey! I heard that Lean thinks 1/0 = 0. Is that true?

Yes. So do Coq and Agda and many other theorem provers.

Doesn’t that lead to contradictions?

No. It just means that Lean’s / symbol doesn’t mean mathematical division. Let \mathbb{R} denote the real numbers. Let’s define a function f:\mathbb{R}^2\to\mathbb{R} by f(x,y)=x/y if y\not=0 and f(x,0)=0. Does making that definition give us a contradiction in mathematics? No, of course not! It’s just a definition. Lean uses the symbol / to mean f. As does Coq, Agda etc. Lean calls it real.div by the way, not f.

But doesn’t that lead to confusion?

It certainly seems to lead to confusion on Twitter. But it doesn’t lead to confusion when doing mathematics in a theorem prover. Mathematicians don’t divide by 0 and hence in practice they never notice the difference between real.div and mathematical division (for which 1/0 is undefined). Indeed, if a mathematician is asking what Lean thinks 1/0 is, one might ask the mathematician why they are even asking, because as we all know, dividing by 0 is not allowed in mathematics, and hence this cannot be relevant to their work. In fact knowing real.div is the same as knowing mathematical division; any theorem about one translates into a theorem about the other, so having real.div is equivalent to having mathematical division.

This convention is stupid though!

It gets worse. There’s a subtraction nat.sub defined on the natural numbers \{0,1,2,\ldots\}, with notation x - y, and it eats two natural numbers and spits out another natural number. If x and y are terms of type and x < y, then x - y will be 0. There’s a function called real.sqrt which takes as input a real number, and outputs a real number. If you give it 2, it outputs \sqrt{2}. I don’t know what happens if you give it the input -1, beyond the fact that it is guaranteed to output a real number. Maybe it’s 0. Maybe it’s 1. Maybe it’s 37. I don’t care. I am a mathematician, and if I want to take the square root of a negative real number, I won’t use real.sqrt because I don’t want an answer in the reals, and the type of real.sqrt is ℝ → ℝ.

Why can’t you just do it the sensible way like mathematicians do?

Great question! I tried this in 2017! Turns out it’s really inconvenient in a theorem prover!

Here’s how I learnt Lean. I came at it as a “normal mathematician”, who was thinking about integrating Lean into their undergraduate introduction to proof course. I had no prior experience with theorem provers, and no formal background in programming. As a feasibility study, I tried to use Lean to do all the problem sheets which I was planning on giving the undergraduates. This was back in 2017 when Lean’s maths library was much smaller, and real.sqrt did not yet exist. However the basic theory of sups and infs had been formalised, so I defined real.sqrt x, for x non-negative, to be Sup\{y\in\mathbb{R}\,|\,y^2\leq x\}, and proved the basic theorems that one would want in an interface for a square root function, such as \sqrt{ab}=\sqrt{a}\sqrt{b} and \sqrt{a^2}=a and \sqrt{a^2b}=a\sqrt{b} and so on (here a,b are non-negative reals, the only reals which my function would accept).

I then set out to prove \sqrt{2}+\sqrt{3}<\sqrt{10}, a question on a problem sheet from my course. The students are told not to use a calculator, and asked to find a proof which only uses algebraic manipulations, i.e. the interface for real.sqrt. Of course, the way I had set things up, every time I used the \sqrt{\phantom{2}} symbol I had to supply a proof that what I was taking the square root of was non-negative. Every time the symbol occurred in my proof. Even if I had proved 2 > 0 on the previous line, I had to prove it again on this line, because this line also had a \sqrt{2} in. Of course the proof is just by norm_num, but that was 10 characters which I soon got sick of typing.

I then moaned about this on the Lean chat, was mocked for my silly mathematician conventions, and shown the idiomatic Lean way to do it. The idiomatic way to do it is to allow garbage inputs like negative numbers into your square root function, and return garbage outputs. It is in the theorems where one puts the non-negativity hypotheses. For example, the statement of the theorem that \sqrt{ab}=\sqrt{a}\sqrt{b} has the hypotheses that a,b\geq 0. Note that it does not also have the hypothesis that ab\geq0, as one can deduce this within the proof and not bother the user with it. This is in contrast to the mathematicians’ approach, where the proof that ab\geq0 would also need to be supplied because it is in some sense part of the \sqrt{\phantom{2}} notation.

So you’re saying this crazy way is actually better?

No, not really. I’m saying that it is (a) mathematically equivalent to what we mathematicians currently do and (b) simply more convenient when formalising mathematics in dependent type theory.

What actually is a field anyway? For a mathematician, a field is a set F equipped with 0,1,a+b,-a,a\times b,a^{-1} where the inversion function a^{-1} is only defined for non-zero a. The non-zero elements of a field form a group, so we have axioms such as x\times x^{-1}=1 for x\not=0 (and this doesn’t even make sense for x=0). Let’s say we encountered an alien species, who had also discovered fields, but their set-up involved a function \iota :F\to F instead of our x^{-1}. Their \iota was defined, using our notation, by \iota(x)=x^{-1} for x\not=0, and \iota(0)=0. Their axioms are of course just the same as ours, for example they have x\times \iota(x)=1 for x\not=0. They have an extra axiom \iota(0)=0, but this is no big deal. It’s swings and roundabouts — they define a/b:=a\times\iota(b) and their theorem (a+b)/c=a/c+b/c doesn’t require c\not=0, whereas ours does. They are simply using slightly different notation to express the same idea. Their \iota is discontinuous. Ours is not defined everywhere. But there is a canonical isomorphism of categories between our category of fields and theirs. There is no difference mathematically between the two set-ups.

Lean uses the alien species convention. The aliens’ \iota is Lean’s field.inv , and Lean’s field.div x y is defined to be field.mul (x, field.inv y).

OK so I can see that it can be made to work. Why do I still feel a bit uncomfortable about all this?

It’s probably for the following reason. You are imagining that a computer proof checker will be checking your work, and in particular checking to see if you ever divided by zero, and if you did then you expect it to throw an error saying that your proof is invalid. What you need to internalise is that Lean is just using that function f above, defined by f(x,y)=x/y for y\not=0 and f(x,0)=0. In particular you cannot prove false things by applying f to an input of the form (x,0), because the way to get a contradiction by dividing by zero and then continuing will involve invoking theorems which are true for mathematical division but which are not true for f. For example perhaps a mathematician would say a/a=1 is true for all a, with the implicit assumption that a\not=0 and that this can be inferred from the notation. Lean’s theorem that real.div a a = 1 is only proved under the assumption that a\not=0, so the theorem cannot be invoked if a=0. In other words, the problem simply shows up at a different point in the argument. Lean won’t accept your proof of 1=2 which sneakily divides by 0 on line 8, but the failure will occur at a different point in the argument. The failure will still however be the assertion that you have a denominator which you have not proved is nonzero. It will simply not occur at the point when you do the division, it will occur at the point where you invoke the theorem which is not true for real.div.

Thanks to Jim Propp and Alex Kontorovich for bringing this up on Twitter. I hope that this clarifies things.

Posted in Learning Lean, M1F, M40001, Type theory, undergrad maths | Tagged , | 12 Comments

Equality, specifications, and implementations

Equality is such a dull topic of conversation to mathematicians. Equality is completely intrinsic to mathematics, it behaves very well, and if you asked a mathematician to prove that equality of real numbers is an equivalence relation then they would probably struggle to say anything of content; it’s just obviously true. Euclid included reflexivity and transitivity of equality as two of his “common notions”, and symmetry was equally clear from his language — he talks about two things being “equal to each other” rather than distinguishing between a = b and b = a.

One thing that has helped me start to understand why computer scientists make a fuss about equality is the observation that if you follow Peano’s development of the natural numbers (as I do in the natural number game) then you come to the following realisation: if you define addition by recursion in the second variable (i.e. a + 0 := a and a + succ(n) := succ(a + n) ) then life temporarily becomes asymmetric. The fact that a + 0 = a is an axiom. However the fact that 0 + a = a needs to be proved by induction. Now induction is also an axiom, so a mathematician would just say that despite the fact that the proofs have different lengths, 0 + a = a and a + 0 = a are both theorems, so the fact that digging down to the foundations shows that the proofs are of a different nature is really of no relevance.

To a computer scientist however, there is a difference in these proofs. This difference seems to be of no relevance to mathematics. But the difference is that, if you set the natural numbers up in this way, then a + 0 = a is true by definition, and 0 + a = a is not. Indeed, in Lean’s type theory (and probably in many others) there are three types of equality that you have to be aware of:

  1. Propositional equality;
  2. Definitional equality;
  3. Syntactic equality.

Let’s start by getting one thing straight: to a mathematician, all of these things are just called equality. In fact, even more is true: definitional equality is not a mathematical concept. All of these kinds of equalities are easy to explain, so let me go through them now.

Propositional equality: a = b is a propositional equality if you can prove it.

Definitional equality: a = b is a definitional equality if it’s true by definition.

Syntactic equality: a = b is a syntactic equality if a and b are literally the same string of characters.

For example, let’s go back to Peano’s definition of the natural numbers and the conventions for addition above. Then a + 0 = a is a definitional equality but not a syntactic one, 0 + a = a is a propositional equality but not a definitional one, and a = a is a syntactic equality. To show that 0 + a = a is not definitionally true, you have to ask yourself what the definition of 0 + a is; and because we don’t know whether a = 0 or not, 0 + a cannot be definitionally simplified any further (the definition of x + y depends on whether y = 0 or not).

[Technical note: syntactic equality does allow for renaming of bound variables, so \{a \in \mathbb{R}\, |\, a^2 = 2\} is syntactically equal to \{b \in \mathbb{R}\, |\, b^2=2\}. If you understand that idea that notation is syntax sugar then you’ll probably know that syntactic equality can see through notation too, which means to say that add(a,0) = a + 0 is also a syntactic equality. But that’s it.]

Of course 2 + 2 = 4 is not a syntactic equality; removing notation and writing S for “successor”, and working under the assumption that 2 is syntax sugar for S(S(0)) and 4 for S(S(S(S(0)))), we see that the left hand side is syntactically add(S(S(0),S(S(0)) and the right hand side is S(S(S(S(0)))). However it is a definitional equality! add(x,S(y))=S(add(x,y)) is true by definition, as is add(x,0)=x, and it’s fun to check that applying these rules a few times will reduce 2 + 2 to 4.

The reason that it’s important to understand the differences if you are writing Lean code, is that different tactics work with different kinds of equality. Lean’s refl tactic attempts to close the goal by showing that one side of an equality is definitionally equal to the other side. If your goal is X then change Y will work if and only if Y is definitionally equal to X. On the other hand Lean’s rw tactic works at the level of syntactic equality: if h : A = B then rw h will change everything syntactically equal to A in the goal, into a B. If h : a + 0 = b and your goal is a = b then rw h will fail because the equality a + 0 = a is not syntactic. However exact h will work fine, because exact works up to definitional equality.

Specification v implementation

The fact that a + 0 = a is a definitional equality in the natural number game, but 0 + a = a isn’t, is as far as I am concerned a proof that definitional equality is not a mathematical concept. Indeed one can clearly just define addition on the natural numbers by recursion on the first variable instead of the second, and then 0 + a = a would be definitional and a + 0 = a would not be. What is going on here was made clear to me after a conversation I had with Steve Vickers after a seminar I gave to the University of Birmingham CS theory group. Mathematicians have a specification of the natural numbers. We know what we want: we want it to satisfy induction and recursion, we want it to be a totally ordered commutative semiring (i.e. all the ring axioms other than those involving minus) and we can take it from there thank you very much. If you present me with an object which satisfies these theorems I can use it to prove quadratic reciprocity. I don’t care what the actual definition of addition is, indeed I know several definitions of addition and I can prove they’re all equivalent.

If you’re doing to do mathematics in a theorem prover, you have to make one definition. Mathematicians know that all the definitions of the natural numbers are the same. If you want to set up mathematics in set theory for example, then it doesn’t matter whether you decide to let 3 = \{2\} or 3 = \{0,1,2\}: any system which ensures that 3 isn’t any of the sets you’ve already made is clearly going to work. But in a computer theorem prover you need to make choices — you need to make implementations of 3 and of add — and the moment that choice is made you now have a dichotomy: some stuff is true by definition, and some stuff needs an argument like induction and is not true by definition.

The first time I really noticed the specification / implementation difference was when it dawned on me that Lean’s maths library had to choose a definition of the reals, and it went with the Cauchy sequence definition: a real number in Lean is an equivalence class of Cauchy sequences. An alternative approach would have been to define it as Dedekind cuts. As mathematicians, we don’t care which one is used, because we are well brought up and we promise to only ever access the real numbers via its interface, or its API to borrow a computing term. The interface is the specification. We mathematicians have a list of properties which we want the real numbers to satisfy: we want it to be a complete archimedean ordered field. Furthermore we have proved a theorem that says that any two complete archimedean ordered fields are uniquely isomorphic, and this is why we do not care one jot about whether we are using Cauchy sequences or Dedekind cuts. Lean gives me access to an interface for the real numbers which knows these things, and it’s all I need to build a theory of Banach spaces. As mathematicians we somehow know this fact implicitly. If I am trying to prove a theorem about Banach spaces, and I have a real number \lambda, I never say “Ok now for the next part it’s really important that \lambda is under the hood defined to be a Dedekind cut”. If I want the Dedekind cut associated to \lambda, I can just make it. I don’t care whether it equals \lambda by definition or not, because definitional equality is not a mathematical concept. All I care about is access to the interface — I’m proving a theorem about Banach spaces here, and I just need to have access to stuff which is true. The job of Lean’s file data.real.basic is to give me access to that interface, and I can build mathematics from there.

Computer scientists on the other hand — they have to care about definitional equality, because it’s often their job to make the interface! If two things are definitionally equal then the proof they’re equal is refl, which is pretty handy. Different definitions — different implementations of the same specification — might give you a very different experience when you are making an interface for the specification. If you really have too much time on your hands this lockdown, why not go and try proving that addition on the real numbers is associative, using both the Dedekind cuts definition and the Cauchy sequences definition? For Cauchy sequences it just boils down to the fact that addition is associative on the rationals. But you’ll find that it’s a real bore with Dedekind cuts, because Dedekind cuts have this annoying property that you need a convention for the cuts corresponding to rational numbers: whether to put the rational number itself into the lower or upper cut. Neither convention gives a nice definition of addition. You can’t just add the lower cuts and the upper cuts, because the sum of two irrationals can be a rational. Multiplication is even worse, because multiplication by a negative number switches the lower and upper cut, so you have to move a boundary rational between cuts. You can see why Lean went for the Cauchy sequences definition.

I ran into this “which implementation to use for my specification” issue myself recently. I notice that I have been encouraging students at Imperial to formalise courses which I have been lecturing, which recently have been algebra courses such as Galois theory. I am by training an algebraic number theorist, and really by now I should have turned my attention the arithmetic of number fields and their integers: as far as I can see, finiteness of the class group of a number field has been formalised in no theorem provers at all, so it is probably low-hanging fruit for anyone interested in a publication, and we surely have all the required prerequisites in Lean now. I thought that I would try and get this area moving by formalising the definitions of a Dedekind Domain and a discrete valuation ring (DVR). I looked up the definition of discrete valuation ring on Wikipedia and discovered to my amusement that there are (at the time of writing) ten definitions 🙂

Now here I am trying to be a theory builder: I want to make a basic API for DVRs so that students can use them to prove results about local and global fields. So now I have to decide upon a definition, and then prove that it is equivalent to some of the other definitions — I need to make enough interface to make it easy for someone else to take over. As far as I could see though, what the actual definition of a DVR is, is of no real importance, because it doesn’t change the contents of the theorems, it only changes the way you state them. So I just chose a random one 😛 and it’s going fine!

Equality of terms, equality of types

When talking about propositional and definitional equality above, my main examples were equality between natural numbers: 0 + a = a and what have you. Set-theoretically, we can happily think about 0 + a = a as equality between two elements of a set — the set of natural numbers. In type theory we are talking about equality between two terms of type T, where T : Type . But one can take this a level up. Say A : Type and B : Type (for example say A is the Cauchy reals, and B is the Dedekind reals). What does A = B mean? These are now not elements of a set, but objects of a category. Certainly the Cauchy reals and the Dedekind reals are not going to be definitionally equal. Can we prove they are propositionally equal though? No — of course not! Becuase they are not actually equal! They are, however canonically isomorphic. A fourth type of equality!

All this equality naval-gazing is important to understand if you are talking about equality of terms. This, we have nailed. For mathematicians there is one kind of equality, namely “it is a theorem that a = b“. For computer scientists there are three kinds, and understanding the distinction might be important for interface extraction.

But for equality of types, something funny is going on. Which is the most accurate? (A \otimes B) \otimes C \cong A \otimes (B \otimes C) or (A \otimes B) \otimes C = A \otimes (B \otimes C)? This is just more notational naval-gazing, who cares whether these things are isomorphic or equal – they are manifestly the same, and we can always replace one by the other in any reasonable situation because they both satisfy the same universal property. However, I am realising that as a Lean user I need to say very carefully what I mean by “a reasonable situation”, and actually the safest way to do that is to not prove any theorems at all about (A \otimes B) \otimes C other than the fact that it satisfies its universal property, and then instead prove theorems about all objects which satisfy that universal property. Mathematicians do not use this technique. They write their papers as if they are proving theorems about the concrete object (A \otimes B) \otimes C, but their proofs are sensible and hence apply to any object satisfying its universal property, thus can be translated without too much problem, once one has extracted enough API from the universal property. There is an art to making this painless. I learnt from Strickland the three key facts that one should prove about a localisation R \to R[1/S] of rings: the kernel is the elements annihilated by an element of S, the image of every element of S is invertible, and the map from R\times S to the target sending (r,s) to r/s is surjective. These three facts together are equivalent to the universal property of R[1/S] and now you can throw it away and build the rest of your localisation API on top of it. Indeed, when my former MSc student Ramon Fernández Mir formalised schemes in Lean, he needed this result from the stacks project to prove that affine schemes were schemes, but more generally for the case of rings only satisfying the same universal properties as the rings in the lemma. At the time he needed it (about 18 months ago) the proof only used the three facts above isolated by Strickland, and so was easy to prove in the generality he required.

However, in Milne’s book on etale cohomology, it is decreed in the Notation section that = means “canonical isomorphism”, and so there will be a lot more of this nonsense which we will have to understand properly if we want to formalise some serious arithmetic geometry in Lean.

Posted in Algebraic Geometry, General, Type theory | Tagged , , | 6 Comments

Teaching dependent type theory to 4 year olds via mathematics

We had a family Zoom chat today and I ended up talking to my relative Casper, who is 4 and likes maths. He asked me to give him some maths questions. I thought 5+5 was a good place to start, and we could go on from there depending on responses. He confidently informed me that 5+5 was 10, and that 5-5 was zero. For 5*5 he asked me “is that five fives?” and I said yes, and he asked me “Is that 5 lots of 5?” and I said yes, and then he began to count. “1,2,3,4,5. 6,7,8,9,10. 11,12,13,14,15. 16,17,18,19,20. 21,22,23,24,25. It’s 25!” he said. I guess he’s proved this by refl. Every natural number is succ (succ (succ (...(succ zero))...) so to evaluate one of them, you just break it down into this canonical normal form. He did not know what division was, so I thought it was time to move on from 5.

I figured that if he didn’t know his 5 times table we needed to be careful with multiplication, in case of overflow errors, so I decided I would go lower. I next asked him 0+0, 0-0 and 0*0 (“so that’s zero lots of zero?” Yes. “So that’s zero zeros?” Yes). He got them all right, and similarly for 1+1, 1-1 and 1*1. I then thought we could try 0 and 1 together, so I asked him 0+1 to which he confidently replied 1, and 0-1, to which he equally confidently replied 0.

This answer really made me sit up. Because Lean thinks it’s zero too.

#eval 0 - 1 

I am pretty sure that a couple of years ago I would have told him that he was wrong, possibly with a glint in my eye, and then tried to interest him in integers. Now I am no longer sure that he is wrong. I am more convinced that what has happened here is that Casper has internalised some form of Peano’s axioms for arithmetic. He knows . In some sense this is what I have been testing. He knows that every natural is succ succ … succ 0, and then for some reason he has to learn a song that goes with it, with a rigid rhythmical beat and written in 10/4 time: “One, two, three four, five, six, seven, eight, nine, ten. Eleven, twelve,…” and so on ad infinitum, some kind of system which is important for computation but is currently not of interest to me. Also, Casper knows that there’s something called subtraction, and when given junk input such as 0-1 he has attempted to make some sense of it as best he could, just like what the functional programmers who like all functions to be total have done.

We then went onto something else. I asked him what 29+58 was and he essentially said that it was too big to calculate. I asked him if he thought that whatever it was, it was the same as 58+29, and he confidently said it was, even though he did not have a clue what the answer was. I asked him how he was so sure but I did not get a coherent answer.

I asked him what three twos were, and he counted “1, 2, 3, 4, 5, 6“. Three lots of two is 6. I then asked him what two threes were and he immediately replied 6 and I asked him if he’d done the calculation and he said no, two threes were the same as three twos. I asked him why but all he could say was that it was obvious. It’s simp. I asked him if he was thinking about a rectangle and he said no. So he knows simp and refl.

I then did quite a mean thing on him, I did overflow his multiplication buffer. I asked him what ten 3’s were. We spent ages getting there, and he needed help, because he hasn’t learnt how to sing the song in 3/4 time yet. But eventually refl proved that 10*3=30. I then asked him if he thought 3*10=10*3 and he was confident that it was. I then asked him what 3 tens were and whilst he didn’t immediately do this, he knew a song (in 4/4) which started at “one ten is10 (rest)” and finished at “ten tens-are-a hun dred.”. Using the song, we figured out pretty quickly what three tens were, and deduced that ten 3’s were 30 again. I then asked him what ten sevens were, and we talked about how how difficult it would be to sing the song in 7/4 and how long the song would be, and then we talked together through the argument that it was also “obviously” seven tens, so by the song it was seventy; apparently the fact that seventy sounds similar to 7 (even though thirty doesn’t sound that much like 3) is evidence that the patterns in the counting song can be helpful to humans who want to calculate.

I then remembered the junk subtraction answer, so I thought I’d try him on another function which returns junk in Lean, namely pred, the “number before” function on the naturals. I asked him what the number before 3 was, and he said 2. He was also confident that the number before 2 was 1, and the number before 1 was 0. I then asked him what the number before 0 was and he thought a bit and then said….”1?”.

pred is defined by recursion in Lean. We have pred (succ a) = a, and pred 0 is defined to be 0, because if you follow Lean’s “make all functions total, it’s easier for programmers” convention then it has to be something. But the choice of the actual value of pred 0 is probably not seriously used in Lean, and they could have defined pred 0 to be 1 or 37 and probably not much would break, and what did break would probably be easily fixed. Because of whining linter I recently needed to supply a junk value to Lean (it wanted me to prove that the quotient of a ring by an ideal was not the empty set) and I supplied 37 as the witness. pred is a fundamental function defined by recursion, as far as computer scientists are concerned. So why is it not even mentioned in the natural number game? Because you can’t really define functions by induction, you define them by recursion, and I don’t want to start getting involved in new recursive definitions, because this sort of thing cannot go within a begin end block. I could have decided to explain pred but I would have had to address the issue of pred 0 not being error and I realised that actually there was always a “mathematical workaround”. By this I mean the following. The natural proof of succ_inj uses pred, for example. The successor function is injective. The proof using pred is easy. But I tell people in the natural number game that succ_inj is an axiom, as is zero_ne_succ, and now we mathematicians seem to be able to develop a huge amount of theory without ever even defining subtraction. We don’t really want to use Lean’s natural number subtraction. We know that 0 ≠ junk is true by definition but unfortunately `0-0=0-1` in Lean, which is not right.

We then talked about the number line a bit, and I then told him about a level in the 2d Mario franchise where the exit is way off to the right but right at the beginning you can completely counterintuitively go off to the left and pick up some bonus stuff. I then told him that there were some other kinds of numbers which he hadn’t learnt about yet, and I said they were called the integers (note: not “the negative numbers”). I said that the integers included all the numbers he knew already, and some new ones. I told him that in the integers, the number before 0 was -1. I then asked him what he thought the number before -1 would be and he said “-2?” And I told him he was right and asked him what the number before -2 was and he confidently said -3, and we did this for a bit and chatted about negative numbers in general, and then I asked him what the number before -9 was and he said -10. I then asked him what the number before -99 was and he said -100 and then I asked him what the number before -999 was and he said he didn’t know that one and did I want to play Monopoly. I took this as a sign that it was time to stop, and we agreed to play a game of monopoly via this online video chat, and then it turned out that he didn’t know any of the rules of monopoly (he’d just found the box on the floor) and he couldn’t read either, so he just put the figures on random places on the board and we talked about the figures, and what we could see on the board, and what was in the monopoly box, and the maths lesson was over.

I was pretty appalled when I saw Lean’ s definition of int, a.k.a. . It is so ugly. There are two constructors, one of_nat n which takes a natural number n and returns the integer n (an invisible function, set up to be a coercion), and the one called something really weird like neg_one_of_nat n which takes in a natural n and returns the integer -1-n, complete with hideous notation. This definition somehow arrogantly assumes some special symmetry of the integers about the point -0.5. The mathematician’s definition, that it’s a quotient of \mathbb{N}^2 by the equivalence relation (a,b)\sim(c,d)\iff a+d=b+c, a.k.a. the localisation of the additive monoid of naturals at itself (i.e. adding additive inverses to every element). This definition is a thing of beauty. The symmetry is never broken. It is the canonical definition. But even though this definition as a quotient is the most beautiful definition, classifying as it does the integers up to unique isomorphism because of the universal property of localisation (thanks so much Amelia Livingston), implementation decisions are system specific and int in Lean is the concrete inductive type with two constructors and there’s nothing we can do about this. So mathematically I am kind of embarassed to say that today I basically taught Lean’s definition of int to a kid, just as an experiment.

Another weird thing is that Lean has a really ugly proof of a+b=b+a in int but for some reason this is going to be “obvious” when negative numbers are taught to him in school, and will not need a proof. I guess ac_refl will do it.

When Casper had to go, I told him before he left to ask his parents what the number before -9 was. I think it’s a pretty tricky question if you’re caught off-guard. I told him that negative numbers were like mirror world. Is there educational research about how children model numbers? What different ideas do they have about before they even start to learn their multiplication tables? Should we teach them induction now, before they are crushed by having to learn so many long and boring songs telling them things like six sevens are 42, because of the belief in the education system currently that memorising this is somehow important in the year 2020 when most teenage kids have a calculator app in their pocket.

Casper loves my daughter Kezia’s art. He says “everything is alive except the rainbow”.

Everything is alive, except the rainbow.
Posted in computability, Learning Lean, number theory, Type theory | Tagged , , | 8 Comments

Mathematics in type theory.

What is maths? I think it can basically be classified into four types of thing. There are definitions, true/false statements, proofs, and ideas.

Definitions (for example the real numbers, or pi) and true/false statements (for example the statement of Fermat’s Last Theorem or the statement of the Riemann Hypothesis) are part of the science of mathematics: these are black and white things which have a completely rigorous meaning within some foundational system.

Proofs are in some sense the currency of mathematics: proofs win prizes. Constructing them is an art, checking them is a science. This explains, very simply, why computer proof verification systems such as Lean, Coq, Isabelle/HOL, Agda… are much better at checking proofs than constructing them.

And ideas are the purely artistic part of mathematics. That “lightbulb” moment, the insight which enables you to solve a problem — this is the elusive mathematical idea.

Ideas are the part of mathematics that I understand the least, in a formal sense. Here are two questions:

  • What is a group?
  • How do you think about groups?

The first one is a precise “scientific” question. A group is a set equipped with some extra structure, and which satisfies some axioms. The formal answer is on Wikipedia’s page on groups. A group is a definition. But the second question is a different kind of question. Different people think about groups in different ways. Say G is a group generated by an element x satisfying x^5=x^8=1. What can you say about G? If you are a mathematics undergraduate who has just seen the formal definition of a group, you can probably say nothing. If you have a more mature understanding of group theory, you instantly know that this group is trivial, because you have a far more sophisticated model of what is going on. Ideas are complicated, and human-dependent. A computer’s idea of what a group is, is literally a copy of the definition in Wikipedia, and this is one of the reasons that computers are currently bad at proving new theorems by themselves. You can develop a computer’s intuition by teaching it theorems about groups, or teaching it examples of groups, or trying to write AI’s which figure out group theory theorems or examples of groups automatically. But intuition is a very subtle thing, and I do not understand it at all well, so I will say no more about these ideas here. I think that the concept of a map being “canonical” is an idea rather than a definition — I think different mathematicians have different ways of thinking about this weasel word. In this post I’m going to talk about how the three other concepts are implemented in type theory, in the Lean theorem prover.

Definitions, true/false statements, and proofs

In contrast to ideas, the other parts of mathematics (the definitions, theorems/conjectures, and proofs) can be formalised in a foundational system, and hence can be created and stored on a computer in a precise way. By this, I don’t mean a pdf file! Pdf files are exactly what I want to move away from! I mean that people have designed computer programming languages which understand one of the various foundations of mathematics (set theory, type theory, category theory) and then mathematicians can write code in this language which represents the definition, true/false statement or proof in question.

I am certainly not qualified to explain how all this works in category theory. In set theory, let me just make one observation. A definition in set theory, for example the definition of the real numbers, or \pi, is a set. And a proof is a sequence of steps in logic. A definition and a proof seem to me to be two completely different things in set theory. A group is a mixture of these things — a group is an ordered quadruple (G,m,i,e) satisfying some axioms, so it’s a set with some logic attached.

In type theory however, things are surprisingly different. All three things — definitions, true/false statements, and proofs — are all the same kind of thing! They are all terms. A group, a proof, the real numbers — they are all terms. This unification of definitions and proofs — of sets and logic — are what seems to make type theory a practical foundational system for teaching all undergraduate level mathematics to computers.

Universes, types, and terms.

In type theory, everything is a term. But some terms are types. Not every term is a type, but every term has a type. A colon is used to express the type of a term in Lean — the notation x : T means that x is a term of type T. For example, the real number π (pi) is a term in Lean, and the real numbers is a type, and we have π : ℝ , that is, π is a term of type . In set theory one writes π ∈ ℝ, in type theory we write π : ℝ. They both express the same mathematical concept — “π is a real number”.

Now π is a term but it’s not a type. In Lean, x : π makes no sense. In set theory, x ∈ π does happen to make sense, but this is a weird coincidence because everything is a set. Furthermore, the actual elements of π will depend on how the real numbers are implemented (as Dedekind cuts or Cauchy sequences, for example), and hence in set theory x ∈ π has no mathematical meaning; it happens to make sense, but this is a quirk of the system.

I claimed above that every term has a type. So what is the type of ? It turns out that ℝ : Type. The real numbers are a term of a “universe” type called Type — the type theory analogue of the class of all sets.

Many of the mathematical objects which mathematicians think of as definitions either have type Type, or have type T where T : Type. As a vague rule of thumb, things we write using capital letters (a group, a ring,…) or fancy letters (the reals, the rationals) have type Type, and things we write using small letters (an element g of a group, a real number r or an integer n) have got type T where T is what we think of as the set which contains these elements. For example 2 : ℕ and ℕ : Type, or if g is an element of the group G then in Lean we have g : G and G : Type. You can see that there is a three-layer hiearchy here — terms at the bottom, types above them, and the universe at the top.

  • Universe : Type
  • Examples of types : , , G (a group), R (a ring), X (something a set theorist would call a set), a Banach space, etc. Formally, we say ℝ : Type.
  • Examples of terms: π (a term of type ), g (an element of the group G, so a term of type G), x (an element of X, so a term of type X). Formally, we say g : G.

This hierarchy is more expressive than the hierarchy in set theory, where there are only two levels: classes (e.g. the class of all sets), and sets.

There is a standard use of the colon in mathematics — it’s in the notation for functions. If X and Y are sets (if you’re doing set theory) or types (if you’re doing type theory), then the notation for a function from X to Y is f : X → Y. This is actually consistent with Lean’s usage of the colon; Lean’s notation for the collection \mathrm{Hom}(X,Y) of functions from X to Y is X → Y , which is a type (i.e. X → Y : Type, corresponding to the fact that set theorists think of \mathrm{Hom}(X,Y) as a set), and f : X → Y means that f is a term of type X → Y, the type-theoretic version of f \in \mathrm{Hom}(X,Y), and the way to say that f is a function from X to Y in type theory.

(Not for exam) Strictly speaking, universes are types, and types are terms, but this is a linguistic issue: often when people speak of types, they mean types which are not universes, and when people speak of terms they mean terms which are not types. But not always. This confused me when I was a beginner.

Theorems and proofs

This is where the fun starts. So far, it just looks like a type is what a type theorist calls a set, and a term is what they call an element. But let’s now look at another universe in Lean’s type theory, the universe Prop of true/false statements, where our traditional mental model of what’s going on is quite different. We will see how theorems and proofs can be modelled in the same way as types and terms.

So, how does this all work? As well as the universe Type, there is a second universe in Lean’s type theory called Prop. The terms of type Prop are true/false statements. There is an unfortunate notation clash here. In mathematics, the word proposition is often used to mean a baby theorem, and theorems are true (or else they would be conjectures or counterexamples or something). Here we are using the the word Proposition in the same way as the logicians do — a Proposition is a generic true/false statement, whose truth value is of no relevance.

This will all be clearer with examples. 2 + 2 = 4 is a Proposition, so we can write 2 + 2 = 4 : Prop. But 2 + 2 = 5 is also a Proposition, so 2 + 2 = 5 : Prop as well. I’ll say it again — Propositions do not have to be true! Propositions are true/false statements. Let’s see some more complex examples.

The true/false statement that x+0=x for all natural numbers x is a Proposition: in Lean this can be expressed as (∀ x : ℕ, x + 0 = x) : Prop . A Proposition is a term of type Prop (just like the types we saw earlier were terms of type Type). Let RH denote the statement of the Riemann Hypothesis. Then RH : Prop. We don’t care if it’s true, false, independent of the axioms of mathematics, undecidable, whatever. A Proposition is a true/false statement. Let’s look at the part of Lean’s type theory hierarchy which lives in the Prop universe.

  • Universe: Prop
  • Examples of types : 2 + 2 = 4, 2 + 2 = 5, the statement of Fermat’s Last Theorem, the statement of the Riemann Hypothesis.
  • Examples of terms: ??

So what are the terms in this three-layer Prop hierarchy? They are the proofs!

Propositions are types, proofs are terms.

This is where the world of type theory seriously diverges from the way things are set up in set theory, and also the way things were set up in my brain up until three years ago. In trying to understand what was going on here, I even realised that mathematicians take some liberties with their language here. Before we start, consider this. The Bolzano-Weierstrass theorem is some statement in analysis about a bounded sequence having a convergent subsequence. I want to talk a little bit about how mathematicians use the phrase “Bolzano-Weierstrass theorem” in practice. A mathematician would say that the Bolzano-Weierstrass theorem is this statement about sequences having convergent subsequences. But if they are in the middle of a proof and need to apply it in order to continue with their proof, they say “by the Bolzano-Weierstrass theorem we deduce that there’s a convergent subsequence”. Nothing seems at all funny about any of this. But what I want to point out is that mathematicians are using the phrase “the Bolzano-Weierstrass theorem” in two different ways. When they say what it is, they are referring to the statement of the theorem. But when they say they’re using the Bolzano Weierstrass theorem, what they are actually using is its proof. The Birch and Swinnerton-Dyer conjecture is a perfectly well-formed true/false statement, you can certainly say what it is. But you can’t use the Birch and Swinnerton-Dyer conjecture in the middle of a proof of something else if you want your proof to be complete, because at the time of writing the conjecture is an unsolved problem. Making a clear distinction between the statement of a theorem, and the proof of a theorem, is important here. A mathematician might use the phrase “the Bolzano-Weierstrass theorem” to mean either concept. This informal abuse of notation can confuse beginners, because in the below it’s really important to be able to distinguish between a theorem statement, and a theorem proof; they are two very different things.

In the natural number game, I use this abuse of notation because I am trying to communicate to mathematicians. The statement ∀ x : ℕ, x + 0 = x is a true statement, and I say things like “this is called add_zero in Lean”. In the natural number game I write statements such as add_zero : ∀ x : ℕ, x + 0 = x. But what this means is that the term called add_zero in Lean is a proof of ∀ x : ℕ, x + 0 = x! The colon is being used in the type theory way. I am intentionally vague about this concept in the natural number game. I let mathematicians believe that add_zero is somehow equal to the “idea” that x+0=x for all x. But what is going on under the hood is that ∀ x : ℕ, x + 0 = x is a Proposition, which is a type, and add_zero is its proof, which is a term. Making a clear distinction between the statement of a theorem, and its proof, is important here. The statements are the types, the proofs are the terms.

  • Universe: Prop
  • Examples of types: 2 + 2 = 4, 2 + 2 = 37, the statement of Fermat’s Last Theorem — ∀ x y z : ℕ, n > 2 ∧ x^n + y^n = z^n → x*y = 0.
  • Examples of terms: the proof that 2 + 2 = 4 (a term of type 2 + 2 = 4), the proof of Fermat’s Last Theorem (a term of type ∀ x y z : ℕ, n > 2 ∧ x^n + y^n = z^n → x*y = 0)

Elements of a theorem

So our mental model of the claim π : ℝ is that , the type, is “a collection of stuff”, and π, the term, is a member of that collection. If we continue with this analogy, it says that the statement 2 + 2 = 4 is some kind of collection, and a proof of 2 + 2 = 4 is a member of that collection. In other words, Lean is suggesting that we model the true/false statement 2 + 2 = 4 as being some sort of a set, and a proof of 2 + 2 = 4 is an element of that set. Now in Lean, it is an inbuilt axiom that all proofs of a proposition are equal. So if a : 2 + 2 = 4 and b : 2 + 2 = 4 then a = b. This is because we’re working in the Prop universe — this is how Propositions behave in Lean. In the Type universe the analogue is not remotely true — we have π : ℝ and 37 : ℝ and certainly π ≠ 37. This special quirk of the Prop universe is called “proof irrelevance”. Formally we could say that if P : Prop, if a : P and if b : P then a = b. Of course if a Proposition is false, then it has no proofs at all! It’s like the empty set. So Lean’s model of Propositions is that the true ones are like sets with 1 element, and the false ones are like sets with 0 elements.

Recall that if f : X → Y then this means that f is a function from X to Y. Now say P and Q are Propositions, and let’s say that we know P\implies Q. What does this mean? It means that P implies Q. It means that if P is true, then Q is true. It means that if we have a proof of P, we can make a proof of Q. It is a function from the proofs of P to the proofs of Q. It is a function sending an element of P to an element of Q. It is a term of type P → Q. Again: a proof h of P\implies Q is a term h : P → Q. This is why in the natural number game we use the symbol to denote implication.

Let false denote a generic false statement (thought of as a set with 0 elements), and let true denote a generic true statement (thought of as a set with 1 element). Can we construct a term of type false → false or a term of type true → true? Sure — just use the identity function. In fact, in both cases there is a unique function — the hom sets have size 1. Can we construct a term of type false → true? Sure, there is a function from the set with 0 elements to a set with 1 element, and again this function is unique. But can we construct a term of type true → false? No we can’t, because where do we send a proof of true? There are no proofs of false to send it to. So true → false is a set of size 0. This corresponds to the standard truth table for , where the first three statements we analysed were true and the last was false.

The proof of Fermat’s Last Theorem is a function

So what does a proof of ∀ x y z : ℕ, n > 2 ∧ x^n + y^n = z^n → x*y = 0 look like? Well, there is an arrow involved in that Proposition, so the statement of Fermat’s Last Theorem is some kind of set of the form \mathrm{Hom}(A,B), which means that in Lean, a proof of Fermat’s Last Theorem is actually a function! And here is what that function does. It has four inputs. The first three inputs are natural numbers x, y and z. The fourth input is a proof: it is a proof of the Proposition n > 2 ∧ x^n + y^n = z^n. And the output of this function is a proof of the Proposition x*y = 0. This is quite an unconventional way to think about what the proof of Fermat’s Last Theorem is, and let me stress that it does not help at all with actually trying to understand the proof — but it is a completely consistent mental model for how mathematics works. Unifying the concept of a number and a proof — thinking of them both as terms — enables you to think of proofs as functions. Lean is a functional programming language, and in particular it is designed with functions at its heart. This, I believe, is why theorem provers such as Lean, Coq and Isabelle/HOL, which use type theory, are now moving ahead of provers such as Metamath and Mizar, which use set theory.

Prove a theorem! Write a function!

Universe TypeProp
Type examples2 + 2 = 5, (∀ a : ℕ, a + 0 = a)
Term examples37, πadd_zero, rfl
Cheat sheet
Posted in Learning Lean, Type theory, undergrad maths | Tagged , , , | 24 Comments