## 2021 Xena Summer Projects

So I spent July and August supervising mathematics undergraduates who were doing Lean projects in various areas of mathematics. Conformal maps, Euclid’s SAS theorem, nilpotent groups and the Radon-Nikodym theorem are just some of the topics I’ve been engaging with this summer. Earlier this week we had a summer project conference, where eight students presented 15 minute talks on their work. I recorded the talks, and they’re now up on YouTube. For each talk there were also 15 minutes of questions but I decided not to record these so that attendees could feel free to ask basic questions.

The students behind four of the projects managed to get PR’s accepted into mathlib, and for two of them this was their first mathlib PR (and possibly even their first contribution to open source). These students had to use git and github (some for the first time), but these are skills which I personally now value as important and worth teaching (it is commonplace to teach this sort of thing to computer scientists, but mathematicians seem to miss out here). In 2022 I will be teaching a formalisation course to undergraduates at Imperial and we will be using git and github for this too.

I think the video titles are pretty explanatory, and perhaps now isn’t the time to be going through the technicalities of exactly what the students achieved. However for the most part we stuck to the mantra of: learn the maths first, and once you think you understand it, then try to formalise it. That way, of course, you find out whether you do actually understand it 🙂

## What was the set-up?

We ran the entire thing on the Xena Project Discord server, a server for undergraduate mathematicians interested in formalisation of mathematics. This worked very well for me and, I think, for them. Students could share their screen if they had technical questions. Everything was done in voice or text channels, and in particular there are some students which I supervised and who I would not recognise if I met them in the street. Not that this bothered me in the slightest.

Timetabling: I basically promised to be online every Tuesday in July and August from 10am until about 6pm, so the entire thing functioned as a drop-in. Some students talked to me a lot, some students barely talked to me at all, and of course some students talked to other people. Creating a community was, I felt, a bit harder than previous in-person summer projects which I’ve run in the past (where you can just go to lunch with a bunch of UGs and let them talk to each other) but of course these are extraordinary times and we have to make do. One big advantage of running stuff online was that students could be in different countries and still participate, and more generally students could move around (e.g. to and from London) without it disrupting their supervision. I live and work in London and for some students it’s problematic to stay without a serious source of funding, and working online also solved that problem. Going forward — assuming things are far more normal next year I might be tempted to run summer projects in a hybrid way next year.

Thank you to all students who participated. If you are a mathematics undergraduate who is interested in formalising some mathematics in Lean (probably Lean 4 next year I guess!) over the summer of 2022, then get in touch with me at some point and we’ll see what we can do.

## Half a year of the Liquid Tensor Experiment: Amazing developments

[This is a guest post by Peter Scholze.]

Exactly half a year ago I wrote the Liquid Tensor Experiment blog post, challenging the formalization of a difficult foundational theorem from my Analytic Geometry lecture notes on joint work with Dustin Clausen. While this challenge has not been completed yet, I am excited to announce that the Experiment has verified the entire part of the argument that I was unsure about. I find it absolutely insane that interactive proof assistants are now at the level that within a very reasonable time span they can formally verify difficult original research. Congratulations to everyone involved in the formalization!!

In this Q&A-style blog post, I want to reflect on my experience watching this experiment.

## Question: Who is behind the formalization?

Answer: It was formalized in the Lean Proof Assistant, mostly written by Leonardo de Moura from Microsoft Research, and used the extensive mathematical library (mathlib) written by the Lean community over the last four years. Immediately after the blog post, the Lean prover/mathlib community discussed the feasibility of the experiment on the Lean Prover Zulip Chat. Reid Barton did some critical early work, but then Johan Commelin has taken the leading role in this. In outline, Johan made an attack along the path of steepest ascent towards the proof, and handed off all required self-contained lemmas to the community. In particular, to get the project started, by January 14 he had formalized the statement of Theorem 9.4 of [Analytic], whose proof became the first target, and has now been completed on May 28, with the help of the Lean community, including (mathematicians) Riccardo Brasca, Kevin Buzzard, Heather Macbeth, Patrick Massot, Bhavik Mehta, Scott Morrison, Filippo Nuccio, Damiano Testa, Adam Topaz and many others, but also with occasional help from computer scientists like Mario Carneiro. Here is a link to the repository containing the formalised proof of Theorem 9.4, and you can also view its dependency graph, now fully green and completed.

## Question: How did you follow the project?

Answer: I joined the Zulip chat to answer any mathematical questions that may arise, but also as an interested spectator.

## Question: What is the significance of Theorem 9.4? Is the Liquid Tensor Experiment completed?

Answer: Theorem 9.4 is an extremely technical statement, whose proof is however the heart of the challenge, and is the only result I was worried about. So with its formal verification, I have no remaining doubts about the correctness of the main proof. Thus, to me the experiment is already successful; but the challenge of my blog post has not been completed. It is probably fair to guess that the experiment is about half-way done. Note that Theorem 9.4 abstracts away from any actual condensed mathematics, so the remaining half will involve a lot of formalization of things like condensed abelian groups, Ext groups in abelian categories, and surrounding machinery. The basics for this have already been built, but much work remains to be done.

## Question: How did the formalization proceed?

Answer: Initially, I imagined that the first step would be that a group of people study the whole proof in detail and write up a heavily digested version, broken up into many many small lemmas, and only afterwards start the formalization of each individual lemma. This is not what happened. Instead, the formalization followed quite closely the original lecture notes, and directly attacked Lemma after Lemma there. It did seem that the process was to directly type the proofs into Lean. Lean actually gives the user a very clear summary of what the current goal is, so one always needs to get a very clear sense of what the next few steps really are. Sometimes it was then realized that even on paper it does not seem clear how to proceed, and the issue was brought to attention in the chat, where it was usually quickly resolved. Only after a lemma was entirely formalized, the proof, now thoroughly digested, was again written up in the Blueprint in human readable form.

## Question: So it’s not actually a blueprint, is it?

Answer: Right — it’s not the blueprint from which the Lean code was formed, but (largely) the other way around! The Lean Proof Assistant was really that: An assistant in navigating through the thick jungle that this proof is. Really, one key problem I had when I was trying to find this proof was that I was essentially unable to keep all the objects in my “RAM”, and I think the same problem occurs when trying to read the proof. Lean always gives you a clear formulation of the current goal, and Johan confirmed to me that when he formalized the proof of Theorem 9.4, he could — with the help of Lean — really only see one or two steps ahead, formalize those, and then proceed to the next step. So I think here we have witnessed an experiment where the proof assistant has actually assisted in understanding the proof.

## Question: Was the proof in [Analytic] found to be correct?

Answer: Yes, up to some usual slight imprecisions.

## Question: Were any of these imprecisions severe enough to get you worried about the veracity of the argument?

Answer: One day I was sweating a little bit. Basically, the proof uses a variant of “exactness of complexes” that is on the one hand more precise as it involves a quantitative control of norms of elements, and on the other hand weaker as it is only some kind of pro-exactness of a pro-complex. It was implicitly used that this variant notion behaves sufficiently well, and in particular that many well-known results about exact complexes adapt to this context. There was one subtlety related to quotient norms — that the infimum need not be a minimum (this would likely have been overlooked in an informal verification) — that was causing some unexpected headaches. But the issues were quickly resolved, and required only very minor changes to the argument. Still, this was precisely the kind of oversight I was worried about when I asked for the formal verification.

## Question: Were there any other issues?

Answer: There was another issue with the third hypothesis in Lemma 9.6 (and some imprecision around Proposition 8.17); it could quickly be corrected, but again was the kind of thing I was worried about. The proof walks a fine line, so if some argument needs constants that are quite a bit different from what I claimed, it might have collapsed.

## Question: So, besides the authors of course, who understands the proof now?

Answer: I guess the computer does, as does Johan Commelin.

## Question: Did you learn anything about mathematics during the formalization?

Answer: Yes! The first is a beautiful realization of Johan Commelin. Basically, the computation of the Ext-groups in the Liquid Tensor Experiment is done via a certain non-explicit resolution known as a Breen-Deligne resolution. Although constructed in the 70’s, this seems to have not been in much use until it was used for a couple of computations in condensed mathematics. The Breen-Deligne resolution has certain beautiful structural properties, but is not explicit, and the existence relies on some facts from stable homotopy theory. In order to formalize Theorem 9.4, the Breen-Deligne resolution was axiomatized, formalizing only the key structural properties used for the proof. What Johan realized is that one can actually give a nice and completely explicit object satisfying those axioms, and this is good enough for all the intended applications. This makes the rest of the proof of the Liquid Tensor Experiment considerably more explicit and more elementary, removing any use of stable homotopy theory. I expect that Commelin’s complex may become a standard tool in the coming years.

## Question: Interesting! What else did you learn?

Answer: What actually makes the proof work! When I wrote the blog post half a year ago, I did not understand why the argument worked, and why we had to move from the reals to a certain ring of arithmetic Laurent series. But during the formalization, a significant amount of convex geometry had to be formalized (in order to prove a well-known lemma known as Gordan’s lemma), and this made me realize that actually the key thing happening is a reduction from a non-convex problem over the reals to a convex problem over the integers. This led me to ask my MathOverflow question whether such a reduction was known before; unfortunately, it did not really receive a satisfactory answer yet.

## Question: Did the formalization answer any questions from the lecture notes?

Answer: Yes, it did, Question 9.9 on the growth of certain constants. There are now explicit recursive definitions of these constants that are formally verified to work, and using this one can verify that indeed they grow roughly doubly-exponentially.

## Question: What did you learn about the process of formalization?

Answer: I learnt that it can now be possible to take a research paper and just start to explain lemma after lemma to a proof assistant, until you’ve formalized it all! I think this is a landmark achievement.

## Question: And about the details of it?

Answer: You know this old joke where a professor gets asked whether some step really is obvious, and then he sits down for half an hour, after which he says “Yes, it is obvious”. It turns out that computers can be like that, too! Sometimes the computer asks you to prove that $A=B$, and the argument is “That’s obvious — it’s true by definition of $A$ and $B$.” And then the computer works for quite some time until it confirms. I found that really surprising.

## Question: Can you read the Lean code?

Answer: The definitions and theorems are surprisingly readable, although I did not receive any training in Lean. But I cannot read the proofs at all — they are analogous to referring to theorems only via their LaTeX labels, together with a specification of the variables to which it gets applied; plus the names of some random proof finding routines. Still, I have the feeling that it should be possible to create a completely normal mathematical manuscript that is cross-linked with the Lean code that makes it possible to navigate the Lean code seamlessly — I think the creation of such an interface has also become a goal of the experiment.

## Question: So there will be more documentation work to do, also for the part that has just been completed?

Answer: Definitely! Currently, the Lean code leading up to the proof of Theorem 9.4 is not well-documented, and some parts of the proof could definitely be streamlined. Moreover, large parts of it are basic material that should become part of mathlib. It should be noted that because mathlib is constantly evolving, any project that uses it has to continually make small changes so that it will still compile with the newest version of mathlib. So it is vital that the parts of the proof of general interest are moved into mathlib, where they will be maintained.

## Question: What is the de Bruijn factor, i.e. the ratio between the length of the computer-verified proof and the length of the human proof?

Answer: It depends on the method of calculation, but somewhere around 20. I think this is amazingly small! I had expected that the first step of taking the lecture notes and turning them into a properly digested human proof — which as I said didn’t actually happen — would already introduce a factor of ~5. But the blueprint is actually only a factor of ~2.5 longer than the relevant part of the lecture notes right now.

## Question: How do you feel now that the proof has been broken down to the axioms of mathematics?

Answer: Good question! Usually the verification of a proof involves trying small variations of the argument and seeing whether they break or not, whether they lead to statements that are too strong etc., in order to get a sense of what is happening. Basically a proof is like a map of how to get up a mountain, say; it may be a nice, slightly winding path with a broad view, or it may lead through the jungle and up some steep wall, requiring climbing skills. Usually there’s not just one way up, and one may try whether taking a left turn the view is nicer, or taking a right turn one can take a shortcut.

In the case at hand, it feels like the main theorem is some high plateau with a wonderful view, but the way there leads through a large detour, to attack the mountain from the other side, where it is dark and slippery, and one has to climb up a certain steep wall; and no other pathways are seen left or right. Answering the questions in the Zulip chat felt like I would give instructions of the form “put your foot here, then your hand there, then pull yourself up this way” at the more difficult passages.

So I have gained the reassurance that it is possible to climb the mountain along this route, but I still have no sense of the terrain.

Posted in Uncategorized | | 22 Comments

## The trace of an endomorphism (without picking a basis)

What mathematics can we do with vector spaces, without having to pick a basis? For example, if $V$ is a finite-dimensional vector space and $\phi:V\to V$ is a linear map, can we define its trace “without picking a basis”?

I got interested in trying to understand if this question even has a meaning. Here are some thoughts.

## Models for vector spaces

When we learn linear algebra at high school, we typically first learn the “concrete” theory, where vectors are columns of numbers, and we can multiply them by matrices and thus get a conceptual understanding of systems of linear equations. Then at university we go on to the “abstract” theory, where a real vector space is something defined by a list of axioms, and spaces like $\mathbb{R}^n$ are now examples of these abstract objects.

We then learn about the fundamental notion of a basis of a vector space. Say we have an abstract finite-dimensional vector space. By picking a basis, the vectors in our vector space suddenly transform back into columns of numbers. Not only that, but linear maps between vector-spaces-with-a-basis turn back into matrices. By the time we’ve learnt that every vector space has a basis, we can see that our new theory is in some sense “the same as” our old theory. In some sense it took me a long time to get on top of this principle as an undergraduate; perhaps the key concepts were not emphasized to me enough, or maybe I just got lost in all the new (to me, at the time) ideas. Nowadays I think about it like this: initially we learn that $\mathbb{R}^n$ is an example of a finite-dimensional vector space, but after learning about bases we can conclude that every finite-dimensional real vector space is isomorphic to $\mathbb{R}^n$ for some $n$, so in fact $\mathbb{R}^n$ can be thought of as a model for a finite-dimensional real vector space, just like the collection of equivalence classes can be thought of as a model for a quotient by an equivalence relation. Every vector space has a basis, so one can prove theorems about finite dimensional vector spaces by checking them on models, i.e. by picking a basis.

## Don’t pick a basis (unless you have to)

After a couple of years at university the following idea had somehow sunk in: if possible, one “should not choose a basis”. The canonical example shows up when we learn about the dual of a vector space. The dual of a real vector space $V$ is just the space of linear maps from $V$ to $\mathbb{R}$; this space has a natural vector space structure and is called the dual space of $V$, with notation $V^*$. Confusing example: if $V=\mathbb{R}^n$, then an element of $V$ is represented by a column of $n$ numbers. Give $V$ its canonical basis $e_1=(1,0,0,\ldots,0), e_2=(0,1,0,\ldots), \ldots, e_n$. Then any linear map $V\to\mathbb{R}$ is uniquely determined by where it sends the $e_i$, and in particular an element of $V^*$ is also uniquely determined by $n$ numbers, so we can represent it as a vector of length $n$ and we have proved that the dual of $\mathbb{R}^n$ is $\mathbb{R}^n$ again. Great! Furthermore, every finite-dimensional vector space equals $\mathbb{R}^n$ (proof: pick a basis) so we’ve proved that duality is just the identity map!

Except that we haven’t, because we have been a bit too liberal with equality here (and as some computer scientists are very quick to point out, many mathematicians don’t understand equality properly, and hence they might accidentally end up teaching this point badly). This argument proves that if $V$ is a finite-dimensional vector space, then it is isomorphic to its dual $V^*$; however it is not in general canonically isomorphic to its dual, whatever that is supposed to mean. In this instance, it means that different bases in general produce different isomorphisms between $V$ (identified with $\mathbb{R}^n$) and $V^*$ (also identified with $\mathbb{R}^n$). This is a bit confusing because in the group theory course running at the same time as the linear algebra course, a student is simultaneously being taught that when we say “how many groups are there of order 4” we obviously mean “…up to isomorphism”, because isomorphic groups are equal for that question.

However, if we do this trick twice, we can identify $V$ with its double-dual $(V^*)^*$ and it turns out that this identification is canonical, whatever that means. What it appears to mean in this case is that there is a really cool way of writing down the isomorphism from $V$ to $(V^*)^*$ which doesn’t ever pick a basis!

[Technical note/reminder: Here’s the explicit map from $V$ to its double dual. Say $v\in V$. We need to write down a linear map $V^*\to\mathbb{R}$ associated to $v$. So take $\psi\in V^*$ and we need to construct a real number somehow. Well, what about $\psi(v)$? That works a treat! One can check that this map $L(v)$ sending $\psi$ to $\psi(v)$ is indeed linear, and induces a map $L$ from $V$ to $(V^*)^*$ which can be checked to be an injection and hence, by dimension counting, an isomorphism (NB: some magic just happened there). I suspect that this argument was my initiation into the mysterious word canonical, a word I now rail against, not least because in my opinion the Wikipedia page about canonical maps contains a definition which is full of fluff (“In general, it is the map which preserves the widest amount of structure, and it tends to be unique” — this is not a definition — this is waffle).]

The moral: all the canonical kids are too cool to pick a basis.

PS here is a funny way of thinking about it: if we identify $V$ with $\mathbb{R}^n$ as column vectors, then perhaps we should identify the dual of $V$ with $\mathbb{R}^n$ as row vectors, because multiplication on the left by a row vector sends a column vector to a number, which is what we want a dual vector to do. So identifying $V$ with $V^*$ is some high-falutin’ version of transpose, and if you transpose once you don’t quite get the same thing, but if you transpose twice then you’re back where you started. Canonically.

## Picking a basis and then pretending you didn’t

OK so here’s a pretty cool theorem about traces (although any formaliser would tell you that it is actually a definition, not a theorem). If $M$ is an $n\times n$ square matrix then it has a trace, the sum of the elements on the leading diagonal. Now say $V$ is a finite-dimensional real vector space, and $\phi:V\to V$ is a linear map, crucially from $V$ to itself. If we choose a basis of $V$ and use the same basis for the source $V$ and the target $V$, then $\phi$ becomes a matrix, and we can take its trace. If we change our basis of $V$ then the matrix representing $\phi$ changes. But, miraculously, its trace does not! This can be proved by an explicit calculation: changing our basis for $V$ changes the matrix $M$ representing $\phi$ to $P^{-1}MP$ for a certain invertible “change of basis” matrix $P$ (here was where it was key that the source and the target of the endomorphism were the same, otherwise we would have got $P^{-1}MQ$), and the traces of $M$ and $P^{-1}MP$ are equal because of the general fact that the traces of $AB$ and $BA$ are equal if $A$ and $B$ are square matrices of the same size (apply with $A=P^{-1}M$ and $B=P$).

As a consequence, this means that if $V$ is a finite-dimensional vector space then we can unambiguously talk about the trace of a linear map $\phi:V\to V$, in the following way. First do the slightly distasteful choose-a-basis thing, then take the trace of the corresponding matrix, and then prove that the calculation was independent of the basis you chose, so we can pretend you never did it. Similarly one can talk about the determinant and characteristic polynomial of $\phi:V\to V$, because these are also invariants of matrices which are constant on conjugacy classes.

However, you did do it — you chose a basis. Something is a little different to the map from $V$ to its double dual — the map from $V$ to its double dual really was defined without choosing a basis at all. Here we did something slightly different — we chose a basis, and then proved that it didn’t matter. Can we go one better, and define the trace of a linear map from $V$ to $V$ without choosing a basis at all?

So in the process of discussing this question on Twitter and on the Lean chat, established firstly that this very much depends on what the question actually means, and secondly I managed to learn something new about Lean, and when I learn something new about Lean I tend to blog about it, so here we are. First I’ll talk about the failed attempts to define the trace of an endomorphism, which led to some other questions and clarifications, and then I’ll talk about trunc, something which looked to me like a completely pointless operation in Lean and which up until now I’ve just ignored, but which somehow might be at the bottom of this.

## A failed attempt, and some progress

In characteristic $p$ there are some problems with the ideas below, but these are not relevant to what I want to talk about, so let’s just stick to vector spaces over the reals (or more generally any field of characteristic zero). The first observation is that computing the trace, determinant and characteristic polynomial all seem to be pretty much the same question: for example, if you can compute the char poly of $\phi$ then you can compute its trace and det because you can read these things off from the coefficients. Conversely, if you can compute traces then applying this to some exterior powers you can read off the coefficients of the characteristic polynomial including the det. So computing any of these invariants without choosing a basis somehow boils down to the same thing.

Next let’s turn to Wikipedia, where we are informed of a basis-free way to compute the trace of an endomorphism! Here’s the trick. There’s an obvious bilinear map $V\times V^*\to\mathbb{R}$, sending $(v,\psi)$ to $\psi(v)$, and by the universal property of the tensor product this induces a linear map $T:V\otimes_{\mathbb{R}}V^*\to\mathbb{R}$. There is also an obvious bilinear map $V\times V^*\to Hom(V,V)$ sending $(v,\psi)$ to the linear map sending $w$ to $\psi(w)v$, and this induces a linear map $V\otimes_{\mathbb{R}}V^*\to Hom(V,V)$, which is easily checked to be an isomorphism if $V$ is finite-dimensional. Composing the inverse of this isomorphism with $T$ gives us a linear map $Hom(V,V)\to\mathbb{R}$ which we can check to be the trace (e.g. by picking a basis). So we’re done, right?

Well, the thing about this construction is that whilst the map $V\otimes_{\mathbb{R}}V^*\to Hom(V,V)$ is canonical (indeed, it even exists in the infinite-dimensional case), to prove that it’s surjective in the finite-dimensional case the natural thing to do is to pick a basis and to make the corresponding matrix by taking an appropriate linear combination of tensor products of elements of the basis and the dual basis. I would argue that we have made some progress here — we still picked a basis, but we used it to fill in a proof, rather than to construct data. However, we still picked a basis. Note also that the inverse of a computable bijection might not be computable, if you’re interested in that sort of thing, and I suspect that this might be a situation where this annoyance kicks in.

## What is the dimension of a finite-dimensional vector space?

One might instead be tempted to argue that the map $V\otimes_{\mathbb{R}}V^*\to Hom(V,V)$ is surjective because it is an injective map between vector spaces of the same dimension (I’m not entirely sure how to prove it’s injective without picking a basis, but it might well be possible; however I do not know how to prove that the dimension of $V\otimes W$ is the product of the dimensions of $V$ and $W$ without picking a basis ). Anyway, talking of dimensions, here is the other “basis-free” method I learnt to do these sorts of things: the canonical method to work out the determinant of an endomorphism without choosing a basis. If $\phi:V\to V$ is a linear map, and if $n$ is the dimension of $V$ then $\phi$ induces a linear map on top wedge powers $\bigwedge^n(V)\to\bigwedge^n(V)$, and an endomorphism of a 1-dimensional space is canonically a number (proof: pick a basis and check it’s independent of the choice) which can be checked to be the determinant of $\phi$, and if you can do determinants then you can do char polys and hence traces.

The problem with this approach is that it relies on you knowing what $n$ is, the dimension of $V$, and if all you know is that $V$ is finite-dimensional, then how do you get $n$, the dimension? Well obviously you pick a basis , count it, and then prove that your answer is independent of the choice of basis. So this “top exterior power” argument has moved us closer to the heart of the problem: forget defining the trace — how do we define the dimension of a finite-dimensional vector space without picking a basis? Note that the dimension is just the trace of the identity function, so we can add dimension to our list of things which we cannot “compute” without picking a basis. And now we’re getting somewhere — what does it mean to say a vector space is finite-dimensional? Did we pick a basis even to make that statement?

## This is something to do with constructivism.

I am not a big fan of constructivism, as many of you will know. I think that the emphasis placed on it by the computer theorem proving community historically has held the area back; it puts off mathematicians (e.g. the 2012 version of me) who have been indoctrinated by a traditional mathematics course which assumes the law of the excluded middle from day one. One problem I have with constructivism, as someone who was classically trained, is that it turns out that sometimes there is more than one way of doing something constructively, and all these ways are the same classically. For example very early on in my Lean career I was surprised to learn that there was function.bijective, the predicate that a function was a bijection, but also there is the concept of a function with a two-sided inverse. As far as I’m concerned these are the same thing, but constructively they’re not, because given a bijection there might not be a “formula” for its inverse. The existence of a two-sided inverse is a true/false statement — but actually having the two-sided inverse is data, and, in constructive mathematics, data can sometimes be hard to come by. The function which takes as input a set/type X and a proof that X is nonempty, and outputs an element of X, is a noncomputable function and its existence, if you think about it, is closely related to the axiom of choice, which is something that the constructivists are not big fans of.

So it turns out that this whole “$V$ is finite-dimensional” thing, which this entire post has assumed all the way through, is a victim of this subtlety. What does it mean for a vector space to be finite-dimensional? The following answers are all the same classically (i.e. in “normal maths”), but constructively they’re all different:

• A proof that $V$ has a finite basis;
• An actual choice of a finite basis;
• An element of the truncation of the set of all pairs $(n,\alpha)$, where $n\in\mathbb{N}$ and $\alpha:V\cong\mathbb{R}^n$ is an isomorphism. Here by the truncation of a set we mean the quotient of the set by the “always true” equivalence relation. Yes, you read that right.

OK so we know what the first two things are. The first statement is just a proof. If your proof is nonconstructive that’s fine, I don’t care. The second thing is data. For me this is a problematic definition of finite-dimensional, precisely because it’s not a proof of a true-false statement. If I am working with a finite-dimensional vector space $V$ in Lean then I might end up having to deal with the fact that if some argument changed $V$‘s basis whilst leaving $V$ alone, I might not have $V = V$ (as finite-dimensional vector spaces) any more, because the data going on behind the scenes saying that $V$ is finite-dimensional might not match up. I have enough problems formalising mathematics in type theory without having to deal with this too.

This brings us to the third definition, involving trunc. OK so if X is a type or set or however you want to set up your mathematics, then, as I mentioned above, trunc X is the quotient of X by the equivalence relation which is always true. In particular if X is nonempty then trunc X has one element/term, and if X is empty then it has no elements/terms. If you’re happy about the concept that propositions can be types, with the idea that a true proposition has one term (its proof), and a false proposition has no terms, then trunc X seems to be basically the proposition that X is nonempty. However it is more than that, because it is data. It is the missing link in our story.

Let’s say that V is a real vector space, and we have a “proof” that it is finite-dimensional in the sense that we have a term t of type trunc X(V), where X(V) is the space of all pairs $(n,\alpha:V\cong\mathbb{R}^n)$. Here’s how to define the trace of an endomorphism $\phi:V\to V$. We’re going to define a map from trunc X(V) to the reals, and the idea is that if you evaluate this map at t then you’ll get the trace of $\phi$. Now to define a map from a quotient, we use the universal property of quotients. First of all we have to define a map from the space we’re quotienting by to the reals. This is easy: given an isomorphism $\alpha:V\to \mathbb{R}^n$ we get an induced map $\alpha_*\phi:\mathbb{R}^n\to\mathbb{R}^n$ and we just take its trace with respect to the standard basis. And secondly, we need to prove that this map descends to the quotient, which boils down to proving a theorem, and the theorem of course turns out to be precisely the statement that the trace of an endomorphism is independent of choice of basis.

Similarly we can define a function from trunc X(V) to the naturals, such that if V is finite-dimensional in the sense that we have t : trunc X(V) then its dimension is what you get by evaluating this function. And det and char poly etc.

Note that for any vector space V, the type trunc X(V) has at most one term — it’s a so-called subsingleton type. Lean’s exact command will be fussy about subsingleton types because it’s a theorem that two terms of a subsingleton type are equal, rather than it being a definitional equality. Hence Lean’s exact tactic still might not work if we’re carrying around t : trunc X(V) as our definition of finite-dimensionality. However the convert tactic will operate fine, because it looks out for subsingletons and applies the appropriate theorem.

## Type-theoretic conclusion

We now seem to have got to the bottom of this. To do this kind of metamathematics — “did we pick a basis?” — we need to think really carefully about what we even mean by the assumption of V being finite-dimensional. My preferred approach, and the one which makes mathematics easier to do in Lean, is to just use the propositional definition “there exists a basis”. This way one never runs into problems with equality, and if one needs a basis one just applies Lean’s version of the axiom of choice. Constructivists would complain that it breaks computation, but I want to prove theorems and I prefer to do computations using tactics rather than trying to persuade the kernel to reduce things. The other approach is to carry around some extra data, and this can lead to problems for beginners with equality being a bit broken, however the experts have learnt ways to deal with this. Ultimately the best choice in a theorem prover will depend on what you actually are doing with your objects, and given that I just want to prove theorems, for me, the bare existence definition of finite-dimensional is enough. However to move from the Prop world to the Type world, e.g. when defining the trace of an endomorphism, one has to at some point do some magic, and the moment where this happens is the moment you picked a basis.

## Induction on equality

Or: how I learnt to stop worrying, and figured out a proof that equality was an equivalence relation.

## What is the definition of = ?

I used to teach a course where I defined the notion of what it meant for two integers to be congruent modulo N. Here N is an integer, and two integers a and b are said to be congruent modulo N if their difference is a multiple of N. For example, 37 and 57 are congruent modulo 10.

I would go on to prove that congruence modulo N is an equivalence relation on the integers. Reminder: an equivalence relation on X is a binary relation on X which is reflexive, symmetric and transitive. The proof I gave never assumed that N was non-zero, and congruence modulo 0 is the same relation as equality, so you might like to deduce from this that equality on the integers is an equivalence relation. Well, tough darts. In the proof that congruence modulo N is an equivalence relation, it turns out that we assume that equality is an equivalence relation, as you can readily check if you type it into a theorem prover.

So how do we prove that equality is an equivalence relation? In my 1st year undergraduate lecture notes this is stated as an “obvious example” of an equivalence relation. Euclid was more cautious — he explicitly noted that he would be assuming transitivity and reflexivity of equality in his common notions 1 and 4, and symmetry followed from his use of language — he would say “this collection of things are equal” e.g. “all right angles are equal”: so equality of two things was a predicate on unordered pairs for him, and symmetry was hence built in.

I also looked at my 3rd year logic and set theory notes, to try and figure out the definition of = , but they were also no help. There is some waffle about “the interpretation in a model is the diagonal of $A \times A$” (which I think might be circular) and how x = y means that the two terms x and y are actually the same term, but don’t get me started about what mathematicians mean when they say two things are “the same”, e.g. “the same proof” or “canonical isomorphism is denoted by =” or all the other highly useful and yet irritatingly non-rigorous stuff we say. Anyway, just “defining” equality to mean another word or phrase which is synonymous with equality isn’t going to get us anywhere. Ultimately my impression is that you might just assume that equality is an equivalence relation, when setting up mathematics via first order logic and ZFC set theory. I’m no logician though — feel free to correct me in the comments. This post is about a different way to approach things, which part of me finds a whole lot more satisfactory.

## A definition of equality.

Lean’s type theory contains a definition of equality! This is great because it means we know where we stand. Here’s the definition:

inductive eq {X : Type} : X → X → Prop
| refl (a : X) : eq a a

infix  = :50 := eq

What does all that baloney mean? Well, this is an inductive definition of a binary relation. Let X be a type or a set or however you want to think of a collection of stuff. The binary relation eq on X takes as input two things in X, call them a and b, and spits out a true-false statement eq a b. We’ll get to the definition in a second, but that last line means “the notation a = b is defined to mean eq a b, with BIDMAS-power 50”. Let’s use the = notation from now on instead of eq, because it’s familiar-looking.

So, let’s get to the heart of the matter: how do we define a = b? Well, Lean’s definition says the following: “there is only one tool we have to prove the equality of two things, and it’s the theorem refl a, which is a proof of the statement a = a. That’s it.” Thanks to Andrej Bauer who on Twitter pointed out that a neat way to think about this definition is: equality is the smallest binary relation which is reflexive. Andrej also tells me that this is Martin-Löf’s definition of equality.

OK great. We have now apparently defined the true-false statement a = b, and we can prove that a = a is true, so this is a good start. But how the heck are we going to go from this to symmetry and transitivity? We’re going to use induction!

## Inductive types: a reminder

In my last post, I waffled on about inductive types. I was going to talk about equality in that post, but it had already got quite long, so I thought I’d deal with equality separately, and so here we are. The take home message from the last post was that if you define something as an inductive type in a type theory like Lean’s, then Lean automatically generates a new axiom in the system called the recursor for that type. This axiom is generated automatically by the rules of the calculus of inductive constructions. For example, if you define the natural numbers inductively using Peano’s axioms, then the recursor is just the statements that induction and recursion are true (propositions are types in Lean, so one function can be interpreted in two ways depending on whether you apply it in the Prop universe or the Type universe). The recursor attached to an inductive type is a way of saying “if you want to do something to every instance of the thing you’re defining, it suffices to do it for all the constructors”, and a special case of it is the “inductor”, which says, in our case, that if you want to deduce something from a = b then all you have to do is make sure you can prove it in the special case a = a. Formally, the inductor for = is the following statement:

eq.inductor (X : Type) (R : X → X → Prop)
(h : ∀ x, R x x) (a b : X) : a = b → R a b

Note the fancy type theory R : X → X → Prop for what a mathematician would call “R is a binary relation on X” (reason: R : X → X → Prop means $R\in \mathrm{Hom}(X,\mathrm{Hom}(X,\{\tt{true},\tt{false}\}))$ in maths speak, i.e., $R$ is a true-false statement attached to every pair of elements of $X$). So, in words, the inductor says that if you have any binary relation R on X such that R x x is true for all x, and if you know that a = b, then R a b is true. That’s the tool we have. Again let’s go back to Andrej’s observation: the definition of equality can be thought of as saying that it is the smallest (or the initial) binary relation which is reflexive, and the inductor then makes this rigorous by saying that any other reflexive binary relation must contain equality.

Summary : we are armed with two things, and two things only. (1) a proof of ∀ a, a = a and (2) a proof that if R is a binary relation on X and ∀ x, R x x is true, then a = b implies R a b.

The game now is to prove that = is symmetric and transitive, without accidentally assuming it! And what better framework to play that game in, than Lean! (because trust me, if you play it on paper, you are so going to mess this one up).

## The equivalence relation game

Insert coin. That’s hard mode, i.e. spoiler-free. Copy and paste the code at the other end of that link into VS Code if you have Lean installed locally if you want a much speedier experience. The game is to prove symm and trans from refl and ind. I would work out the maths proofs first.

If you want no spoilers at all, stop reading now. I usually post some art generated by my children in my blog posts so perhaps now is a good time to do this. If you want to hear more about equality, and in particular have one or more hints and spoilers, read on.

refine is a great tactic. It’s like apply on steroids. You can do everything with intros, apply, refine and exact. You can read about what these tactics do here, in Lean’s documentation. All Lean proofs can be done in just two or three lines.

Life is easier if we have more tools than just ind. The thing about ind is it wants as input a binary relation. A simpler object than a binary relation on X is a subset of X, or equivalently its characteristic function P : X → Prop (sending the elements of the subset to true and the other elements to false). Here’s something which you believe about equality: if a = b and P a is true, then P b is true. This is a much cleaner statement than ind — it’s just as powerful, and more flexible. Logicians call it the substitution property of equality. Applying it is the way Lean’s rewrite or rw tactic works. Why not try to prove it first? It makes the proof of trans much easier — it’s basically a master sword. Here’s the statement. Just cut and paste it before symm. Of course, you’ve got to prove it (using ind and refl).

theorem subst (hab : a ∼ b) (P : X → Prop) : P a → P b :=
begin
sorry
end

## Mathematical solutions

• subst
• symm
• trans

## Lean solutions

• subst
• symm
• trans
Posted in M1F, M40001, Type theory, undergrad maths | Tagged , | 8 Comments

## Induction, and inductive types.

Here’s a conundrum: why does a schoolkid think recursion is a triviality (“the Fibonacci sequence is defined like this: 1,1,2,3,5,8,…, with each number in the sequence the sum of the two before it” — this is a definition very likely to be accepted by a schoolkid) but they might struggle with the principle of mathematical induction (a procedure for generating a sequence of proofs, each one a consequence of the one before it)? I was taught at university that induction and recursion were different things, but Lean has a common abstraction which makes them both instances of the same thing. In this post I’ll talk about induction, and recursion, and also about how something called the Calculus of Inductive Constructions (the way Lean’s inductive command works) provides some rather surprising proofs of basic mathematical facts.

## Defining a type.

It is not unreasonable to think of what Lean calls a type as what a mathematician would usually call a set. For example a mathematician might say “let S be a set with three elements; call the elements a, b and c“. Here’s how to make that set, or type, in Lean:

inductive S : Type
| a : S
| b : S
| c : S

In fact the full names of the elements of S, or the terms of type S as we call them in type theory, are S.a, S.b and S.c, and we might want to open the S namespace so that we can just refer to them as a, b and c. An undergraduate might want to make this kind of definition when they are answering the following kind of question in Lean: “Let $f : X \to Y$ and $g : Y \to Z$ be functions. True or false: if the composition $g \circ f$ is injective, then $g$ is injective”. This is false, and to prove it’s false one can either use the sets we mathematicians have lying around (such as the naturals, integers or reals), or one can just build some explicit sets of small size like S above, and some explicit functions between those sets.

## Defining functions between types.

So here’s what we’re going to do. Let’s make a type X with one term p, a type Y with two terms q and r, and a type Z with one term s. This is easy given what we’ve already seen:

inductive X : Type
| p : X

inductive Y : Type
| q : Y
| r : Y

inductive Z : Type
| s : Z

[By the way, if you want to play along but you haven’t got Lean installed on your computer, you can do all this within a web browser by clicking here (although you could instead click here to find out how to install Lean and the community tools, which give you a far slicker experience).]

Our counterexample is going to be the following: We define f : X → Y by f(p)=q and g : Y → Z by g(q)=g(r)=s. Let’s do this.

open X Y Z

def f : X → Y
| p := q

def g : Y → Z
| q := s
| r := s

As a mathematician I find the use of the | symbol quite intimidating (especially that we are now using it in a different way), but given that I’ve told you what we’re doing and now I’m telling you how we are doing it, you can probably guess what it all means. One can now go ahead and state the result:

open function

example : ¬ (∀ (X Y Z : Type) (f : X → Y) (g : Y → Z),
injective (g ∘ f) → injective g) :=
begin
sorry
end

and now if you fancy proving a mathematical theorem by playing a puzzle game, you can click here to get all the code at once, and have a go. Instead of talking about the proof though, I want to talk about the rather surprising (at least to me) fact that Lean is defining f and g by recursion.

## What happens when we define an inductive type?

What happens under the hood when Lean sees this code

inductive X : Type
| p : X

is quite surprising (at least to me as a mathematician). I’ve been arguing above that we should think of this code as saying “Let $X := \{p\}$ be a set with one element”. But here’s what’s really going on when Lean sees this code. Unsurprisingly, Lean defines a new type X and a new term p (or more precisely X.p) of type X. It also defines one more new thing, which expresses that p is the only element of X. But the way it does this is surprising: it defines the so-called recursor for X, which is the following statement:

X.rec : ∀ {C : X → Sort u}, C p → (∀ (x : X), C x)

Whatever does that mean? Well first I think I’d better explain what this Sort u thing is. I’ve written in the past an explanation of how sets and their elements, and theorems and their proofs, are unified in Lean’s type theory as types and their terms. The sets/elements story goes on in the Type universe, and the theorems/proofs story goes on in the Prop universe. When Lean says Sort u it means “either of these universes”. So we can rewrite X.rec as two statements:

X.recursor : ∀ {C : X → Type}, C p → (∀ (x : X), C x)
X.inductor : ∀ {C : X → Prop}, C p → (∀ (x : X), C x)

The first statement is the principle of recursion for X. In set-theoretic language it says this. “Let’s say that for every element $x$ of $X$ we have a set $C(x)$, and let’s say we have an element of $C(p)$. Then we have a method of constructing an element of $C(x)$ for all $x \in X$.” This looks like a rather long-winded way of saying that $p$ is the only element of $X$. In fact it is worth looking at the special case of X.recursor where C is the constant function sending every element of X to the set S:

X.recursor_constant : ∀ S, S → (X → S)

This says that if S is any set, and we have an element a of S, then we can get a function from X to S. What is unsaid here, but true by definition, is that it’s the function that sends p to S, as can be checked thus:

-- give X.rec the constant function C sending everything to S
def X.recursor_constant : ∀ S, S → (X → S) := λ S, @X.rec (λ x, S)

example (S : Type) (a : S) : (X.recursor_constant S a) p = a :=
begin
-- true by definition
refl
end

Do you remember our definition of f above?

def f : X → Y
| p := q

This function f is defined using X.recursor_constant, letting S be Y and letting the element of S be q. The notation Lean uses is short, but under the hood this is how f is constructed.

So much for recursion. The second statement coming from X.rec is X.inductor, the principle of induction for X. I should probably say that I made up the word “inductor”, but inductor is to induction as recursor is to recursion. In more mathematical language the inductor says this. “Let’s say that for every element x of X we have a true-false statement $C(x)$, and let’s say that $C(p)$ is true. Then $C(x)$ is true for every element x of X.” So again it is just a rather long-winded way of saying that p is the only element of X.

Why have computer scientists isolated these rather elaborate statements as the fundamental way to say that p is the only element of X? It’s actually because of a fundamental symmetry. We have defined a new type X, in a functional programming language, and now the fundamental thing we need to do next is to explain how to define functions into X, and how to define functions out of X. To define functions into X, we need to have access to terms of type X, or in computer science lingo to constructors of X. This is exactly what X.p is — a way to construct a term of type X. To define functions out of X, we need access to eliminators for X, that is, some kind of gadget whose output is a function from X to somewhere else. Because X only has one term, namely p, we need a way of saying “to give a function out of X, we only need to say what happens to p“, and this is exactly what the recursor is doing. Between them, the constructor and recursor say in a formal way that the elements of X are “at least p, and at most p, so are exactly p.”

## Recursors for general inductive types.

Lean automatically generates constructors and a recursor for every type defined with the inductive command. There is a general rule for how to do this, but informally it’s pretty clear. We define inductive types using this | symbol, and you get a constructor for each line with a | in. The eliminator or recursor simply says that to define a function from the new type you’re defining, all you have to do is to make sure you’ve defined it on each constructor.

The rest of this post is the fun part. I will go through a bunch of inductive types defined in Lean, we can look at the definition, figure out the recursor attached to each of the types, and then see what this corresponds to mathematically. We will see some familiar things popping up in surprising ways.

### Example: a set with two elements.

Recall our inductive type Y:

inductive Y : Type
| q : Y
| r : Y

The recursor for Y tells us that if S is a set, then to get a map from Y to S we have to give two elements of S, one corresponding to where q goes and one corresponding to where r goes.

def Y.recursor_constant : ∀ S, S → S → (Y → S) := λ S, @Y.rec (λ y, S)

The full recursor can even be used (with non-constant C) to define a function from Y which sends q into one type and r into a different type, but when defining the function g above we do not need this level of generality. If you want to see what Y‘s recursor looks like, just type #check @Y.rec in a Lean session after the definition of Y, and remember that Π is just computer science for ∀ (in Lean 4 they will be using ∀ instead of Π in fact).

### Example: the natural numbers.

Mathematicians who have seen the development of mathematics in ZFC set theory know that a complex number is defined to be a pair of real numbers, a real number is defined to be an equivalence class of Cauchy sequences of rational numbers, a rational number is defined to be a multiplicative localisation of the integers at the nonzero integers, an integer is defined to be an additive localisation of the naturals at the naturals, and the naturals are defined by the ZFC axiom of infinity. In Lean’s type theory a complex number is defined to be a pair of real numbers, a real number is defined to be an equivalence class of Cauchy sequences of rational numbers etc etc etc, and it’s all just the same up to the very end, because in Lean the naturals are defined by Peano’s axioms:

inductive nat : Type
| zero : nat
| succ (n : nat) : nat

This means that we have two ways to make natural numbers. First, zero is a natural number. Second, if n is a natural number, then succ n (usually called n+1 by mathematicians) is a natural number. Now we need a way of expressing the idea that this is the only way to make natural numbers, and this is the recursor, which is automatically generated by Lean, and says a precise version of the following informal thought: “If you want to do something for all naturals, then you need to tell me how to do it for both constructors”. In other words, “…you need to tell me how to do it for zero, and then you have to tell me a way to do it for n+1 assuming we’ve already done it for n“. Sounds familiar?

The recursor in general involves a map to Sort u. Let’s just specialise to the two universes we’re interested in, and take a look at the constant recursor, and the inductor (and let’s use Lean’s notation ℕ for nat):

nat.recursor_constant : ∀ (S : Type), S → (∀ (n : ℕ), S → S) → (ℕ → S)
nat.inductor : ∀ (C : ℕ → Prop), C 0 → (∀ (n : ℕ), C n → C (succ n)) → ∀ (n : ℕ), C n

[The proof of nat.recursor_constant is λ S, @nat.rec (λ n, S) and the proof of nat.inductor is just @nat.rec. ]

The constant recursor says this: if S is a set, and we want to make a function f : ℕ → S, here’s a way of doing it. First we need an element of S (namely f(0)) and second, for each natural number we need a map from S to S (telling us how to make f(n+1) given that we know f(n)).

The inductor says this. Say we have a family C(n) of true-false statements, and that C(0) is true, and that for all n we have a proof that C(n) implies C(n+1). Then we can deduce that C(n) is true for all n.

What I think is really cute about this example is that Peano’s definition of the natural numbers makes it immediately clear why the principle of mathematical induction works. In the natural number game we use the recursor in the background to define addition and multiplication on the naturals. We also use it to prove things which I call “axioms” in the natural number game — for example the proof that 0 is not equal to succ n for any natural number n uses the recursor to define a function from ℕ to $\{0,1\}$ sending 0 to 0 and succ n to 1, and using this function it’s easy to prove the “axiom” zero_ne_succ by contradiction. If you want an exercise, try using nat.recursor_constant to prove injectivity of the succ function, something else I also claimed was an axiom in the natural number game (as Peano did) but which was actually proved using the recursor.

### Example : false

false is a true-false statement, and you can probably guess which one it is. In Lean false is defined as an inductive type! Here’s the full definition:

inductive false : Prop

This time there are no | s at all! Every constructor of false would be a proof of a false statement, so this design decision is not surprising. The recursor is

false.rec : Π (C : Sort u), false → C

In other words, to give a map from false to C you have to define it on all constructors, of which there are none. Let’s take a look at the inductor then, by changing Sort u to Prop:

false.inductor : ∀ (P : Prop), false → P

It says that if P is any true-false statement, then false implies P. This logical tautology has been automatically generated by Lean, because Lean’s model of an implication $Q\implies P$ is a function from proofs of Q to proofs of P, and false has no terms, i.e., no proofs.

There is a similar story with inductive empty : Type, Lean’s definition of the empty type. The recursor for empty says that to give a map from the empty type to any type S, you don’t have to do anything other than feed in S.

### Example : or

The logical or on propositions is defined as an inductive type in Lean!

inductive or (P Q : Prop) : Prop
| inl (hp : P) : or
| inr (hq : Q) : or

There are two constructors for P ∨ Q, where now I’m using the usual logicians’ notation for or. In other words, there are two ways to prove P ∨ Q. First you can prove P, and second you can prove Q. Lean’s auto-generated inductor for this is

or.inductor : ∀ (P Q R : Prop), (P → R) → (Q → R) → (P ∨ Q → R)

In other words, if you can prove $P\implies R$ and you can prove $Q\implies R$, then you can deduce $(P\lor Q)\implies R$. Again no mathematician is surprised that this statement is true, but perhaps some are surprised by the fact that a computer scientist might claim that this is true by induction on or.

### Example : equality.

The = symbol in Lean is defined as an inductive type! But I think I’m going to save the topic of what induction on equality is until the next post, where we will prove, by induction, that equality is an equivalence relation.

## Some final notes.

I was very surprised when I realised that every inductive type came with a principle of induction. In fact one can even define the reals as an inductive type, which means that there will be an inductor for reals meaning that you can do induction on the reals! But when I figured out what the induction principle said I was disappointed — it says “if you can prove it for every real which is an equivalence class of Cauchy sequences of rationals, you can prove it for every real”. Remember that the idea of the recursor is that it is a way of saying “every term of your type can be made using the constructors”, so if your only constructor for a real is an equivalence class of Cauchy sequences of rationals then this is what you get. However these other examples, and in particular these examples coming from logic, are quite funky. An example I didn’t talk about: and is an inductive type and its inductor is ∀ (P Q R : Prop), (P → (Q → R)) → (P ∧ Q → R), which is some propositional version of uncurrying (indeed the constant recursor for prod, the product of two types, is uncurrying on the nose). The basic facts in propositional logic about and and or and are proved constructively in Lean using recursors rather than by truth tables, because directly constructing the functions corresponding to the proofs is more appealing than a case split.

Not everything is an inductive type in Lean — there are two other kinds of types. There are quotient types, which are there for some kind of computer science efficiency reasons and which could be constructed using inductive types, and then there are function types, which are a different kind of thing. I don’t think it’s of mathematical interest whether they type you’re looking at is an inductive type or a function type, but here’s an example of a function type: logical not. In Lean, ¬ P is defined to be P → false. On the other hand most of the structures used by mathematicians (groups, subgroups, rings, fields, perfectoid spaces and so on) are defined as inductive types (often however with one constructor, so their induction principle is boring). An inductive type with one constructor is known as a structure in Lean. You can read more about inductive types and structures in the very wonderful Theorem Proving In Lean, in sections 7 to 9.

In my next post I’ll talk about induction on equality.

My daughter is busy with exam revision, so here’s some old digital art by one of my sons.

Posted in Learning Lean, Type theory | Tagged , | 9 Comments

## Formalising Mathematics : workshop 8 — group cohomology

This is notes on the last of the eight workshops I am running this term as part of an EPSRC Taught Course Centre course. Many thanks to the students for coming, it was very interesting working with you all.

I promised I would do something more ambitious in week 8, and eventually I settled on group cohomology. I usually write these blog posts just before the workshop but instead this week I wrote a README and am only now writing this more detailed document after the event.

## What is group cohomology?

I don’t know the history of group cohomology, but I do know that it’s possible to invent the theory of 1-cocycles in lots of ways (for example when trying to understand what the top right hand corner of a group homomorphism from a group G into 2×2 upper triangular matrices “is”) and so they were bound to show up eventually. The basic question is this: Say we have an abelian group $N$ (group law +) and a subgroup $M$, and let $P:=N/M$ be the quotient group. Say we have an action of a group (group law *) $G$ on $N$, and say $M$ is $G$-stable, so $P$ gets an induced $G$-action too. Now take $G$-invariants of everything and denote this $N\mapsto N^G$. The $G$-invariants $M^G$ of $M$ are still a subgroup of $N^G$, but $P^G$, the $G$-invariant elements of $P$, might be bigger than the image of $N^G$ in $P$. For example if $N=\mathbb{Z}/4\mathbb{Z}$ is the integers mod 4, with $G$ the cyclic group of order 2 acting by sending $x$ to $-x$, then $M:=N^G=\{0,2\}$ is $G$-stable but one checks that $G$ acts trivially on $P$ so the map $N^G\to P^G$ is no longer surjective.

Here is an attempt to “measure” failure of surjectivity. Say $p\in P$ is $G$-invariant. Lift randomly to $n\in N$. Then if $g\in G$ we see that $gn-n$ maps to $gp-p=0$ in $P$ so must be in $M$. Trying this example in the $\mathbb{Z}/4\mathbb{Z}$ case above you can convince yourself that you get a group isomorphism from $G$ to $M$ this way. But in general the map $G\to M$ sending $g$ to $gn-n$ is not a group homomorphism, and is not even “canonical”, as a mathematician would say — it depends on a choice of $n$ lifting $p$. Different choices differ by an element of $m$, and asking whether the function $g\mapsto gn-n$ is of the form $g\mapsto gm-m$ for some $m\in M$ is the same as asking whether our original element $p\in P^G$ lifts to an element of $N^G$.

These ideas synthesized into the following definitions. Say $G$ acts on $M$. The zero-th cohomology group $H^0(G,M)$ is just the subgrup $M^G$ of $G$-invariant elements.

A cocycle for a $G$-action on $M$ is a function $c:G\to M$ such that $c(gh)=c(g)+gc(h)$. A coboundary is a cocycle of the form $g\mapsto gm-m$ for some $m\in M$. The quotient of the cocycles by the coboundaries is called the first cohomology group $H^1(G,M)$.

The construction above shows that if $N$ is a $G$-module with a subgroup $M$ also preserved by $G$ and quotient $P=N/M$ (people write “a short exact sequence of $G$-modules”) then there is a map $H^0(G,P)\to H^1(G,M)$. This actually forms part of a seven term long exact sequence:

$0 \to H^0(G,M)\to H^0(G,N)\to H^0(G,P)\to H^1(G,M)\to H^1(G,N)\to H^1(G,P)$

and our goal in this workshop, at least if we had infinite time, would be to:

• define all the maps in that exact sequence
• prove the sequence is exact
• prove the inflation-restriction sequence is exact
• develop the concrete theory of H^2 via 2-cocycles and 2-coboundaries
• develop the abstract theory of H^n via n-cocycles and n-coboundaries.

My 2018 BSc project student Anca Ciobanu achived the first two of these goals and my 2019 MSc project student Shenyang Wu achieved the last one. So these goals are definitely possible! It will take rather longer than 2 hours though.

This has become a mini-project of mine, and my current thoughts can be seen in the ideas directory of the week 8 folder in the GitHub workshop repository. Ultimately I hope to get this stuff into mathlib (or perhaps to persuade someone else to get it into mathlib for me 😉 )

If you weren’t part of the workshop then you can still do it yourself, all you need is a working Lean and mathlib installation, which you can get following the instructions on the Leanprover community website.

Posted in Uncategorized | 5 Comments

## Formalising Mathematics : workshop 7 — quotients

At university I was told about how to quotient by an equivalence relation. But I was not told the right things about it. This workshop is an attempt to put things right.

## Models

Let me start by talking about things I learnt in my second and third year as an undergraduate. I went to a course called Further Topics in Algebra, lectured by Jim Roseblade, and in it I learnt how to take the tensor product $V\otimes_k W$ of two finite-dimensional $k$-vector spaces. Jim explained to us the universal property of the tensor product, and I saw for the first time in my life the abstract nonsense argument which explains that objects which satisfy the universal property for tensor products are unique up to unique isomorphism. He also explained that writing down the universal property did not count as a definition. The abstract nonsense argument shows that tensor products are unique if they exist. To prove that they exist, Jim wrote down an explicit model, involving taking a quotient of a gigantic free abelian group with basis the pairs $(v,w)$, modulo the relations saying that it is a vector space satisfying the universal property. I came out of these lectures with a good understanding of this technique. Later on in my undergraduate education I met things such as the localisation of a commutative ring at a multiplicative subset, and again I understood that these things were unique up to unique isomorphism if they existed, and that one could write down an explicit model to show they existed.

I came away with the impression that the key fact was the universal property, from which everything else could be proved, and that the model was basically irrelevant. To my surprise, I have learnt more recently that this is not exactly the whole truth. Here is an example, due to Patrick Massot. Let $R$ be a commutative ring, and let $S$ be a multiplicative subset. I claim that the kernel of the map $R\to R[1/S]$ is precisely the annihilator of $S$. Using the universal property and elementary arguments we can reduce to the statement that if $r\in R$ is in the kernel of every ring homomorphism $R\to T$ sending $S$ to units, then $r$ is annihilated by an element of $S$, but as far as I can see, to prove this we have to come up with a cunning $T$, and letting $T$ be the explicit model of $R[1/S]$ constructed as a quotient of $R\times S$ does the job. In particular the model appears again in the argument! Not equipped with the proof that it is the initial object in the category of $R$-algebras in which $S$ is invertible, but equipped with the proof that it is an $R$-algebra in which $S$ is invertible. My MSc student Amelia Livingston came up with other examples of this as part of her work on Koszul complexes in Lean. But I digress. Let’s get on to what we’ll talk about today.

## What is a quotient?

At university, in my first year, I was taught the following construction. If $X$ is a set and $\sim$ is an equivalence relation on $X$, then one can define the set $Q$ of equivalence classes for this equivalence relation. There is a natural map from $X$ to $Q$ sending $x\in X$ to its equivalence class. We write $Q=X/\sim$.

All the way through my undergraduate career, when taking quotients, I imagined that this was what was going on. For example when forming quotient groups, or quotient vector spaces, or later on in my life quotient rings and quotient modules, I imagined the elements of the quotient to be sets. I would occasionally look at elements of elements of a quotient set, something rarely done in other situations. I would define functions from a quotient set to somewhere else by choosing a random representative, saying where the representative went, and then proving that the construction was ultimately independent of the choice of representative and hence “well-defined”. This was always the point of view presented to me.

I have only relatively recently learnt that actually, this model of a quotient as a set of equivalence classes is nothing more than that — it’s just a model.

## The universal property of quotients

Here’s the universal property. Say $X$ is a set equipped with an equivalence relation. A quotient for this data is a pair consisting of a set $Q$ and a function $p:X\to Q$ which is constant on equivalence classes (i.e. $x_1\sim x_2\implies p(x_1)=p(x_2)$) and which is furthermore initial with respect to that property. In other words, if $T$ is any other set and $f:X\to T$ is any function which is constant on equivalence classes, there exists a unique $g:Q\to T$ such that $f=g\circ p$. The usual abstract nonsense argument shows that quotients are unique up to unique isomorphism.

Example 1) Let $X$ be a set equipped with an equivalence relation $\sim$, let $Q$ be the set of equivalence classes of $X$, equipped with the map $p:X\to Q$ sending an element of $X$ to its equivalence class. Then $(Q,p)$ is a quotient of $X$ by $\sim$.

Example 2) Let $X$ and $Y$ be any sets, and say $p:X\to Y$ is a surjection. Define an equivalence relation on $X$ by $x_1\sim x_2\iff p(x_1)=p(x_2)$. Then $(Y, p)$ is a quotient of $X$ by $\sim$.

Example 2 shows that this construction of quotients using equivalence classes is nothing more than a model, and that there are plenty of other sets which show up naturally and which are not sets of equivalence classes but which are quotients anyway. The important point is the universal property. In contrast to localisation of rings, I know of no theorem about quotients for which the “equivalence class” model helps in the proof. The only purposes I can see for this “equivalence class” model now are (1) it supplies a proof that quotients do actually exist and (2) a psychological one, providing a “model” for the quotient.

I have had to teach quotients before, and students find them hard. I think they find them hard because some just basically find it hard to handle this whole “set whose elements are sets” thing. Hence even though the psychological reason was ultimately useful for me, and eventually I “got the hang of” quotients, I do wonder what we should be doing about the people who never master them. An alternative approach in our teaching is to push the universal property angle. I have never tried this. It might turn out even worse!

## A worked example : maps from a quotient.

Here is the mantra which we hear as undergraduates and thus go on to feed our own undergraduates. The situation is this: we have some quotient object $Q$ (e.g. a quotient ring or a quotient group or whatever) and we want to define a map $g$ from this quotient to some other thing $Z$.

### “Set of sets” variant:

The argument goes something like this:

“Recall that $Q$ is a quotient, so its elements are really equivalence classes. We want to define a map from $Q$ to $Z$, so let’s choose $q\in Q$. Now remember that really $q$ is actually an equivalence class. Choose an element $x$ of this equivalence class, and now apply a construction which seems to depend on $x$, giving us an element of $Z$ [Note that this construction is just a function from $X$ to $Z$, so let’s call it $f$]. That is our map from $Q$ to $Z$. But now we need to check that this map is well-defined. This means the following: during this construction we did something “non-canonical”, namely choosing a random element $x$ of $q$. We need to check that our “function” from $Q$ to $Z$ is in fact independent of this choice. So say that instead we had chosen $y\in q$. Then $x$ and $y$ are in the same equivalence class, so they are equivalent. Now an explicit computation shows that $f(x)=f(y)$ and hence we’re OK — our function is well-defined.”

Is the student left wondering what the heck it means for a function to be “not well-defined”? How come nobody ever talks about any other kind of function before being “well-defined”? I thought the axiom of choice said that we can choose an element in each equivalence class all at the same time. How come we can’t just define $Q\to Z$ by taking $q$, using our axiom of choice element $x\in q$ and sending $q$ to $f(x)$? Is that “well-defined”?

### Universal property variant:

The argument now looks like this.

“Recall that $Q$ is a quotient, so it satisfies the universal property of quotients. Recall that this says that to give a map from $Q$ to another set $Z$, all we have to do is to give a function $f:X\to Z$ which is constant on equivalence classes; the universal property then gives us a unique $g:Q\to Z$ such that $f=g\circ p$. So let’s define $f$ like this [and now define $f$], and now let’s check it’s constant on equivalence classes [the same calculation as before]. The universal property thus gives us the function $g:Q\to Z$ which we require.”

Is that the way to teach this stuff to undergraduates? Is it more confusing, less confusing, or maybe just differently confusing?

## Today’s workshop

Lean has quotients of equivalence relations built in. This is not particularly necessary; it is an implementation decision which did not have to be done like this. One can certainly make quotients as types of equivalence classes (and indeed this has been done in mathlib, with the theory of partitions). However Lean also has an opaque quotient function, which creates another model of a quotient; we don’t know what the elements are, but we know the universal property and this is all we need.

Today we will learn about quotients by working through the explicit construction of the integers as the quotient of $\mathbb{N}^2$ by the equivalence relation $(a,b)\sim (c,d)\iff a+d=c+b$, and a proof that it is a commutative ring. We will go on to play around a bit with the universal property of quotients, and finish by using abstract nonsense to construct a bijection from our quotient to Lean’s $\mathbb{Z}$.

There is far far too much material to do in one 2-hour workshop, but I was on a roll. As ever, the material is here, in src/week_7.

## Appendix: why not just use equivalence classes?

Mathematicians use quotients everywhere, so it’s kind of interesting that they have their own type, but why not just use the type of equivalence classes? Lean can make that type, and it’s possible to prove all the API for it — I’ve done it. So why explicitly extend Lean’s type theory to add a new quotient type? The answer seems to be this. The universal property for quotients is that if $p:X\to Q$ is a quotient of $X$ by an equivalence relation $\sim$, then we have a bijection between the functions $g:Q\to T$ and the $\sim$-equivariant functions $f:X\to T$. To build $f$ from $g$ is easy — just compose with $p$. To go the other way Lean has a function called quotient.lift f h, which spits out g given f and a proof h that f is $\sim$-equivariant (i.e. constant on equivalence classes). The claim that these constructions are inverse bijections boils down to the assertion that f = (quotient.lift f h) ∘ p, and the proof of this, remarkably, is rfl — it’s true by definition. This is what the baked-in quotient construction buys you. I used to think that this was really important (and indeed in the past I have claimed that this is a key advantage which Lean has over Coq). Now I am not so sure. It probably makes proofs a bit slicker occasionally — but nowadays I am less convinced by the idea of definitional equality in general — I’m happy to rewrite.

## Formalising Mathematics : workshop 6 — limits

I talked about limits of sequences of real numbers in workshop 3, but this week I come back to them, because this week we’ll see that the usual ten-line proof that if $(a_n)_n\to\ell$ and $(b_n)_n\to m$ then $(a_nb_n)_n\to \ell m$ can be replaced by a two-line proof involving something called filter.tendsto_mul. So this week I’m going to talk about tendsto, but first I think it’s worth refreshing our memories about the useful mental picture of a filter as a generalised subset.

## Reminder: filters are generalised subsets

Let X be a type (you can call it a set if you like). The type of subsets of X has a bunch of nice structure — there’s a partial order $\subseteq$, there are unions and intersections (both finite and infinite) and they satisfy a bunch of axioms. Back in the days of Isaac Newton, one particularly well-studied type was the type of real numbers. However, people had not quite worked out whether infinitesimals existed — infinitely small non-zero real numbers called things like $dy$ and $dx$ — and some people like Leibniz wanted to divide one of them by the other because they were discovering some new-fangled thing called calculus. By the 20th century, the experts had made their decision: there were no infinitesimally small nonzero real numbers, and that’s what the textbooks say today (other than Robinson’s book on non-standard analysis, but nobody uses that for teaching undergraduate calculus). However it was equally clear that infinitesimals provided a good picture.

A filter on X is a kind of “generalised subset” of X. Each subset S of X gives you a filter, called the principal filter 𝓟 S, and there are other filters too corresponding to slightly weirder things. For example, if X = ℝ then there’s a filter called 𝓝 0, the neighbourhood filter of 0, which should be thought of as containing 0 and all the infinitesimally small numbers. Just like usual subsets, these generalised subsets have a partial order, which we’ll call $\leq$, extending the partial order on usual subsets. In reality these filters are defined completely rigorously as some collection of usual subsets satisfying some axioms but we won’t go into these this week, we’ll just stick with the picture.

## Pushing forward and pulling back subsets

Let’s stick with “usual” subsets for this section, but let’s throw in a second type Y and a function f : X → Y. The function gives us some kind of dynamics in the system — we can start using the function to move sets around. The most obvious way that a function can move a set around is via the image construction. Given a subset S of X, we can consider what a mathematician would call $f(S)$, the image of S in Y, defined as the set of y in Y such that there exists x in S with f x = y. This is an abuse of notation — the inputs to f are supposed to be elements of X, not subsets of X, so f X does not make sense, and in Lean we carefully differentiate between these ideas by writing f '' S for the image of S in Y. We call this “pushing forward a subset along f“.

Conversely, if T : set Y then there is a way of “pulling T back along f” to give us a subset $f^{-1}(T)$ of X, consisting of the x in X such that f x is in T. Again Lean has a weird notation for this, because ⁻¹ is taken by a general kind of inverse function on a group. So we write f⁻¹' T for this construction.

If set X denotes the type of subsets of X, then f : X → Y gives rise to functions f '' : set X → set Y and f⁻¹' : set Y → set X. Are these functions inverse to one another? No, not remotely! In general, doing one then the other won’t get you back to where you started. So what is the relationship between these two constructions? The fancy answer is that they form a Galois connection, and the even fancier answer is that they are a pair of adjoint functors. But let’s not go into this. Let’s talk about a fundamental predicate.

Let’s stay with the set-up: X and Y are types, and f : X → Y is a function. Say S is a subset of X and T is a subset of Y. Then we can ask ourselves the following true-false question: does f restrict to a function from S to T? In other words, is it true that x ∈ S implies f x ∈ T? Perhaps a good notation for this idea would be something like $S\leq_f T$. The reason this notation is appealing is that if $S'\subseteq S\leq_f T$ then $S'\leq_f T$, and if $S\leq_f T\subseteq T'$ then $S\leq_f T'$, and this feels like some kind of transitivity statement, but it isn’t literally transitivity of some relation on a type, because S and T don’t in general have the same type — they’re subsets of different sets. How can we restate $\leq_f$ using pushforwards or pullbacks?

If you think about it, it turns out that there is a way to state this relation using pushforwards, and an equivalent way using pullbacks. One can check easily that $S\leq_f T$ is equivalent to f '' S ⊆ T and also to S ⊆ f⁻¹' T. In particular f '' S ⊆ T and S ⊆ f⁻¹' T are equivalent to each other (and we have proved that the functors are adjoint, for those of you who know this category-theoretic language).

Our aim is to find analogous constructions for filters.

## Pushing forward and pulling back generalised subsets

Jordan Ellenberg on Twitter remarked that something I said last week reminded him of a fact about ideals. I really like this idea because if we’re going to think in pictures then it helps to understand analogies. When people were trying to understand factorization in integral domains such as the integers, they have to deal with the fact that $6=2\times3$ and also $6=(-2)\times(-3)$ but that really these are “the same factorization”. This leads us to an equivalence relation on elements of an integral domain — two elements are equivalent if one is a unit times the other, where the units are divisors of 1. Equivalence classes of elements are the same as principal ideals of the ring, and one might have hoped that this idea solves all your factorization problems. But rings like $\mathbb{Z}[\sqrt{-5}]$ teach us otherwise — now $6=2\times 3=(1+\sqrt{-5})\times(1-\sqrt{-5})$ and these factorizations are still not equivalent. The fix was to introduce some magic “ideal numbers”, or “ideals” for short, which are not really numbers, but some kind of generalised number, and now every non-zero generalised number in $\mathbb{Z}[\sqrt{-5}]$ factors uniquely into prime generalised numbers. The reason I am bringing this up is that it is not difficult to check that every ideal of a commutative ring is uniquely determined by the principal ideals which it contains (because it is uniquely determined by the elements it contains).

Filters, a.k.a. generalised subsets, have the reverse property: every filter is uniquely determined by the principal filters which contain it. This is an extensionality lemma for filters, and it is this idea which we need to keep in mind when we try to figure out how to push forward and pull back filters.

As ever, say X and Y are types, and f : X → Y is a function. Pushing forward filters along f (called map f : filter X → filter Y in Lean) is barely any harder than pushing forward subsets. Say F : filter X is a filter on X. Let’s figure out how to define its pushforward map f F, a filter on Y. By the remark above, it suffices to figure out which subsets T of Y, or more precisely which principal filters T, satisfy map f F ≤ T. If we want our intuition to be correct, this should be the case precisely when F ≤ f⁻¹' T, because this feels exactly like the situation $F \leq_f T$ studied above. Hence we will define the pushforward map f F of filter F along f by saying that map f F ≤ T if and only if F ≤ f⁻¹' T, and one can check that this definition (of (map f F).sets) satisfies the axioms of a filter. This is one of the things you’ll be proving today in workshop 6.

Pulling back filters, called comap f : filter Y → filter X, is harder, because if G : filter Y then this time we want to solve comap f G ≤ S and the adjoint functor nonsense above only tells us about S ≤ comap f G, which is not enough information: for example the only nonempty subset of ℝ contained in the infinitesimal neighbourhood filter 𝓝 0 is the subset {0}, but 𝓝 0 is strictly larger than the principal filter 𝓟 0 because it also contains Leibniz’s elements $dx$. The one question we can answer heuristically however, is a criterion for comap f G ≤ f⁻¹' T, because if our mental model of comap f is “generalised f⁻¹'” then G ≤ T should imply this. The problem with just restricting to these Ts is that if f is not injective then we can never distinguish between distinct elements of X mapping to the same element of Y, and yet if comap f G ≤ f⁻¹' T and f⁻¹' T ≤ S then we certainly want comap f G ≤ S. So this is what we go for: comap f G ≤ S if and only if there exists a subset T of Y such that G ≤ T and f⁻¹' T ≤ S. It turns out that there does exist a filter on X satisfying these inequalities, and this is our pullback filter.

One can check, in exact analogy to pushing forward and pulling back subsets, that map f F ≤ G ↔ F ≤ comap f G — indeed, this is the boss level of Part A of today’s workshop.

## filter.tendsto — what and why?

The picture behind filter.tendsto f is easy, indeed we have seen it before for sets. The idea is simply that if f : X → Y as usual, and F : filter X and G : filter Y then we can ask whether f restricts to a map from the generalised set F to the generalised set G. This true-false statement is called tendsto f F G in Lean. It is equivalent to map f F ≤ G and to F ≤ comap f G, and it seems to be pronounced “F tends to G along f“, although this notation does not often get pronounced because it seems to be rather rare to see it in the literature. It is used absolutely all over the place in Lean’s treatment of topological spaces and continuity, and it’s based on an analogous formalisation of topology in the older Isabelle/HOL proof system.

Why is this notion useful? Here is a beautiful example. Let us say now that X and Y are topological spaces, and f : X → Y is an arbitrary function. What does it mean to say that f is continuous at x, an element of X? The idea is that if x' is an element of X very close to x, then f x' should be very close to f x. How close? Let’s just say infinitesimally close. The idea is that f should send elements of X infinitesimally close to x to elements infinitesimally close to f x. In other words,

tendsto f (𝓝 x) (𝓝 (f x)).

It’s as simple as that. We want f to map an infinitesimal neighbourhood of x to an infinitesimal neighbourhood of f x. We say that f is continuous if it is continuous at x for all x in X. One can check that this is equivalent to the usual definition of continuity. The proof that a composite of continuous functions is continuous is just as easy using this filter language as it is in the usual open set language, and certain other proofs become easier to formalise using this point of view.

But the real power of the tendsto predicate is that it does not just give us a new way of thinking about continuity of a function at a point. It also gives us a new way of thinking about limits of functions, limits of sequences of real numbers, limits in metric space, and more generally essentially any kind of limit that comes up in an undergraduate degree — these concepts are all unified by the tendsto predicate.

The second Lean file in today’s workshop consists of an analysis of a two-line proof of the fact that the limit of the product of two real sequences is the product of the limits. Let me finish by talking about these two lines.

The first line observes that the definition is_limit a l of week 3 (the traditional $\forall \epsilon>0, \exists N, n\geq N\implies |a(n)-l|<\epsilon$) is equivalent to the rather slicker-looking tendsto a cofinite (𝓝 l), where cofinite is the cofinite filter (defined as the generalised subset C of ℕ such that C ≤ S if and only if S is cofinite). It is straightforward to see that these two predicates are equivalent: tendsto a cofinite (𝓝 l) is equivalent to saying that any subset of the reals whose interior contains l has pre-image under a a cofinite subset of the naturals, which is equivalent to saying that for any $\epsilon>0$, the number of $n$ such that $|a(n)-l|\geq\epsilon$ is finite.

The second line comes from the following standard fact about neighbourhood filters (which we will prove): if Y is a topological space equipped with a continuous multiplication, if A is any type, and if f and g are functions from A to Y and F is any filter on A, then tendsto f F (𝓝 y) and tendsto g F (𝓝 z) together imply tendsto (f * g) F (𝓝 (y * z)), where f * g denotes the map from A to Y sending a to f a * g a. We prove this result from first principles in the file, although of course it is in Lean’s maths library already, as is the fact that multiplication on the real number is continuous, which is why we can give a complete two-line proof of the fact that limit of product is product of limits in Lean.

## Formalising Mathematics : workshop 5 — filters

The word “filter” means different things to different people; just to clarify, this week we’ll be learning about the mathematical notion of a filter on a set. I’ve written about these filters before, but since then I’ve managed to pick up a much better understanding of how to think about filters, and I hope this shows here. When I wrote that post in 2018 I knew that filters were “something to do with limits”, but now I realise that this is wrong. They are used to talk about limits, but what a filter itself is, is simply a generalisation of a subset of a set.

## What is a filter?

Let X be a type, i.e. what most mathematicians call a set. Then X has subsets, and the collection of all subsets of X has some really nice properties — you can take arbitrary unions and intersections, for example, and if you order subsets of X by inclusion then these constructions can be thought of as sups and infs and satisfy a bunch of axioms which one might expect sups and infs to satisfy, for example if $X_i\subseteq Y$ for all $i$ in an index set then $\bigcup_i X_i \subseteq Y$. In short, the subsets of a set form what is known in order theory as a complete lattice.

A filter can be thought of as a kind of generalised subset of X. Every subset S of X gives rise to a filter on X, called the principal filter 𝓟 S associated to S, and we have 𝓟 S = 𝓟 T if and only if S = T. However if X is infinite then there are other, nonprincipal, filters F on X, which are slightly vaguer objects. However, filters still have an ordering on them, written F ≤ G, and it is true that S ⊆ T ↔ 𝓟 S ≤ 𝓟 T (indeed we’ll be proving this today). To give an example of a filter which is not principal, let’s let X be the real numbers. Then for a real number x there is a filter 𝓝 x, called the neighbourhood filter of x, with the property that if U is any open subset of $\mathbb{R}$ containing x then 𝓟 {x} < 𝓝 x < 𝓟 U. In other words, 𝓝 x is some kind of “infinitesimal neighbourhood of x“, strictly bigger than {x} but strictly smaller than every open neighbourhood of x. This is a concept which cannot be formalised using sets alone, but can be formalised using filters.

## The formal definition of a filter.

Let me motivate the definition before I give it. Say F is a filter. Let’s define F.sets to be the subsets of X which contain F, i.e., the S such that F ≤ 𝓟 S . Here is a property of filters which I have not yet mentioned: If two filters F and G satisfy F.sets = G.sets, then F = G; in other words, a filter is determined by the principal filters which contain it. This motivates the following definition: why not define a filter F to be the set of subsets of X which contain it? We will need some axioms — what are reasonable axioms? We don’t want a filter to be bigger than X itself, and we want to make sure that if S contains F then T contains F for any T ⊇ S; finally if both S and T contain F then we want S ∩ T to contain F. That’s the definition of a filter!

structure filter (α : Type*) :=
(sets                   : set (set α))
(univ_sets              : set.univ ∈ sets)
(sets_of_superset {x y} : x ∈ sets → x ⊆ y → y ∈ sets)
(inter_sets {x y}       : x ∈ sets → y ∈ sets → x ∩ y ∈ sets)


A filter on X, or, as Lean would like to call it, a term F : filter X of type filter X, is a collection F.sets of subsets of X satisfying the three axioms mentioned above. That’s it. Unravelling the definitions, we see that a sensible definition of F ≤ G is that G.sets ⊆ F.sets, because we want G ⊆ S to imply F ⊆ S (or, more precisely, we want G ≤ 𝓟 S to imply F ≤ 𝓟 S).

It’s probably finally worth mentioning that in Bourbaki, where this concept was first introduced, they have an extra axiom on their filters — they do not allow 𝓟 ∅ to be a filter — the empty set is not a generalised set. In this optic this looks like a very strange decision, and this extra axiom was dropped in Lean. Indeed, we bless 𝓟 ∅ with a special name — it is ⊥, the unique smallest filter under our ≤ ordering. The (small) advantage of the Bourbaki convention is that an ultrafilter can be defined to literally be a minimal element in the type of all filters, rather than a minimal element in the type of all filters other than ⊥. This would be analogous to not allowing a ring R to be an ideal of itself, so one can define maximal ideals of a ring to be the maximal elements in the set of all ideals of the ring. However this convention for ideals would hugely break the functoriality of ideals, for example the image of an ideal along a ring homomorphism might not be an ideal any more, the sum of two ideals might not be an ideal, and so on. Similarly, we allow ⊥ to be a filter in Lean, because it enables us to take the intersection of filters, pull filters back and so on — it gives a far more functorial definition.

## What’s in today’s workshop?

The material this week is in week_5 of the formalising-mathematics GitHub repo which you can download locally if you have leanproject installed or, if you have the patience of a saint and don’t mind missing some of the bells and whistles, you can try online (Part A, and Part B). NB all this infrastructure didn’t just appear by magic, I wrote the code in the repo but I had nothing to do with all these other tricks to make it easier for mathematicians to use — we have a lot to thank people like Patrick Massot and Bryan Gin-ge Chen for.

In Part A we start by defining principal filters and we make a basic API for them. I give a couple more examples of filters too, for example the cofinite filter C on X, which is all the subsets of X whose complement is finite. This filter is worth dwelling on. It corresponds to a generic “every element of X apart from perhaps finitely many” subset of X, perhaps analogous to a generic point in algebraic geometry. However, there exists no element a of X such that 𝓟 {a} ≤ C, because X - {a} is a cofinite subset not containing a. In particular, thinking of filters as generalised subsets again, we note that whilst a generalised set is determined by the sets containing it, it is definitely not determined by the sets it contains: indeed, C contains no nonempty sets at all.

In Part B we go on to do some topology. We define neighbourhood filters and cluster points, and then talk about a definition of compactness which doesn’t involve open sets at all, but instead involves filters. I am still trying to internalise this definition, which is the following:

def is_compact (S : set X) := ∀ ⦃F⦄ [ne_bot F], F ≤ 𝓟 S → ∃ a  ∈ S, cluster_pt a F


In words, a subset S of a topological space is compact if every generalised non-empty subset F of S has closure containing a point of S.

Let’s think about an example here. Let’s stick to S = X. Say S is an infinite discrete topological space. Then the cofinite filter is a filter on S which has no cluster points at all, meaning that an infinite discrete topological space is not compact. Similarly imagine S is the semi-open interval $(0,1]$. Then the filter of neighbourhoods of zero in $\mathbb{R}$, restricted to this subset (i.e. just intersect all the sets in the filter with $(0,1]$), again has no cluster points, so this space is not compact either. Finally let’s consider $\mathbb{R}$ itself. Then the at_top filter, which we will think about in Part A, consists of all subsets $T$ of $\mathbb{R}$ for which there exists some $r\in\mathbb{R}$ such that $(r,\infty)\subseteq T$. This “neighbourhood of $+\infty$” filter has no cluster points in $\mathbb{R}$ (note that $+\infty$ would be a cluster point, but it’s not a real number). Hence $\mathbb{R}$ is not compact either. We have certainly not proved here that this definition of compact is mathematically equivalent to the usual one, but it is, and if you’re interested, and you’ve learnt Lean’s language, you can just go and read the proof for yourself in Lean’s maths library.

The boss level this week is, again, that a closed subspace of a compact space is compact. But this time we prove it with filters. As last time, we prove something slightly more general: if X is any topological space, and if S is a compact subset and C is a closed subset, then S ∩ C is compact. Here’s the proof. Say F is a nonempty generalised subset (i.e. a filter) contained in S ∩ C. By compactness of S, F has a cluster point a in S. But F is contained in C, so all cluster points of F are cluster points of C, and the cluster points of C are just the closure of C, which is C again. Hence a is the element of S ∩ C which we seek. No covers, no finite subcovers.

| Tagged , | 5 Comments

## Formalising Mathematics : workshop 4 — topology

OK, an overview of this week: we’re doing topology. I was going to introduce filters but I decided to put them off for one more week, so this week it’s topology the way it is traditionally taught in mathematics departments. The two main goals are:

1. Continuous image of a compact space is compact;
2. Closed subspace of a compact space is compact.

As I say, we will be doing the traditional approach, so the thing we will use about a compact space is that every open cover has a finite subcover.

Here then are sketches of the proofs, which should be enough for a mathematician:

Theorem. If S is a compact space, if f is a continuous function defined on S, and if T=f(S), then T is compact.

Proof. Say we have an open cover of T by sets $U_i$; we seek a finite subcover. Let $V_i$ denote the preimage of $U_i$ under f. Then the $V_i$ are open and cover $S$. Choose a finite subcover of $V$‘s; then the corresponding $U$‘s cover f(S). QED.

Theorem. A closed subspace of a compact space is compact.

Proof. Say S is compact and C is a closed subset. Then S \ C is open. Cover C by opens $U_i$; we seek a finite subcover. Now the $U_i$ and S \ C are opens which cover S, so we can choose a finite subcover consisting of finitely many of the $U_i$ and S \ C which cover S and, a forteriori, cover C. But S \ C doesn’t contribute to this cover, so we have found our finite subcover of C. QED.

The main goal this week is to formalise these proofs. The formalisations are mostly straightforward, given what we know, but may well take a beginner some time. I don’t know if it’s helpful or disspiriting to hear that when writing up the solutions I just sat down and worked straight through these in 5-10 minutes.

Next week, we will be doing topology again, but in a completely different way, using filters. One thing about the filter proofs is that they are shorter to formalise than the traditional proofs. But we’ll talk about filters next week, let’s stick to this week for now.

In order to formalise these traditional proofs, it will help us to know some more API (i.e. theorems in the library, for example “a set is closed if and only if its complement is open”), and a couple more tactics. We will also need to know about the ways Lean thinks about arbitrary unions (i.e., unions of possibly infinitely many things). We have (very briefly) seen unions of two things, back in week 1, but here we need to consider arbitrary covers by open sets, so the union of two things isn’t enough. So let’s talk about how Lean does unions. And before we do that, let’s remind ourselves about how sets work.

## Sets in Lean.

Lean’s concept of an abstract “collection of stuff” is a type, rather than a set. When a mathematician writes “a group is a set equipped with a binary operation…”, what they really mean is “a group is a collection of things equipped with a binary operation…”. In type theory, a collection of things is known as a type, not a set, and the things themselves are known as terms of the type, not elements of the set. This is nothing more than a change in nomenclature really. So, if Lean uses type theory and not set theory, then why are we talking about sets at all?

Well, even though Lean doesn’t have the abstract concept of a set as a random collection of stuff, it does have the concept of a subset of a type. This means exactly what you think it means: for example, the natural numbers are a type, and the even numbers, prime numbers, counterexamples to the Goldbach conjecture etc are all examples of subsets of this type. If we have a collection of subsets of a type then we can take unions and intersections and so on, to make more subsets of that type. We can make things like the even numbers and the prime numbers into types, and then these would be called subtypes rather than subsets, but we would then lose the ability to take unions and intersections.

In some sense, set theory is more powerful than type theory, because in set theory you can just go crazy and take unions of random things. If $G$ is a group, then you can take the union of $G$, the real number $\pi$, and the Riemann zeta function, because all of these objects are sets. This is something which you cannot do in type theory. Whether you think this makes type theory less powerful than set theory or a lot saner than set theory is a matter for your own conscience, but we will not be discussing these foundational issues here. We will be discussing topological spaces, and if X is a type then a topological space structure on X and the axioms which come with it fortunately only talk about unions and intersections of subsets of X, so we will not need to take the union of X with other random mathematical things.

The type of subsets of X is called set X. Why not subset X? I guess what’s going on is that if S : set X (i.e. S is a term of type set X, otherwise known as a subset of X) then S is a set of elements of X. In fact here’s another reason. A topology on X is a set of subsets of X called the open sets, satisfying some axioms, and so in Lean the data of a topology is given by a term C : set (set X). Humans typically say “a set of subsets of X“. If we’d called it subset X then C : subset (subset X) would be a subset of the type of subsets of X, which is more of a mouthful. Anyway, set X it is; let’s continue.

A word of warning: in type theory, terms can have exactly one type. If x is a term and x : A and x : B (i.e. x has type A and type B) then necessarily A = B. So this presents us with a difficulty. If X : Type is a type and S : set X is a subset, and if x : X is a term of type X which is an element of the subset S, then surely we want x : S (representing the fact that x is an element of S), and yet I’ve just explained that this is not allowed unless S = X, which of course will not in general be true. But in fact even more is false — the assertion x : S does not even make sense, because S is not a type! It is a term, of type set X. And S = X does not make sense either! The subset of X corresponding to all the terms of X is called univ : set X. So how the heck are we going to express the true-false statement that x is an element of S? Well, S is a set, not a type, so we can just use set notation! x ∈ S is the assertion that the term x : X is an element of the subset S : set X.

## Three types of union.

There are three ways to do unions in Lean! Four if you count binary unions — if S T : set X then the usual S ∪ T notation is the union of S and T. But here we are concerned with arbitrary unions — the union of a collection of subsets of X.

### Union

Let’s start with set.Union. This is a function: let’s take a look at its type. We can do this by typing #check set.Union in Lean and then hovering over set.Union. We see

set.Union : (ι → set X) → set X


Here X and ι are types. So we see that set.Union takes in a function from ι to the subsets of X (let’s call this function F), and it outputs a subset of X; this subset is of course the union of the F i as i runs through the terms of ι, but we can’t see that from the type alone, we would have to use #print set.Union to see this, or we can just read the docstring, which is also visible when you hover over set.Union. If you #check @set.Union (note the @, which means “display all the inputs which Lean will figure out by itself”) you’ll see something like this:

set.Union : Π {X : Type v} {ι : Sort x}, (ι → set X) → set X


telling us that the first two inputs are in squiggly brackets so Lean will not be asking the user for them, it will be figuring out them itself by looking at the type (i.e., the domain and range) of the input F. Technical note: that Sort means that ι can, as well as a Type, also be a Prop. More on this later.

Now set.Union is a definition, so it needs an interface, or an API. What do we need to know about this definition in order to use it? Well, in some sense, we only need to know one theorem: we need to know when is an element in the union of a bunch of sets. This is

set.mem_Union : x ∈ set.Union F ↔ ∃ (i : ι), x ∈ F i


If, like me, you are sick of typing set in front of every function or theorem about sets, we can just open set at the beginning of our file, and then this theorem is magically just called mem_Union, and the union of the sets indexed by the function F is just called Union F. Actually we can do better — there is notation for Union. If F : (ι → set X) then Union F can be written as ⋃ (i : ι), F i. You will find that the most common usage of mem_Union is rewriting x ∈ ⋃ (i : ι), F i into ∃ (i : ι), x ∈ F i, which you can then make progress with using cases (if it’s a hypothesis) or use (if it’s a goal).

### sUnion

The next kind of union I’ll talk about (although we won’t use it this week so you can skip straight to bUnion if you like) is set.sUnion, or just sUnion if you have the set namespace open. Here our arbitrary collection of subsets of X is just a set of them, rather than a collection indexed by a type ι. We’ve seen this before — in Lean this looks like C : set (set X) — a set of sets of elements of X, or a set of subsets of X if you like. An example would be the open sets in a topology on X, or, as we’ll see next week, the sets which comprise a filter on X. We take the union of these sets using set.sUnion.

sUnion : set (set X) → set X


Again this a definition (it has an input {X} as well, but Lean will figure it out once it sees C), again it has notation (⋃₀ C is sUnion C) and again a definition needs an API, the key theorem of which is

mem_sUnion : x ∈ ⋃₀ C ↔ ∃ (S : set X) (H : S ∈ C), x ∈ S


In words — a term x : X is in the union of all the sets of C if and only there exists a subset S of X which is in C and for which x ∈ S.

### bUnion

There is a third kind of union, which is actually quite helpful in practice. This is when we have an index type ι and F : ι → set X, but we only want to take the union of F i for all i in a subset Z of ι. There seems to be no definition of set.bUnion itself, but we have notation ⋃ i ∈ Z, F i and of course the key theorem:

theorem mem_bUnion_iff {Z : set ι} {F : ι → set X} {x : X} :
x ∈ (⋃ i ∈ Z, F i) ↔ ∃ i ∈ Z, x ∈ F i


Note that for some reason it’s called mem_bUnion_iff rather than mem_bUnion, which is odd because Lean is usually extremely anally retentive about consistency like this. It turns out that mem_bUnion is just the right to left implication of this.

We finish with a technical note, which can be omitted on first reading. This bUnion stuff is actually just a union over a union, once you realise that in type theory true-false statements can be thought of as types, and proofs as their terms. This is why ι is allowed to be a Prop as well as a Type. Here are two proofs of the same result (one direction of mem_bUnion_iff). The first uses mem_bUnion_iff and is unsurprising. The second is a more peculiar proof (I give the full Lean code so you can try it at home):

import data.set.lattice

variables (X : Type) (ι : Type) {Z : set ι} {F : ι → set X} {x : X}

open set

-- proof using mem_bUnion_iff
example (h : x ∈ (⋃ i ∈ Z, F i)) : ∃ i ∈ Z, x ∈ F i :=
begin
rw mem_bUnion_iff at h,
exact h,
end

-- proof using mem_Union
example (h : x ∈ (⋃ i ∈ Z, F i)) : ∃ i ∈ Z, x ∈ F i :=
begin
-- h : x ∈ ⋃ (i : ι) (H : i ∈ Z), F i
rw mem_Union at h,
-- h : ∃ (i : ι), x ∈ ⋃ (H : i ∈ Z), F i
cases h with i hixF,
-- hixF : x ∈ ⋃ (H : i ∈ Z), F i
rw mem_Union at hixF,
-- ∃ (i_1 : i ∈ Z), x ∈ F i
cases hixF with hiZ hx,
-- hx : x ∈ F i
-- ⊢ ∃ (i : ι) (H : i ∈ Z), x ∈ F i
use [i, hiZ],
exact hx,
end


The first proof uses mem_bUnion_iff ; the second one emulates it with mem_Union. You can see in the second proof that ⋃ i ∈ Z, F i is unfolded to ⋃ (i : ι) (H : i ∈ Z), F i so it is really a union over two things. First a union over all i : ι, and second a union over all proofs that i ∈ Z ! This is a Prop but we’re taking a Union over it anyway. If i ∈ Z then this union is a union over one element, and if i ∉ Z then it’s a union over no elements, so things work out in the end.

In the “warm-up” part A for this week, we try some basic things with infinite unions.

## Theorems you will need this week

I can’t guess the exact direction you will go, but here are some of the theorems I used in my formalisation of the topology proofs:

• continuous.is_open_preimage says that the preimage of an open set is open.
• subset_def says S ⊆ T ↔ ∀ x, x ∈ S → x ∈ T
• compact_iff_finite_subcover' says that a space is compact iff every open cover has a finite subcover. Note the ' at the end of the statement! Without the ' you will get a different notion of finiteness.
• is_open_compl_iff says that a set has open complement iff it’s closed.
• finite.preimage says that the preimage of a finite set under an injective map is finite. Note that if you use the unprimed compact_iff_finite_subcover then you will end up with a finset instead, and that’s another story.

I often find these theorems by half-guessing the name and then pressing ctrl-space. Or I use library_search.

## Some tactics

Here are some tactics I’ve not mentioned before, but which I use in the model solutions.

• change : If the goal is ⊢ P, and Q is a proposition which is equal by definition to P, then change Q will change the goal to ⊢ Q. This can be helpful if you want to use rw on a goal but it’s not quite in the form you want.
• rwa h just means rw h, assumption — it’s an abbreviation. Recall the assumption tactic tries to close the goal by going through all the hypotheses X in the tactic state and trying exact X. That proof of one direction of mem_bUnion_iff could have been done in one line with rwa mem_bUnion_iff at h.
• contradiction : if there are two hypotheses in the tactic state, h1 : P and h2 : ¬ P then the contradiction tactic will close the goal immediately (by doing exfalso, apply h2, exact h1).
• rcases : This is cases on steroids — it will take things apart into more than two pieces. It is very useful for some of this topology stuff. For example if you have a hypothesis h : ∃ (i : ι) (H : i ∈ F), x ∈ V i then rcases h with ⟨i, hiF, hxV⟩ immediately extracts i, and names the proofs of i ∈ F and x ∈ V i both at once. With cases you would have to run the tactic twice.

## option

The last new thing I want to talk about in detail this week is the option X type. The tl;dr version (all you need to know) is explained in the worksheet, but here is a more in-depth discussion. Those of you familiar with Haskell or some other functional languages will know this as the maybe X type, and some languages call it optional X. Despite the fancy name, this type is very easy to describe: it is X and some extra element, called none in Lean (and sometimes called other things in other languages). It can be used in Lean to do things like a one point compactification of a topological space.

We don’t need to know the definition, but I’ll show it anyway:

inductive option (X : Type)
| some (x : X) : option
| none : option


The option X type has two constructors: if x : X then there’s some x, which is a term of type option X corresponding to x : X, and then there’s a second constructor none, which returns a term of type option X which has no corresponding term of type X — it’s the “extra term”. Note that if x : X then it is not true that x : option X. Distinct types are disjoint in Lean’s type theory. The term of type option X corresponding to x is some x. The function some : X → option X is invisible if you think of option X as “X plus another element”. If you think of some as a function then it is injective; you’re going to need this. Here’s a theorem which implies it:

option.some_inj : some a = some b ↔ a = b


Let me end by saying a little bit about what happens when Lean sees that definition of option X above. After processing the definition, Lean puts four new constants into the system. The first is option, a function which eats a type X and spits out a type option X. You might feel that we have just “defined” option above, but that’s not actually how it works: option has no definition, it is a function which cannot be evaluated in any meaningful sense. However it has an API. The second and third new constants in the system are some, a function from X to option X with no definition, and none, a term of type option X with no definition.

The fourth new constant is something called a recursor. A recursor is a way of defining functions from an inductive type. For example let’s say we want to build a function F from the naturals to, say, the reals, by induction (or more precisely, by recursion). We want as input data a “start” real number (which will be F 0) and a method which given a natural n and a real number r (which will be F n), returns another real number r' (which will be F (n + 1). The recursor for the natural numbers is a function which takes these things as input, and then returns the function F.

The recursor for option, called option.rec, takes as input a term (y0 : Y) and a function f : X → Y, and returns the function F : option X → Y which sends none to y0 and some x to f x. Again this recursor has no definition, but it does satisfy the theorems F none = y0 and F (some x) = f x, and the proofs are both refl — they are true by definition.

From these new constants and these two facts about the recursor, we have enough to be able to prove all the other theorems about option, for example the fact that some is injective. I was surprised about this — the recursor is a very powerful tool. I talk about it more in this blog post, but here I will just finish by explaining the proof of injectivity of some using the recursor. Say a b : X and we know some a = some b. Let’s use the recursor for option to define a function from option X to X. It sends none to a, and let’s use the identity function f : X → X for our function f above. The recursor then spits out a function F : option X → X satisfying F (some a) = a and F (some b) = b. But some a = some b, and hence a = b.

option is a very simple example of a monad. If you really want to go nuts, you can try proving this. But you might have more fun trying the topology theorems.