I’ve spent a few years now trying to figure out how best to teach maths undergraduates how to get up to speed quickly with Lean, a theorem prover which uses dependent type theory and has a large mathematics library. In theory students can do all of the problem sheets I give to them in Lean. In practice things aren’t nearly so simple. As well as a long list of tactics to internalise, it helps if students get a feeling for what’s in Lean’s maths library already, and perhaps they should also know the difference between definitional and non-definitional (i.e. propositional) equality. Unfortunately definitional equality is a non-mathematical concept, in the following sense: if you define addition on the naturals recursively by `n+0:=n`

and `n+(succ m):=succ (n+m)`

then `n+0=n`

is true by definition and `0+n=n`

is not. This is an asymmetry which is of no relevance in real-world mathematics.

This year I’m still avoiding talking about definitional equality, and I’m also avoiding talking about the example sheet questions which I’m giving to my class. A typical example sheet question, even if easy from a mathematical point of view, may only yield to someone who is armed with a whole bunch of Lean tactics. So this year, instead of encouraging the students to work on the example sheets, I’m encouraging them to work on easier problems, so that we can build up to the example sheets later.

The 2021 Lean course home page is here, and I’m making an associated collection of short videos here . I’m going through basic tactics one by one, making what I hope is a more comprehensible introduction to doing mathematics in Lean. All the Lean problem sheets I’ve written so far can be tried online without having to install Lean, but installation instructions for those who want a slicker experience are here. 1st years are busy with their coursework right now, but when it’s over hopefully I will be able to get some feedback from them and others about this new approach. At the time of writing, I have finished the logic sheets, and I’ve just started pushing sets sheets. Still to come: functions and binary relations.

Next term things are getting much more serious. I’m actually teaching an official Lean course as part of our undergraduate program. In contrast to what I’m doing this term (evangelising), next term (Jan to March 2022) I will actually be figuring out how to get students to engage with some more serious undergraduate mathematics. Students will be examined by a series of projects, in contrast to the usual approach here for final year courses (a closed book exam under timed conditions). I’m going to cover standard topics like basic analysis and topology, and also more esoteric ideas like filters and perhaps uniform spaces. Should be interesting! This will involve yet another repository, which I suspect will be to a certain extent based on this one. But more on that later.

]]>The students behind four of the projects managed to get PR’s accepted into mathlib, and for two of them this was their first mathlib PR (and possibly even their first contribution to open source). These students had to use git and github (some for the first time), but these are skills which I personally now value as important and worth teaching (it is commonplace to teach this sort of thing to computer scientists, but mathematicians seem to miss out here). In 2022 I will be teaching a formalisation course to undergraduates at Imperial and we will be using git and github for this too.

I think the video titles are pretty explanatory, and perhaps now isn’t the time to be going through the technicalities of exactly what the students achieved. However for the most part we stuck to the mantra of: learn the maths first, and once you think you understand it, then try to formalise it. That way, of course, you find out whether you *do* actually understand it

We ran the entire thing on the Xena Project Discord server, a server for undergraduate mathematicians interested in formalisation of mathematics. This worked very well for me and, I think, for them. Students could share their screen if they had technical questions. Everything was done in voice or text channels, and in particular there are some students which I supervised and who I would not recognise if I met them in the street. Not that this bothered me in the slightest.

Timetabling: I basically promised to be online every Tuesday in July and August from 10am until about 6pm, so the entire thing functioned as a drop-in. Some students talked to me a lot, some students barely talked to me at all, and of course some students talked to other people. Creating a community was, I felt, a bit harder than previous in-person summer projects which I’ve run in the past (where you can just go to lunch with a bunch of UGs and let them talk to each other) but of course these are extraordinary times and we have to make do. One big advantage of running stuff online was that students could be in different countries and still participate, and more generally students could move around (e.g. to and from London) without it disrupting their supervision. I live and work in London and for some students it’s problematic to stay without a serious source of funding, and working online also solved that problem. Going forward — assuming things are far more normal next year I might be tempted to run summer projects in a hybrid way next year.

Thank you to all students who participated. If you are a *mathematics undergraduate* who is interested in formalising some mathematics in Lean (probably Lean 4 next year I guess!) over the summer of 2022, then get in touch with me at some point and we’ll see what we can do.

Exactly half a year ago I wrote the Liquid Tensor Experiment blog post, challenging the formalization of a difficult foundational theorem from my Analytic Geometry lecture notes on joint work with Dustin Clausen. While this challenge has not been completed yet, I am excited to announce that the Experiment has verified the entire part of the argument that I was unsure about. I find it absolutely insane that interactive proof assistants are now at the level that within a very reasonable time span they can formally verify difficult original research. Congratulations to everyone involved in the formalization!!

In this Q&A-style blog post, I want to reflect on my experience watching this experiment.

Answer: It was formalized in the Lean Proof Assistant, mostly written by Leonardo de Moura from Microsoft Research, and used the extensive mathematical library (mathlib) written by the Lean community over the last four years. Immediately after the blog post, the Lean prover/mathlib community discussed the feasibility of the experiment on the Lean Prover Zulip Chat. Reid Barton did some critical early work, but then Johan Commelin has taken the leading role in this. In outline, Johan made an attack along the path of steepest ascent towards the proof, and handed off all required self-contained lemmas to the community. In particular, to get the project started, by January 14 he had formalized the statement of Theorem 9.4 of [Analytic], whose proof became the first target, and has now been completed on May 28, with the help of the Lean community, including (mathematicians) Riccardo Brasca, Kevin Buzzard, Heather Macbeth, Patrick Massot, Bhavik Mehta, Scott Morrison, Filippo Nuccio, Damiano Testa, Adam Topaz and many others, but also with occasional help from computer scientists like Mario Carneiro. Here is a link to the repository containing the formalised proof of Theorem 9.4, and you can also view its dependency graph, now fully green and completed.

Answer: I joined the Zulip chat to answer any mathematical questions that may arise, but also as an interested spectator.

Answer: Theorem 9.4 is an extremely technical statement, whose proof is however the heart of the challenge, and is the only result I was worried about. So with its formal verification, I have no remaining doubts about the correctness of the main proof. Thus, to me the experiment is already successful; but the challenge of my blog post has not been completed. It is probably fair to guess that the experiment is about half-way done. Note that Theorem 9.4 abstracts away from any actual condensed mathematics, so the remaining half will involve a lot of formalization of things like condensed abelian groups, Ext groups in abelian categories, and surrounding machinery. The basics for this have already been built, but much work remains to be done.

Answer: Initially, I imagined that the first step would be that a group of people study the whole proof in detail and write up a heavily digested version, broken up into many many small lemmas, and only afterwards start the formalization of each individual lemma. This is not what happened. Instead, the formalization followed quite closely the original lecture notes, and directly attacked Lemma after Lemma there. It did seem that the process was to directly type the proofs into Lean. Lean actually gives the user a very clear summary of what the current goal is, so one always needs to get a very clear sense of what the next few steps really are. Sometimes it was then realized that even on paper it does not seem clear how to proceed, and the issue was brought to attention in the chat, where it was usually quickly resolved. Only after a lemma was entirely formalized, the proof, now thoroughly digested, was again written up in the Blueprint in human readable form.

Answer: Right — it’s not the blueprint from which the Lean code was formed, but (largely) the other way around! The Lean Proof Assistant was really that: An assistant in navigating through the thick jungle that this proof is. Really, one key problem I had when I was trying to find this proof was that I was essentially unable to keep all the objects in my “RAM”, and I think the same problem occurs when trying to read the proof. Lean always gives you a clear formulation of the current goal, and Johan confirmed to me that when he formalized the proof of Theorem 9.4, he could — with the help of Lean — really only see one or two steps ahead, formalize those, and then proceed to the next step. So I think here we have witnessed an experiment where the proof assistant has actually assisted in understanding the proof.

Answer: Yes, up to some usual slight imprecisions.

Answer: One day I was sweating a little bit. Basically, the proof uses a variant of “exactness of complexes” that is on the one hand more precise as it involves a quantitative control of norms of elements, and on the other hand weaker as it is only some kind of pro-exactness of a pro-complex. It was implicitly used that this variant notion behaves sufficiently well, and in particular that many well-known results about exact complexes adapt to this context. There was one subtlety related to quotient norms — that the infimum need not be a minimum (this would likely have been overlooked in an informal verification) — that was causing some unexpected headaches. But the issues were quickly resolved, and required only very minor changes to the argument. Still, this was precisely the kind of oversight I was worried about when I asked for the formal verification.

Answer: There was another issue with the third hypothesis in Lemma 9.6 (and some imprecision around Proposition 8.17); it could quickly be corrected, but again was the kind of thing I was worried about. The proof walks a fine line, so if some argument needs constants that are quite a bit different from what I claimed, it might have collapsed.

Answer: I guess the computer does, as does Johan Commelin.

Answer: Yes! The first is a beautiful realization of Johan Commelin. Basically, the computation of the Ext-groups in the Liquid Tensor Experiment is done via a certain non-explicit resolution known as a Breen-Deligne resolution. Although constructed in the 70’s, this seems to have not been in much use until it was used for a couple of computations in condensed mathematics. The Breen-Deligne resolution has certain beautiful structural properties, but is not explicit, and the existence relies on some facts from stable homotopy theory. In order to formalize Theorem 9.4, the Breen-Deligne resolution was axiomatized, formalizing only the key structural properties used for the proof. What Johan realized is that one can actually give a nice and completely explicit object satisfying those axioms, and this is good enough for all the intended applications. This makes the rest of the proof of the Liquid Tensor Experiment considerably more explicit and more elementary, removing any use of stable homotopy theory. I expect that Commelin’s complex may become a standard tool in the coming years.

Answer: What actually makes the proof work! When I wrote the blog post half a year ago, I did not understand why the argument worked, and why we had to move from the reals to a certain ring of arithmetic Laurent series. But during the formalization, a significant amount of convex geometry had to be formalized (in order to prove a well-known lemma known as Gordan’s lemma), and this made me realize that actually the key thing happening is a reduction from a non-convex problem over the reals to a convex problem over the integers. This led me to ask my MathOverflow question whether such a reduction was known before; unfortunately, it did not really receive a satisfactory answer yet.

Answer: Yes, it did, Question 9.9 on the growth of certain constants. There are now explicit recursive definitions of these constants that are formally verified to work, and using this one can verify that indeed they grow roughly doubly-exponentially.

Answer: I learnt that it can now be possible to take a research paper and just start to explain lemma after lemma to a proof assistant, until you’ve formalized it all! I think this is a landmark achievement.

Answer: You know this old joke where a professor gets asked whether some step really is obvious, and then he sits down for half an hour, after which he says “Yes, it is obvious”. It turns out that computers can be like that, too! Sometimes the computer asks you to prove that , and the argument is “That’s obvious — it’s true by definition of and .” And then the computer works for quite some time until it confirms. I found that really surprising.

Answer: The definitions and theorems are surprisingly readable, although I did not receive any training in Lean. But I cannot read the proofs at all — they are analogous to referring to theorems only via their LaTeX labels, together with a specification of the variables to which it gets applied; plus the names of some random proof finding routines. Still, I have the feeling that it should be possible to create a completely normal mathematical manuscript that is cross-linked with the Lean code that makes it possible to navigate the Lean code seamlessly — I think the creation of such an interface has also become a goal of the experiment.

Answer: Definitely! Currently, the Lean code leading up to the proof of Theorem 9.4 is not well-documented, and some parts of the proof could definitely be streamlined. Moreover, large parts of it are basic material that should become part of mathlib. It should be noted that because mathlib is constantly evolving, any project that uses it has to continually make small changes so that it will still compile with the newest version of mathlib. So it is vital that the parts of the proof of general interest are moved into mathlib, where they will be maintained.

Answer: It depends on the method of calculation, but somewhere around 20. I think this is amazingly small! I had expected that the first step of taking the lecture notes and turning them into a properly digested human proof — which as I said didn’t actually happen — would already introduce a factor of ~5. But the blueprint is actually only a factor of ~2.5 longer than the relevant part of the lecture notes right now.

Answer: Good question! Usually the verification of a proof involves trying small variations of the argument and seeing whether they break or not, whether they lead to statements that are too strong etc., in order to get a sense of what is happening. Basically a proof is like a map of how to get up a mountain, say; it may be a nice, slightly winding path with a broad view, or it may lead through the jungle and up some steep wall, requiring climbing skills. Usually there’s not just one way up, and one may try whether taking a left turn the view is nicer, or taking a right turn one can take a shortcut.

In the case at hand, it feels like the main theorem is some high plateau with a wonderful view, but the way there leads through a large detour, to attack the mountain from the other side, where it is dark and slippery, and one has to climb up a certain steep wall; and no other pathways are seen left or right. Answering the questions in the Zulip chat felt like I would give instructions of the form “put your foot here, then your hand there, then pull yourself up this way” at the more difficult passages.

So I have gained the reassurance that it is possible to climb the mountain along this route, but I still have no sense of the terrain.

]]>I got interested in trying to understand if this question even has a meaning. Here are some thoughts.

When we learn linear algebra at high school, we typically first learn the “concrete” theory, where vectors are columns of numbers, and we can multiply them by matrices and thus get a conceptual understanding of systems of linear equations. Then at university we go on to the “abstract” theory, where a real vector space is something defined by a list of axioms, and spaces like are now *examples* of these abstract objects.

We then learn about the fundamental notion of a *basis* of a vector space. Say we have an abstract finite-dimensional vector space. By picking a basis, the vectors in our vector space suddenly transform back into columns of numbers. Not only that, but linear maps between vector-spaces-with-a-basis turn back into matrices. By the time we’ve learnt that every vector space has a basis, we can see that our new theory is in some sense “the same as” our old theory. In some sense it took me a long time to get on top of this principle as an undergraduate; perhaps the key concepts were not emphasized to me enough, or maybe I just got lost in all the new (to me, at the time) ideas. Nowadays I think about it like this: initially we learn that is an *example* of a finite-dimensional vector space, but after learning about bases we can conclude that *every* finite-dimensional real vector space is isomorphic to for some , so in fact can be thought of as a *model* for a finite-dimensional real vector space, just like the collection of equivalence classes can be thought of as a model for a quotient by an equivalence relation. Every vector space has a basis, so one can prove theorems about finite dimensional vector spaces by checking them on models, i.e. by picking a basis.

After a couple of years at university the following idea had somehow sunk in: if possible, one “should not choose a basis”. The canonical example shows up when we learn about the *dual* of a vector space. The dual of a real vector space is just the space of linear maps from to ; this space has a natural vector space structure and is called the *dual space* of , with notation . Confusing example: if , then an element of is represented by a column of numbers. Give its canonical basis . Then any linear map is uniquely determined by where it sends the , and in particular an element of is also uniquely determined by numbers, so we can represent it as a vector of length and we have proved that the dual of is again. Great! Furthermore, every finite-dimensional vector space equals (proof: pick a basis) so we’ve proved that duality is just the identity map!

Except that we haven’t, because we have been a bit too liberal with equality here (and as some computer scientists are very quick to point out, many mathematicians don’t understand equality properly, and hence they might accidentally end up teaching this point badly). This argument proves that if is a finite-dimensional vector space, then it is *isomorphic* to its dual ; however it is not in general *canonically isomorphic* to its dual, whatever that is supposed to mean. In this instance, it means that different bases in general produce different isomorphisms between (identified with ) and (also identified with ). This is a bit confusing because in the group theory course running at the same time as the linear algebra course, a student is simultaneously being taught that when we say “how many groups are there of order 4” we *obviously* mean “…up to isomorphism”, because isomorphic groups *are* equal for that question.

However, if we do this trick twice, we can identify with its double-dual and it turns out that this identification *is* canonical, whatever that means. What it appears to mean in this case is that there is a really cool way of writing down the isomorphism from to which *doesn’t ever pick a basis*!

[Technical note/reminder: Here’s the explicit map from to its double dual. Say . We need to write down a linear map associated to . So take and we need to construct a real number somehow. Well, what about ? That works a treat! One can check that this map sending to is indeed linear, and induces a map from to which can be checked to be an injection and hence, by dimension counting, an isomorphism (NB: some magic just happened there). I suspect that this argument was my initiation into the mysterious word *canonical*, a word I now rail against, not least because in my opinion the Wikipedia page about canonical maps contains a definition which is full of fluff (“In general, it is the map which preserves the widest amount of structure, and it tends to be unique” — this is not a definition — this is waffle).]

The moral: all the canonical kids are too cool to pick a basis.

PS here is a funny way of thinking about it: if we identify with as column vectors, then perhaps we should identify the dual of with as *row* vectors, because multiplication on the left by a row vector sends a column vector to a number, which is what we want a dual vector to do. So identifying with is some high-falutin’ version of *transpose*, and if you transpose once you don’t quite get the same thing, but if you transpose twice then you’re back where you started. Canonically.

OK so here’s a pretty cool theorem about traces (although any formaliser would tell you that it is actually a *definition*, not a theorem). If is an square matrix then it has a *trace*, the sum of the elements on the leading diagonal. Now say is a finite-dimensional real vector space, and is a linear map, crucially from to itself. If we choose a basis of and use the same basis for the source and the target , then becomes a matrix, and we can take its trace. If we change our basis of then the matrix representing changes. But, miraculously, its trace does not! This can be proved by an explicit calculation: changing our basis for changes the matrix representing to for a certain invertible “change of basis” matrix (here was where it was key that the source and the target of the endomorphism were the same, otherwise we would have got ), and the traces of and are equal because of the general fact that the traces of and are equal if and are square matrices of the same size (apply with and ).

As a consequence, this means that if is a finite-dimensional vector space then we can unambiguously talk about the *trace* of a linear map , in the following way. First do the slightly distasteful choose-a-basis thing, then take the trace of the corresponding matrix, and then prove that the calculation was independent of the basis you chose, so we can pretend you never did it. Similarly one can talk about the determinant and characteristic polynomial of , because these are also invariants of matrices which are constant on conjugacy classes.

However, you *did* do it — you chose a basis. Something is a little different to the map from to its double dual — the map from to its double dual really was defined without choosing a basis *at all*. Here we did something slightly different — we chose a basis, and then proved that it didn’t matter. Can we go one better, and define the trace of a linear map from to without choosing a basis *at all*?

So in the process of discussing this question on Twitter and on the Lean chat, established firstly that this very much depends on what the question actually *means*, and secondly I managed to learn something new about Lean, and when I learn something new about Lean I tend to blog about it, so here we are. First I’ll talk about the failed attempts to define the trace of an endomorphism, which led to some other questions and clarifications, and then I’ll talk about `trunc`

, something which looked to me like a completely pointless operation in Lean and which up until now I’ve just ignored, but which somehow might be at the bottom of this.

In characteristic there are some problems with the ideas below, but these are not relevant to what I want to talk about, so let’s just stick to vector spaces over the reals (or more generally any field of characteristic zero). The first observation is that computing the trace, determinant and characteristic polynomial all seem to be pretty much the same question: for example, if you can compute the char poly of then you can compute its trace and det because you can read these things off from the coefficients. Conversely, if you can compute traces then applying this to some exterior powers you can read off the coefficients of the characteristic polynomial including the det. So computing any of these invariants without choosing a basis somehow boils down to the same thing.

Next let’s turn to Wikipedia, where we are informed of a basis-free way to compute the trace of an endomorphism! Here’s the trick. There’s an obvious bilinear map , sending to , and by the universal property of the tensor product this induces a linear map . There is also an obvious bilinear map sending to the linear map sending to , and this induces a linear map , which is easily checked to be an isomorphism if is finite-dimensional. Composing the inverse of this isomorphism with gives us a linear map which we can check to be the trace (e.g. by picking a basis). So we’re done, right?

Well, the thing about this construction is that whilst the map is canonical (indeed, it even exists in the infinite-dimensional case), to prove that it’s surjective in the finite-dimensional case the natural thing to do is to pick a basis and to make the corresponding matrix by taking an appropriate linear combination of tensor products of elements of the basis and the dual basis. I would argue that we have made some progress here — we still picked a basis, but we used it to fill in a proof, rather than to construct data. However, we still picked a basis. Note also that the inverse of a computable bijection might not be computable, if you’re interested in that sort of thing, and I suspect that this might be a situation where this annoyance kicks in.

One might instead be tempted to argue that the map is surjective because it is an injective map between vector spaces of the same dimension (I’m not entirely sure how to prove it’s injective without picking a basis, but it might well be possible; however I do not know how to prove that the dimension of is the product of the dimensions of and without picking a basis ). Anyway, talking of dimensions, here is the other “basis-free” method I learnt to do these sorts of things: the canonical method to work out the determinant of an endomorphism without choosing a basis. If is a linear map, and if is the dimension of then induces a linear map on top wedge powers , and an endomorphism of a 1-dimensional space is canonically a number (proof: pick a basis and check it’s independent of the choice) which can be checked to be the determinant of , and if you can do determinants then you can do char polys and hence traces.

The problem with this approach is that it relies on you knowing what is, the dimension of , and if all you know is that is finite-dimensional, then how do you get , the dimension? Well obviously you pick a basis , count it, and then prove that your answer is independent of the choice of basis. So this “top exterior power” argument has moved us closer to the heart of the problem: forget defining the trace — how do we define the *dimension* of a finite-dimensional vector space without picking a basis? Note that the dimension is just the trace of the identity function, so we can add dimension to our list of things which we cannot “compute” without picking a basis. And now we’re getting somewhere — what does it *mean* to say a vector space is finite-dimensional? Did we pick a basis even to make that statement?

I am not a big fan of constructivism, as many of you will know. I think that the emphasis placed on it by the computer theorem proving community historically has held the area back; it puts off mathematicians (e.g. the 2012 version of me) who have been indoctrinated by a traditional mathematics course which assumes the law of the excluded middle from day one. One problem I have with constructivism, as someone who was classically trained, is that it turns out that sometimes there is more than one way of doing something constructively, and all these ways are the same classically. For example very early on in my Lean career I was surprised to learn that there was `function.bijective`

, the predicate that a function was a bijection, but also there is the concept of a function with a two-sided inverse. As far as I’m concerned these are the same thing, but constructively they’re not, because given a bijection there might not be a “formula” for its inverse. The existence of a two-sided inverse is a true/false statement — but actually *having* the two-sided inverse is **data**, and, in constructive mathematics, data can sometimes be hard to come by. The function which takes as input a set/type `X`

and a proof that `X`

is nonempty, and outputs an element of `X`

, is a noncomputable function and its existence, if you think about it, is closely related to the axiom of choice, which is something that the constructivists are not big fans of.

So it turns out that this whole “ is finite-dimensional” thing, which this entire post has assumed all the way through, is a victim of this subtlety. What does it *mean* for a vector space to be finite-dimensional? The following answers are all the same classically (i.e. in “normal maths”), but constructively they’re all different:

- A
*proof*that has a finite basis; - An actual
*choice*of a finite basis; - An element of the
*truncation*of the set of all pairs , where and is an isomorphism. Here by the truncation of a set we mean the quotient of the set by the “always true” equivalence relation. Yes, you read that right.

OK so we know what the first two things are. The first statement is just a proof. If your proof is nonconstructive that’s fine, I don’t care. The second thing is data. For me this is a problematic definition of finite-dimensional, precisely because it’s *not* a proof of a true-false statement. If I am working with a finite-dimensional vector space in Lean then I might end up having to deal with the fact that if some argument changed ‘s basis whilst leaving alone, I might not have (as finite-dimensional vector spaces) any more, because the data going on behind the scenes saying that is finite-dimensional might not match up. I have enough problems formalising mathematics in type theory without having to deal with this too.

This brings us to the third definition, involving `trunc`

. OK so if `X`

is a type or set or however you want to set up your mathematics, then, as I mentioned above, `trunc X`

is the quotient of `X`

by the equivalence relation which is always true. In particular if `X`

is nonempty then `trunc X`

has one element/term, and if `X`

is empty then it has no elements/terms. If you’re happy about the concept that propositions can be types, with the idea that a true proposition has one term (its proof), and a false proposition has no terms, then `trunc X`

seems to be basically the proposition that `X`

is nonempty. However it is more than that, because it is *data*. It is the missing link in our story.

Let’s say that `V`

is a real vector space, and we have a “proof” that it is finite-dimensional in the sense that we have a term `t`

of type `trunc X(V)`

, where `X(V)`

is the space of all pairs . Here’s how to define the trace of an endomorphism . We’re going to define a map from `trunc X(V)`

to the reals, and the idea is that if you evaluate this map at `t`

then you’ll get the trace of . Now to define a map from a quotient, we use the universal property of quotients. First of all we have to define a map from the space we’re quotienting by to the reals. This is easy: given an isomorphism we get an induced map and we just take its trace with respect to the standard basis. And secondly, we need to prove that this map descends to the quotient, which boils down to proving a theorem, and the theorem of course turns out to be precisely the statement that the trace of an endomorphism is independent of choice of basis.

Similarly we can define a function from `trunc X(V)`

to the naturals, such that if `V`

is finite-dimensional in the sense that we have `t : trunc X(V)`

then its dimension is what you get by evaluating this function. And det and char poly etc.

Note that for *any* vector space `V`

, the type `trunc X(V)`

has at most one term — it’s a so-called *subsingleton* type. Lean’s `exact`

command will be fussy about subsingleton types because it’s a *theorem* that two terms of a subsingleton type are equal, rather than it being a definitional equality. Hence Lean’s `exact`

tactic still might not work if we’re carrying around `t : trunc X(V)`

as our definition of finite-dimensionality. However the `convert`

tactic will operate fine, because it looks out for subsingletons and applies the appropriate theorem.

We now seem to have got to the bottom of this. To do this kind of metamathematics — “did we pick a basis?” — we need to think really carefully about what we even *mean* by the assumption of `V`

being finite-dimensional. My preferred approach, and the one which makes mathematics easier to do in Lean, is to just use the propositional definition “there exists a basis”. This way one never runs into problems with equality, and if one needs a basis one just applies Lean’s version of the axiom of choice. Constructivists would complain that it breaks computation, but I want to prove theorems and I prefer to do computations using tactics rather than trying to persuade the kernel to reduce things. The other approach is to carry around some extra data, and this can lead to problems for beginners with equality being a bit broken, however the experts have learnt ways to deal with this. Ultimately the best choice in a theorem prover will depend on what you actually are *doing* with your objects, and given that I just want to prove theorems, for me, the bare existence definition of finite-dimensional is enough. However to move from the `Prop`

world to the `Type`

world, e.g. when defining the trace of an endomorphism, one has to at some point do some magic, and the moment where this happens is the moment you picked a basis.

I used to teach a course where I defined the notion of what it meant for two integers to be *congruent modulo N*. Here *N* is an integer, and two integers *a* and *b* are said to be congruent modulo *N* if their difference is a multiple of *N*. For example, 37 and 57 are congruent modulo 10.

I would go on to prove that congruence modulo *N* is an equivalence relation on the integers. Reminder: an equivalence relation on `X`

is a binary relation on `X`

which is reflexive, symmetric and transitive. The proof I gave never assumed that *N* was non-zero, and congruence modulo 0 is the same relation as equality, so you might like to deduce from this that equality on the integers is an equivalence relation. Well, tough darts. In the proof that congruence modulo *N* is an equivalence relation, it turns out that we *assume* that equality is an equivalence relation, as you can readily check if you type it into a theorem prover.

So how do we prove that equality is an equivalence relation? In my 1st year undergraduate lecture notes this is stated as an “obvious example” of an equivalence relation. Euclid was more cautious — he explicitly noted that he would be assuming transitivity and reflexivity of equality in his common notions 1 and 4, and symmetry followed from his use of language — he would say “this collection of things are equal” e.g. “all right angles are equal”: so equality of two things was a predicate on *unordered* pairs for him, and symmetry was hence built in.

I also looked at my 3rd year logic and set theory notes, to try and figure out the definition of = , but they were also no help. There is some waffle about “the interpretation in a model is the diagonal of ” (which I think might be circular) and how `x = y`

means that the two terms `x`

and `y`

are actually *the same term*, but don’t get me started about what mathematicians mean when they say two things are “the same”, e.g. “the same proof” or “canonical isomorphism is denoted by =” or all the other highly useful and yet irritatingly non-rigorous stuff we say. Anyway, just “defining” equality to mean another word or phrase which is synonymous with equality isn’t going to get us anywhere. Ultimately my impression is that you might just *assume* that equality is an equivalence relation, when setting up mathematics via first order logic and ZFC set theory. I’m no logician though — feel free to correct me in the comments. This post is about a different way to approach things, which part of me finds a whole lot more satisfactory.

Lean’s type theory contains a **definition** of equality! This is great because it means we know where we stand. Here’s the definition:

```
inductive eq {X : Type} : X โ X โ Prop
| refl (a : X) : eq a a
infix ` = `:50 := eq
```

What does all that baloney mean? Well, this is an inductive definition of a binary relation. Let `X`

be a type or a set or however you want to think of a collection of stuff. The binary relation `eq`

on `X`

takes as input two things in `X`

, call them `a`

and `b`

, and spits out a true-false statement `eq a b`

. We’ll get to the definition in a second, but that last line means “the notation `a = b`

is defined to mean `eq a b`

, with BIDMAS-power 50”. Let’s use the `=`

notation from now on instead of `eq`

, because it’s familiar-looking.

So, let’s get to the heart of the matter: how do we define `a = b`

? Well, Lean’s definition says the following: “there is only one tool we have to prove the equality of two things, and it’s the theorem `refl a`

, which is a proof of the statement `a = a`

. That’s it.” Thanks to Andrej Bauer who on Twitter pointed out that a neat way to think about this definition is: equality is the smallest binary relation which is reflexive. Andrej also tells me that this is Martin-Lรถf’s definition of equality.

OK great. We have now apparently defined the true-false statement `a = b`

, and we can prove that `a = a`

is true, so this is a good start. But how the heck are we going to go from this to symmetry and transitivity? We’re going to use induction!

In my last post, I waffled on about inductive types. I was going to talk about equality in that post, but it had already got quite long, so I thought I’d deal with equality separately, and so here we are. The take home message from the last post was that if you define something as an inductive type in a type theory like Lean’s, then Lean automatically generates a new axiom in the system called the *recursor* for that type. This axiom is generated automatically by the rules of the calculus of inductive constructions. For example, if you define the natural numbers inductively using Peano’s axioms, then the recursor is just the statements that induction and recursion are true (propositions are types in Lean, so one function can be interpreted in two ways depending on whether you apply it in the `Prop`

universe or the `Type`

universe). The recursor attached to an inductive type is a way of saying “if you want to do something to every instance of the thing you’re defining, it suffices to do it for all the constructors”, and a special case of it is the “inductor”, which says, in our case, that if you want to deduce something from `a = b`

then all you have to do is make sure you can prove it in the special case `a = a`

. Formally, the inductor for `=`

is the following statement:

```
eq.inductor (X : Type) (R : X โ X โ Prop)
(h : โ x, R x x) (a b : X) : a = b โ R a b
```

Note the fancy type theory `R : X โ X โ Prop`

for what a mathematician would call “*R* is a binary relation on *X*” (reason: `R : X โ X โ Prop`

means in maths speak, i.e., is a true-false statement attached to every pair of elements of ). So, in words, the inductor says that if you have any binary relation `R`

on `X`

such that `R x x`

is true for all `x`

, and if you know that `a = b`

, then `R a b`

is true. That’s the tool we have. Again let’s go back to Andrej’s observation: the definition of equality can be thought of as saying that it is the smallest (or the initial) binary relation which is reflexive, and the inductor then makes this rigorous by saying that any other reflexive binary relation must contain equality.

Summary : we are armed with two things, and two things only. (1) a proof of `โ a, a = a`

and (2) a proof that if `R`

is a binary relation on `X`

and `โ x, R x x`

is true, then `a = b`

implies `R a b`

.

The game now is to prove that `=`

is symmetric and transitive, without accidentally assuming it! And what better framework to play that game in, than Lean! (because trust me, if you play it on paper, you are so going to mess this one up).

Insert coin. That’s hard mode, i.e. spoiler-free. Copy and paste the code at the other end of that link into VS Code if you have Lean installed locally if you want a much speedier experience. The game is to prove `symm`

and `trans`

from `refl`

and `ind`

. I would work out the maths proofs first.

If you want no spoilers *at all*, stop reading now. I usually post some art generated by my children in my blog posts so perhaps now is a good time to do this. If you want to hear more about equality, and in particular have one or more hints and spoilers, read on.

`refine`

is a great tactic. It’s like `apply`

on steroids. You can do everything with `intros`

, `apply`

, `refine`

and `exact`

. You can read about what these tactics do here, in Lean’s documentation. All Lean proofs can be done in just two or three lines.

Life is easier if we have more tools than just `ind`

. The thing about `ind`

is it wants as input a binary relation. A simpler object than a binary relation on `X`

is a subset of `X`

, or equivalently its characteristic function `P : X โ Prop`

(sending the elements of the subset to `true`

and the other elements to `false`

). Here’s something which you believe about equality: if `a = b`

and `P a`

is true, then `P b`

is true. This is a much cleaner statement than `ind`

— it’s just as powerful, and more flexible. Logicians call it the *substitution property* of equality. Applying it is the way Lean’s `rewrite`

or `rw`

tactic works. Why not try to prove it first? It makes the proof of `trans`

much easier — it’s basically a master sword. Here’s the statement. Just cut and paste it before `symm`

. Of course, you’ve got to prove it (using `ind`

and `refl`

).

```
theorem subst (hab : a โผ b) (P : X โ Prop) : P a โ P b :=
begin
sorry
end
```

`inductive`

command works) provides some rather surprising proofs of basic mathematical facts.
It is not unreasonable to think of what Lean calls a type as what a mathematician would usually call a set. For example a mathematician might say “let *S* be a set with three elements; call the elements *a*, *b* and *c*“. Here’s how to make that set, or type, in Lean:

```
inductive S : Type
| a : S
| b : S
| c : S
```

In fact the full names of the elements of *S*, or the terms of type `S`

as we call them in type theory, are `S.a`

, `S.b`

and `S.c`

, and we might want to open the `S`

namespace so that we can just refer to them as `a`

, `b`

and `c`

. An undergraduate might want to make this kind of definition when they are answering the following kind of question in Lean: “Let and be functions. True or false: if the composition is injective, then is injective”. This is false, and to prove it’s false one can either use the sets we mathematicians have lying around (such as the naturals, integers or reals), or one can just build some explicit sets of small size like `S`

above, and some explicit functions between those sets.

So here’s what we’re going to do. Let’s make a type `X`

with one term `p`

, a type `Y`

with two terms `q`

and `r`

, and a type `Z`

with one term `s`

. This is easy given what we’ve already seen:

```
inductive X : Type
| p : X
inductive Y : Type
| q : Y
| r : Y
inductive Z : Type
| s : Z
```

[By the way, if you want to play along but you haven’t got Lean installed on your computer, you can do all this within a web browser by clicking here (although you could instead click here to find out how to install Lean and the community tools, which give you a far slicker experience).]

Our counterexample is going to be the following: We define `f : X โ Y`

by `f(p)=q`

and `g : Y โ Z`

by `g(q)=g(r)=s`

. Let’s do this.

```
open X Y Z
def f : X โ Y
| p := q
def g : Y โ Z
| q := s
| r := s
```

As a mathematician I find the use of the `|`

symbol quite intimidating (especially that we are now using it in a different way), but given that I’ve told you what we’re doing and now I’m telling you how we are doing it, you can probably guess what it all means. One can now go ahead and state the result:

```
open function
example : ยฌ (โ (X Y Z : Type) (f : X โ Y) (g : Y โ Z),
injective (g โ f) โ injective g) :=
begin
sorry
end
```

and now if you fancy proving a mathematical theorem by playing a puzzle game, you can click here to get all the code at once, and have a go. Instead of talking about the proof though, I want to talk about the rather surprising (at least to me) fact that Lean is defining `f`

and `g`

by recursion.

What happens under the hood when Lean sees this code

```
inductive X : Type
| p : X
```

is quite surprising (at least to me as a mathematician). I’ve been arguing above that we should think of this code as saying “Let be a set with one element”. But here’s what’s really going on when Lean sees this code. Unsurprisingly, Lean defines a new type `X`

and a new term `p`

(or more precisely `X.p`

) of type `X`

. It also defines one more new thing, which expresses that `p`

is the only element of `X`

. But the way it does this is surprising: it defines the so-called *recursor* for `X`

, which is the following statement:

`X.rec : โ {C : X โ Sort u}, C p โ (โ (x : X), C x)`

Whatever does that mean? Well first I think I’d better explain what this `Sort u`

thing is. I’ve written in the past an explanation of how sets and their elements, and theorems and their proofs, are unified in Lean’s type theory as types and their terms. The sets/elements story goes on in the `Type`

universe, and the theorems/proofs story goes on in the `Prop`

universe. When Lean says `Sort u`

it means “either of these universes”. So we can rewrite `X.rec`

as *two* statements:

```
X.recursor : โ {C : X โ Type}, C p โ (โ (x : X), C x)
X.inductor : โ {C : X โ Prop}, C p โ (โ (x : X), C x)
```

The first statement is the principle of recursion for `X`

. In set-theoretic language it says this. “Let’s say that for every element of we have a set , and let’s say we have an element of . Then we have a method of constructing an element of for all .” This looks like a rather long-winded way of saying that is the only element of . In fact it is worth looking at the special case of `X.recursor`

where `C`

is the constant function sending every element of `X`

to the set `S`

:

`X.recursor_constant : โ S, S โ (X โ S)`

This says that if `S`

is any set, and we have an element `a`

of `S`

, then we can get a function from `X`

to `S`

. What is unsaid here, but true by definition, is that it’s the function that sends `p`

to `S`

, as can be checked thus:

```
-- give X.rec the constant function C sending everything to S
def X.recursor_constant : โ S, S โ (X โ S) := ฮป S, @X.rec (ฮป x, S)
example (S : Type) (a : S) : (X.recursor_constant S a) p = a :=
begin
-- true by definition
refl
end
```

Do you remember our definition of `f`

above?

```
def f : X โ Y
| p := q
```

This function `f`

is defined using `X.recursor_constant`

, letting `S`

be `Y`

and letting the element of `S`

be `q`

. The notation Lean uses is short, but under the hood this is how `f`

is constructed.

So much for recursion. The second statement coming from `X.rec`

is `X.inductor`

, the principle of induction for `X`

. I should probably say that I made up the word “inductor”, but inductor is to induction as recursor is to recursion. In more mathematical language the inductor says this. “Let’s say that for every element *x* of *X* we have a true-false statement , and let’s say that is true. Then is true for every element *x* of *X*.” So again it is just a rather long-winded way of saying that *p* is the only element of *X*.

Why have computer scientists isolated these rather elaborate statements as the fundamental way to say that `p`

is the only element of `X`

? It’s actually because of a fundamental symmetry. We have defined a new type `X`

, in a functional programming language, and now the fundamental thing we need to do next is to explain how to define functions *into* `X`

, and how to define functions *out of* `X`

. To define functions into `X`

, we need to have access to terms of type `X`

, or in computer science lingo to *constructors* of `X`

. This is exactly what `X.p`

is — a way to construct a term of type `X`

. To define functions out of `X`

, we need access to *eliminators* for `X`

, that is, some kind of gadget whose output is a function from `X`

to somewhere else. Because `X`

only has one term, namely `p`

, we need a way of saying “to give a function out of `X`

, we only need to say what happens to `p`

“, and this is exactly what the recursor is doing. Between them, the constructor and recursor say in a formal way that the elements of `X`

are “at least `p`

, and at most `p`

, so are exactly `p`

.”

Lean *automatically* generates constructors and a recursor for every type defined with the `inductive`

command. There is a general rule for how to do this, but informally it’s pretty clear. We define inductive types using this `|`

symbol, and you get a constructor for each line with a `|`

in. The eliminator or recursor simply says that to define a function from the new type you’re defining, all you have to do is to make sure you’ve defined it on each constructor.

The rest of this post is the fun part. I will go through a bunch of inductive types defined in Lean, we can look at the definition, figure out the recursor attached to each of the types, and then see what this corresponds to mathematically. We will see some familiar things popping up in surprising ways.

Recall our inductive type `Y`

:

```
inductive Y : Type
| q : Y
| r : Y
```

The recursor for `Y`

tells us that if `S`

is a set, then to get a map from `Y`

to `S`

we have to give two elements of `S`

, one corresponding to where `q`

goes and one corresponding to where `r`

goes.

`def Y.recursor_constant : โ S, S โ S โ (Y โ S) := ฮป S, @Y.rec (ฮป y, S)`

The full recursor can even be used (with non-constant `C`

) to define a function from `Y`

which sends `q`

into one type and `r`

into a different type, but when defining the function `g`

above we do not need this level of generality. If you want to see what `Y`

‘s recursor looks like, just type `#check @Y.rec`

in a Lean session after the definition of `Y`

, and remember that `ฮ `

is just computer science for `โ`

(in Lean 4 they will be using `โ`

instead of `ฮ `

in fact).

Mathematicians who have seen the development of mathematics in ZFC set theory know that a complex number is defined to be a pair of real numbers, a real number is defined to be an equivalence class of Cauchy sequences of rational numbers, a rational number is defined to be a multiplicative localisation of the integers at the nonzero integers, an integer is defined to be an additive localisation of the naturals at the naturals, and the naturals are defined by the ZFC axiom of infinity. In Lean’s type theory a complex number is defined to be a pair of real numbers, a real number is defined to be an equivalence class of Cauchy sequences of rational numbers etc etc etc, and it’s all just the same up to the very end, because in Lean the naturals are defined by Peano’s axioms:

```
inductive nat : Type
| zero : nat
| succ (n : nat) : nat
```

This means that we have two ways to make natural numbers. First, `zero`

is a natural number. Second, if `n`

is a natural number, then `succ n`

(usually called `n+1`

by mathematicians) is a natural number. Now we need a way of expressing the idea that this is the only way to make natural numbers, and this is the recursor, which is automatically generated by Lean, and says a precise version of the following informal thought: “If you want to do something for all naturals, then you need to tell me how to do it for both constructors”. In other words, “…you need to tell me how to do it for zero, and then you have to tell me a way to do it for `n+1`

assuming we’ve already done it for `n`

“. Sounds familiar?

The recursor in general involves a map to `Sort u`

. Let’s just specialise to the two universes we’re interested in, and take a look at the constant recursor, and the inductor (and let’s use Lean’s notation `โ`

for `nat`

):

```
nat.recursor_constant : โ (S : Type), S โ (โ (n : โ), S โ S) โ (โ โ S)
nat.inductor : โ (C : โ โ Prop), C 0 โ (โ (n : โ), C n โ C (succ n)) โ โ (n : โ), C n
```

[The proof of `nat.recursor_constant`

is `ฮป S, @nat.rec (ฮป n, S)`

and the proof of `nat.inductor`

is just `@nat.rec`

. ]

The constant recursor says this: if `S`

is a set, and we want to make a function `f : โ โ S`

, here’s a way of doing it. First we need an element of `S`

(namely `f(0)`

) and second, for each natural number we need a map from `S`

to `S`

(telling us how to make `f(n+1)`

given that we know `f(n)`

).

The inductor says this. Say we have a family `C(n)`

of true-false statements, and that `C(0)`

is true, and that for all `n`

we have a proof that `C(n)`

implies `C(n+1)`

. Then we can deduce that `C(n)`

is true for all `n`

.

What I think is really cute about this example is that Peano’s definition of the natural numbers makes it immediately clear why the principle of mathematical induction works. In the natural number game we use the recursor in the background to define addition and multiplication on the naturals. We also use it to prove things which I call “axioms” in the natural number game — for example the proof that `0`

is not equal to `succ n`

for any natural number `n`

uses the recursor to define a function from โ to sending `0`

to `0`

and `succ n`

to 1, and using this function it’s easy to prove the “axiom” `zero_ne_succ`

by contradiction. If you want an exercise, try using `nat.recursor_constant`

to prove injectivity of the `succ`

function, something else I also claimed was an axiom in the natural number game (as Peano did) but which was actually proved using the recursor.

`false`

is a true-false statement, and you can probably guess which one it is. In Lean `false`

is defined as an inductive type! Here’s the full definition:

`inductive false : Prop`

This time there are no `|`

s at all! Every constructor of `false`

would be a proof of a false statement, so this design decision is not surprising. The recursor is

`false.rec : ฮ (C : Sort u), false โ C`

In other words, to give a map from `false`

to `C`

you have to define it on all constructors, of which there are none. Let’s take a look at the inductor then, by changing `Sort u`

to `Prop`

:

`false.inductor : โ (P : Prop), false โ P`

It says that if `P`

is any true-false statement, then `false`

implies `P`

. This logical tautology has been automatically generated by Lean, because Lean’s model of an implication is a function from proofs of *Q* to proofs of *P*, and `false`

has no terms, i.e., no proofs.

There is a similar story with `inductive empty : Type`

, Lean’s definition of the empty type. The recursor for `empty`

says that to give a map from the empty type to any type `S`

, you don’t have to do anything other than feed in `S`

.

The logical `or`

on propositions is defined as an inductive type in Lean!

```
inductive or (P Q : Prop) : Prop
| inl (hp : P) : or
| inr (hq : Q) : or
```

There are two constructors for `P โจ Q`

, where now I’m using the usual logicians’ notation for `or`

. In other words, there are two ways to prove `P โจ Q`

. First you can prove `P`

, and second you can prove `Q`

. Lean’s auto-generated inductor for this is

`or.inductor : โ (P Q R : Prop), (P โ R) โ (Q โ R) โ (P โจ Q โ R)`

In other words, if you can prove and you can prove , then you can deduce . Again no mathematician is surprised that this statement is true, but perhaps some are surprised by the fact that a computer scientist might claim that this is true *by induction on or*.

The `=`

symbol in Lean is defined as an inductive type! But I think I’m going to save the topic of what induction on equality is until the next post, where we will prove, by induction, that equality is an equivalence relation.

I was very surprised when I realised that every inductive type came with a principle of induction. In fact one can even define the reals as an inductive type, which means that there will be an inductor for reals meaning that you can do induction on the reals! But when I figured out what the induction principle said I was disappointed — it says “if you can prove it for every real which is an equivalence class of Cauchy sequences of rationals, you can prove it for every real”. Remember that the idea of the recursor is that it is a way of saying “every term of your type can be made using the constructors”, so if your only constructor for a real is an equivalence class of Cauchy sequences of rationals then this is what you get. However these other examples, and in particular these examples coming from logic, are quite funky. An example I didn’t talk about: `and`

is an inductive type and its inductor is `โ (P Q R : Prop), (P โ (Q โ R)) โ (P โง Q โ R)`

, which is some propositional version of uncurrying (indeed the constant recursor for `prod`

, the product of two types, is uncurrying on the nose). The basic facts in propositional logic about `and`

and `or`

and are proved constructively in Lean using recursors rather than by truth tables, because directly constructing the functions corresponding to the proofs is more appealing than a case split.

Not everything is an inductive type in Lean — there are two other kinds of types. There are quotient types, which are there for some kind of computer science efficiency reasons and which could be constructed using inductive types, and then there are function types, which are a different kind of thing. I don’t think it’s of mathematical interest whether they type you’re looking at is an inductive type or a function type, but here’s an example of a function type: logical `not`

. In Lean, `ยฌ P`

is defined to be `P โ false`

. On the other hand most of the structures used by mathematicians (groups, subgroups, rings, fields, perfectoid spaces and so on) are defined as inductive types (often however with one constructor, so their induction principle is boring). An inductive type with one constructor is known as a `structure`

in Lean. You can read more about inductive types and structures in the very wonderful Theorem Proving In Lean, in sections 7 to 9.

In my next post I’ll talk about induction on equality.

My daughter is busy with exam revision, so here’s some old digital art by one of my sons.

]]>I promised I would do something more ambitious in week 8, and eventually I settled on group cohomology. I usually write these blog posts just before the workshop but instead this week I wrote a README and am only now writing this more detailed document after the event.

I don’t know the history of group cohomology, but I do know that it’s possible to invent the theory of 1-cocycles in lots of ways (for example when trying to understand what the top right hand corner of a group homomorphism from a group G into 2×2 upper triangular matrices “is”) and so they were bound to show up eventually. The basic question is this: Say we have an abelian group (group law `+`

) and a subgroup , and let be the quotient group. Say we have an action of a group (group law `*`

) on , and say is -stable, so gets an induced -action too. Now take -invariants of everything and denote this . The -invariants of are still a subgroup of , but , the -invariant elements of , might be bigger than the image of in . For example if is the integers mod 4, with the cyclic group of order 2 acting by sending to , then is -stable but one checks that acts trivially on so the map is no longer surjective.

Here is an attempt to “measure” failure of surjectivity. Say is -invariant. Lift randomly to . Then if we see that maps to in so must be in . Trying this example in the case above you can convince yourself that you get a group isomorphism from to this way. But in general the map sending to is not a group homomorphism, and is not even “canonical”, as a mathematician would say — it depends on a choice of lifting . Different choices differ by an element of , and asking whether the function is of the form for some is the same as asking whether our original element lifts to an element of .

These ideas synthesized into the following definitions. Say acts on . The zero-th cohomology group is just the subgrup of -invariant elements.

A *cocycle* for a -action on is a function such that . A *coboundary* is a cocycle of the form for some . The quotient of the cocycles by the coboundaries is called the first cohomology group .

The construction above shows that if is a -module with a subgroup also preserved by and quotient (people write “a short exact sequence of -modules”) then there is a map . This actually forms part of a seven term long exact sequence:

and our goal in this workshop, at least if we had infinite time, would be to:

- define all the maps in that exact sequence
- prove the sequence is exact
- prove the inflation-restriction sequence is exact
- develop the concrete theory of H^2 via 2-cocycles and 2-coboundaries
- develop the abstract theory of H^n via n-cocycles and n-coboundaries.

My 2018 BSc project student Anca Ciobanu achived the first two of these goals and my 2019 MSc project student Shenyang Wu achieved the last one. So these goals are definitely possible! It will take rather longer than 2 hours though.

This has become a mini-project of mine, and my current thoughts can be seen in the `ideas`

directory of the week 8 folder in the GitHub workshop repository. Ultimately I hope to get this stuff into mathlib (or perhaps to persuade someone else to get it into mathlib for me )

If you weren’t part of the workshop then you can still do it yourself, all you need is a working Lean and mathlib installation, which you can get following the instructions on the Leanprover community website.

]]>Let me start by talking about things I learnt in my second and third year as an undergraduate. I went to a course called Further Topics in Algebra, lectured by Jim Roseblade, and in it I learnt how to take the tensor product of two finite-dimensional -vector spaces. Jim explained to us the *universal property* of the tensor product, and I saw for the first time in my life the abstract nonsense argument which explains that objects which satisfy the universal property for tensor products are unique up to unique isomorphism. He also explained that writing down the universal property did not count as a *definition*. The abstract nonsense argument shows that tensor products are unique *if they exist*. To prove that they exist, Jim wrote down an explicit model, involving taking a quotient of a gigantic free abelian group with basis the pairs , modulo the relations saying that it is a vector space satisfying the universal property. I came out of these lectures with a good understanding of this technique. Later on in my undergraduate education I met things such as the localisation of a commutative ring at a multiplicative subset, and again I understood that these things were unique up to unique isomorphism if they existed, and that one could write down an explicit model to show they existed.

I came away with the impression that the key fact was the universal property, from which everything else could be proved, and that the model was basically irrelevant. To my surprise, I have learnt more recently that this is not exactly the whole truth. Here is an example, due to Patrick Massot. Let be a commutative ring, and let be a multiplicative subset. I claim that the kernel of the map is precisely the annihilator of . Using the universal property and elementary arguments we can reduce to the statement that if is in the kernel of every ring homomorphism sending to units, then is annihilated by an element of , but as far as I can see, to prove this we have to come up with a cunning , and letting be the explicit model of constructed as a quotient of does the job. In particular the model appears again in the argument! Not equipped with the proof that it is the initial object in the category of -algebras in which is invertible, but equipped with the proof that it is *an* -algebra in which is invertible. My MSc student Amelia Livingston came up with other examples of this as part of her work on Koszul complexes in Lean. But I digress. Let’s get on to what we’ll talk about today.

At university, in my first year, I was taught the following construction. If is a set and is an equivalence relation on , then one can define the set of equivalence classes for this equivalence relation. There is a natural map from to sending to its equivalence class. We write .

All the way through my undergraduate career, when taking quotients, I imagined that this was what was going on. For example when forming quotient groups, or quotient vector spaces, or later on in my life quotient rings and quotient modules, I imagined the elements of the quotient to be sets. I would occasionally look at elements of elements of a quotient set, something rarely done in other situations. I would define functions from a quotient set to somewhere else by choosing a random representative, saying where the representative went, and then proving that the construction was ultimately independent of the choice of representative and hence “well-defined”. This was always the point of view presented to me.

I have only relatively recently learnt that actually, this model of a quotient as a set of equivalence classes is nothing more than that — it’s just a model.

Here’s the universal property. Say is a set equipped with an equivalence relation. A *quotient* for this data is a pair consisting of a set and a function which is constant on equivalence classes (i.e. ) and which is furthermore *initial* with respect to that property. In other words, if is any other set and is any function which is constant on equivalence classes, there exists a unique such that . The usual abstract nonsense argument shows that quotients are unique up to unique isomorphism.

Example 1) Let be a set equipped with an equivalence relation , let be the set of equivalence classes of , equipped with the map sending an element of to its equivalence class. Then is a quotient of by .

Example 2) Let and be any sets, and say is a surjection. Define an equivalence relation on by . Then is a quotient of by .

Example 2 shows that this construction of quotients using equivalence classes is nothing more than a model, and that there are plenty of other sets which show up naturally and which are not sets of equivalence classes but which are quotients anyway. The important point is the universal property. In contrast to localisation of rings, I know of no theorem about quotients for which the “equivalence class” model helps in the proof. The only purposes I can see for this “equivalence class” model now are (1) it supplies a proof that quotients do actually exist and (2) a psychological one, providing a “model” for the quotient.

I have had to teach quotients before, and students find them hard. I think they find them hard because some just basically find it hard to handle this whole “set whose elements are sets” thing. Hence even though the psychological reason was ultimately useful for me, and eventually I “got the hang of” quotients, I do wonder what we should be doing about the people who never master them. An alternative approach in our teaching is to push the universal property angle. I have never tried this. It might turn out even worse!

Here is the mantra which we hear as undergraduates and thus go on to feed our own undergraduates. The situation is this: we have some quotient object (e.g. a quotient ring or a quotient group or whatever) and we want to define a map from this quotient to some other thing .

The argument goes something like this:

“Recall that is a quotient, so its elements are really equivalence classes. We want to define a map from to , so let’s choose . Now remember that really is actually an equivalence class. Choose an element of this equivalence class, and now apply a construction which seems to depend on , giving us an element of [Note that this construction is just a function from to , so let’s call it ]. That is our map from to . But now we need to check that this map is *well-defined*. This means the following: during this construction we did something “non-canonical”, namely choosing a random element of . We need to check that our “function” from to is in fact independent of this choice. So say that instead we had chosen . Then and are in the same equivalence class, so they are equivalent. Now an explicit computation shows that and hence we’re OK — our function is well-defined.”

Is the student left wondering what the heck it means for a function to be “not well-defined”? How come nobody ever talks about any other kind of function before being “well-defined”? I thought the axiom of choice said that we can choose an element in each equivalence class all at the same time. How come we can’t just define by taking , using our axiom of choice element and sending to ? Is that “well-defined”?

The argument now looks like this.

“Recall that is a quotient, so it satisfies the *universal property of quotients*. Recall that this says that to give a map from to another set , all we have to do is to give a function which is constant on equivalence classes; the universal property then gives us a unique such that . So let’s define like this [and now define ], and now let’s check it’s constant on equivalence classes [the same calculation as before]. The universal property thus gives us the function which we require.”

Is that the way to teach this stuff to undergraduates? Is it more confusing, less confusing, or maybe just differently confusing?

Lean has quotients of equivalence relations built in. This is not particularly necessary; it is an implementation decision which did not have to be done like this. One can certainly make quotients as types of equivalence classes (and indeed this has been done in mathlib, with the theory of partitions). However Lean also has an opaque `quotient`

function, which creates another model of a quotient; we don’t know what the elements are, but we know the universal property and this is all we need.

Today we will learn about quotients by working through the explicit construction of the integers as the quotient of by the equivalence relation , and a proof that it is a commutative ring. We will go on to play around a bit with the universal property of quotients, and finish by using abstract nonsense to construct a bijection from our quotient to Lean’s .

There is far far too much material to do in one 2-hour workshop, but I was on a roll. As ever, the material is here, in `src/week_7`

.

Mathematicians use quotients *everywhere*, so it’s kind of interesting that they have their own type, but why not just use the type of equivalence classes? Lean can make that type, and it’s possible to prove all the API for it — I’ve done it. So why explicitly extend Lean’s type theory to add a new quotient type? The answer seems to be this. The universal property for quotients is that if is a quotient of by an equivalence relation , then we have a bijection between the functions and the -equivariant functions . To build from is easy — just compose with . To go the other way Lean has a function called `quotient.lift f h`

, which spits out `g`

given `f`

and a proof `h`

that `f`

is -equivariant (i.e. constant on equivalence classes). The claim that these constructions are inverse bijections boils down to the assertion that `f = (quotient.lift f h) โ p`

, and the proof of this, remarkably, is `rfl`

— it’s true by definition. This is what the baked-in quotient construction buys you. I used to think that this was really important (and indeed in the past I have claimed that this is a key advantage which Lean has over Coq). Now I am not so sure. It probably makes proofs a bit slicker occasionally — but nowadays I am less convinced by the idea of definitional equality in general — I’m happy to rewrite.

`filter.tendsto_mul`

. So this week I’m going to talk about `tendsto`

, but first I think it’s worth refreshing our memories about the useful mental picture of a filter as a generalised subset.
Let `X`

be a type (you can call it a set if you like). The type of subsets of `X`

has a bunch of nice structure — there’s a partial order , there are unions and intersections (both finite and infinite) and they satisfy a bunch of axioms. Back in the days of Isaac Newton, one particularly well-studied type was the type of real numbers. However, people had not quite worked out whether *infinitesimals* existed — infinitely small non-zero real numbers called things like and — and some people like Leibniz wanted to divide one of them by the other because they were discovering some new-fangled thing called calculus. By the 20th century, the experts had made their decision: there were *no infinitesimally small nonzero real numbers*, and that’s what the textbooks say today (other than Robinson’s book on non-standard analysis, but nobody uses that for teaching undergraduate calculus). However it was equally clear that infinitesimals provided a *good picture*.

A filter on `X`

is a kind of “generalised subset” of `X`

. Each subset `S`

of `X`

gives you a filter, called the principal filter `๐ S`

, and there are other filters too corresponding to slightly weirder things. For example, if `X = โ`

then there’s a filter called `๐ 0`

, the neighbourhood filter of `0`

, which should be thought of as containing `0`

and all the infinitesimally small numbers. Just like usual subsets, these generalised subsets have a partial order, which we’ll call , extending the partial order on usual subsets. In reality these filters are defined completely rigorously as some collection of usual subsets satisfying some axioms but we won’t go into these this week, we’ll just stick with the picture.

Let’s stick with “usual” subsets for this section, but let’s throw in a second type `Y`

and a function `f : X โ Y`

. The function gives us some kind of dynamics in the system — we can start using the function to move sets around. The most obvious way that a function can move a set around is via the `image`

construction. Given a subset `S`

of `X`

, we can consider what a mathematician would call , the image of `S`

in `Y`

, defined as the set of `y`

in `Y`

such that there exists `x`

in `S`

with `f x = y`

. This is an abuse of notation — the inputs to `f`

are supposed to be elements of `X`

, not subsets of `X`

, so `f X`

does not make sense, and in Lean we carefully differentiate between these ideas by writing `f '' S`

for the image of `S`

in `Y`

. We call this “pushing forward a subset along `f`

“.

Conversely, if `T : set Y`

then there is a way of “pulling `T`

back along `f`

” to give us a subset of `X`

, consisting of the `x`

in `X`

such that `f x`

is in `T`

. Again Lean has a weird notation for this, because `โปยน`

is taken by a general kind of inverse function on a group. So we write `f`

for this construction. `โปยน`

' T

If `set X`

denotes the type of subsets of `X`

, then `f : X โ Y`

gives rise to functions `f '' : set X โ set Y`

and `f`

. Are these functions inverse to one another? No, not remotely! In general, doing one then the other won’t get you back to where you started. So what is the relationship between these two constructions? The fancy answer is that they form a Galois connection, and the even fancier answer is that they are a pair of adjoint functors. But let’s not go into this. Let’s talk about a fundamental predicate.`โปยน`

' : set Y โ set X

Let’s stay with the set-up: `X`

and `Y`

are types, and `f : X โ Y`

is a function. Say `S`

is a subset of `X`

and `T`

is a subset of `Y`

. Then we can ask ourselves the following true-false question: does `f`

restrict to a function from `S`

to `T`

? In other words, is it true that `x โ S`

implies `f x โ T`

? Perhaps a good notation for this idea would be something like . The reason this notation is appealing is that if then , and if then , and this feels like some kind of transitivity statement, but it isn’t literally transitivity of some relation on a type, because `S`

and `T`

don’t in general have the same type — they’re subsets of different sets. How can we restate using pushforwards or pullbacks?

If you think about it, it turns out that there is a way to state this relation using pushforwards, and an equivalent way using pullbacks. One can check easily that is equivalent to `f '' S โ T`

and also to `S โ fโปยน' T`

. In particular `f '' S โ T`

and `S โ fโปยน' T`

are equivalent to each other (and we have proved that the functors are adjoint, for those of you who know this category-theoretic language).

Our aim is to find analogous constructions for filters.

Jordan Ellenberg on Twitter remarked that something I said last week reminded him of a fact about ideals. I really like this idea because if we’re going to think in pictures then it helps to understand analogies. When people were trying to understand factorization in integral domains such as the integers, they have to deal with the fact that and also but that really these are “the same factorization”. This leads us to an equivalence relation on elements of an integral domain — two elements are equivalent if one is a unit times the other, where the units are divisors of 1. Equivalence classes of elements are the same as principal ideals of the ring, and one might have hoped that this idea solves all your factorization problems. But rings like teach us otherwise — now and these factorizations are still not equivalent. The fix was to introduce some magic “ideal numbers”, or “ideals” for short, which are not really numbers, but some kind of generalised number, and now every non-zero generalised number in factors uniquely into prime generalised numbers. The reason I am bringing this up is that it is not difficult to check that every ideal of a commutative ring is uniquely determined by the principal ideals which it contains (because it is uniquely determined by the elements it contains).

Filters, a.k.a. generalised subsets, have the reverse property: every filter is uniquely determined by the principal filters which contain it. This is an extensionality lemma for filters, and it is this idea which we need to keep in mind when we try to figure out how to push forward and pull back filters.

As ever, say `X`

and `Y`

are types, and `f : X โ Y`

is a function. Pushing forward filters along `f`

(called `map f : filter X โ filter Y`

in Lean) is barely any harder than pushing forward subsets. Say `F : filter X`

is a filter on `X`

. Let’s figure out how to define its pushforward `map f F`

, a filter on `Y`

. By the remark above, it suffices to figure out which subsets `T`

of `Y`

, or more precisely which principal filters `T`

, satisfy `map f F โค T`

. If we want our intuition to be correct, this should be the case precisely when `F `

, because this feels exactly like the situation studied above. Hence we will `โค`

`fโปยน' T`

*define* the pushforward `map f F`

of filter `F`

along `f`

by saying that `map f F โค T`

if and only if `F `

, and one can check that this definition (of `โค`

`fโปยน' T`

`(map f F).sets`

) satisfies the axioms of a filter. This is one of the things you’ll be proving today in workshop 6.

Pulling back filters, called `comap f : filter Y โ filter X`

, is harder, because if `G : filter Y`

then this time we want to solve `comap f G โค S`

and the adjoint functor nonsense above only tells us about `S โค comap f G`

, which is not enough information: for example the only nonempty subset of `โ`

contained in the infinitesimal neighbourhood filter `๐ 0`

is the subset `{0}`

, but `๐ 0`

is strictly larger than the principal filter `๐ 0`

because it also contains Leibniz’s elements . The one question we can answer heuristically however, is a criterion for `comap f G `

, because if our mental model of `โค`

`fโปยน' T`

`comap f`

is “generalised

” then `fโปยน'`

`G `

should imply this. The problem with just restricting to these `โค`

T`T`

s is that if `f`

is not injective then we can never distinguish between distinct elements of `X`

mapping to the same element of `Y`

, and yet if `comap f G `

and `โค`

`fโปยน' T`

then we certainly want `fโปยน' T`

`โค`

S

. So this is what we go for: `comap f G โค`

S

if and only if there exists a subset `comap f G โค`

S`T`

of `Y`

such that `G `

and `โค`

T

. It turns out that there does exist a filter on `fโปยน' T`

`โค`

S`X`

satisfying these inequalities, and this is our pullback filter.

One can check, in exact analogy to pushing forward and pulling back subsets, that `map f F โค G ↔ F โค comap f G`

— indeed, this is the boss level of Part A of today’s workshop.

The picture behind `filter.tendsto f`

is easy, indeed we have seen it before for sets. The idea is simply that if `f : X โ Y`

as usual, and `F : filter X`

and `G : filter Y`

then we can ask whether `f`

restricts to a map from the generalised set `F`

to the generalised set `G`

. This true-false statement is called `tendsto f F G`

in Lean. It is equivalent to `map f F โค G`

and to `F โค comap f G`

, and it seems to be pronounced “`F`

tends to `G`

along `f`

“, although this notation does not often get pronounced because it seems to be rather rare to see it in the literature. It is used *absolutely all over the place* in Lean’s treatment of topological spaces and continuity, and it’s based on an analogous formalisation of topology in the older Isabelle/HOL proof system.

Why is this notion useful? Here is a beautiful example. Let us say now that `X`

and `Y`

are topological spaces, and `f : X โ Y`

is an arbitrary function. What does it mean to say that `f`

is continuous at `x`

, an element of `X`

? The idea is that if `x'`

is an element of `X`

very close to `x`

, then `f x'`

should be very close to `f x`

. How close? Let’s just say infinitesimally close. The idea is that `f`

should send elements of `X`

infinitesimally close to `x`

to elements infinitesimally close to `f x`

. In other words,

`tendsto f (๐ x) (๐ (f x))`

.

It’s as simple as that. We want `f`

to map an infinitesimal neighbourhood of `x`

to an infinitesimal neighbourhood of `f x`

. We say that `f`

is *continuous* if it is continuous at `x`

for all `x`

in `X`

. One can check that this is equivalent to the usual definition of continuity. The proof that a composite of continuous functions is continuous is just as easy using this filter language as it is in the usual open set language, and certain other proofs become easier to formalise using this point of view.

But the real power of the `tendsto`

predicate is that it does not just give us a new way of thinking about continuity of a function at a point. It also gives us a new way of thinking about limits of functions, limits of sequences of real numbers, limits in metric space, and more generally essentially any kind of limit that comes up in an undergraduate degree — these concepts are all *unified* by the `tendsto`

predicate.

The second Lean file in today’s workshop consists of an analysis of a two-line proof of the fact that the limit of the product of two real sequences is the product of the limits. Let me finish by talking about these two lines.

The first line observes that the definition `is_limit a l`

of week 3 (the traditional ) is equivalent to the rather slicker-looking `tendsto a cofinite (๐ l)`

, where `cofinite`

is the cofinite filter (defined as the generalised subset `C`

of `โ`

such that `C โค S`

if and only if `S`

is cofinite). It is straightforward to see that these two predicates are equivalent: `tendsto a cofinite (๐ l)`

is equivalent to saying that any subset of the reals whose interior contains `l`

has pre-image under `a`

a cofinite subset of the naturals, which is equivalent to saying that for any , the number of such that is finite.

The second line comes from the following standard fact about neighbourhood filters (which we will prove): if `Y`

is a topological space equipped with a continuous multiplication, if `A`

is any type, and if `f`

and `g`

are functions from `A`

to `Y`

and `F`

is any filter on `A`

, then `tendsto f F (๐ y)`

and `tendsto g F (๐ z)`

together imply `tendsto (f * g) F (๐ (y * z))`

, where `f * g`

denotes the map from `A`

to `Y`

sending `a`

to `f a * g a`

. We prove this result from first principles in the file, although of course it is in Lean’s maths library already, as is the fact that multiplication on the real number is continuous, which is why we can give a complete two-line proof of the fact that limit of product is product of limits in Lean.

Let `X`

be a type, i.e. what most mathematicians call a set. Then `X`

has subsets, and the collection of all subsets of `X`

has some really nice properties — you can take arbitrary unions and intersections, for example, and if you order subsets of `X`

by inclusion then these constructions can be thought of as sups and infs and satisfy a bunch of axioms which one might expect sups and infs to satisfy, for example if for all in an index set then . In short, the subsets of a set form what is known in order theory as a *complete lattice*.

A filter can be thought of as a kind of generalised subset of `X`

. Every subset `S`

of `X`

gives rise to a filter on `X`

, called the principal filter `๐ S`

associated to `S`

, and we have `๐ S = ๐ T`

if and only if `S = T`

. However if `X`

is infinite then there are other, nonprincipal, filters `F`

on `X`

, which are slightly vaguer objects. However, filters still have an ordering on them, written `F โค G`

, and it is true that `S โ T ↔ ๐ S โค ๐ T`

(indeed we’ll be proving this today). To give an example of a filter which is not principal, let’s let `X`

be the real numbers. Then for a real number `x`

there is a filter `๐ x`

, called the neighbourhood filter of `x`

, with the property that if `U`

is any open subset of containing `x`

then `๐ {x} `

. In other words, `<`

`๐ x `

`<`

`๐`

U

is some kind of “infinitesimal neighbourhood of `๐ x`

`x`

“, strictly bigger than `{x}`

but strictly smaller than every open neighbourhood of `x`

. This is a concept which cannot be formalised using sets alone, but can be formalised using filters.

Let me motivate the definition before I give it. Say `F`

is a filter. Let’s define `F.sets`

to be the subsets of `X`

which contain `F`

, i.e., the `S`

such that `F โค ๐ S`

. Here is a property of filters which I have not yet mentioned: If two filters `F`

and `G`

satisfy `F.sets = G.sets`

, then `F = G`

; in other words, a filter is determined by the principal filters which contain it. This motivates the following definition: why not define a filter `F`

to *be* the set of subsets of `X`

which contain it? We will need some axioms — what are reasonable axioms? We don’t want a filter to be bigger than `X`

itself, and we want to make sure that if `S`

contains `F`

then `T`

contains `F`

for any `T โ S`

; finally if both `S`

and `T`

contain `F`

then we want `S โฉ T`

to contain `F`

. That’s the definition of a filter!

```
structure filter (ฮฑ : Type*) :=
(sets : set (set ฮฑ))
(univ_sets : set.univ โ sets)
(sets_of_superset {x y} : x โ sets โ x โ y โ y โ sets)
(inter_sets {x y} : x โ sets โ y โ sets โ x โฉ y โ sets)
```

A filter on `X`

, or, as Lean would like to call it, a term `F : filter X`

of type `filter X`

, is a collection `F.sets`

of subsets of `X`

satisfying the three axioms mentioned above. That’s it. Unravelling the definitions, we see that a sensible definition of `F โค G`

is that `G.sets โ F.sets`

, because we want `G โ S`

to imply `F โ S`

(or, more precisely, we want `G โค ๐ S`

to imply `F โค ๐ S`

).

It’s probably finally worth mentioning that in Bourbaki, where this concept was first introduced, they have an extra axiom on their filters — they do not allow `๐ โ
`

to be a filter — the empty set is not a generalised set. In this optic this looks like a very strange decision, and this extra axiom was dropped in Lean. Indeed, we bless `๐ โ
`

with a special name — it is `โฅ`

, the unique smallest filter under our `โค`

ordering. The (small) advantage of the Bourbaki convention is that an ultrafilter can be defined to literally be a minimal element in the type of all filters, rather than a minimal element in the type of all filters other than `โฅ`

. This would be analogous to not allowing a ring `R`

to be an ideal of itself, so one can define maximal ideals of a ring to be the maximal elements in the set of all ideals of the ring. However this convention for ideals would hugely break the functoriality of ideals, for example the image of an ideal along a ring homomorphism might not be an ideal any more, the sum of two ideals might not be an ideal, and so on. Similarly, we allow

to be a filter in Lean, because it enables us to take the intersection of filters, pull filters back and so on — it gives a far more functorial definition. `โฅ`

The material this week is in week_5 of the formalising-mathematics GitHub repo which you can download locally if you have `leanproject`

installed or, if you have the patience of a saint and don’t mind missing some of the bells and whistles, you can try online (Part A, and Part B). NB all this infrastructure didn’t just appear by magic, I wrote the code in the repo but I had nothing to do with all these other tricks to make it easier for mathematicians to use — we have a lot to thank people like Patrick Massot and Bryan Gin-ge Chen for.

In Part A we start by defining principal filters and we make a basic API for them. I give a couple more examples of filters too, for example the cofinite filter `C`

on `X`

, which is all the subsets of `X`

whose complement is finite. This filter is worth dwelling on. It corresponds to a generic “every element of `X`

apart from perhaps finitely many” subset of `X`

, perhaps analogous to a generic point in algebraic geometry. However, there exists no element `a`

of `X`

such that `๐ {a} โค C`

, because `X - {a}`

is a cofinite subset not containing `a`

. In particular, thinking of filters as generalised subsets again, we note that whilst a generalised set is determined by the sets containing it, it is definitely not determined by the sets it contains: indeed, `C`

contains no nonempty sets at all.

In Part B we go on to do some topology. We define neighbourhood filters and cluster points, and then talk about a definition of compactness which doesn’t involve open sets at all, but instead involves filters. I am still trying to internalise this definition, which is the following:

```
def is_compact (S : set X) := โ โฆFโฆ [ne_bot F], F โค ๐ S โ โ a โ S, cluster_pt a F
```

In words, a subset `S`

of a topological space is *compact* if every generalised non-empty subset `F`

of `S`

has closure containing a point of `S`

.

Let’s think about an example here. Let’s stick to `S = X`

. Say `S`

is an infinite discrete topological space. Then the cofinite filter is a filter on `S`

which has no cluster points at all, meaning that an infinite discrete topological space is not compact. Similarly imagine `S`

is the semi-open interval . Then the filter of neighbourhoods of zero in , restricted to this subset (i.e. just intersect all the sets in the filter with ), again has no cluster points, so this space is not compact either. Finally let’s consider itself. Then the `at_top`

filter, which we will think about in Part A, consists of all subsets of for which there exists some such that . This “neighbourhood of ” filter has no cluster points in (note that would be a cluster point, but it’s not a real number). Hence is not compact either. We have certainly not proved here that this definition of compact is mathematically equivalent to the usual one, but it is, and if you’re interested, and you’ve learnt Lean’s language, you can just go and read the proof for yourself in Lean’s maths library.

The boss level this week is, again, that a closed subspace of a compact space is compact. But this time we prove it with filters. As last time, we prove something slightly more general: if `X`

is any topological space, and if `S`

is a compact subset and `C`

is a closed subset, then `S โฉ C`

is compact. Here’s the proof. Say `F`

is a nonempty generalised subset (i.e. a filter) contained in `S โฉ C`

. By compactness of `S`

, `F`

has a cluster point `a`

in `S`

. But `F`

is contained in `C`

, so all cluster points of `F`

are cluster points of `C`

, and the cluster points of `C`

are just the closure of `C`

, which is `C`

again. Hence `a`

is the element of `S โฉ C`

which we seek. No covers, no finite subcovers.