Induction, and inductive types.

Here’s a conundrum: why does a schoolkid think recursion is a triviality (“the Fibonacci sequence is defined like this: 1,1,2,3,5,8,…, with each number in the sequence the sum of the two before it” — this is a definition very likely to be accepted by a schoolkid) but they might struggle with the principle of mathematical induction (a procedure for generating a sequence of proofs, each one a consequence of the one before it)? I was taught at university that induction and recursion were different things, but Lean has a common abstraction which makes them both instances of the same thing. In this post I’ll talk about induction, and recursion, and also about how something called the Calculus of Inductive Constructions (the way Lean’s inductive command works) provides some rather surprising proofs of basic mathematical facts.

Defining a type.

It is not unreasonable to think of what Lean calls a type as what a mathematician would usually call a set. For example a mathematician might say “let S be a set with three elements; call the elements a, b and c“. Here’s how to make that set, or type, in Lean:

inductive S : Type
| a : S
| b : S
| c : S

In fact the full names of the elements of S, or the terms of type S as we call them in type theory, are S.a, S.b and S.c, and we might want to open the S namespace so that we can just refer to them as a, b and c. An undergraduate might want to make this kind of definition when they are answering the following kind of question in Lean: “Let f : X \to Y and g : Y \to Z be functions. True or false: if the composition g \circ f is injective, then g is injective”. This is false, and to prove it’s false one can either use the sets we mathematicians have lying around (such as the naturals, integers or reals), or one can just build some explicit sets of small size like S above, and some explicit functions between those sets.

Defining functions between types.

So here’s what we’re going to do. Let’s make a type X with one term p, a type Y with two terms q and r, and a type Z with one term s. This is easy given what we’ve already seen:

inductive X : Type
| p : X

inductive Y : Type
| q : Y
| r : Y

inductive Z : Type
| s : Z

[By the way, if you want to play along but you haven’t got Lean installed on your computer, you can do all this within a web browser by clicking here (although you could instead click here to find out how to install Lean and the community tools, which give you a far slicker experience).]

Our counterexample is going to be the following: We define f : X → Y by f(p)=q and g : Y → Z by g(q)=g(r)=s. Let’s do this.

open X Y Z

def f : X → Y
| p := q

def g : Y → Z
| q := s
| r := s

As a mathematician I find the use of the | symbol quite intimidating (especially that we are now using it in a different way), but given that I’ve told you what we’re doing and now I’m telling you how we are doing it, you can probably guess what it all means. One can now go ahead and state the result:

open function

example : ¬ (∀ (X Y Z : Type) (f : X → Y) (g : Y → Z),
  injective (g ∘ f) → injective g) :=
begin
  sorry
end

and now if you fancy proving a mathematical theorem by playing a puzzle game, you can click here to get all the code at once, and have a go. Instead of talking about the proof though, I want to talk about the rather surprising (at least to me) fact that Lean is defining f and g by recursion.

What happens when we define an inductive type?

What happens under the hood when Lean sees this code

inductive X : Type
| p : X

is quite surprising (at least to me as a mathematician). I’ve been arguing above that we should think of this code as saying “Let X := \{p\} be a set with one element”. But here’s what’s really going on when Lean sees this code. Unsurprisingly, Lean defines a new type X and a new term p (or more precisely X.p) of type X. It also defines one more new thing, which expresses that p is the only element of X. But the way it does this is surprising: it defines the so-called recursor for X, which is the following statement:

X.rec : ∀ {C : X → Sort u}, C p → (∀ (x : X), C x)

Whatever does that mean? Well first I think I’d better explain what this Sort u thing is. I’ve written in the past an explanation of how sets and their elements, and theorems and their proofs, are unified in Lean’s type theory as types and their terms. The sets/elements story goes on in the Type universe, and the theorems/proofs story goes on in the Prop universe. When Lean says Sort u it means “either of these universes”. So we can rewrite X.rec as two statements:

X.recursor : ∀ {C : X → Type}, C p → (∀ (x : X), C x)
X.inductor : ∀ {C : X → Prop}, C p → (∀ (x : X), C x)

The first statement is the principle of recursion for X. In set-theoretic language it says this. “Let’s say that for every element x of X we have a set C(x), and let’s say we have an element of C(p). Then we have a method of constructing an element of C(x) for all x \in X.” This looks like a rather long-winded way of saying that p is the only element of X. In fact it is worth looking at the special case of X.recursor where C is the constant function sending every element of X to the set S:

X.recursor_constant : ∀ S, S → (X → S)

This says that if S is any set, and we have an element a of S, then we can get a function from X to S. What is unsaid here, but true by definition, is that it’s the function that sends p to S, as can be checked thus:

-- give X.rec the constant function C sending everything to S
def X.recursor_constant : ∀ S, S → (X → S) := λ S, @X.rec (λ x, S) 

example (S : Type) (a : S) : (X.recursor_constant S a) p = a :=
begin
  -- true by definition
  refl
end

Do you remember our definition of f above?

def f : X → Y
| p := q

This function f is defined using X.recursor_constant, letting S be Y and letting the element of S be q. The notation Lean uses is short, but under the hood this is how f is constructed.

So much for recursion. The second statement coming from X.rec is X.inductor, the principle of induction for X. I should probably say that I made up the word “inductor”, but inductor is to induction as recursor is to recursion. In more mathematical language the inductor says this. “Let’s say that for every element x of X we have a true-false statement C(x), and let’s say that C(p) is true. Then C(x) is true for every element x of X.” So again it is just a rather long-winded way of saying that p is the only element of X.

Why have computer scientists isolated these rather elaborate statements as the fundamental way to say that p is the only element of X? It’s actually because of a fundamental symmetry. We have defined a new type X, in a functional programming language, and now the fundamental thing we need to do next is to explain how to define functions into X, and how to define functions out of X. To define functions into X, we need to have access to terms of type X, or in computer science lingo to constructors of X. This is exactly what X.p is — a way to construct a term of type X. To define functions out of X, we need access to eliminators for X, that is, some kind of gadget whose output is a function from X to somewhere else. Because X only has one term, namely p, we need a way of saying “to give a function out of X, we only need to say what happens to p“, and this is exactly what the recursor is doing. Between them, the constructor and recursor say in a formal way that the elements of X are “at least p, and at most p, so are exactly p.”

Recursors for general inductive types.

Lean automatically generates constructors and a recursor for every type defined with the inductive command. There is a general rule for how to do this, but informally it’s pretty clear. We define inductive types using this | symbol, and you get a constructor for each line with a | in. The eliminator or recursor simply says that to define a function from the new type you’re defining, all you have to do is to make sure you’ve defined it on each constructor.

The rest of this post is the fun part. I will go through a bunch of inductive types defined in Lean, we can look at the definition, figure out the recursor attached to each of the types, and then see what this corresponds to mathematically. We will see some familiar things popping up in surprising ways.

Example: a set with two elements.

Recall our inductive type Y:

inductive Y : Type
| q : Y
| r : Y

The recursor for Y tells us that if S is a set, then to get a map from Y to S we have to give two elements of S, one corresponding to where q goes and one corresponding to where r goes.

def Y.recursor_constant : ∀ S, S → S → (Y → S) := λ S, @Y.rec (λ y, S)

The full recursor can even be used (with non-constant C) to define a function from Y which sends q into one type and r into a different type, but when defining the function g above we do not need this level of generality. If you want to see what Y‘s recursor looks like, just type #check @Y.rec in a Lean session after the definition of Y, and remember that Π is just computer science for (in Lean 4 they will be using instead of Π in fact).

Example: the natural numbers.

Mathematicians who have seen the development of mathematics in ZFC set theory know that a complex number is defined to be a pair of real numbers, a real number is defined to be an equivalence class of Cauchy sequences of rational numbers, a rational number is defined to be a multiplicative localisation of the integers at the nonzero integers, an integer is defined to be an additive localisation of the naturals at the naturals, and the naturals are defined by the ZFC axiom of infinity. In Lean’s type theory a complex number is defined to be a pair of real numbers, a real number is defined to be an equivalence class of Cauchy sequences of rational numbers etc etc etc, and it’s all just the same up to the very end, because in Lean the naturals are defined by Peano’s axioms:

inductive nat : Type
| zero : nat
| succ (n : nat) : nat

This means that we have two ways to make natural numbers. First, zero is a natural number. Second, if n is a natural number, then succ n (usually called n+1 by mathematicians) is a natural number. Now we need a way of expressing the idea that this is the only way to make natural numbers, and this is the recursor, which is automatically generated by Lean, and says a precise version of the following informal thought: “If you want to do something for all naturals, then you need to tell me how to do it for both constructors”. In other words, “…you need to tell me how to do it for zero, and then you have to tell me a way to do it for n+1 assuming we’ve already done it for n“. Sounds familiar?

The recursor in general involves a map to Sort u. Let’s just specialise to the two universes we’re interested in, and take a look at the constant recursor, and the inductor (and let’s use Lean’s notation for nat):

nat.recursor_constant : ∀ (S : Type), S → (∀ (n : ℕ), S → S) → (ℕ → S)
nat.inductor : ∀ (C : ℕ → Prop), C 0 → (∀ (n : ℕ), C n → C (succ n)) → ∀ (n : ℕ), C n

[The proof of nat.recursor_constant is λ S, @nat.rec (λ n, S) and the proof of nat.inductor is just @nat.rec. ]

The constant recursor says this: if S is a set, and we want to make a function f : ℕ → S, here’s a way of doing it. First we need an element of S (namely f(0)) and second, for each natural number we need a map from S to S (telling us how to make f(n+1) given that we know f(n)).

The inductor says this. Say we have a family C(n) of true-false statements, and that C(0) is true, and that for all n we have a proof that C(n) implies C(n+1). Then we can deduce that C(n) is true for all n.

What I think is really cute about this example is that Peano’s definition of the natural numbers makes it immediately clear why the principle of mathematical induction works. In the natural number game we use the recursor in the background to define addition and multiplication on the naturals. We also use it to prove things which I call “axioms” in the natural number game — for example the proof that 0 is not equal to succ n for any natural number n uses the recursor to define a function from ℕ to \{0,1\} sending 0 to 0 and succ n to 1, and using this function it’s easy to prove the “axiom” zero_ne_succ by contradiction. If you want an exercise, try using nat.recursor_constant to prove injectivity of the succ function, something else I also claimed was an axiom in the natural number game (as Peano did) but which was actually proved using the recursor.

Example : false

false is a true-false statement, and you can probably guess which one it is. In Lean false is defined as an inductive type! Here’s the full definition:

inductive false : Prop

This time there are no | s at all! Every constructor of false would be a proof of a false statement, so this design decision is not surprising. The recursor is

false.rec : Π (C : Sort u), false → C

In other words, to give a map from false to C you have to define it on all constructors, of which there are none. Let’s take a look at the inductor then, by changing Sort u to Prop:

false.inductor : ∀ (P : Prop), false → P

It says that if P is any true-false statement, then false implies P. This logical tautology has been automatically generated by Lean, because Lean’s model of an implication Q\implies P is a function from proofs of Q to proofs of P, and false has no terms, i.e., no proofs.

There is a similar story with inductive empty : Type, Lean’s definition of the empty type. The recursor for empty says that to give a map from the empty type to any type S, you don’t have to do anything other than feed in S.

Example : or

The logical or on propositions is defined as an inductive type in Lean!

inductive or (P Q : Prop) : Prop
| inl (hp : P) : or
| inr (hq : Q) : or

There are two constructors for P ∨ Q, where now I’m using the usual logicians’ notation for or. In other words, there are two ways to prove P ∨ Q. First you can prove P, and second you can prove Q. Lean’s auto-generated inductor for this is

or.inductor : ∀ (P Q R : Prop), (P → R) → (Q → R) → (P ∨ Q → R)

In other words, if you can prove P\implies R and you can prove Q\implies R, then you can deduce (P\lor Q)\implies R. Again no mathematician is surprised that this statement is true, but perhaps some are surprised by the fact that a computer scientist might claim that this is true by induction on or.

Example : equality.

The = symbol in Lean is defined as an inductive type! But I think I’m going to save the topic of what induction on equality is until the next post, where we will prove, by induction, that equality is an equivalence relation.

Some final notes.

I was very surprised when I realised that every inductive type came with a principle of induction. In fact one can even define the reals as an inductive type, which means that there will be an inductor for reals meaning that you can do induction on the reals! But when I figured out what the induction principle said I was disappointed — it says “if you can prove it for every real which is an equivalence class of Cauchy sequences of rationals, you can prove it for every real”. Remember that the idea of the recursor is that it is a way of saying “every term of your type can be made using the constructors”, so if your only constructor for a real is an equivalence class of Cauchy sequences of rationals then this is what you get. However these other examples, and in particular these examples coming from logic, are quite funky. An example I didn’t talk about: and is an inductive type and its inductor is ∀ (P Q R : Prop), (P → (Q → R)) → (P ∧ Q → R), which is some propositional version of uncurrying (indeed the constant recursor for prod, the product of two types, is uncurrying on the nose). The basic facts in propositional logic about and and or and are proved constructively in Lean using recursors rather than by truth tables, because directly constructing the functions corresponding to the proofs is more appealing than a case split.

Not everything is an inductive type in Lean — there are two other kinds of types. There are quotient types, which are there for some kind of computer science efficiency reasons and which could be constructed using inductive types, and then there are function types, which are a different kind of thing. I don’t think it’s of mathematical interest whether they type you’re looking at is an inductive type or a function type, but here’s an example of a function type: logical not. In Lean, ¬ P is defined to be P → false. On the other hand most of the structures used by mathematicians (groups, subgroups, rings, fields, perfectoid spaces and so on) are defined as inductive types (often however with one constructor, so their induction principle is boring). An inductive type with one constructor is known as a structure in Lean. You can read more about inductive types and structures in the very wonderful Theorem Proving In Lean, in sections 7 to 9.

In my next post I’ll talk about induction on equality.

My daughter is busy with exam revision, so here’s some old digital art by one of my sons.

Posted in Learning Lean, Type theory | Tagged , | 7 Comments

Formalising Mathematics : workshop 8 — group cohomology

This is notes on the last of the eight workshops I am running this term as part of an EPSRC Taught Course Centre course. Many thanks to the students for coming, it was very interesting working with you all.

I promised I would do something more ambitious in week 8, and eventually I settled on group cohomology. I usually write these blog posts just before the workshop but instead this week I wrote a README and am only now writing this more detailed document after the event.

What is group cohomology?

I don’t know the history of group cohomology, but I do know that it’s possible to invent the theory of 1-cocycles in lots of ways (for example when trying to understand what the top right hand corner of a group homomorphism from a group G into 2×2 upper triangular matrices “is”) and so they were bound to show up eventually. The basic question is this: Say we have an abelian group N (group law +) and a subgroup M, and let P:=N/M be the quotient group. Say we have an action of a group (group law *) G on N, and say M is G-stable, so P gets an induced G-action too. Now take G-invariants of everything and denote this N\mapsto N^G. The G-invariants M^G of M are still a subgroup of N^G, but P^G, the G-invariant elements of P, might be bigger than the image of N^G in P. For example if N=\mathbb{Z}/4\mathbb{Z} is the integers mod 4, with G the cyclic group of order 2 acting by sending x to -x, then M:=N^G=\{0,2\} is G-stable but one checks that G acts trivially on P so the map N^G\to P^G is no longer surjective.

Here is an attempt to “measure” failure of surjectivity. Say p\in P is G-invariant. Lift randomly to n\in N. Then if g\in G we see that gn-n maps to gp-p=0 in P so must be in M. Trying this example in the \mathbb{Z}/4\mathbb{Z} case above you can convince yourself that you get a group isomorphism from G to M this way. But in general the map G\to M sending g to gn-n is not a group homomorphism, and is not even “canonical”, as a mathematician would say — it depends on a choice of n lifting p. Different choices differ by an element of m, and asking whether the function g\mapsto gn-n is of the form g\mapsto gm-m for some m\in M is the same as asking whether our original element p\in P^G lifts to an element of N^G.

These ideas synthesized into the following definitions. Say G acts on M. The zero-th cohomology group H^0(G,M) is just the subgrup M^G of G-invariant elements.

A cocycle for a G-action on M is a function c:G\to M such that c(gh)=c(g)+gc(h). A coboundary is a cocycle of the form g\mapsto gm-m for some m\in M. The quotient of the cocycles by the coboundaries is called the first cohomology group H^1(G,M).

The construction above shows that if N is a G-module with a subgroup M also preserved by G and quotient P=N/M (people write “a short exact sequence of G-modules”) then there is a map H^0(G,P)\to H^1(G,M). This actually forms part of a seven term long exact sequence:

0 \to H^0(G,M)\to H^0(G,N)\to H^0(G,P)\to H^1(G,M)\to H^1(G,N)\to H^1(G,P)

and our goal in this workshop, at least if we had infinite time, would be to:

  • define all the maps in that exact sequence
  • prove the sequence is exact
  • prove the inflation-restriction sequence is exact
  • develop the concrete theory of H^2 via 2-cocycles and 2-coboundaries
  • develop the abstract theory of H^n via n-cocycles and n-coboundaries.

My 2018 BSc project student Anca Ciobanu achived the first two of these goals and my 2019 MSc project student Shenyang Wu achieved the last one. So these goals are definitely possible! It will take rather longer than 2 hours though.

This has become a mini-project of mine, and my current thoughts can be seen in the ideas directory of the week 8 folder in the GitHub workshop repository. Ultimately I hope to get this stuff into mathlib (or perhaps to persuade someone else to get it into mathlib for me 😉 )

If you weren’t part of the workshop then you can still do it yourself, all you need is a working Lean and mathlib installation, which you can get following the instructions on the Leanprover community website.

Posted in Uncategorized | Leave a comment

Formalising Mathematics : workshop 7 — quotients

At university I was told about how to quotient by an equivalence relation. But I was not told the right things about it. This workshop is an attempt to put things right.

Models

Let me start by talking about things I learnt in my second and third year as an undergraduate. I went to a course called Further Topics in Algebra, lectured by Jim Roseblade, and in it I learnt how to take the tensor product V\otimes_k W of two finite-dimensional k-vector spaces. Jim explained to us the universal property of the tensor product, and I saw for the first time in my life the abstract nonsense argument which explains that objects which satisfy the universal property for tensor products are unique up to unique isomorphism. He also explained that writing down the universal property did not count as a definition. The abstract nonsense argument shows that tensor products are unique if they exist. To prove that they exist, Jim wrote down an explicit model, involving taking a quotient of a gigantic free abelian group with basis the pairs (v,w), modulo the relations saying that it is a vector space satisfying the universal property. I came out of these lectures with a good understanding of this technique. Later on in my undergraduate education I met things such as the localisation of a commutative ring at a multiplicative subset, and again I understood that these things were unique up to unique isomorphism if they existed, and that one could write down an explicit model to show they existed.

I came away with the impression that the key fact was the universal property, from which everything else could be proved, and that the model was basically irrelevant. To my surprise, I have learnt more recently that this is not exactly the whole truth. Here is an example, due to Patrick Massot. Let R be a commutative ring, and let S be a multiplicative subset. I claim that the kernel of the map R\to R[1/S] is precisely the annihilator of S. Using the universal property and elementary arguments we can reduce to the statement that if r\in R is in the kernel of every ring homomorphism R\to T sending S to units, then r is annihilated by an element of S, but as far as I can see, to prove this we have to come up with a cunning T, and letting T be the explicit model of R[1/S] constructed as a quotient of R\times S does the job. In particular the model appears again in the argument! Not equipped with the proof that it is the initial object in the category of R-algebras in which S is invertible, but equipped with the proof that it is an R-algebra in which S is invertible. My MSc student Amelia Livingston came up with other examples of this as part of her work on Koszul complexes in Lean. But I digress. Let’s get on to what we’ll talk about today.

What is a quotient?

At university, in my first year, I was taught the following construction. If X is a set and \sim is an equivalence relation on X, then one can define the set Q of equivalence classes for this equivalence relation. There is a natural map from X to Q sending x\in X to its equivalence class. We write Q=X/\sim.

All the way through my undergraduate career, when taking quotients, I imagined that this was what was going on. For example when forming quotient groups, or quotient vector spaces, or later on in my life quotient rings and quotient modules, I imagined the elements of the quotient to be sets. I would occasionally look at elements of elements of a quotient set, something rarely done in other situations. I would define functions from a quotient set to somewhere else by choosing a random representative, saying where the representative went, and then proving that the construction was ultimately independent of the choice of representative and hence “well-defined”. This was always the point of view presented to me.

I have only relatively recently learnt that actually, this model of a quotient as a set of equivalence classes is nothing more than that — it’s just a model.

The universal property of quotients

Here’s the universal property. Say X is a set equipped with an equivalence relation. A quotient for this data is a pair consisting of a set Q and a function p:X\to Q which is constant on equivalence classes (i.e. x_1\sim x_2\implies p(x_1)=p(x_2)) and which is furthermore initial with respect to that property. In other words, if T is any other set and f:X\to T is any function which is constant on equivalence classes, there exists a unique g:Q\to T such that f=g\circ p. The usual abstract nonsense argument shows that quotients are unique up to unique isomorphism.

Example 1) Let X be a set equipped with an equivalence relation \sim, let Q be the set of equivalence classes of X, equipped with the map p:X\to Q sending an element of X to its equivalence class. Then (Q,p) is a quotient of X by \sim.

Example 2) Let X and Y be any sets, and say p:X\to Y is a surjection. Define an equivalence relation on X by x_1\sim x_2\iff p(x_1)=p(x_2). Then (Y, p) is a quotient of X by \sim.

Example 2 shows that this construction of quotients using equivalence classes is nothing more than a model, and that there are plenty of other sets which show up naturally and which are not sets of equivalence classes but which are quotients anyway. The important point is the universal property. In contrast to localisation of rings, I know of no theorem about quotients for which the “equivalence class” model helps in the proof. The only purposes I can see for this “equivalence class” model now are (1) it supplies a proof that quotients do actually exist and (2) a psychological one, providing a “model” for the quotient.

I have had to teach quotients before, and students find them hard. I think they find them hard because some just basically find it hard to handle this whole “set whose elements are sets” thing. Hence even though the psychological reason was ultimately useful for me, and eventually I “got the hang of” quotients, I do wonder what we should be doing about the people who never master them. An alternative approach in our teaching is to push the universal property angle. I have never tried this. It might turn out even worse!

A worked example : maps from a quotient.

Here is the mantra which we hear as undergraduates and thus go on to feed our own undergraduates. The situation is this: we have some quotient object Q (e.g. a quotient ring or a quotient group or whatever) and we want to define a map g from this quotient to some other thing Z.

“Set of sets” variant:

The argument goes something like this:

“Recall that Q is a quotient, so its elements are really equivalence classes. We want to define a map from Q to Z, so let’s choose q\in Q. Now remember that really q is actually an equivalence class. Choose an element x of this equivalence class, and now apply a construction which seems to depend on x, giving us an element of Z [Note that this construction is just a function from X to Z, so let’s call it f]. That is our map from Q to Z. But now we need to check that this map is well-defined. This means the following: during this construction we did something “non-canonical”, namely choosing a random element x of q. We need to check that our “function” from Q to Z is in fact independent of this choice. So say that instead we had chosen y\in q. Then x and y are in the same equivalence class, so they are equivalent. Now an explicit computation shows that f(x)=f(y) and hence we’re OK — our function is well-defined.”

Is the student left wondering what the heck it means for a function to be “not well-defined”? How come nobody ever talks about any other kind of function before being “well-defined”? I thought the axiom of choice said that we can choose an element in each equivalence class all at the same time. How come we can’t just define Q\to Z by taking q, using our axiom of choice element x\in q and sending q to f(x)? Is that “well-defined”?

Universal property variant:

The argument now looks like this.

“Recall that Q is a quotient, so it satisfies the universal property of quotients. Recall that this says that to give a map from Q to another set Z, all we have to do is to give a function f:X\to Z which is constant on equivalence classes; the universal property then gives us a unique g:Q\to Z such that f=g\circ p. So let’s define f like this [and now define f], and now let’s check it’s constant on equivalence classes [the same calculation as before]. The universal property thus gives us the function g:Q\to Z which we require.”

Is that the way to teach this stuff to undergraduates? Is it more confusing, less confusing, or maybe just differently confusing?

Today’s workshop

Lean has quotients of equivalence relations built in. This is not particularly necessary; it is an implementation decision which did not have to be done like this. One can certainly make quotients as types of equivalence classes (and indeed this has been done in mathlib, with the theory of partitions). However Lean also has an opaque quotient function, which creates another model of a quotient; we don’t know what the elements are, but we know the universal property and this is all we need.

Today we will learn about quotients by working through the explicit construction of the integers as the quotient of \mathbb{N}^2 by the equivalence relation (a,b)\sim (c,d)\iff a+d=c+b, and a proof that it is a commutative ring. We will go on to play around a bit with the universal property of quotients, and finish by using abstract nonsense to construct a bijection from our quotient to Lean’s \mathbb{Z}.

There is far far too much material to do in one 2-hour workshop, but I was on a roll. As ever, the material is here, in src/week_7.

Appendix: why not just use equivalence classes?

Mathematicians use quotients everywhere, so it’s kind of interesting that they have their own type, but why not just use the type of equivalence classes? Lean can make that type, and it’s possible to prove all the API for it — I’ve done it. So why explicitly extend Lean’s type theory to add a new quotient type? The answer seems to be this. The universal property for quotients is that if p:X\to Q is a quotient of X by an equivalence relation \sim, then we have a bijection between the functions g:Q\to T and the \sim-equivariant functions f:X\to T. To build f from g is easy — just compose with p. To go the other way Lean has a function called quotient.lift f h, which spits out g given f and a proof h that f is \sim-equivariant (i.e. constant on equivalence classes). The claim that these constructions are inverse bijections boils down to the assertion that f = (quotient.lift f h) ∘ p, and the proof of this, remarkably, is rfl — it’s true by definition. This is what the baked-in quotient construction buys you. I used to think that this was really important (and indeed in the past I have claimed that this is a key advantage which Lean has over Coq). Now I am not so sure. It probably makes proofs a bit slicker occasionally — but nowadays I am less convinced by the idea of definitional equality in general — I’m happy to rewrite.

Posted in formalising mathematics course, M1F, undergrad maths | Tagged , | 7 Comments

Formalising Mathematics : workshop 6 — limits

I talked about limits of sequences of real numbers in workshop 3, but this week I come back to them, because this week we’ll see that the usual ten-line proof that if (a_n)_n\to\ell and (b_n)_n\to m then (a_nb_n)_n\to \ell m can be replaced by a two-line proof involving something called filter.tendsto_mul. So this week I’m going to talk about tendsto, but first I think it’s worth refreshing our memories about the useful mental picture of a filter as a generalised subset.

Reminder: filters are generalised subsets

Let X be a type (you can call it a set if you like). The type of subsets of X has a bunch of nice structure — there’s a partial order \subseteq, there are unions and intersections (both finite and infinite) and they satisfy a bunch of axioms. Back in the days of Isaac Newton, one particularly well-studied type was the type of real numbers. However, people had not quite worked out whether infinitesimals existed — infinitely small non-zero real numbers called things like dy and dx — and some people like Leibniz wanted to divide one of them by the other because they were discovering some new-fangled thing called calculus. By the 20th century, the experts had made their decision: there were no infinitesimally small nonzero real numbers, and that’s what the textbooks say today (other than Robinson’s book on non-standard analysis, but nobody uses that for teaching undergraduate calculus). However it was equally clear that infinitesimals provided a good picture.

A filter on X is a kind of “generalised subset” of X. Each subset S of X gives you a filter, called the principal filter 𝓟 S, and there are other filters too corresponding to slightly weirder things. For example, if X = ℝ then there’s a filter called 𝓝 0, the neighbourhood filter of 0, which should be thought of as containing 0 and all the infinitesimally small numbers. Just like usual subsets, these generalised subsets have a partial order, which we’ll call \leq, extending the partial order on usual subsets. In reality these filters are defined completely rigorously as some collection of usual subsets satisfying some axioms but we won’t go into these this week, we’ll just stick with the picture.

Pushing forward and pulling back subsets

Let’s stick with “usual” subsets for this section, but let’s throw in a second type Y and a function f : X → Y. The function gives us some kind of dynamics in the system — we can start using the function to move sets around. The most obvious way that a function can move a set around is via the image construction. Given a subset S of X, we can consider what a mathematician would call f(S), the image of S in Y, defined as the set of y in Y such that there exists x in S with f x = y. This is an abuse of notation — the inputs to f are supposed to be elements of X, not subsets of X, so f X does not make sense, and in Lean we carefully differentiate between these ideas by writing f '' S for the image of S in Y. We call this “pushing forward a subset along f“.

Conversely, if T : set Y then there is a way of “pulling T back along f” to give us a subset f^{-1}(T) of X, consisting of the x in X such that f x is in T. Again Lean has a weird notation for this, because ⁻¹ is taken by a general kind of inverse function on a group. So we write f⁻¹' T for this construction.

If set X denotes the type of subsets of X, then f : X → Y gives rise to functions f '' : set X → set Y and f⁻¹' : set Y → set X. Are these functions inverse to one another? No, not remotely! In general, doing one then the other won’t get you back to where you started. So what is the relationship between these two constructions? The fancy answer is that they form a Galois connection, and the even fancier answer is that they are a pair of adjoint functors. But let’s not go into this. Let’s talk about a fundamental predicate.

Let’s stay with the set-up: X and Y are types, and f : X → Y is a function. Say S is a subset of X and T is a subset of Y. Then we can ask ourselves the following true-false question: does f restrict to a function from S to T? In other words, is it true that x ∈ S implies f x ∈ T? Perhaps a good notation for this idea would be something like S\leq_f T. The reason this notation is appealing is that if S'\subseteq S\leq_f T then S'\leq_f T, and if S\leq_f T\subseteq T' then S\leq_f T', and this feels like some kind of transitivity statement, but it isn’t literally transitivity of some relation on a type, because S and T don’t in general have the same type — they’re subsets of different sets. How can we restate \leq_f using pushforwards or pullbacks?

If you think about it, it turns out that there is a way to state this relation using pushforwards, and an equivalent way using pullbacks. One can check easily that S\leq_f T is equivalent to f '' S ⊆ T and also to S ⊆ f⁻¹' T. In particular f '' S ⊆ T and S ⊆ f⁻¹' T are equivalent to each other (and we have proved that the functors are adjoint, for those of you who know this category-theoretic language).

Our aim is to find analogous constructions for filters.

Pushing forward and pulling back generalised subsets

Jordan Ellenberg on Twitter remarked that something I said last week reminded him of a fact about ideals. I really like this idea because if we’re going to think in pictures then it helps to understand analogies. When people were trying to understand factorization in integral domains such as the integers, they have to deal with the fact that 6=2\times3 and also 6=(-2)\times(-3) but that really these are “the same factorization”. This leads us to an equivalence relation on elements of an integral domain — two elements are equivalent if one is a unit times the other, where the units are divisors of 1. Equivalence classes of elements are the same as principal ideals of the ring, and one might have hoped that this idea solves all your factorization problems. But rings like \mathbb{Z}[\sqrt{-5}] teach us otherwise — now 6=2\times 3=(1+\sqrt{-5})\times(1-\sqrt{-5}) and these factorizations are still not equivalent. The fix was to introduce some magic “ideal numbers”, or “ideals” for short, which are not really numbers, but some kind of generalised number, and now every non-zero generalised number in \mathbb{Z}[\sqrt{-5}] factors uniquely into prime generalised numbers. The reason I am bringing this up is that it is not difficult to check that every ideal of a commutative ring is uniquely determined by the principal ideals which it contains (because it is uniquely determined by the elements it contains).

Filters, a.k.a. generalised subsets, have the reverse property: every filter is uniquely determined by the principal filters which contain it. This is an extensionality lemma for filters, and it is this idea which we need to keep in mind when we try to figure out how to push forward and pull back filters.

As ever, say X and Y are types, and f : X → Y is a function. Pushing forward filters along f (called map f : filter X → filter Y in Lean) is barely any harder than pushing forward subsets. Say F : filter X is a filter on X. Let’s figure out how to define its pushforward map f F, a filter on Y. By the remark above, it suffices to figure out which subsets T of Y, or more precisely which principal filters T, satisfy map f F ≤ T. If we want our intuition to be correct, this should be the case precisely when F f⁻¹' T, because this feels exactly like the situation F \leq_f T studied above. Hence we will define the pushforward map f F of filter F along f by saying that map f F ≤ T if and only if F f⁻¹' T, and one can check that this definition (of (map f F).sets) satisfies the axioms of a filter. This is one of the things you’ll be proving today in workshop 6.

Pulling back filters, called comap f : filter Y → filter X, is harder, because if G : filter Y then this time we want to solve comap f G ≤ S and the adjoint functor nonsense above only tells us about S ≤ comap f G, which is not enough information: for example the only nonempty subset of contained in the infinitesimal neighbourhood filter 𝓝 0 is the subset {0}, but 𝓝 0 is strictly larger than the principal filter 𝓟 0 because it also contains Leibniz’s elements dx. The one question we can answer heuristically however, is a criterion for comap f G f⁻¹' T, because if our mental model of comap f is “generalised f⁻¹'” then G T should imply this. The problem with just restricting to these Ts is that if f is not injective then we can never distinguish between distinct elements of X mapping to the same element of Y, and yet if comap f G f⁻¹' T and f⁻¹' T S then we certainly want comap f G ≤ S. So this is what we go for: comap f G ≤ S if and only if there exists a subset T of Y such that G T and f⁻¹' T S. It turns out that there does exist a filter on X satisfying these inequalities, and this is our pullback filter.

One can check, in exact analogy to pushing forward and pulling back subsets, that map f F ≤ G ↔ F ≤ comap f G — indeed, this is the boss level of Part A of today’s workshop.

filter.tendsto — what and why?

The picture behind filter.tendsto f is easy, indeed we have seen it before for sets. The idea is simply that if f : X → Y as usual, and F : filter X and G : filter Y then we can ask whether f restricts to a map from the generalised set F to the generalised set G. This true-false statement is called tendsto f F G in Lean. It is equivalent to map f F ≤ G and to F ≤ comap f G, and it seems to be pronounced “F tends to G along f“, although this notation does not often get pronounced because it seems to be rather rare to see it in the literature. It is used absolutely all over the place in Lean’s treatment of topological spaces and continuity, and it’s based on an analogous formalisation of topology in the older Isabelle/HOL proof system.

Why is this notion useful? Here is a beautiful example. Let us say now that X and Y are topological spaces, and f : X → Y is an arbitrary function. What does it mean to say that f is continuous at x, an element of X? The idea is that if x' is an element of X very close to x, then f x' should be very close to f x. How close? Let’s just say infinitesimally close. The idea is that f should send elements of X infinitesimally close to x to elements infinitesimally close to f x. In other words,

tendsto f (𝓝 x) (𝓝 (f x)).

It’s as simple as that. We want f to map an infinitesimal neighbourhood of x to an infinitesimal neighbourhood of f x. We say that f is continuous if it is continuous at x for all x in X. One can check that this is equivalent to the usual definition of continuity. The proof that a composite of continuous functions is continuous is just as easy using this filter language as it is in the usual open set language, and certain other proofs become easier to formalise using this point of view.

But the real power of the tendsto predicate is that it does not just give us a new way of thinking about continuity of a function at a point. It also gives us a new way of thinking about limits of functions, limits of sequences of real numbers, limits in metric space, and more generally essentially any kind of limit that comes up in an undergraduate degree — these concepts are all unified by the tendsto predicate.

The second Lean file in today’s workshop consists of an analysis of a two-line proof of the fact that the limit of the product of two real sequences is the product of the limits. Let me finish by talking about these two lines.

The first line observes that the definition is_limit a l of week 3 (the traditional \forall \epsilon>0, \exists N, n\geq N\implies |a(n)-l|<\epsilon) is equivalent to the rather slicker-looking tendsto a cofinite (𝓝 l), where cofinite is the cofinite filter (defined as the generalised subset C of such that C ≤ S if and only if S is cofinite). It is straightforward to see that these two predicates are equivalent: tendsto a cofinite (𝓝 l) is equivalent to saying that any subset of the reals whose interior contains l has pre-image under a a cofinite subset of the naturals, which is equivalent to saying that for any \epsilon>0, the number of n such that |a(n)-l|\geq\epsilon is finite.

The second line comes from the following standard fact about neighbourhood filters (which we will prove): if Y is a topological space equipped with a continuous multiplication, if A is any type, and if f and g are functions from A to Y and F is any filter on A, then tendsto f F (𝓝 y) and tendsto g F (𝓝 z) together imply tendsto (f * g) F (𝓝 (y * z)), where f * g denotes the map from A to Y sending a to f a * g a. We prove this result from first principles in the file, although of course it is in Lean’s maths library already, as is the fact that multiplication on the real number is continuous, which is why we can give a complete two-line proof of the fact that limit of product is product of limits in Lean.

Posted in Uncategorized | Leave a comment

Formalising Mathematics : workshop 5 — filters

The word “filter” means different things to different people; just to clarify, this week we’ll be learning about the mathematical notion of a filter on a set. I’ve written about these filters before, but since then I’ve managed to pick up a much better understanding of how to think about filters, and I hope this shows here. When I wrote that post in 2018 I knew that filters were “something to do with limits”, but now I realise that this is wrong. They are used to talk about limits, but what a filter itself is, is simply a generalisation of a subset of a set.

What is a filter?

Let X be a type, i.e. what most mathematicians call a set. Then X has subsets, and the collection of all subsets of X has some really nice properties — you can take arbitrary unions and intersections, for example, and if you order subsets of X by inclusion then these constructions can be thought of as sups and infs and satisfy a bunch of axioms which one might expect sups and infs to satisfy, for example if X_i\subseteq Y for all i in an index set then \bigcup_i X_i \subseteq Y. In short, the subsets of a set form what is known in order theory as a complete lattice.

A filter can be thought of as a kind of generalised subset of X. Every subset S of X gives rise to a filter on X, called the principal filter 𝓟 S associated to S, and we have 𝓟 S = 𝓟 T if and only if S = T. However if X is infinite then there are other, nonprincipal, filters F on X, which are slightly vaguer objects. However, filters still have an ordering on them, written F ≤ G, and it is true that S ⊆ T ↔ 𝓟 S ≤ 𝓟 T (indeed we’ll be proving this today). To give an example of a filter which is not principal, let’s let X be the real numbers. Then for a real number x there is a filter 𝓝 x, called the neighbourhood filter of x, with the property that if U is any open subset of \mathbb{R} containing x then 𝓟 {x} < 𝓝 x < 𝓟 U. In other words, 𝓝 x is some kind of “infinitesimal neighbourhood of x“, strictly bigger than {x} but strictly smaller than every open neighbourhood of x. This is a concept which cannot be formalised using sets alone, but can be formalised using filters.

The formal definition of a filter.

Let me motivate the definition before I give it. Say F is a filter. Let’s define F.sets to be the subsets of X which contain F, i.e., the S such that F ≤ 𝓟 S . Here is a property of filters which I have not yet mentioned: If two filters F and G satisfy F.sets = G.sets, then F = G; in other words, a filter is determined by the principal filters which contain it. This motivates the following definition: why not define a filter F to be the set of subsets of X which contain it? We will need some axioms — what are reasonable axioms? We don’t want a filter to be bigger than X itself, and we want to make sure that if S contains F then T contains F for any T ⊇ S; finally if both S and T contain F then we want S ∩ T to contain F. That’s the definition of a filter!

structure filter (α : Type*) :=
(sets                   : set (set α))
(univ_sets              : set.univ ∈ sets)
(sets_of_superset {x y} : x ∈ sets → x ⊆ y → y ∈ sets)
(inter_sets {x y}       : x ∈ sets → y ∈ sets → x ∩ y ∈ sets)

A filter on X, or, as Lean would like to call it, a term F : filter X of type filter X, is a collection F.sets of subsets of X satisfying the three axioms mentioned above. That’s it. Unravelling the definitions, we see that a sensible definition of F ≤ G is that G.sets ⊆ F.sets, because we want G ⊆ S to imply F ⊆ S (or, more precisely, we want G ≤ 𝓟 S to imply F ≤ 𝓟 S).

It’s probably finally worth mentioning that in Bourbaki, where this concept was first introduced, they have an extra axiom on their filters — they do not allow 𝓟 ∅ to be a filter — the empty set is not a generalised set. In this optic this looks like a very strange decision, and this extra axiom was dropped in Lean. Indeed, we bless 𝓟 ∅ with a special name — it is , the unique smallest filter under our ordering. The (small) advantage of the Bourbaki convention is that an ultrafilter can be defined to literally be a minimal element in the type of all filters, rather than a minimal element in the type of all filters other than . This would be analogous to not allowing a ring R to be an ideal of itself, so one can define maximal ideals of a ring to be the maximal elements in the set of all ideals of the ring. However this convention for ideals would hugely break the functoriality of ideals, for example the image of an ideal along a ring homomorphism might not be an ideal any more, the sum of two ideals might not be an ideal, and so on. Similarly, we allow to be a filter in Lean, because it enables us to take the intersection of filters, pull filters back and so on — it gives a far more functorial definition.

What’s in today’s workshop?

The material this week is in week_5 of the formalising-mathematics GitHub repo which you can download locally if you have leanproject installed or, if you have the patience of a saint and don’t mind missing some of the bells and whistles, you can try online (Part A, and Part B). NB all this infrastructure didn’t just appear by magic, I wrote the code in the repo but I had nothing to do with all these other tricks to make it easier for mathematicians to use — we have a lot to thank people like Patrick Massot and Bryan Gin-ge Chen for.

In Part A we start by defining principal filters and we make a basic API for them. I give a couple more examples of filters too, for example the cofinite filter C on X, which is all the subsets of X whose complement is finite. This filter is worth dwelling on. It corresponds to a generic “every element of X apart from perhaps finitely many” subset of X, perhaps analogous to a generic point in algebraic geometry. However, there exists no element a of X such that 𝓟 {a} ≤ C, because X - {a} is a cofinite subset not containing a. In particular, thinking of filters as generalised subsets again, we note that whilst a generalised set is determined by the sets containing it, it is definitely not determined by the sets it contains: indeed, C contains no nonempty sets at all.

In Part B we go on to do some topology. We define neighbourhood filters and cluster points, and then talk about a definition of compactness which doesn’t involve open sets at all, but instead involves filters. I am still trying to internalise this definition, which is the following:

def is_compact (S : set X) := ∀ ⦃F⦄ [ne_bot F], F ≤ 𝓟 S → ∃ a  ∈ S, cluster_pt a F

In words, a subset S of a topological space is compact if every generalised non-empty subset F of S has closure containing a point of S.

Let’s think about an example here. Let’s stick to S = X. Say S is an infinite discrete topological space. Then the cofinite filter is a filter on S which has no cluster points at all, meaning that an infinite discrete topological space is not compact. Similarly imagine S is the semi-open interval (0,1]. Then the filter of neighbourhoods of zero in \mathbb{R}, restricted to this subset (i.e. just intersect all the sets in the filter with (0,1]), again has no cluster points, so this space is not compact either. Finally let’s consider \mathbb{R} itself. Then the at_top filter, which we will think about in Part A, consists of all subsets T of \mathbb{R} for which there exists some r\in\mathbb{R} such that (r,\infty)\subseteq T. This “neighbourhood of +\infty” filter has no cluster points in \mathbb{R} (note that +\infty would be a cluster point, but it’s not a real number). Hence \mathbb{R} is not compact either. We have certainly not proved here that this definition of compact is mathematically equivalent to the usual one, but it is, and if you’re interested, and you’ve learnt Lean’s language, you can just go and read the proof for yourself in Lean’s maths library.

The boss level this week is, again, that a closed subspace of a compact space is compact. But this time we prove it with filters. As last time, we prove something slightly more general: if X is any topological space, and if S is a compact subset and C is a closed subset, then S ∩ C is compact. Here’s the proof. Say F is a nonempty generalised subset (i.e. a filter) contained in S ∩ C. By compactness of S, F has a cluster point a in S. But F is contained in C, so all cluster points of F are cluster points of C, and the cluster points of C are just the closure of C, which is C again. Hence a is the element of S ∩ C which we seek. No covers, no finite subcovers.

Posted in formalising mathematics course, Imperial, undergrad maths | Tagged , | 5 Comments

Formalising Mathematics : workshop 4

OK, an overview of this week: we’re doing topology. I was going to introduce filters but I decided to put them off for one more week, so this week it’s topology the way it is traditionally taught in mathematics departments. The two main goals are:

  1. Continuous image of a compact space is compact;
  2. Closed subspace of a compact space is compact.

As I say, we will be doing the traditional approach, so the thing we will use about a compact space is that every open cover has a finite subcover.

Here then are sketches of the proofs, which should be enough for a mathematician:

Theorem. If S is a compact space, if f is a continuous function defined on S, and if T=f(S), then T is compact.

Proof. Say we have an open cover of T by sets U_i; we seek a finite subcover. Let V_i denote the preimage of U_i under f. Then the V_i are open and cover S. Choose a finite subcover of V‘s; then the corresponding U‘s cover f(S). QED.

Theorem. A closed subspace of a compact space is compact.

Proof. Say S is compact and C is a closed subset. Then S \ C is open. Cover C by opens U_i; we seek a finite subcover. Now the U_i and S \ C are opens which cover S, so we can choose a finite subcover consisting of finitely many of the U_i and S \ C which cover S and, a forteriori, cover C. But S \ C doesn’t contribute to this cover, so we have found our finite subcover of C. QED.

The main goal this week is to formalise these proofs. The formalisations are mostly straightforward, given what we know, but may well take a beginner some time. I don’t know if it’s helpful or disspiriting to hear that when writing up the solutions I just sat down and worked straight through these in 5-10 minutes.

Next week, we will be doing topology again, but in a completely different way, using filters. One thing about the filter proofs is that they are shorter to formalise than the traditional proofs. But we’ll talk about filters next week, let’s stick to this week for now.

In order to formalise these traditional proofs, it will help us to know some more API (i.e. theorems in the library, for example “a set is closed if and only if its complement is open”), and a couple more tactics. We will also need to know about the ways Lean thinks about arbitrary unions (i.e., unions of possibly infinitely many things). We have (very briefly) seen unions of two things, back in week 1, but here we need to consider arbitrary covers by open sets, so the union of two things isn’t enough. So let’s talk about how Lean does unions. And before we do that, let’s remind ourselves about how sets work.

Sets in Lean.

Lean’s concept of an abstract “collection of stuff” is a type, rather than a set. When a mathematician writes “a group is a set equipped with a binary operation…”, what they really mean is “a group is a collection of things equipped with a binary operation…”. In type theory, a collection of things is known as a type, not a set, and the things themselves are known as terms of the type, not elements of the set. This is nothing more than a change in nomenclature really. So, if Lean uses type theory and not set theory, then why are we talking about sets at all?

Well, even though Lean doesn’t have the abstract concept of a set as a random collection of stuff, it does have the concept of a subset of a type. This means exactly what you think it means: for example, the natural numbers are a type, and the even numbers, prime numbers, counterexamples to the Goldbach conjecture etc are all examples of subsets of this type. If we have a collection of subsets of a type then we can take unions and intersections and so on, to make more subsets of that type. We can make things like the even numbers and the prime numbers into types, and then these would be called subtypes rather than subsets, but we would then lose the ability to take unions and intersections.

In some sense, set theory is more powerful than type theory, because in set theory you can just go crazy and take unions of random things. If G is a group, then you can take the union of G, the real number \pi, and the Riemann zeta function, because all of these objects are sets. This is something which you cannot do in type theory. Whether you think this makes type theory less powerful than set theory or a lot saner than set theory is a matter for your own conscience, but we will not be discussing these foundational issues here. We will be discussing topological spaces, and if X is a type then a topological space structure on X and the axioms which come with it fortunately only talk about unions and intersections of subsets of X, so we will not need to take the union of X with other random mathematical things.

The type of subsets of X is called set X. Why not subset X? I guess what’s going on is that if S : set X (i.e. S is a term of type set X, otherwise known as a subset of X) then S is a set of elements of X. In fact here’s another reason. A topology on X is a set of subsets of X called the open sets, satisfying some axioms, and so in Lean the data of a topology is given by a term C : set (set X). Humans typically say “a set of subsets of X“. If we’d called it subset X then C : subset (subset X) would be a subset of the type of subsets of X, which is more of a mouthful. Anyway, set X it is; let’s continue.

A word of warning: in type theory, terms can have exactly one type. If x is a term and x : A and x : B (i.e. x has type A and type B) then necessarily A = B. So this presents us with a difficulty. If X : Type is a type and S : set X is a subset, and if x : X is a term of type X which is an element of the subset S, then surely we want x : S (representing the fact that x is an element of S), and yet I’ve just explained that this is not allowed unless S = X, which of course will not in general be true. But in fact even more is false — the assertion x : S does not even make sense, because S is not a type! It is a term, of type set X. And S = X does not make sense either! The subset of X corresponding to all the terms of X is called univ : set X. So how the heck are we going to express the true-false statement that x is an element of S? Well, S is a set, not a type, so we can just use set notation! x ∈ S is the assertion that the term x : X is an element of the subset S : set X.

Three types of union.

There are three ways to do unions in Lean! Four if you count binary unions — if S T : set X then the usual S ∪ T notation is the union of S and T. But here we are concerned with arbitrary unions — the union of a collection of subsets of X.

Union

Let’s start with set.Union. This is a function: let’s take a look at its type. We can do this by typing #check set.Union in Lean and then hovering over set.Union. We see

set.Union : (ι → set X) → set X

Here X and ι are types. So we see that set.Union takes in a function from ι to the subsets of X (let’s call this function F), and it outputs a subset of X; this subset is of course the union of the F i as i runs through the terms of ι, but we can’t see that from the type alone, we would have to use #print set.Union to see this, or we can just read the docstring, which is also visible when you hover over set.Union. If you #check @set.Union (note the @, which means “display all the inputs which Lean will figure out by itself”) you’ll see something like this:

set.Union : Π {X : Type v} {ι : Sort x}, (ι → set X) → set X

telling us that the first two inputs are in squiggly brackets so Lean will not be asking the user for them, it will be figuring out them itself by looking at the type (i.e., the domain and range) of the input F. Technical note: that Sort means that ι can, as well as a Type, also be a Prop. More on this later.

Now set.Union is a definition, so it needs an interface, or an API. What do we need to know about this definition in order to use it? Well, in some sense, we only need to know one theorem: we need to know when is an element in the union of a bunch of sets. This is

set.mem_Union : x ∈ set.Union F ↔ ∃ (i : ι), x ∈ F i

If, like me, you are sick of typing set in front of every function or theorem about sets, we can just open set at the beginning of our file, and then this theorem is magically just called mem_Union, and the union of the sets indexed by the function F is just called Union F. Actually we can do better — there is notation for Union. If F : (ι → set X) then Union F can be written as ⋃ (i : ι), F i. You will find that the most common usage of mem_Union is rewriting x ∈ ⋃ (i : ι), F i into ∃ (i : ι), x ∈ F i, which you can then make progress with using cases (if it’s a hypothesis) or use (if it’s a goal).

sUnion

The next kind of union I’ll talk about (although we won’t use it this week so you can skip straight to bUnion if you like) is set.sUnion, or just sUnion if you have the set namespace open. Here our arbitrary collection of subsets of X is just a set of them, rather than a collection indexed by a type ι. We’ve seen this before — in Lean this looks like C : set (set X) — a set of sets of elements of X, or a set of subsets of X if you like. An example would be the open sets in a topology on X, or, as we’ll see next week, the sets which comprise a filter on X. We take the union of these sets using set.sUnion.

sUnion : set (set X) → set X

Again this a definition (it has an input {X} as well, but Lean will figure it out once it sees C), again it has notation (⋃₀ C is sUnion C) and again a definition needs an API, the key theorem of which is

mem_sUnion : x ∈ ⋃₀ C ↔ ∃ (S : set X) (H : S ∈ C), x ∈ S

In words — a term x : X is in the union of all the sets of C if and only there exists a subset S of X which is in C and for which x ∈ S.

bUnion

There is a third kind of union, which is actually quite helpful in practice. This is when we have an index type ι and F : ι → set X, but we only want to take the union of F i for all i in a subset Z of ι. There seems to be no definition of set.bUnion itself, but we have notation ⋃ i ∈ Z, F i and of course the key theorem:

theorem mem_bUnion_iff {Z : set ι} {F : ι → set X} {x : X} :
  x ∈ (⋃ i ∈ Z, F i) ↔ ∃ i ∈ Z, x ∈ F i

Note that for some reason it’s called mem_bUnion_iff rather than mem_bUnion, which is odd because Lean is usually extremely anally retentive about consistency like this. It turns out that mem_bUnion is just the right to left implication of this.

We finish with a technical note, which can be omitted on first reading. This bUnion stuff is actually just a union over a union, once you realise that in type theory true-false statements can be thought of as types, and proofs as their terms. This is why ι is allowed to be a Prop as well as a Type. Here are two proofs of the same result (one direction of mem_bUnion_iff). The first uses mem_bUnion_iff and is unsurprising. The second is a more peculiar proof (I give the full Lean code so you can try it at home):

import data.set.lattice

variables (X : Type) (ι : Type) {Z : set ι} {F : ι → set X} {x : X}

open set

-- proof using mem_bUnion_iff
example (h : x ∈ (⋃ i ∈ Z, F i)) : ∃ i ∈ Z, x ∈ F i :=
begin
  rw mem_bUnion_iff at h,
  exact h,
end

-- proof using mem_Union
example (h : x ∈ (⋃ i ∈ Z, F i)) : ∃ i ∈ Z, x ∈ F i :=
begin
  -- h : x ∈ ⋃ (i : ι) (H : i ∈ Z), F i
  rw mem_Union at h,
  -- h : ∃ (i : ι), x ∈ ⋃ (H : i ∈ Z), F i
  cases h with i hixF,
  -- hixF : x ∈ ⋃ (H : i ∈ Z), F i
  rw mem_Union at hixF,
  -- ∃ (i_1 : i ∈ Z), x ∈ F i
  cases hixF with hiZ hx,
  -- hx : x ∈ F i
  -- ⊢ ∃ (i : ι) (H : i ∈ Z), x ∈ F i
  use [i, hiZ],
  exact hx,
end

The first proof uses mem_bUnion_iff ; the second one emulates it with mem_Union. You can see in the second proof that ⋃ i ∈ Z, F i is unfolded to ⋃ (i : ι) (H : i ∈ Z), F i so it is really a union over two things. First a union over all i : ι, and second a union over all proofs that i ∈ Z ! This is a Prop but we’re taking a Union over it anyway. If i ∈ Z then this union is a union over one element, and if i ∉ Z then it’s a union over no elements, so things work out in the end.

In the “warm-up” part A for this week, we try some basic things with infinite unions.

Theorems you will need this week

I can’t guess the exact direction you will go, but here are some of the theorems I used in my formalisation of the topology proofs:

  • continuous.is_open_preimage says that the preimage of an open set is open.
  • subset_def says S ⊆ T ↔ ∀ x, x ∈ S → x ∈ T
  • compact_iff_finite_subcover' says that a space is compact iff every open cover has a finite subcover. Note the ' at the end of the statement! Without the ' you will get a different notion of finiteness.
  • is_open_compl_iff says that a set has open complement iff it’s closed.
  • finite.preimage says that the preimage of a finite set under an injective map is finite. Note that if you use the unprimed compact_iff_finite_subcover then you will end up with a finset instead, and that’s another story.

I often find these theorems by half-guessing the name and then pressing ctrl-space. Or I use library_search.

Some tactics

Here are some tactics I’ve not mentioned before, but which I use in the model solutions.

  • change : If the goal is ⊢ P, and Q is a proposition which is equal by definition to P, then change Q will change the goal to ⊢ Q. This can be helpful if you want to use rw on a goal but it’s not quite in the form you want.
  • rwa h just means rw h, assumption — it’s an abbreviation. Recall the assumption tactic tries to close the goal by going through all the hypotheses X in the tactic state and trying exact X. That proof of one direction of mem_bUnion_iff could have been done in one line with rwa mem_bUnion_iff at h.
  • contradiction : if there are two hypotheses in the tactic state, h1 : P and h2 : ¬ P then the contradiction tactic will close the goal immediately (by doing exfalso, apply h2, exact h1).
  • rcases : This is cases on steroids — it will take things apart into more than two pieces. It is very useful for some of this topology stuff. For example if you have a hypothesis h : ∃ (i : ι) (H : i ∈ F), x ∈ V i then rcases h with ⟨i, hiF, hxV⟩ immediately extracts i, and names the proofs of i ∈ F and x ∈ V i both at once. With cases you would have to run the tactic twice.

option

The last new thing I want to talk about in detail this week is the option X type. The tl;dr version (all you need to know) is explained in the worksheet, but here is a more in-depth discussion. Those of you familiar with Haskell or some other functional languages will know this as the maybe X type, and some languages call it optional X. Despite the fancy name, this type is very easy to describe: it is X and some extra element, called none in Lean (and sometimes called other things in other languages). It can be used in Lean to do things like a one point compactification of a topological space.

We don’t need to know the definition, but I’ll show it anyway:

inductive option (X : Type)
| some (x : X) : option
| none : option

The option X type has two constructors: if x : X then there’s some x, which is a term of type option X corresponding to x : X, and then there’s a second constructor none, which returns a term of type option X which has no corresponding term of type X — it’s the “extra term”. Note that if x : X then it is not true that x : option X. Distinct types are disjoint in Lean’s type theory. The term of type option X corresponding to x is some x. The function some : X → option X is invisible if you think of option X as “X plus another element”. If you think of some as a function then it is injective; you’re going to need this. Here’s a theorem which implies it:

option.some_inj : some a = some b ↔ a = b

Let me end by saying a little bit about what happens when Lean sees that definition of option X above. After processing the definition, Lean puts four new constants into the system. The first is option, a function which eats a type X and spits out a type option X. You might feel that we have just “defined” option above, but that’s not actually how it works: option has no definition, it is a function which cannot be evaluated in any meaningful sense. However it has an API. The second and third new constants in the system are some, a function from X to option X with no definition, and none, a term of type option X with no definition.

The fourth new constant is something called a recursor. A recursor is a way of defining functions from an inductive type. For example let’s say we want to build a function F from the naturals to, say, the reals, by induction (or more precisely, by recursion). We want as input data a “start” real number (which will be F 0) and a method which given a natural n and a real number r (which will be F n), returns another real number r' (which will be F (n + 1). The recursor for the natural numbers is a function which takes these things as input, and then returns the function F.

The recursor for option, called option.rec, takes as input a term (y0 : Y) and a function f : X → Y, and returns the function F : option X → Y which sends none to y0 and some x to f x. Again this recursor has no definition, but it does satisfy the theorems F none = y0 and F (some x) = f x, and the proofs are both refl — they are true by definition.

From these new constants and these two facts about the recursor, we have enough to be able to prove all the other theorems about option, for example the fact that some is injective. I was surprised about this — the recursor is a very powerful tool. I talk about it more in this blog post, but here I will just finish by explaining the proof of injectivity of some using the recursor. Say a b : X and we know some a = some b. Let’s use the recursor for option to define a function from option X to X. It sends none to a, and let’s use the identity function f : X → X for our function f above. The recursor then spits out a function F : option X → X satisfying F (some a) = a and F (some b) = b. But some a = some b, and hence a = b.

option is a very simple example of a monad. If you really want to go nuts, you can try proving this. But you might have more fun trying the topology theorems.

Posted in formalising mathematics course, Learning Lean, tactics, undergrad maths | Leave a comment

Formalising mathematics : workshop 3

This week we’re going to do limits of sequences, of the kind you see in a 1st year analysis course. These are great fun to do in Lean. Because of Rob Lewis’ linarith tactic (which does the “and now this inequality clearly follows” parts of the arguments), the arguments feel to me like they run very close to what we would write on paper as undergraduates, which is a sign that the system is mature enough for use (for example in teaching). The problem sheet is now in the repo or I guess you could even try it online (although it will take a while to start up — you’re better off installing Lean). This week there are 11 sorrys. I was initially planning on doing a bunch more stuff too, however I realised when preparing this class that there are a whole bunch of tricks which I know, which makes the arguments come out smoothly, but I didn’t teach them yet, so I am envisaging students finding these questions harder than I found them! I will start by going through the maths we’re doing, and will then explain a bunch of the tricks.

Limits of sequences

A sequence a_0, a_1, a_2,\ldots in Lean can simply be encoded as a function a : ℕ → ℝ. You can think of a : ℕ → ℝ as saying “a is a function from the naturals to the reals” (the idea being that a(37) in the function notation represents a_{37} in the sequence). Here’s what’s actually going on with this notation. In Lean, the type ℕ → ℝ as the type of all functions from the naturals to the reals! This point didn’t dawn on me for a long time so let me spell it out: when Lean says ℕ → ℝ it is talking about a type, and it’s the type which mathematicians would call Hom(\mathbb{N},\mathbb{R}), the set of all functions from the naturals to the reals. Then the notation a : ℕ → ℝ just means that a is a term of this type, i.e. an element of the hom set. Note that this is one of the few places in traditional mathematics where it is common to use a colon to denote element of a set, or term of a type, or however you want to think about it.

Lean must have the definition of the limit of a sequence, right? Sure! In fact it has the definitions of limits of a sequence, limit of a function f(x) as x tends to a, as x tends to a from above, and also as x tends to +\infty and to -\infty. In fact it has such a general notion of a limit that we’re going to need an entire workshop to understand it properly — it’s the predicate of a filter tending towards another filter along a map. But to understand filters, you have to understand unions and intersections of sets, and I have not covered these properly. So I propose doing sets and filters next week, and this week let’s just roll our own definition of a limit.

-- Don't ask me why this doesn't come as standard in Lean
notation `|` x `|` := abs x

/-- `l` is the limit of the sequence `a` of reals -/
definition is_limit (a : ℕ → ℝ) (l : ℝ) : Prop :=
∀ ε > 0, ∃ N, ∀ n ≥ N, | a n - l | < ε

I do not need to tell you what this definition says, you can read it yourselves. This is one of the advantages of Lean being compatible with unicode. Time and again I have heard computer scientists telling me that this whole “Lean does unicode” thing is an irrelevant gimmick. Time and again I have heard mathematicians telling me that it really makes a difference to them. It’s making a difference right now — I don’t have to tell you what that definition says. Lean even figures out the types of all the variables automatically. In order for everything to make sense, a n (the functional programming way of writing a(n)) must make sense, and because a takes a natural number as input, n must be a natural number. Because n ≥ N must make sense, N must be a natural number too. And a n - l is a real number, so | a n - l | must be a real number, so ε must be a real number.

In the worksheet we’ll work through the following proofs:

  1. Limit of a constant sequence is the constant.
  2. Limits are unique (if they exist).
  3. Sum of two limits is limit of sums.
  4. Product of two limits is limit of products.
  5. If a_n\leq b_n for all n then the same is true for the limits (assuming they exist)
  6. The sandwich theorem: if a_n \leq b_n \leq c_n and the limits of both a_n and c_n exist and equal \ell, then the limit of b_n is also \ell.

I will do the first two, so you can see the techniques. The remaining four, and also seven other things, are your job. The proofs are just the proofs which you saw as undergraduate mathematicians, and you will hopefully find the formalisation relatively straightforward once you’ve internalised the tricks, which I explain within the copious comments in the lean file. Let me go through some of them here.

λ notation for functions

This sometimes intimidates mathematicians. Here’s how it works. We mathematicians say “Let f be the squaring function, sending x to x^2. If we don’t want to name f explicitly we might say “consider the squaring function x\mapsto x^2“. A computer scientist might say instead “consider the squaring function λ x, x ^ 2“. That’s all the lambda is — it’s just “\mapsto” but written at the beginning instead of the middle.

The one thing you need to know is how to deal with things like (λ x, x ^ 2) 37. Here we have made this anonymously-named function, and then we’re evaluating it at 37. How do you get Lean to simplify this to 37 ^ 2? The answer is dsimp only, a tactic which does some basic definitional tidying up.

I drone on about this notation at great length in the comments in this week’s problem sheet.

Squiggly brackets {a} in function inputs

In Lean, a proof is a function. Let’s say we have managed to prove my_cool_theorem : ∀ (a : ℝ), 0 < a → 0 < 2 * a. This proof is a function. It takes two inputs, firstly a real number a and secondly a proof that 0 < a, and then it spits out a proof that 0 < 2 * a. If you’re prepared to believe that theorems can be thought of as sets, and proofs as their elements, or more precisely that theorems can be thought of as types, and proofs as their terms, then it makes sense to think about proofs involving implications as functions, and this is an example.

But let’s think for a minute: where does this function go from and to? It’s a function of two variables so it should be of the form X \times Y \to Z. Clearly X is the real numbers. And then Y is the set of proofs that 0 < a. But wait a minute, what is a here? It’s the element of X which we chose as our first input! So actually something slightly fishy is going on — Y (the type of our second input) actually depends on the element of the set X which we chose (or more precisely the term of type X). If you know some geometry or topology, you can see that actually the source of this function is not a product \mathbb{R}\times Y, it’s more like the total space of a bundle on \mathbb{R}, where the fibre above a is the type of proofs that 0 < a, and this fibre moves as we move around \mathbb{R}. If you don’t want to think about it in this fancy way, just understand that the source of the function is sort of X\times Y, but the type Y depends the term of type X.

This is not a big deal foundationally, of course we can make the source space as the disjoint union of the types 0 < a as a varies and then think of our proof as a function on this space. But here’s another funny consequence. If we know the second input of the function, i.e. the element of Y, then by looking at its type (namely the true-false statement 0 < a) we can actually figure out the first input of the function (namely a). In particular, we are kind of wasting the user’s time asking them to give the first input, when we can just work it out from the second input. Here’s a concrete example. If h : 0 < 37 and we want to use my_cool_theorem to prove that 0 < 2 * 37 then right now we’re going to write my_cool_theorem 37 h. But that 37 input could have been worked out by the system because, given that the second input is h, the number 37 is the only possible first input that makes everything make sense — or, to use a technical term — makes everything typecheck. Lean’s type unification system, or unifier, is the system which checks all of this (it’s a big chunk of C++ code in core Lean which I have never looked at — I just know what it does rather than how it does it), and the trick we can do here is to make the unifier fill in that first input for us. So instead of my_cool_theorem we can define

my_cooler_theorem : ∀ {a : ℝ}, 0 < a → 0 < 2 * a

The squiggly {a} bracket input means “Ok so this a is an actual input to the function, but the unifier is going to supply it, so the user doesn’t have to”. And now if h : 0 < 37 then to prove 0 < 2 * 37 we can just use the term my_cooler_theorem h.

While we’re here, I’ll note that the square bracket inputs [] that you sometimes see mean “OK so this is an actual input to the function, but the type class inference system is going to supply it, so the user doesn’t have to”. The type class inference system is another system whose job it is to supply inputs to functions, but let’s not talk about this here (it’s another big chunk of C++ code and again I’ve never looked at it, I know nothing about C++). The thing you need to remember is that, when hovering over a function and trying to figure out what inputs it needs, your job as a user is to supply the ones in () round brackets, and Lean will supply the other inputs using logic that you don’t need to worry too much about.

Some tactics you’ll find helpful: specialize, linarith, ring, convert

specialize: A function f : X → Y → Z can be thought of as a function which takes two inputs (a term of type X and a term of type Y) and spits out a term of type Z. This is because X → Y → Z means X → (Y → Z). If you have a term x : X and are pretty sure that you only want to ever evaluate f at x then you may as well fix x as the first input and let f just denote the resulting function Y → Z. This can be done with specialize f x.

linarith: If you have hypotheses hab : a ≤ b and hbc : b < c and your goal is a < c then sure there will be some function in the library called something like lt_of_le_of_lt which does it for you, but you can just type linarith and it will all happen automatically. The linarith tactic should be able to deal with anything involving linear algebra and arithmetic, but note that it will not do non-linear goals like proving a*b>0 from a>0 and b>0. For this you need the nonlinear version nlinarith.

ring: If your goal is something which can be proved from the axioms of a commutative ring (or semiring), e.g. like (x+y)^2=x^2+2*x*y+y^2, then the ring tactic will solve it. Note that the ring tactic does not look at any hypotheses — if you need them, you’re going to have to rewrite them manually first (or write a groebner_basis tactic).

convert: If you have a hypothesis which is pretty much equal to your goal, but there’s just some random small subterm which is a bit different, then convert might be a good way to proceed. For example if your goal is ⊢ a ≤ b ∧ b ^ 2 + 3 ≤ c and you have a hypothesis h : a ≤ b ∧ b * b + 3 ≤ c but you don’t know the name of the theorem saying b^2=b*b so you can’t apply rewrite which will change b^2 to b*b then you can just convert h, and the goal will change to ⊢ b ^ 2 = b * b, which can be solved with ring.

Random API which will be useful

abs is a definition, so has an API. Three lemmas in it which might be useful:

abs_pos : 0 < |a| ↔ a ≠ 0
abs_mul x y : |x * y| = |x| * |y|
abs_add x y : |x + y| ≤ |x| + |y|

And ring does not deal with division. A couple of things from the division API:

div_pos : 0 < a → 0 < b → 0 < a / b
lt_div_iff : 0 < c → (a < b / c ↔ a * c < b)
lt_div_iff' : 0 < c → (a < b / c ↔ c * a < b)

Oh, and while we’re here, the moment you start on division you have to start splitting into cases depending on whether the denominator is zero or not: Lean has made a design decision to allow x / 0 to make sense, but there are no theorems about it, so rather than division by zero giving an error it just gives a term you can’t use (think of it as “a random real about which we don’t know anything”). So knowing by_cases hc : c = 0 is a handy tactic trick — this splits into two cases depending on whether c=0 or c\not=0.

Appendix: do like Insta and use filters

Every definition in Lean comes with a cost. Last week we saw some of this cost. We didn’t use Lean’s inbuilt group definition, we rolled our own, and then we had to write a bunch of lemmas before it was usable.

This week we’ve done the same — we’ve rolled our own is_limit definition and have had to prove a bunch of theorems about it. However this week it’s possible to link our definition to Lean’s own far more high-powered definition of a limit, using its tendsto predicate, which is a predicate on two filters and a map. Here are a bunch of two-line proofs of things we’ve been doing today, using Lean’s filter API (so the content of the proofs has not magically disappeared, it is just being done by invoking general theorems from mathlib and using things like the fact that addition is a continuous function on the reals):

import week_3.Part_A_limits

import topology.instances.real

open filter

open_locale topological_space

namespace xena

-- `is_limit` is equivalent to a `filter.tendsto`
lemma is_limit_iff_tendsto (a : ℕ → ℝ) (l : ℝ) :
  is_limit a l ↔ tendsto a at_top (𝓝 l) :=
begin
  rw metric.tendsto_at_top,
  congr',
end

-- this is `is_limit_add`
example (a b : ℕ → ℝ) (l m : ℝ) : is_limit a l → is_limit b m → is_limit (a + b) (l + m) :=
begin
  repeat {rw is_limit_iff_tendsto},
  exact tendsto.add,
end

-- this is `is_limit_mul`
example (a b : ℕ → ℝ) (l m : ℝ) : is_limit a l → is_limit b m → is_limit (a * b) (l * m) :=
begin
  repeat {rw is_limit_iff_tendsto},
  exact tendsto.mul,
end

end xena

I will talk more about these filters at_top (neighbourhoods of infinity on the naturals) and 𝓝 l (neighbourhoods of l in the reals) next time. I write about filters here and will say more about them next week.

Posted in formalising mathematics course, M1P1, undergrad maths | 1 Comment

Formalising mathematics : Workshop 2

This is some notes on the second workshop in my Formalising Mathematics course, running as part of the EPSRC TCC. The Lean github repo is here.

Groups and subgroups

I start with an apology — there was far too much material last week. I’m still getting the hang of this. Hopefully this week there is a more manageable amount. There is probably still more than can be done in two hours.

In workshop 1 we filled in sorrys, and most of the time the definitions we worked with were things like injective, which is a straightforward “one-liner” definition. Here is the definition of injective from core Lean:

/-- A function `f : α → β` is called injective if `f x = f y` implies `x = y`. -/
@[reducible] def injective (f : α → β) : Prop := ∀ ⦃a₁ a₂⦄, f a₁ = f a₂ → a₁ = a₂

Apart from the @[reducible] tag and the slightly weird brackets, there are no surprises here: it is a one-line Lean definition, with a docstring (a string of text describing in human terms what the definition is about).

There was one more complex definition last week, which was the definition of a partition on a set, but I don’t think anyone got that far anyway, so let’s ignore it and talk about what we’re doing this week, which is to:

  1. define the concept of a group.
  2. make a basic API for groups.
  3. learn how to train Lean’s simplifier simp to prove simple theorems about groups.
  4. define the concept of a subgroup of a group.
  5. make a basic API for subgroups (specifically focussing on the subgroup generated by a subset).

Again I have no feeling for how far we will get. I’d like to think that most people will make it as far as 3 in the two hours we have. I had been using Lean for about a year before it dawned on me what the simplifier was for, so 3 is a nice target.

The material is in the src/week2 directory of the formalising-mathematics github repository.

Defining a group.

We start by discussing the material in src/week_2/part_A_groups.lean. The mathematical definition we will formalise is the following. A group structure on a type G (a type is just Lean’s word for a set) is two collections of things. Firstly, it is the following three pieces of data:

  1. A function mul taking two elements of G and returning an element of G, with notation g * h.
  2. A function inv from G to G, with notation g⁻¹.
  3. A constant one in G, with notation 1.

Secondly, it is the following three assumptions, or axioms, or proofs, or functions, or however you want to think about them:

  1. mul_assoc : ∀ (a b c : G), a * b * c = a * (b * c))
  2. one_mul : ∀ (a : G), 1 * a = a
  3. mul_left_inv : ∀ (a : G), a⁻¹ * a = 1

Internally these are represented in Lean as functions. For example one_mul is a function which takes as input an element a of G (or, as Lean would call it, a term a of type G — it’s the same thing) and spits out a proof that 1 * a = a.

In Lean the full definition in the repo looks like this:

class group (G : Type) extends has_mul G, has_one G, has_inv G :=
(mul_assoc : ∀ (a b c : G), a * b * c = a * (b * c))
(one_mul : ∀ (a : G), 1 * a = a)
(mul_left_inv : ∀ (a : G), a⁻¹ * a = 1)

Now mathematicians themselves don’t really know what the definition of a group is. Some people say that there are two more axioms mul_one : ∀ (a : G), a * 1 = a and mul_right_inv : ∀ (a : G), a * a⁻¹ = 1. Others say that these last two “axioms” can be deduced from the three we have already, so they are not axioms, they are theorems. It’s clear that the issue is not a mathematical one, but an infrastructure one. If we put more axioms in, then it will be easier to prove things about groups, but harder to prove that various things are groups. Given that I am assuming my audience is mathematically mature (this is not supposed to be a first introduction to group theory!) I am going to go for a minimal axiom approach, and the first thing we shall do is to prove mul_one and mul_right_inv. To do this we will need to prove a couple more lemmas first — or, at least, this is the way I have set things up. Feel free to experiment if you think you know a slicker way of proving these things — all that matters is that we get the two “missing axioms”. Here is the route I have taken in the Lean files:

mul_left_cancel : ∀ (a b c : G), a * b = a * c → b = c
mul_eq_of_eq_inv_mul {a x y : G} : x = a⁻¹ * y → a * x = y
mul_one (a : G) : a * 1 = a
mul_right_inv (a : G) : a * a⁻¹ = 1

In some sense, this is mathematically the hardest part of setting up a basic theory of groups, because not being allowed to use mul_one and mul_right_inv until we’ve proved them is rather unlike the way mathematicians usually deal with groups. It rather reminds me of when the Tokay pinch all of Link’s belongings on Crescent Island in Oracle of Ages.

Making an API

Once we have achieved the obvious goal of proving mul_one and mul_right_inv, the question is: what to do next. For reasons which will become apparent later, our next goal is to prove the following theorems (none of which are too hard at this point):

inv_mul_cancel_left : a⁻¹ * (a * b) = b
mul_inv_cancel_left : a * (a⁻¹ * b) = b
inv_mul : (a * b)⁻¹ = b⁻¹ * a⁻¹
one_inv : (1 : G)⁻¹ = 1
inv_inv : (a⁻¹)⁻¹ = a

You might want to take some time staring at the names of these results and noting that they are not just random computer science gobbledegook, there is method in the madness (in particular you should be able to figure out what mul_one and one_mul say without looking!) There are also a bunch of other random things which you can prove if you want, such as a⁻¹ = b⁻¹ ↔ a = b and so on — Lean’s maths library mathlib has many many basic statements about groups. The reason we have focussed on these five results above is that, if you put them together with mul_one, mul_right_inv and the axioms defining a group, then you get a confluent rewrite system! What is that, you ask?

The simplifier

Lean’s simplifier simp has a database of so-called “simp lemmas”, lemmas of the form A = B or A ↔ B. Given a random goal, simp will attempt to spot A‘s in it, and will proceed to replace them with B‘s. If you give it the wrong lemmas, this can be a disaster. For example if you put a = 1 * a in the database then the simplifier will start to replace elements g in goals with 1 * g, and then with 1 * (1 * g) and so on. However, putting 1 * a = a into the database is a really good idea, because cancelling a “1” is typically a sensible move in the game of mathematics. Tagging all of the ten lemmas above with @[simp] adds them to the simplifier’s database, and a theorem of Knuth and Bendix then tells us that the simplifier will then put every element of every group into some kind of unique canonical form (if you know about the theory of reduced words in free groups, it’s this, but with added care about exactly where the brackets are). In practice this means that the simp tactic is now guaranteed to solve silly questions like showing (a * b) * 1⁻¹⁻¹ * b⁻¹ * (a⁻¹ * a⁻¹⁻¹⁻¹) * a = 1.

Subgroups

If G is a type with a group structure (i.e., a group), then there is a type subgroup G of subgroups of G. The Lean definition is in Part_B_subgroups.lean in src/week_2 and it looks like this:

/-- A subgroup of a group G is a subset containing 1
and closed under multiplication and inverse. -/
structure subgroup (G : Type) [group G] :=
(carrier : set G)
(one_mem' : (1 : G) ∈ carrier)
(mul_mem' {x y} : x ∈ carrier → y ∈ carrier → x * y ∈ carrier)
(inv_mem' {x} : x ∈ carrier → x⁻¹ ∈ carrier)

In words, to give a subgroup of G is to give four pieces of information: a subset of G (a.k.a. a term of type set G) called the carrier, and then three proofs that it is closed under the three pieces of data used to define a group — i.e., proofs that the subset contains one and is closed under mul and inv.

Optional paragraph: One slightly weird thing to keep in mind, if you’re into the whole foundational stuff, is that in contrast to the way things are usually set up in type theory, a subgroup of G is a term, not a type. This is a bit weird because usually the idea is that types are sets, and terms are elements of the sets. So how come a subgroup of G is a term? It’s because in type theory every term has exactly one type, so if we have a subset X ⊆ G then we can’t have the same 1 being of type X and of type G. The way we have set things up, we will have 1 : G (so 1 is still a term of type G) but 1 ∈ X (so we will use the set-theoretic notation — is a relation which takes a term g of type G (an element of G) and a term X of type set G (a subset of G) and spits out a true-false statement (the statement g ∈ X).

We start with some slightly tedious infrastructure (for example extending so that it applies to subgroups). I was tempted to skip this, but doing it gives you some idea of the “cost” of a definition. We prove random trivial stuff such as two subgroups are equal if and only if they have the same elements, and that subgroups are partially ordered by inclusion.

Closure of a subset

The main mathematical work we do in the file is to set up the theory of the subgroup generated by a subset. There are two ways to do this — a “bottom-up” way where we start with a set S and then throw in the identity and close up under multiplication and inverse. This is certainly possible in Lean, using inductive propositions, but I decided to do the “top-down” approach, where the subgroup generated by S is defined to be the intersection of the subgroups of G which contain S. We prove the following theorems (here ↑H is the underlying subset of G corresponding to the subgroup H of G):

subset_closure (S : set G) : S ⊆ closure S
closure_mono {S T : set G} (hST : S ⊆ T) : closure S ≤ closure T
closure_le (S : set G) (H : subgroup G) : closure S ≤ H ↔ S ⊆ ↑H
closure_closure (S : set G) : closure S = closure (closure S)
lemma closure_self {H : subgroup G} : closure ↑H = H

These results are enough to prove that closure is a closure operator, that closure S is indeed the smallest subset of S containing S and closed under the group structure maps (1, * and ⁻¹), and furthermore that closure and form a Galois connection. From these few lemmas the “machine can take over” — for example we can deduce for free that subgroups form a complete lattice and hence the 100 or so theorems proved in Lean about complete lattices can all now be applied to subgroups for free.

Again I suspect that there is far far too much for a 2-hour workshop.

Posted in formalising mathematics course, undergrad maths | Leave a comment

Formalising mathematics : workshop 1

Here’s information about workshop 1 of the 8 workshops in my formalising mathematics course.

Mathematical background

At Imperial I lecture part of a standard “introduction to proof” course, a compulsory course for every beginning maths undergraduate. In my lectures we cover the basics of logic, sets, functions and equivalence relations. The idea is that the students see some abstract definitions (such as injective functions) and then some simple theorems (such as the composite of two injective functions is injective) and then try some problem sheet questions where they practice their skills at putting basic proofs together on paper.

My formalising course is for more advanced students (on paper it’s for PhD students but next year there’s a rumour that I will be teaching it to final year undergraduates), so I can assume that they know how to prove that the composite of two injective functions is injective in the traditional way. The goal of this workshop is to get students proving statements at this kind of level of difficulty, in Lean.

What I will assume

  • You have installed Lean 3 and the community tools e.g. by following the instructions on the Lean community website.
  • You know the pen-and-paper proofs of the results we’re talking about (e.g. that equivalence relations on a set are the same as partitions).
  • (optional but helpful): you’ve had a go at the natural number game so you know the basics of what it’s all about.

The workshop

Start by installing the course repository by following the instructions in the README on that page. NB if you skipped the Lean 3 installation part, then you can still play along by using the links on the README, but you will have a degraded (slower) Lean experience and I cannot guarantee that the links will continue to compile correctly as Lean and mathlib evolve; on the other hand, the course repository will always compile.

[ Note for nerds: you can’t just clone the repo. You have to install it correctly using leanproject as explained in the README if you want access to Lean’s maths library (which you will need for some of the questions). ]

Open the repository using VS Code as explained in the README (you must use the “open folder” functionality of VS Code to open the entire repo, don’t just open a file or else again you won’t have access to the maths library), and find your way to the Lean files in src/week1.

[note for people who have not actually downloaded and installed Lean or anything — you can still play along with this part, you can click here and then wait for a while until Lean stops saying “Lean is busy…” and switches to “Lean is ready!”. You’ll have to allow cookies I think, and you won’t get the colours, but you gets what you pays for here]

Open the first of the Lean files — part_A_logic.lean. You will see some sorrys in the file — they’re in red (and there are corresponding little orange dots in the area where the scrollbar is). Let’s take a look at the first sorry in this file — it’s on line 68. In this puzzle, P is a proposition, which is a computer-science word for a general true-false statement (in particular, P might not be true: 2+2=4 and 2+2=5 are both propositions in Lean). The puzzle is to prove that P\implies P. The sorry indicates that the proof is incomplete — it tells Lean that this proof is omitted right now.

Let’s figure out how to prove this theorem in Lean.

Your first Lean proof.

Here is the maths proof that we’re going to formalise. We are trying to prove that if P is any true-false statement, then P implies P. One proof of this is a boring “truth table” proof, where we just check the two cases where P is true and P is false. You can do this in Lean using the tauto! tactic, which will in fact solve pretty much all of the problems in this first logic file. But let’s solve it “constructively”, as the computer scientists call it.

Here’s the mathematical proof we’re going to formalise. Our goal is to show that P implies P. So let’s start by assuming that P is true. Our goal now changes into a new goal: we now have to prove that P is true. But this is exactly one of our assumptions, so we are done.

Let’s step through how this looks in Lean. We have our file on the left, and another window called the “infoview” on the right, which displays the tactic state (the goal and the hypotheses — i.e., the state of Lean’s brain). Right now the tactic state should look like this:

This means that P is a Proposition, and the goal (the statement after the sideways T) is to prove P\implies P. Now, back in the Lean file, under the -- Prove this using intro and exact comment, type intro hP, so it looks like this:

Don’t forget the comma. Just after you’ve typed the comma, look at the infoview. It has changed — it now looks like this:

We now have a Proposition called P, and a new hypothesis hP which is the hypothesis that P holds, i.e., that P is true. The goal has changed too — our goal now is to prove that P is true.

Note that hP is just a name. We could instead of typed intro Nigel and then Lean’s infoview would display Nigel : P . Beginners can get confused about the difference between P and hP; I will say more about the difference later on, although I could make the following cryptic remark now that P is what is called a Type, and hP is a term of that type. We’ll come back to this. Let’s concentrate on finishing the proof right now.

Our goal is now to show that P is true. But this is exactly one of our assumptions, namely hP. So we can finish the proof by adding exact hP, under the intro hP tactic, like this:

If we take a look at the infoview just after typing that comma, we see that a good thing has happened:

You just proved a theorem in Lean! But actually there is still something wrong: you can see that the sorry on the left now has a red underline on it. Clicking on sorry shows you the text tactic failed, there are no goals to be solved. in the infoview. We are apologising unnecessarily. Delete the sorry, the error goes away, and now the proof is complete. If you look down at the bottom left hand corner of VS Code you will now see something like this:

That means that there are no errors in your file, and 39 warnings. Each theorem whose proof still has one or more sorrys in generates a warning, indicating that you cheated. You should always try and keep your files so that they have no errors, and your job in the workshop is to try and decrease the number of warnings, by filling in sorrys with proofs.

The difference between P and hP

If I were to say to you “What is Fermat’s Little Theorem?” then you would probably tell me that it’s the statement that a^p is congruent to a mod p, if p is a prime and a is an integer. If you were writing a proof and you were working mod p and had a number a^p, you could replace it with a and justify this in your work by saying you were using Fermat’s Little Theorem.

This is the way mathematicians speak. But something I noticed only after I started formalising is that here they are using the phrase “Fermat’s Little Theorem” in two different ways. When asked what the theorem is, the response above is a description of the statement of the theorem. But in the middle of a proof, if you want to invoke it, then you are really using the proof of the theorem. If I were to ask you what the Riemann Hypothesis was, then you might state it, but this doesn’t mean that you can just say “now by the Riemann Hypothesis” in the middle of a proof, at least not if you want your proof to be complete, because in the middle of a proof you are only allowed to use stuff which is proved in your arguments, not just stuff which is stated.

We saw hP : P above. One model of what is going on here is that P is the statement of the proposition we’re thinking about, and hP is the proof. I see beginners writing things like exact P in their proofs. But In the proof we have to use exact hP because we need to use the proof, not the statement. In fact what we are doing in this proof is constructing a function which sends proofs of P to proofs of P. Lean is a functional programming language, and under the hood the way Lean understands a proof of the form P\implies Q is as a function, or an algorithm, which takes as input a proof of P and returns as output a proof of Q. If you want to learn more about how Lean’s type theory models theorems and proofs, you can try this other blog post — it’s not too difficult — but you don’t really need to know about this stuff to get something out of this course.

A second proof.

Let me go through one more proof. Let’s prove that P and P\implies Q together imply Q. The sorryed proof is around like 98 or so in the file (it might have moved a bit if you typed in the earlier proof). It looks like this:

The first surprise is that there is no “and” here. The way I have formalised this statement in Lean is like this: P\implies ((P\implies Q)\implies Q). In words, P implies that (deep breath) P\implies Q implies Q. This is of course logically the same as saying that P and P\implies Q imply Q. Also worth noting is that the Lean formalisation has a disconcertingly small number of brackets in. This is because the symbol in Lean is right associative, which is just a fancy way of saying that if you see P → Q → R it means P → (Q → R).

Our goal is hence of the form “P implies something” so we can start with intro hP, (or intro Nigel or whatever you want to call the assumption that P is true). The goal now changes to the something, which is “(P implies Q) implies something”, so we can continue with intro hPQ, and now our tactic state looks like this:

We have two propositions, P and Q, and hypotheses hP (saying that P is true) and hPQ (saying that P implies Q). So, exactly as advertised, we know P and P\implies Q, and we want to prove Q. Next we need to learn a new tactic, the apply tactic.

apply is a difficult word. Humans apply all sorts of facts to deduce other facts all over the place in mathematics. In Lean, apply has a very precise meaning, and we are in precisely the situation here where we can use it. If you try apply hPQ, for your next move in this puzzle game, the tactic state changes to the following:

The difference: the goal has changed! Our goal used to be to come up with a proof of Q. But P implies Q, so if we apply this fact then we see that it suffices to come up with a proof of P instead. That is what the apply tactic does — it reduces our problem to a simpler one by applying an implication. You can only use the apply h tactic if h is an implication, or a series of implications, which ultimately imply your goal.

The rest is easy — exact hP, or even the assumption tactic (which closes any goal which happens to be one of your assumptions) will work.

Try the rest on your own.

The rest is up to you. I have put model solutions in the solutions directory. You will need to know a few more tactics than these three — I have explained them in the week 1 README. Thanks for coming to my course, and good luck!

Post workshop thoughts.

There was far too much material for a beginner to complete within 2 hours. Most people did not even finish Part A. In some sense, there was a lot of material designed to hammer home a few basic tactics again and again. I don’t think it matters if students don’t make it through the material. I’m going to press on next week and do basic group theory and subgroups.

Posted in formalising mathematics course | 2 Comments

Formalising mathematics: an introduction.

As part of the EPSRC Taught Course Centre I am giving a course on formalising mathematics. This is a course for mathematics PhD students enrolled at Imperial College London, Bristol, Bath, Oxford, or Warwick. No formalisation experience will be assumed. Details of the course are available at the second link above. I have been timetabled eight 2-hour lectures on Thursdays 4-6pm UK time, starting this coming Thursday, 21st Jan 2021.

My instinct in the first lecture would be to start by listing a bunch of reasons why learning how to formalise pure mathematics is interesting/useful/important/whatever, and perhaps also explaining how I got involved with it. But I could probably spend about 30 minutes on this, and I don’t want to waste valuable lecture time on it. In fact I won’t actually be giving lectures at all — the 2-hour slots will be mini Lean workshops, where beginners formalise mathematics they know, with me watching, and I cannot see the point of making the students listen to me waffle on about my opinions/history when, after all, they have chosen to come to the course anyway. So I’ve just decided to write the introduction here, and then students can choose to read it at their leisure (or not read it at all).

Note that I will also be posting notes for the eight lectures here on this blog, explaining the tactics and lemmas which students will need to do the problem sheets associated to each of the eight workshops. I am hoping that the course material itself can be used by other people if they want, e.g. by people wanting to teach Lean courses in other universities, or students wanting to self-study. But without more ado, let’s get on with the stuff which I have now decided not to say in lecture 1.

Formalising mathematics: Why?

Hello, and welcome to my course! I’m Kevin Buzzard, a lecturer at Imperial. For 20+ years I was a algebraic number theorist in the traditional sense, and my research work involved trying to prove new theorems in, broadly speaking, the Langlands Philosophy. Nowadays I work on something completely different — I am laying the foundations for teaching the Langlands Philosophy, and other aspects of serious modern mathematics, to computers. To make this switch, all I had to do was to learn a new language (a computer language) which was rich enough for me to be able to express my mathematical ideas in, and then it simply became a matter of explaining the mathematics I know and love to the computer; the computer checks the details of the proofs. In this course, we will be jumping right in. You will be actively learning this language, because you will be spending the vast majority of every lecture teaching basic undergraduate level mathematics (stuff you understand very well) to your own computer, and I or, hopefully, other students, will be helping you out when you get stuck. As the term progresses we might move onto harder mathematics — for example maybe we’ll do some MSc level algebra or algebraic geometry, maybe we’ll take a look at Patrick Massot’s sphere inversion project, or maybe we’ll explore the foothills of recent work of Clausen and Scholze. How far we get and what we’ll do in the last few lectures will depend very much on who is still attending the workshops after the first month.

A natural way to start such a course would be to spend half the first lecture explaining why I am investing my time giving such a course — what the point of it all is, and why I think this area is worth teaching. But such an explanation will eat into our precious time formalising time, so I decided to write it up here instead. In this introduction, which I won’t be formally reading out in the first lecture, I will explain the following:

  • What happened to make me change area, and some mistakes I made at that time;
  • Why I think it’s important to teach young people (i.e. you lot) about formalisation of mathematics;

The nature of research mathematics

As I explained, I used to be a research algebraic number theorist, working in a technical area. More and more, my work would rely on results of other people, and it was becoming increasingly the case that I did not know the proofs of these results. This is the normal procedure in many modern areas of pure mathematics, as many of you will come to realise (if you didn’t realise already) during your time as PhD students.

Around 2017-18 I had some kind of a midlife crisis and decided that I did not want to proceed in this manner any more. I was seeing more and more evidence that some of these results that my colleagues and I were using were either not completely proved, or were completely proved but the complete proofs were not in the literature and may well never be in the literature. I was becoming increasingly concerned that the literature was incomplete and, in places, incorrect. Worse, I had begun to feel that there was a certain amount of failure to uphold what I thought were reasonable academic standards in number theory, and talking to experts in other areas I discovered that similar things were happening in several other branches of pure mathematics. By “failure to uphold academic standards” I certainly do not mean to imply that people are intentionally cheating the system. What I mean is that I felt, at that time, that the system was not scaling. Papers are getting longer. Areas are becoming highly technical. The fact that a journal is sitting on a technical paper with only a small pool of potential expert referees, and needs a report from a busy person — and that busy person has to make a judgement call on the correctness of a paper despite not having time to check the details correctly — and this person was sometimes me — means that some stuff gets through the net. I know papers, in my area, which have errors in. This is not a big deal, because I know, for the most part. how to work around the errors. Traditionally this is regarded as “to be expected”, but by 2017 I had became unhappy with the status quo. Did you know that there are two papers (this one and this one) in the Annals of Mathematics, the most prestigious maths journal, which contain results that directly contradict each other, for example? No erratum was ever published for either paper, and if you chase up the web pages of the authors involved you will see that (at least at the time of writing) both Annals papers are still being proudly displayed on authors’ publication lists. The reasons for this are complex. The incorrect paper has got some groud-breaking ideas in, even if the main theorem is wrong. Journals seem to be unhappy to waste their time/pages publishing errata, and authors sometimes seem to be unhappy to waste their time writing them. The experts know what’s going on anyway, so why bother fixing stuff? Contrast this with the world of computer science, where bugs in programs are discovered and fixed, and the fixes are pushed so that future users of the programs will not run into the same problems.

Mathematicians tend to be smart people. How come they’re making errors?

Mathematicians think in pictures

I have a picture of the real numbers in my head. It’s a straight line. This picture provides a great intuition as to how the real numbers work. I also have a picture of what the graph of a differentiable function looks like. It’s a wobbly line with no kinks in. This is by no means a perfect picture, but it will do in many cases. For example: If someone asked me to prove or disprove the existence of a strictly increasing infinitely differentiable function f:\mathbb{R}\to\mathbb{R} such that f'(37)=0 and f''(37)<0 then I would start by considering a picture of a graph of a strictly increasing function (monotonically increasing as we move from left to right), and a second picture of a function whose derivative at x=37 is zero and whose second derivative is negative (a function with a local maximum). I then note that there are features in these pictures which make them incompatible with each other. Working with these pictures in mind, I can now follow my intuition and write down on paper a picture-free proof that such a function cannot exist, and this proof would be acceptable as a model solution to an exam question. My perception is that other working mathematicians have the same pictures in their head when presented with the same problem, and would go through roughly the same process if they were asked to write down a sketch proof of this theorem.

I also have a picture in my head of an overconvergent modular form defined on a neighbourhood of the ordinary locus on a p-adic modular curve. This picture informed several papers I wrote earlier this century with Richard Taylor, Frank Calegari, and others. I was once privileged to be invited to speak in the number theory seminar at Orsay in Paris, and Jean-Pierre Serre was in the audience. I drew one of these pictures of mine on the board and Serre interrupted! He asked what the picture meant. I had drawn a picture of a compact Riemann surface of genus 3 and was drawing discs and annuli on the Riemann surface. The picture was however supposed to represent a 1-dimensional p-adic manifold (a rigid analytic space in the sense of Tate). It was a representation of the argument I was explaining, but because the object I was actually working with was p-adic, the drawing in some sense bore essentially no relation to the actual mathematical object I was working with. However, my Annals of Mathematics paper with Taylor and my follow-up Journal of the AMS single-author paper (which I was lecturing on at the time) were all evidence that my way of thinking about things, the pictures in my head, really could be translated down into rigorous mathematics, even though they were in some sense meaningless. They were effective guides. My picture came with caveats, which I had a mental note of (for example there are all sorts of subtleties with the “topology” on a rigid analytic space, issues which were solved initially by Tate in the 60s using Grothendieck topologies, and nowadays there are other solutions). These subtleties were not displayed in the picture I’d drawn on the board in Orsay, but I was aware of them. In short, I knew “how far one could push the picture” in some sense — which bits of it to take seriously.

I once found what I thought was an error, when refereeing a paper (the author of which was an expert). I could see where the error was coming from. There was a certain object being studied where my picture of the object was more sophisticated than that of the expert writing the paper. The author claimed that something was obvious. I could not see why it was obvious, so I consulted another expert. This expert I consulted (whose picture, or perhaps I should say “understanding”, of the situation was the most sophisticated out of all three of us) said that the result which the author claimed to be obvious was almost certainly true, they said they knew of no reference, and sketched an argument which they were confident could be turned into a proof with a fair amount of work. I relayed a brief summary of these ideas back to the author in my report (requesting revision) and in the second version of the paper they still claimed the result was obvious and attached a note saying that they could not see what the fuss was about. I then sent a much longer report explaining the problems in far more detail. In the next version of the paper which I was sent, the entire section containing the problematic argument had been rewritten and the difficult-but-probably-true result was no longer needed. This is a great example of the unreasonable resilience of mathematics. The author knew several proofs of the main result in that section; I had spotted an issue with one, so they simply supplied another one. In my experience, this is the way human-written mathematics routes its way round most errors in preprints and papers. But occasionally we’re not so lucky, and unfixable errors make it through the system.

The system is broken!

You might conclude from this, and other stories you’ll encounter throughout your PhD studies, that the system is broken. I did, and in 2017-18 I started to seriously consider the idea that perhaps using computer proof checkers might help to fix the problem. I have lost count of the number of times people told me to read Thurston’s On Proof and Progress in mathematics; Thurston makes some extremely coherent arguments which one might interpret as being against formalisation of mathematics, because one naive interpretation of what formalisation is doing is that it is “removing the pictures”. Of course, one also has to remember that Thurston’s pictures were very accurate. Thurston was responding to an article by Jaffe and Quinn called Theoretical Mathematics, and anyone who reads Thurston’s paper should also read Jaffe–Quinn, to see what inspired it. The Jaffe–Quinn paper also makes some very good points. They, like me at that time, were concerned. In 2017 I had independently started to learn about computer proof checkers because I had become interested in their potential use as a teaching tool, and I had some grandiose ideas about how everything could be put together. I went on a “UK tour” in 2018, giving lectures at several serious UK universities called things like “Pure mathematics in crisis?” and I made many provocative statements in an attempt to get a dialogue going about these issues. I also invested a lot of my time into becoming an expert in using one particular computer proof checker, called Lean (other computer proof checkers are available, e.g. Coq, Isabelle, Metamath, Mizar, Agda, HOL 4, HOL Light and many other systems). With a team of undergraduates at Imperial we taught Lean what a scheme was (the idea discovered by Grothendieck which revolutionised algebraic geometry in the 1960s), and I quickly realised that modern proof checking systems were now capable of handling serious modern research mathematics. Could these systems somehow be used to fix our broken system?

The system is not broken!

However, I now believe that the system is not broken at all. Rather, this is just the nature of mathematics as done by humans. What I did not realise in 2017 was that mathematics has always been like this. Humans are fallable. Philosophers and historians of mathematics have put me right on this point. What I also did not realise in 2017 is that I was going to have to come to terms with the fact that perhaps 20 years of administration and childcare meant that I was no longer a top expert in the Langlands philosophy, and the fact that I now was having problems with the literature certainly did not mean that the top experts were. The top experts know where the problems are, and why some things which seem like problems are not problems. Furthermore, they would probably be happy to explain things to you, if you raise specific issues of concern, and thus give them a reason to take you seriously. Thank you very much to those that did this for me.

So…why formalise?

So, if formalising mathematics in a computer proof checker is not going to save pure mathematics, and if indeed pure mathematics does not even need saving, then why should a pure mathematician bother with using computer proof checkers at all? Well, the thing is, in stark contrast to my naive 2017 self, I have now seen what these systems can do, and it is now manifestly clear to me that these systems can change mathematics for the better. These systems can digitise mathematics. Why should we want to digitise mathematics? Is it not obvious to you? It will enable us to use it in different ways. It has the potential to change mathematics completely. Let me give you an analogy.

In 1992 I, very reluctantly, bought my first CD player. At that time I had hundreds and hundreds of vinyl records (those of you at Imperial might well have seen a bunch of them in my office), and each one I regarded as a thing of beauty. CDs were ugly (the artwork looked much less good), breakable (we had been told during the marketing campaigns that you could spread jam on them and they wouldn’t get damaged, but I had CDs which jumped like crazy and one of them had even started to rot in some weird way), and some people were arguing that the switch to digital meant that they did not sound as good as vinyl (although I will be honest and say that my ears were never good enough to tell the difference, at least when it came to the kind of noisy guitar rock and early breakbeat/drum and bass which I was listening to at that time). What I (and many others, I suspect) at the time had not realised was that the crucial switch was not from vinyl to CD, it was from analogue to digital. Nowadays, essentially all of the music which I consume is digital, even though essentially none of it is on CD’s. The switch to digital has made music more portable. It means that during lockdown I can access new music instantly, send YouTube links to my friends, and my son can create and manipulate music to make new kinds of sounds using his laptop. Digitising music was a game-changer. Essentially nobody really realised the extent of this in 1992, indeed at the time it just seemed to me to be a ploy by The Establishment to make me re-purchase music I had already paid for.

Digitising mathematics is going to be a game-changer in the same way. Digitising mathematics changes the way that it is consumed. Digitising mathematics turns it into a new kind of picture — some kind of directed acyclic graph of terms and functions in a type theory. This kind of picture is of no direct use to humans. Patrick Massot made a picture of the graph corresponding to a perfectoid space, and it is just, to quote A A Milne, a “confused noise”, at least for humans. However it is a picture which computer proof systems can understand very well. Unless you are living under a rock, you will know that artificial intelligence and machine learning are changing the world. IBM made a computer program which could beat a grandmaster at chess. Deepmind made a computer program which could beat a grandmaster at go. Proving theorems in pure mathematics is a very natural next step. When will computers start to compete with humans in this game? Not for a while yet. But this is starting to happen. However, AI works best with a database — and those databases are not yet there. Those that we have are deeply flawed for a number of reasons. For example, one mathematical database we have is a formalised contains a list of many many theorems about the smallest finite non-solvable group of odd order — a substantial collection of results which a machine learning program could learn from. The last entry in that database is the theorem that there is no finite non-solvable group of odd order. This is the famous Feit–Thompson theorem, formalised by a team of 20 researchers over a six year period. I ask then — what use is that database of theorems to an AI? Hundreds of theorems, some with quite technical proofs, about a group which does not exist. These theorems are very important for the proof — they are the proof — but are they important to an AI? The Gonthier et al work on the Feit–Thompson theorem is extremely important, because it is unquestionable evidence that the systems are ready to handle a 400 page proof, at least when when it is a proof which only involves low-level objects like finite groups. But to an AI researcher it seems to me that this database has problems, as it spends a lot of time developing a theory for an object which doesn’t exist. This is by no means a criticism of the Gonthier et al work! I do not know of a single database of mathematical theorems which I would consider remotely adequate for machine learning. Where is our analogue of the thousands of grandmaster games which IBM’s chess computer Deep Blue trained on? My comments are, at least implicitly I guess, a criticism of the mathematical community itself. Isn’t it about time that we supplied these AI experts with a database of theorems and conjectures about schemes, manifolds, automorphic representations, Shimura varieties and so on? Things which human mathematicians are working on in 2021? The systems are ready; this is what people like Gonthier and Hales have shown us.

Digitising mathematics is not just useful for the AI researcher. Digitising mathematics guarantees a basic level of accuracy, which is extremely helpful for the learning experience. I have still not forgotten the example sheet question I spent five hours on as an undergraduate, which asked to prove that if a topological space had various nice properties then it was metrizable. I finally gave up, went to my supervision, showed my supervisor the problem, and they instantly responded “Oh that’s not true at all! The Stone-Cech compactification of the natural numbers is a counterexample”. Having never heard of this object at the time, I felt rather cheated. I have been formalising my undergraduate problem sheets over the last couple of years, and “edge cases” such as how a claimed result might become untrue/meaningless when some of the variables take extreme values such as zero are now completely eliminated. I am not saying that this is a reason to learn how to formalise. I am however saying that once more mathematics is formalised, people will take what we have and begin to consider doing things like creating online interactive textbooks and problem sheets, where students will be able to solve problems perhaps in some “front end” language. These tools and utilities will begin to appear as more and more mathematics is digitised and the software becomes more and more normalised in mathematics departments.

Let me finish by going back to pictures. Thurston’s ideas of the importance of humans turning mathematics into pictures and then using their intuition to manipulate these pictures can be interpreted as an argument for the importance of fluidity in thought, and hence an argument against formalisation. But imagine arguing that coordinate geometry should not be used to study Platonic solids because making Platonic solids out of clay clearly teaches you more about them. For example, testing the hypothesis that unit tetrahedra can be used to tessellate 3-space is far more easily done with physical tetrahedra than with a pile of formulae satisfied by the coordinates of the vertices in 3-space. However, without 3-dimensional coordinates there would be no 3d computer modelling software Blender, and using Blender (if you know what you’re doing) it is also very easy to check that unit tetrahedra do not fit together in a neat way to fill 3-space. When Descartes came up with his coordinates, he was in no position to imagine Blender. Maybe we are in no position in 2021 to imagine the kind of ways which a computer can be used to show us how to visualise various mathematical objects. But if people like me start teaching people like you the basics of how this stuff works, perhaps you will start having your own insights about what can be achieved in the future.

You can’t stop progress. Undergraduate and MSc mathematics is going to be formalised, and then who knows what will happen. Maybe the statements of the theorems in the Stacks project will be formalised — we’re working on it. Maybe the proofs of some of the theorems in the Clausen-Scholze work on a new variant of topological spaces will be formalised — we have made a start, and every day (at least at the time of writing) people are talking about formalising condensed mathematics on the Lean Zulip chat. These projects indicate that these systems are capable of understanding modern mathematics. What will they do with this understanding? I don’t think any of us are in a position to guess. But I want to find out sooner rather than later, and this is why I want to show young people like you what is currently possible — so you can prepare to dream.

Posted in formalising mathematics course, Uncategorized | 19 Comments