Formalising Mathematics : workshop 5 — filters

The word “filter” means different things to different people; just to clarify, this week we’ll be learning about the mathematical notion of a filter on a set. I’ve written about these filters before, but since then I’ve managed to pick up a much better understanding of how to think about filters, and I hope this shows here. When I wrote that post in 2018 I knew that filters were “something to do with limits”, but now I realise that this is wrong. They are used to talk about limits, but what a filter itself is, is simply a generalisation of a subset of a set.

What is a filter?

Let X be a type, i.e. what most mathematicians call a set. Then X has subsets, and the collection of all subsets of X has some really nice properties — you can take arbitrary unions and intersections, for example, and if you order subsets of X by inclusion then these constructions can be thought of as sups and infs and satisfy a bunch of axioms which one might expect sups and infs to satisfy, for example if X_i\subseteq Y for all i in an index set then \bigcup_i X_i \subseteq Y. In short, the subsets of a set form what is known in order theory as a complete lattice.

A filter can be thought of as a kind of generalised subset of X. Every subset S of X gives rise to a filter on X, called the principal filter π“Ÿ S associated to S, and we have π“Ÿ S = π“Ÿ T if and only if S = T. However if X is infinite then there are other, nonprincipal, filters F on X, which are slightly vaguer objects. However, filters still have an ordering on them, written F ≀ G, and it is true that S βŠ† T ↔ π“Ÿ S ≀ π“Ÿ T (indeed we’ll be proving this today). To give an example of a filter which is not principal, let’s let X be the real numbers. Then for a real number x there is a filter 𝓝 x, called the neighbourhood filter of x, with the property that if U is any open subset of \mathbb{R} containing x then π“Ÿ {x} < 𝓝 x < π“Ÿ U. In other words, 𝓝 x is some kind of “infinitesimal neighbourhood of x“, strictly bigger than {x} but strictly smaller than every open neighbourhood of x. This is a concept which cannot be formalised using sets alone, but can be formalised using filters.

The formal definition of a filter.

Let me motivate the definition before I give it. Say F is a filter. Let’s define F.sets to be the subsets of X which contain F, i.e., the S such that F ≀ π“Ÿ S . Here is a property of filters which I have not yet mentioned: If two filters F and G satisfy F.sets = G.sets, then F = G; in other words, a filter is determined by the principal filters which contain it. This motivates the following definition: why not define a filter F to be the set of subsets of X which contain it? We will need some axioms — what are reasonable axioms? We don’t want a filter to be bigger than X itself, and we want to make sure that if S contains F then T contains F for any T βŠ‡ S; finally if both S and T contain F then we want S ∩ T to contain F. That’s the definition of a filter!

structure filter (Ξ± : Type*) :=
(sets                   : set (set Ξ±))
(univ_sets              : set.univ ∈ sets)
(sets_of_superset {x y} : x ∈ sets β†’ x βŠ† y β†’ y ∈ sets)
(inter_sets {x y}       : x ∈ sets β†’ y ∈ sets β†’ x ∩ y ∈ sets)

A filter on X, or, as Lean would like to call it, a term F : filter X of type filter X, is a collection F.sets of subsets of X satisfying the three axioms mentioned above. That’s it. Unravelling the definitions, we see that a sensible definition of F ≀ G is that G.sets βŠ† F.sets, because we want G βŠ† S to imply F βŠ† S (or, more precisely, we want G ≀ π“Ÿ S to imply F ≀ π“Ÿ S).

It’s probably finally worth mentioning that in Bourbaki, where this concept was first introduced, they have an extra axiom on their filters — they do not allow π“Ÿ βˆ… to be a filter — the empty set is not a generalised set. In this optic this looks like a very strange decision, and this extra axiom was dropped in Lean. Indeed, we bless π“Ÿ βˆ… with a special name — it is βŠ₯, the unique smallest filter under our ≀ ordering. The (small) advantage of the Bourbaki convention is that an ultrafilter can be defined to literally be a minimal element in the type of all filters, rather than a minimal element in the type of all filters other than βŠ₯. This would be analogous to not allowing a ring R to be an ideal of itself, so one can define maximal ideals of a ring to be the maximal elements in the set of all ideals of the ring. However this convention for ideals would hugely break the functoriality of ideals, for example the image of an ideal along a ring homomorphism might not be an ideal any more, the sum of two ideals might not be an ideal, and so on. Similarly, we allow βŠ₯ to be a filter in Lean, because it enables us to take the intersection of filters, pull filters back and so on — it gives a far more functorial definition.

What’s in today’s workshop?

The material this week is in week_5 of the formalising-mathematics GitHub repo which you can download locally if you have leanproject installed or, if you have the patience of a saint and don’t mind missing some of the bells and whistles, you can try online (Part A, and Part B). NB all this infrastructure didn’t just appear by magic, I wrote the code in the repo but I had nothing to do with all these other tricks to make it easier for mathematicians to use — we have a lot to thank people like Patrick Massot and Bryan Gin-ge Chen for.

In Part A we start by defining principal filters and we make a basic API for them. I give a couple more examples of filters too, for example the cofinite filter C on X, which is all the subsets of X whose complement is finite. This filter is worth dwelling on. It corresponds to a generic “every element of X apart from perhaps finitely many” subset of X, perhaps analogous to a generic point in algebraic geometry. However, there exists no element a of X such that π“Ÿ {a} ≀ C, because X - {a} is a cofinite subset not containing a. In particular, thinking of filters as generalised subsets again, we note that whilst a generalised set is determined by the sets containing it, it is definitely not determined by the sets it contains: indeed, C contains no nonempty sets at all.

In Part B we go on to do some topology. We define neighbourhood filters and cluster points, and then talk about a definition of compactness which doesn’t involve open sets at all, but instead involves filters. I am still trying to internalise this definition, which is the following:

def is_compact (S : set X) := βˆ€ ⦃F⦄ [ne_bot F], F ≀ π“Ÿ S β†’ βˆƒ a  ∈ S, cluster_pt a F

In words, a subset S of a topological space is compact if every generalised non-empty subset F of S has closure containing a point of S.

Let’s think about an example here. Let’s stick to S = X. Say S is an infinite discrete topological space. Then the cofinite filter is a filter on S which has no cluster points at all, meaning that an infinite discrete topological space is not compact. Similarly imagine S is the semi-open interval (0,1]. Then the filter of neighbourhoods of zero in \mathbb{R}, restricted to this subset (i.e. just intersect all the sets in the filter with (0,1]), again has no cluster points, so this space is not compact either. Finally let’s consider \mathbb{R} itself. Then the at_top filter, which we will think about in Part A, consists of all subsets T of \mathbb{R} for which there exists some r\in\mathbb{R} such that (r,\infty)\subseteq T. This “neighbourhood of +\infty” filter has no cluster points in \mathbb{R} (note that +\infty would be a cluster point, but it’s not a real number). Hence \mathbb{R} is not compact either. We have certainly not proved here that this definition of compact is mathematically equivalent to the usual one, but it is, and if you’re interested, and you’ve learnt Lean’s language, you can just go and read the proof for yourself in Lean’s maths library.

The boss level this week is, again, that a closed subspace of a compact space is compact. But this time we prove it with filters. As last time, we prove something slightly more general: if X is any topological space, and if S is a compact subset and C is a closed subset, then S ∩ C is compact. Here’s the proof. Say F is a nonempty generalised subset (i.e. a filter) contained in S ∩ C. By compactness of S, F has a cluster point a in S. But F is contained in C, so all cluster points of F are cluster points of C, and the cluster points of C are just the closure of C, which is C again. Hence a is the element of S ∩ C which we seek. No covers, no finite subcovers.

Posted in formalising mathematics course, Imperial, undergrad maths | Tagged , | 5 Comments

Formalising Mathematics : workshop 4 — topology

OK, an overview of this week: we’re doing topology. I was going to introduce filters but I decided to put them off for one more week, so this week it’s topology the way it is traditionally taught in mathematics departments. The two main goals are:

  1. Continuous image of a compact space is compact;
  2. Closed subspace of a compact space is compact.

As I say, we will be doing the traditional approach, so the thing we will use about a compact space is that every open cover has a finite subcover.

Here then are sketches of the proofs, which should be enough for a mathematician:

Theorem. If S is a compact space, if f is a continuous function defined on S, and if T=f(S), then T is compact.

Proof. Say we have an open cover of T by sets U_i; we seek a finite subcover. Let V_i denote the preimage of U_i under f. Then the V_i are open and cover S. Choose a finite subcover of V‘s; then the corresponding U‘s cover f(S). QED.

Theorem. A closed subspace of a compact space is compact.

Proof. Say S is compact and C is a closed subset. Then S \ C is open. Cover C by opens U_i; we seek a finite subcover. Now the U_i and S \ C are opens which cover S, so we can choose a finite subcover consisting of finitely many of the U_i and S \ C which cover S and, a forteriori, cover C. But S \ C doesn’t contribute to this cover, so we have found our finite subcover of C. QED.

The main goal this week is to formalise these proofs. The formalisations are mostly straightforward, given what we know, but may well take a beginner some time. I don’t know if it’s helpful or disspiriting to hear that when writing up the solutions I just sat down and worked straight through these in 5-10 minutes.

Next week, we will be doing topology again, but in a completely different way, using filters. One thing about the filter proofs is that they are shorter to formalise than the traditional proofs. But we’ll talk about filters next week, let’s stick to this week for now.

In order to formalise these traditional proofs, it will help us to know some more API (i.e. theorems in the library, for example “a set is closed if and only if its complement is open”), and a couple more tactics. We will also need to know about the ways Lean thinks about arbitrary unions (i.e., unions of possibly infinitely many things). We have (very briefly) seen unions of two things, back in week 1, but here we need to consider arbitrary covers by open sets, so the union of two things isn’t enough. So let’s talk about how Lean does unions. And before we do that, let’s remind ourselves about how sets work.

Sets in Lean.

Lean’s concept of an abstract “collection of stuff” is a type, rather than a set. When a mathematician writes “a group is a set equipped with a binary operation…”, what they really mean is “a group is a collection of things equipped with a binary operation…”. In type theory, a collection of things is known as a type, not a set, and the things themselves are known as terms of the type, not elements of the set. This is nothing more than a change in nomenclature really. So, if Lean uses type theory and not set theory, then why are we talking about sets at all?

Well, even though Lean doesn’t have the abstract concept of a set as a random collection of stuff, it does have the concept of a subset of a type. This means exactly what you think it means: for example, the natural numbers are a type, and the even numbers, prime numbers, counterexamples to the Goldbach conjecture etc are all examples of subsets of this type. If we have a collection of subsets of a type then we can take unions and intersections and so on, to make more subsets of that type. We can make things like the even numbers and the prime numbers into types, and then these would be called subtypes rather than subsets, but we would then lose the ability to take unions and intersections.

In some sense, set theory is more powerful than type theory, because in set theory you can just go crazy and take unions of random things. If G is a group, then you can take the union of G, the real number \pi, and the Riemann zeta function, because all of these objects are sets. This is something which you cannot do in type theory. Whether you think this makes type theory less powerful than set theory or a lot saner than set theory is a matter for your own conscience, but we will not be discussing these foundational issues here. We will be discussing topological spaces, and if X is a type then a topological space structure on X and the axioms which come with it fortunately only talk about unions and intersections of subsets of X, so we will not need to take the union of X with other random mathematical things.

The type of subsets of X is called set X. Why not subset X? I guess what’s going on is that if S : set X (i.e. S is a term of type set X, otherwise known as a subset of X) then S is a set of elements of X. In fact here’s another reason. A topology on X is a set of subsets of X called the open sets, satisfying some axioms, and so in Lean the data of a topology is given by a term C : set (set X). Humans typically say “a set of subsets of X“. If we’d called it subset X then C : subset (subset X) would be a subset of the type of subsets of X, which is more of a mouthful. Anyway, set X it is; let’s continue.

A word of warning: in type theory, terms can have exactly one type. If x is a term and x : A and x : B (i.e. x has type A and type B) then necessarily A = B. So this presents us with a difficulty. If X : Type is a type and S : set X is a subset, and if x : X is a term of type X which is an element of the subset S, then surely we want x : S (representing the fact that x is an element of S), and yet I’ve just explained that this is not allowed unless S = X, which of course will not in general be true. But in fact even more is false — the assertion x : S does not even make sense, because S is not a type! It is a term, of type set X. And S = X does not make sense either! The subset of X corresponding to all the terms of X is called univ : set X. So how the heck are we going to express the true-false statement that x is an element of S? Well, S is a set, not a type, so we can just use set notation! x ∈ S is the assertion that the term x : X is an element of the subset S : set X.

Three types of union.

There are three ways to do unions in Lean! Four if you count binary unions — if S T : set X then the usual S βˆͺ T notation is the union of S and T. But here we are concerned with arbitrary unions — the union of a collection of subsets of X.


Let’s start with set.Union. This is a function: let’s take a look at its type. We can do this by typing #check set.Union in Lean and then hovering over set.Union. We see

set.Union : (ΞΉ β†’ set X) β†’ set X

Here X and ΞΉ are types. So we see that set.Union takes in a function from ΞΉ to the subsets of X (let’s call this function F), and it outputs a subset of X; this subset is of course the union of the F i as i runs through the terms of ΞΉ, but we can’t see that from the type alone, we would have to use #print set.Union to see this, or we can just read the docstring, which is also visible when you hover over set.Union. If you #check @set.Union (note the @, which means “display all the inputs which Lean will figure out by itself”) you’ll see something like this:

set.Union : Ξ  {X : Type v} {ΞΉ : Sort x}, (ΞΉ β†’ set X) β†’ set X

telling us that the first two inputs are in squiggly brackets so Lean will not be asking the user for them, it will be figuring out them itself by looking at the type (i.e., the domain and range) of the input F. Technical note: that Sort means that ΞΉ can, as well as a Type, also be a Prop. More on this later.

Now set.Union is a definition, so it needs an interface, or an API. What do we need to know about this definition in order to use it? Well, in some sense, we only need to know one theorem: we need to know when is an element in the union of a bunch of sets. This is

set.mem_Union : x ∈ set.Union F ↔ βˆƒ (i : ΞΉ), x ∈ F i

If, like me, you are sick of typing set in front of every function or theorem about sets, we can just open set at the beginning of our file, and then this theorem is magically just called mem_Union, and the union of the sets indexed by the function F is just called Union F. Actually we can do better — there is notation for Union. If F : (ΞΉ β†’ set X) then Union F can be written as ⋃ (i : ΞΉ), F i. You will find that the most common usage of mem_Union is rewriting x ∈ ⋃ (i : ΞΉ), F i into βˆƒ (i : ΞΉ), x ∈ F i, which you can then make progress with using cases (if it’s a hypothesis) or use (if it’s a goal).


The next kind of union I’ll talk about (although we won’t use it this week so you can skip straight to bUnion if you like) is set.sUnion, or just sUnion if you have the set namespace open. Here our arbitrary collection of subsets of X is just a set of them, rather than a collection indexed by a type ΞΉ. We’ve seen this before — in Lean this looks like C : set (set X) — a set of sets of elements of X, or a set of subsets of X if you like. An example would be the open sets in a topology on X, or, as we’ll see next week, the sets which comprise a filter on X. We take the union of these sets using set.sUnion.

sUnion : set (set X) β†’ set X

Again this a definition (it has an input {X} as well, but Lean will figure it out once it sees C), again it has notation (⋃₀ C is sUnion C) and again a definition needs an API, the key theorem of which is

mem_sUnion : x ∈ ⋃₀ C ↔ βˆƒ (S : set X) (H : S ∈ C), x ∈ S

In words — a term x : X is in the union of all the sets of C if and only there exists a subset S of X which is in C and for which x ∈ S.


There is a third kind of union, which is actually quite helpful in practice. This is when we have an index type ΞΉ and F : ΞΉ β†’ set X, but we only want to take the union of F i for all i in a subset Z of ΞΉ. There seems to be no definition of set.bUnion itself, but we have notation ⋃ i ∈ Z, F i and of course the key theorem:

theorem mem_bUnion_iff {Z : set ΞΉ} {F : ΞΉ β†’ set X} {x : X} :
  x ∈ (⋃ i ∈ Z, F i) ↔ βˆƒ i ∈ Z, x ∈ F i

Note that for some reason it’s called mem_bUnion_iff rather than mem_bUnion, which is odd because Lean is usually extremely anally retentive about consistency like this. It turns out that mem_bUnion is just the right to left implication of this.

We finish with a technical note, which can be omitted on first reading. This bUnion stuff is actually just a union over a union, once you realise that in type theory true-false statements can be thought of as types, and proofs as their terms. This is why ΞΉ is allowed to be a Prop as well as a Type. Here are two proofs of the same result (one direction of mem_bUnion_iff). The first uses mem_bUnion_iff and is unsurprising. The second is a more peculiar proof (I give the full Lean code so you can try it at home):

import data.set.lattice

variables (X : Type) (ΞΉ : Type) {Z : set ΞΉ} {F : ΞΉ β†’ set X} {x : X}

open set

-- proof using mem_bUnion_iff
example (h : x ∈ (⋃ i ∈ Z, F i)) : βˆƒ i ∈ Z, x ∈ F i :=
  rw mem_bUnion_iff at h,
  exact h,

-- proof using mem_Union
example (h : x ∈ (⋃ i ∈ Z, F i)) : βˆƒ i ∈ Z, x ∈ F i :=
  -- h : x ∈ ⋃ (i : ΞΉ) (H : i ∈ Z), F i
  rw mem_Union at h,
  -- h : βˆƒ (i : ΞΉ), x ∈ ⋃ (H : i ∈ Z), F i
  cases h with i hixF,
  -- hixF : x ∈ ⋃ (H : i ∈ Z), F i
  rw mem_Union at hixF,
  -- βˆƒ (i_1 : i ∈ Z), x ∈ F i
  cases hixF with hiZ hx,
  -- hx : x ∈ F i
  -- ⊒ βˆƒ (i : ΞΉ) (H : i ∈ Z), x ∈ F i
  use [i, hiZ],
  exact hx,

The first proof uses mem_bUnion_iff ; the second one emulates it with mem_Union. You can see in the second proof that ⋃ i ∈ Z, F i is unfolded to ⋃ (i : ΞΉ) (H : i ∈ Z), F i so it is really a union over two things. First a union over all i : ΞΉ, and second a union over all proofs that i ∈ Z ! This is a Prop but we’re taking a Union over it anyway. If i ∈ Z then this union is a union over one element, and if i βˆ‰ Z then it’s a union over no elements, so things work out in the end.

In the “warm-up” part A for this week, we try some basic things with infinite unions.

Theorems you will need this week

I can’t guess the exact direction you will go, but here are some of the theorems I used in my formalisation of the topology proofs:

  • continuous.is_open_preimage says that the preimage of an open set is open.
  • subset_def says S βŠ† T ↔ βˆ€ x, x ∈ S β†’ x ∈ T
  • compact_iff_finite_subcover' says that a space is compact iff every open cover has a finite subcover. Note the ' at the end of the statement! Without the ' you will get a different notion of finiteness.
  • is_open_compl_iff says that a set has open complement iff it’s closed.
  • finite.preimage says that the preimage of a finite set under an injective map is finite. Note that if you use the unprimed compact_iff_finite_subcover then you will end up with a finset instead, and that’s another story.

I often find these theorems by half-guessing the name and then pressing ctrl-space. Or I use library_search.

Some tactics

Here are some tactics I’ve not mentioned before, but which I use in the model solutions.

  • change : If the goal is ⊒ P, and Q is a proposition which is equal by definition to P, then change Q will change the goal to ⊒ Q. This can be helpful if you want to use rw on a goal but it’s not quite in the form you want.
  • rwa h just means rw h, assumption — it’s an abbreviation. Recall the assumption tactic tries to close the goal by going through all the hypotheses X in the tactic state and trying exact X. That proof of one direction of mem_bUnion_iff could have been done in one line with rwa mem_bUnion_iff at h.
  • contradiction : if there are two hypotheses in the tactic state, h1 : P and h2 : Β¬ P then the contradiction tactic will close the goal immediately (by doing exfalso, apply h2, exact h1).
  • rcases : This is cases on steroids — it will take things apart into more than two pieces. It is very useful for some of this topology stuff. For example if you have a hypothesis h : βˆƒ (i : ΞΉ) (H : i ∈ F), x ∈ V i then rcases h with ⟨i, hiF, hxV⟩ immediately extracts i, and names the proofs of i ∈ F and x ∈ V i both at once. With cases you would have to run the tactic twice.


The last new thing I want to talk about in detail this week is the option X type. The tl;dr version (all you need to know) is explained in the worksheet, but here is a more in-depth discussion. Those of you familiar with Haskell or some other functional languages will know this as the maybe X type, and some languages call it optional X. Despite the fancy name, this type is very easy to describe: it is X and some extra element, called none in Lean (and sometimes called other things in other languages). It can be used in Lean to do things like a one point compactification of a topological space.

We don’t need to know the definition, but I’ll show it anyway:

inductive option (X : Type)
| some (x : X) : option
| none : option

The option X type has two constructors: if x : X then there’s some x, which is a term of type option X corresponding to x : X, and then there’s a second constructor none, which returns a term of type option X which has no corresponding term of type X — it’s the “extra term”. Note that if x : X then it is not true that x : option X. Distinct types are disjoint in Lean’s type theory. The term of type option X corresponding to x is some x. The function some : X β†’ option X is invisible if you think of option X as “X plus another element”. If you think of some as a function then it is injective; you’re going to need this. Here’s a theorem which implies it:

option.some_inj : some a = some b ↔ a = b

Let me end by saying a little bit about what happens when Lean sees that definition of option X above. After processing the definition, Lean puts four new constants into the system. The first is option, a function which eats a type X and spits out a type option X. You might feel that we have just “defined” option above, but that’s not actually how it works: option has no definition, it is a function which cannot be evaluated in any meaningful sense. However it has an API. The second and third new constants in the system are some, a function from X to option X with no definition, and none, a term of type option X with no definition.

The fourth new constant is something called a recursor. A recursor is a way of defining functions from an inductive type. For example let’s say we want to build a function F from the naturals to, say, the reals, by induction (or more precisely, by recursion). We want as input data a “start” real number (which will be F 0) and a method which given a natural n and a real number r (which will be F n), returns another real number r' (which will be F (n + 1). The recursor for the natural numbers is a function which takes these things as input, and then returns the function F.

The recursor for option, called option.rec, takes as input a term (y0 : Y) and a function f : X β†’ Y, and returns the function F : option X β†’ Y which sends none to y0 and some x to f x. Again this recursor has no definition, but it does satisfy the theorems F none = y0 and F (some x) = f x, and the proofs are both refl — they are true by definition.

From these new constants and these two facts about the recursor, we have enough to be able to prove all the other theorems about option, for example the fact that some is injective. I was surprised about this — the recursor is a very powerful tool. I talk about it more in this blog post, but here I will just finish by explaining the proof of injectivity of some using the recursor. Say a b : X and we know some a = some b. Let’s use the recursor for option to define a function from option X to X. It sends none to a, and let’s use the identity function f : X β†’ X for our function f above. The recursor then spits out a function F : option X β†’ X satisfying F (some a) = a and F (some b) = b. But some a = some b, and hence a = b.

option is a very simple example of a monad. If you really want to go nuts, you can try proving this. But you might have more fun trying the topology theorems.

Posted in formalising mathematics course, Learning Lean, tactics, undergrad maths | Leave a comment

Formalising mathematics : workshop 3 — sequences and limits

This week we’re going to do limits of sequences, of the kind you see in a 1st year analysis course. These are great fun to do in Lean. Because of Rob Lewis’ linarith tactic (which does the “and now this inequality clearly follows” parts of the arguments), the arguments feel to me like they run very close to what we would write on paper as undergraduates, which is a sign that the system is mature enough for use (for example in teaching). The problem sheet is now in the repo or I guess you could even try it online (although it will take a while to start up — you’re better off installing Lean). This week there are 11 sorrys. I was initially planning on doing a bunch more stuff too, however I realised when preparing this class that there are a whole bunch of tricks which I know, which makes the arguments come out smoothly, but I didn’t teach them yet, so I am envisaging students finding these questions harder than I found them! I will start by going through the maths we’re doing, and will then explain a bunch of the tricks.

Limits of sequences

A sequence a_0, a_1, a_2,\ldots in Lean can simply be encoded as a function a : β„• β†’ ℝ. You can think of a : β„• β†’ ℝ as saying “a is a function from the naturals to the reals” (the idea being that a(37) in the function notation represents a_{37} in the sequence). Here’s what’s actually going on with this notation. In Lean, the type β„• β†’ ℝ as the type of all functions from the naturals to the reals! This point didn’t dawn on me for a long time so let me spell it out: when Lean says β„• β†’ ℝ it is talking about a type, and it’s the type which mathematicians would call Hom(\mathbb{N},\mathbb{R}), the set of all functions from the naturals to the reals. Then the notation a : β„• β†’ ℝ just means that a is a term of this type, i.e. an element of the hom set. Note that this is one of the few places in traditional mathematics where it is common to use a colon to denote element of a set, or term of a type, or however you want to think about it.

Lean must have the definition of the limit of a sequence, right? Sure! In fact it has the definitions of limits of a sequence, limit of a function f(x) as x tends to a, as x tends to a from above, and also as x tends to +\infty and to -\infty. In fact it has such a general notion of a limit that we’re going to need an entire workshop to understand it properly — it’s the predicate of a filter tending towards another filter along a map. But to understand filters, you have to understand unions and intersections of sets, and I have not covered these properly. So I propose doing sets and filters next week, and this week let’s just roll our own definition of a limit.

-- Don't ask me why this doesn't come as standard in Lean
notation `|` x `|` := abs x

/-- `l` is the limit of the sequence `a` of reals -/
definition is_limit (a : β„• β†’ ℝ) (l : ℝ) : Prop :=
βˆ€ Ξ΅ > 0, βˆƒ N, βˆ€ n β‰₯ N, | a n - l | < Ξ΅

I do not need to tell you what this definition says, you can read it yourselves. This is one of the advantages of Lean being compatible with unicode. Time and again I have heard computer scientists telling me that this whole “Lean does unicode” thing is an irrelevant gimmick. Time and again I have heard mathematicians telling me that it really makes a difference to them. It’s making a difference right now — I don’t have to tell you what that definition says. Lean even figures out the types of all the variables automatically. In order for everything to make sense, a n (the functional programming way of writing a(n)) must make sense, and because a takes a natural number as input, n must be a natural number. Because n β‰₯ N must make sense, N must be a natural number too. And a n - l is a real number, so | a n - l | must be a real number, so Ξ΅ must be a real number.

In the worksheet we’ll work through the following proofs:

  1. Limit of a constant sequence is the constant.
  2. Limits are unique (if they exist).
  3. Sum of two limits is limit of sums.
  4. Product of two limits is limit of products.
  5. If a_n\leq b_n for all n then the same is true for the limits (assuming they exist)
  6. The sandwich theorem: if a_n \leq b_n \leq c_n and the limits of both a_n and c_n exist and equal \ell, then the limit of b_n is also \ell.

I will do the first two, so you can see the techniques. The remaining four, and also seven other things, are your job. The proofs are just the proofs which you saw as undergraduate mathematicians, and you will hopefully find the formalisation relatively straightforward once you’ve internalised the tricks, which I explain within the copious comments in the lean file. Let me go through some of them here.

Ξ» notation for functions

This sometimes intimidates mathematicians. Here’s how it works. We mathematicians say “Let f be the squaring function, sending x to x^2. If we don’t want to name f explicitly we might say “consider the squaring function x\mapsto x^2“. A computer scientist might say instead “consider the squaring function Ξ» x, x ^ 2“. That’s all the lambda is — it’s just “\mapsto” but written at the beginning instead of the middle.

The one thing you need to know is how to deal with things like (Ξ» x, x ^ 2) 37. Here we have made this anonymously-named function, and then we’re evaluating it at 37. How do you get Lean to simplify this to 37 ^ 2? The answer is dsimp only, a tactic which does some basic definitional tidying up.

I drone on about this notation at great length in the comments in this week’s problem sheet.

Squiggly brackets {a} in function inputs

In Lean, a proof is a function. Let’s say we have managed to prove my_cool_theorem : βˆ€ (a : ℝ), 0 < a β†’ 0 < 2 * a. This proof is a function. It takes two inputs, firstly a real number a and secondly a proof that 0 < a, and then it spits out a proof that 0 < 2 * a. If you’re prepared to believe that theorems can be thought of as sets, and proofs as their elements, or more precisely that theorems can be thought of as types, and proofs as their terms, then it makes sense to think about proofs involving implications as functions, and this is an example.

But let’s think for a minute: where does this function go from and to? It’s a function of two variables so it should be of the form X \times Y \to Z. Clearly X is the real numbers. And then Y is the set of proofs that 0 < a. But wait a minute, what is a here? It’s the element of X which we chose as our first input! So actually something slightly fishy is going on — Y (the type of our second input) actually depends on the element of the set X which we chose (or more precisely the term of type X). If you know some geometry or topology, you can see that actually the source of this function is not a product \mathbb{R}\times Y, it’s more like the total space of a bundle on \mathbb{R}, where the fibre above a is the type of proofs that 0 < a, and this fibre moves as we move around \mathbb{R}. If you don’t want to think about it in this fancy way, just understand that the source of the function is sort of X\times Y, but the type Y depends the term of type X.

This is not a big deal foundationally, of course we can make the source space as the disjoint union of the types 0 < a as a varies and then think of our proof as a function on this space. But here’s another funny consequence. If we know the second input of the function, i.e. the element of Y, then by looking at its type (namely the true-false statement 0 < a) we can actually figure out the first input of the function (namely a). In particular, we are kind of wasting the user’s time asking them to give the first input, when we can just work it out from the second input. Here’s a concrete example. If h : 0 < 37 and we want to use my_cool_theorem to prove that 0 < 2 * 37 then right now we’re going to write my_cool_theorem 37 h. But that 37 input could have been worked out by the system because, given that the second input is h, the number 37 is the only possible first input that makes everything make sense — or, to use a technical term — makes everything typecheck. Lean’s type unification system, or unifier, is the system which checks all of this (it’s a big chunk of C++ code in core Lean which I have never looked at — I just know what it does rather than how it does it), and the trick we can do here is to make the unifier fill in that first input for us. So instead of my_cool_theorem we can define

my_cooler_theorem : βˆ€ {a : ℝ}, 0 < a β†’ 0 < 2 * a

The squiggly {a} bracket input means “Ok so this a is an actual input to the function, but the unifier is going to supply it, so the user doesn’t have to”. And now if h : 0 < 37 then to prove 0 < 2 * 37 we can just use the term my_cooler_theorem h.

While we’re here, I’ll note that the square bracket inputs [] that you sometimes see mean “OK so this is an actual input to the function, but the type class inference system is going to supply it, so the user doesn’t have to”. The type class inference system is another system whose job it is to supply inputs to functions, but let’s not talk about this here (it’s another big chunk of C++ code and again I’ve never looked at it, I know nothing about C++). The thing you need to remember is that, when hovering over a function and trying to figure out what inputs it needs, your job as a user is to supply the ones in () round brackets, and Lean will supply the other inputs using logic that you don’t need to worry too much about.

Some tactics you’ll find helpful: specialize, linarith, ring, convert

specialize: A function f : X β†’ Y β†’ Z can be thought of as a function which takes two inputs (a term of type X and a term of type Y) and spits out a term of type Z. This is because X β†’ Y β†’ Z means X β†’ (Y β†’ Z). If you have a term x : X and are pretty sure that you only want to ever evaluate f at x then you may as well fix x as the first input and let f just denote the resulting function Y β†’ Z. This can be done with specialize f x.

linarith: If you have hypotheses hab : a ≀ b and hbc : b < c and your goal is a < c then sure there will be some function in the library called something like lt_of_le_of_lt which does it for you, but you can just type linarith and it will all happen automatically. The linarith tactic should be able to deal with anything involving linear algebra and arithmetic, but note that it will not do non-linear goals like proving a*b>0 from a>0 and b>0. For this you need the nonlinear version nlinarith.

ring: If your goal is something which can be proved from the axioms of a commutative ring (or semiring), e.g. like (x+y)^2=x^2+2*x*y+y^2, then the ring tactic will solve it. Note that the ring tactic does not look at any hypotheses — if you need them, you’re going to have to rewrite them manually first (or write a groebner_basis tactic).

convert: If you have a hypothesis which is pretty much equal to your goal, but there’s just some random small subterm which is a bit different, then convert might be a good way to proceed. For example if your goal is ⊒ a ≀ b ∧ b ^ 2 + 3 ≀ c and you have a hypothesis h : a ≀ b ∧ b * b + 3 ≀ c but you don’t know the name of the theorem saying b^2=b*b so you can’t apply rewrite which will change b^2 to b*b then you can just convert h, and the goal will change to ⊒ b ^ 2 = b * b, which can be solved with ring.

Random API which will be useful

abs is a definition, so has an API. Three lemmas in it which might be useful:

abs_pos : 0 < |a| ↔ a β‰  0
abs_mul x y : |x * y| = |x| * |y|
abs_add x y : |x + y| ≀ |x| + |y|

And ring does not deal with division. A couple of things from the division API:

div_pos : 0 < a β†’ 0 < b β†’ 0 < a / b
lt_div_iff : 0 < c β†’ (a < b / c ↔ a * c < b)
lt_div_iff' : 0 < c β†’ (a < b / c ↔ c * a < b)

Oh, and while we’re here, the moment you start on division you have to start splitting into cases depending on whether the denominator is zero or not: Lean has made a design decision to allow x / 0 to make sense, but there are no theorems about it, so rather than division by zero giving an error it just gives a term you can’t use (think of it as “a random real about which we don’t know anything”). So knowing by_cases hc : c = 0 is a handy tactic trick — this splits into two cases depending on whether c=0 or c\not=0.

Appendix: do like Insta and use filters

Every definition in Lean comes with a cost. Last week we saw some of this cost. We didn’t use Lean’s inbuilt group definition, we rolled our own, and then we had to write a bunch of lemmas before it was usable.

This week we’ve done the same — we’ve rolled our own is_limit definition and have had to prove a bunch of theorems about it. However this week it’s possible to link our definition to Lean’s own far more high-powered definition of a limit, using its tendsto predicate, which is a predicate on two filters and a map. Here are a bunch of two-line proofs of things we’ve been doing today, using Lean’s filter API (so the content of the proofs has not magically disappeared, it is just being done by invoking general theorems from mathlib and using things like the fact that addition is a continuous function on the reals):

import week_3.Part_A_limits

import topology.instances.real

open filter

open_locale topological_space

namespace xena

-- `is_limit` is equivalent to a `filter.tendsto`
lemma is_limit_iff_tendsto (a : β„• β†’ ℝ) (l : ℝ) :
  is_limit a l ↔ tendsto a at_top (𝓝 l) :=
  rw metric.tendsto_at_top,

-- this is `is_limit_add`
example (a b : β„• β†’ ℝ) (l m : ℝ) : is_limit a l β†’ is_limit b m β†’ is_limit (a + b) (l + m) :=
  repeat {rw is_limit_iff_tendsto},
  exact tendsto.add,

-- this is `is_limit_mul`
example (a b : β„• β†’ ℝ) (l m : ℝ) : is_limit a l β†’ is_limit b m β†’ is_limit (a * b) (l * m) :=
  repeat {rw is_limit_iff_tendsto},
  exact tendsto.mul,

end xena

I will talk more about these filters at_top (neighbourhoods of infinity on the naturals) and 𝓝 l (neighbourhoods of l in the reals) next time. I write about filters here and will say more about them next week.

Posted in formalising mathematics course, M1P1, undergrad maths | 1 Comment

Formalising mathematics : Workshop 2 — groups and subgroups

This is some notes on the second workshop in my Formalising Mathematics course, running as part of the EPSRC TCC. The Lean github repo is here.

Groups and subgroups

I start with an apology — there was far too much material last week. I’m still getting the hang of this. Hopefully this week there is a more manageable amount. There is probably still more than can be done in two hours.

In workshop 1 we filled in sorrys, and most of the time the definitions we worked with were things like injective, which is a straightforward “one-liner” definition. Here is the definition of injective from core Lean:

/-- A function `f : Ξ± β†’ Ξ²` is called injective if `f x = f y` implies `x = y`. -/
@[reducible] def injective (f : Ξ± β†’ Ξ²) : Prop := βˆ€ ⦃a₁ a₂⦄, f a₁ = f aβ‚‚ β†’ a₁ = aβ‚‚

Apart from the @[reducible] tag and the slightly weird brackets, there are no surprises here: it is a one-line Lean definition, with a docstring (a string of text describing in human terms what the definition is about).

There was one more complex definition last week, which was the definition of a partition on a set, but I don’t think anyone got that far anyway, so let’s ignore it and talk about what we’re doing this week, which is to:

  1. define the concept of a group.
  2. make a basic API for groups.
  3. learn how to train Lean’s simplifier simp to prove simple theorems about groups.
  4. define the concept of a subgroup of a group.
  5. make a basic API for subgroups (specifically focussing on the subgroup generated by a subset).

Again I have no feeling for how far we will get. I’d like to think that most people will make it as far as 3 in the two hours we have. I had been using Lean for about a year before it dawned on me what the simplifier was for, so 3 is a nice target.

The material is in the src/week2 directory of the formalising-mathematics github repository.

Defining a group.

We start by discussing the material in src/week_2/part_A_groups.lean. The mathematical definition we will formalise is the following. A group structure on a type G (a type is just Lean’s word for a set) is two collections of things. Firstly, it is the following three pieces of data:

  1. A function mul taking two elements of G and returning an element of G, with notation g * h.
  2. A function inv from G to G, with notation g⁻¹.
  3. A constant one in G, with notation 1.

Secondly, it is the following three assumptions, or axioms, or proofs, or functions, or however you want to think about them:

  1. mul_assoc : βˆ€ (a b c : G), a * b * c = a * (b * c))
  2. one_mul : βˆ€ (a : G), 1 * a = a
  3. mul_left_inv : βˆ€ (a : G), a⁻¹ * a = 1

Internally these are represented in Lean as functions. For example one_mul is a function which takes as input an element a of G (or, as Lean would call it, a term a of type G — it’s the same thing) and spits out a proof that 1 * a = a.

In Lean the full definition in the repo looks like this:

class group (G : Type) extends has_mul G, has_one G, has_inv G :=
(mul_assoc : βˆ€ (a b c : G), a * b * c = a * (b * c))
(one_mul : βˆ€ (a : G), 1 * a = a)
(mul_left_inv : βˆ€ (a : G), a⁻¹ * a = 1)

Now mathematicians themselves don’t really know what the definition of a group is. Some people say that there are two more axioms mul_one : βˆ€ (a : G), a * 1 = a and mul_right_inv : βˆ€ (a : G), a * a⁻¹ = 1. Others say that these last two “axioms” can be deduced from the three we have already, so they are not axioms, they are theorems. It’s clear that the issue is not a mathematical one, but an infrastructure one. If we put more axioms in, then it will be easier to prove things about groups, but harder to prove that various things are groups. Given that I am assuming my audience is mathematically mature (this is not supposed to be a first introduction to group theory!) I am going to go for a minimal axiom approach, and the first thing we shall do is to prove mul_one and mul_right_inv. To do this we will need to prove a couple more lemmas first — or, at least, this is the way I have set things up. Feel free to experiment if you think you know a slicker way of proving these things — all that matters is that we get the two “missing axioms”. Here is the route I have taken in the Lean files:

mul_left_cancel : βˆ€ (a b c : G), a * b = a * c β†’ b = c
mul_eq_of_eq_inv_mul {a x y : G} : x = a⁻¹ * y β†’ a * x = y
mul_one (a : G) : a * 1 = a
mul_right_inv (a : G) : a * a⁻¹ = 1

In some sense, this is mathematically the hardest part of setting up a basic theory of groups, because not being allowed to use mul_one and mul_right_inv until we’ve proved them is rather unlike the way mathematicians usually deal with groups. It rather reminds me of when the Tokay pinch all of Link’s belongings on Crescent Island in Oracle of Ages.

Making an API

Once we have achieved the obvious goal of proving mul_one and mul_right_inv, the question is: what to do next. For reasons which will become apparent later, our next goal is to prove the following theorems (none of which are too hard at this point):

inv_mul_cancel_left : a⁻¹ * (a * b) = b
mul_inv_cancel_left : a * (a⁻¹ * b) = b
inv_mul : (a * b)⁻¹ = b⁻¹ * a⁻¹
one_inv : (1 : G)⁻¹ = 1
inv_inv : (a⁻¹)⁻¹ = a

You might want to take some time staring at the names of these results and noting that they are not just random computer science gobbledegook, there is method in the madness (in particular you should be able to figure out what mul_one and one_mul say without looking!) There are also a bunch of other random things which you can prove if you want, such as a⁻¹ = b⁻¹ ↔ a = b and so on — Lean’s maths library mathlib has many many basic statements about groups. The reason we have focussed on these five results above is that, if you put them together with mul_one, mul_right_inv and the axioms defining a group, then you get a confluent rewrite system! What is that, you ask?

The simplifier

Lean’s simplifier simp has a database of so-called “simp lemmas”, lemmas of the form A = B or A ↔ B. Given a random goal, simp will attempt to spot A‘s in it, and will proceed to replace them with B‘s. Note that order matters! Equality is not symmetric in this syntactic game (this is computer science, not mathematics). If you put the wrong lemmas in the database, this can be a disaster. For example if you put a = 1 * a in the database then the simplifier will start to replace elements g in goals with 1 * g, and then with 1 * (1 * g) and so on. However, putting 1 * a = a into the database is a really good idea, because cancelling a “1” is typically a sensible move in the game of mathematics. Tagging all of the ten lemmas above with @[simp] adds them to the simplifier’s database, and a theorem of Knuth and Bendix then tells us that the simplifier will then put every element of every group into some kind of unique canonical form (if you know about the theory of reduced words in free groups, it’s this, but with added care about exactly where the brackets are). In practice this means that the simp tactic is now guaranteed to solve silly questions like showing (a * b) * 1⁻¹⁻¹ * b⁻¹ * (a⁻¹ * a⁻¹⁻¹⁻¹) * a = 1 — you don’t have to mindlessly do all the rewrites manually.


If G is a type with a group structure (i.e., a group), then there is a type subgroup G of subgroups of G. The Lean definition is in Part_B_subgroups.lean in src/week_2 and it looks like this:

/-- A subgroup of a group G is a subset containing 1
and closed under multiplication and inverse. -/
structure subgroup (G : Type) [group G] :=
(carrier : set G)
(one_mem' : (1 : G) ∈ carrier)
(mul_mem' {x y} : x ∈ carrier β†’ y ∈ carrier β†’ x * y ∈ carrier)
(inv_mem' {x} : x ∈ carrier β†’ x⁻¹ ∈ carrier)

In words, to give a subgroup of G is to give four pieces of information: a subset of G (a.k.a. a term of type set G) called the carrier, and then three proofs that it is closed under the three pieces of data used to define a group — i.e., proofs that the subset contains one and is closed under mul and inv.

Optional paragraph: One slightly weird thing to keep in mind, if you’re into the whole foundational stuff, is that in contrast to the way things are usually set up in type theory, a subgroup of G is a term, not a type. This is a bit weird because usually the idea is that types are sets, and terms are elements of the sets. So how come a subgroup of G is a term? It’s because in type theory every term has exactly one type, so if we have a subset X βŠ† G then we can’t have the same 1 being of type X and of type G. The way we have set things up, we will have 1 : G (so 1 is still a term of type G) but 1 ∈ X (so we will use the set-theoretic notation — ∈ is a relation which takes a term g of type G (an element of G) and a term X of type set G (a subset of G) and spits out a true-false statement (the statement g ∈ X).

We start with some slightly tedious infrastructure (for example extending ∈ so that it applies to subgroups). I was tempted to skip this, but doing it gives you some idea of the “cost” of a definition. We prove random trivial stuff such as two subgroups are equal if and only if they have the same elements, and that subgroups are partially ordered by inclusion.

Closure of a subset

The main mathematical work we do in the file is to set up the theory of the subgroup generated by a subset. There are two ways to do this — a “bottom-up” way where we start with a set S and then throw in the identity and close up under multiplication and inverse. This is certainly possible in Lean, using inductive propositions, but I decided to do the “top-down” approach, where the subgroup generated by S is defined to be the intersection of the subgroups of G which contain S. We prove the following theorems (here ↑H is the underlying subset of G corresponding to the subgroup H of G):

subset_closure (S : set G) : S βŠ† closure S
closure_mono {S T : set G} (hST : S βŠ† T) : closure S ≀ closure T
closure_le (S : set G) (H : subgroup G) : closure S ≀ H ↔ S βŠ† ↑H
closure_closure (S : set G) : closure S = closure (closure S)
lemma closure_self {H : subgroup G} : closure ↑H = H

These results are enough to prove that closure is a closure operator, that closure S is indeed the smallest subset of S containing S and closed under the group structure maps (1, * and ⁻¹), and furthermore that closure and ↑ form a Galois connection. From these few lemmas the “machine can take over” — for example we can deduce for free that subgroups form a complete lattice and hence the 100 or so theorems proved in Lean about complete lattices can all now be applied to subgroups for free.

Again I suspect that there is far far too much for a 2-hour workshop.

Posted in formalising mathematics course, undergrad maths | Leave a comment

Formalising mathematics : workshop 1 — logic, sets, functions, relations

Here’s information about workshop 1 of the 8 workshops in my formalising mathematics course.

Mathematical background

At Imperial I lecture part of a standard “introduction to proof” course, a compulsory course for every beginning maths undergraduate. In my lectures we cover the basics of logic, sets, functions and equivalence relations. The idea is that the students see some abstract definitions (such as injective functions) and then some simple theorems (such as the composite of two injective functions is injective) and then try some problem sheet questions where they practice their skills at putting basic proofs together on paper.

My formalising course is for more advanced students (on paper it’s for PhD students but next year there’s a rumour that I will be teaching it to final year undergraduates), so I can assume that they know how to prove that the composite of two injective functions is injective in the traditional way. The goal of this workshop is to get students proving statements at this kind of level of difficulty, in Lean.

What I will assume

  • You have installed Lean 3 and the community tools e.g. by following the instructions on the Lean community website.
  • You know the pen-and-paper proofs of the results we’re talking about (e.g. that equivalence relations on a set are the same as partitions).
  • (optional but helpful): you’ve had a go at the natural number game so you know the basics of what it’s all about.

The workshop

Start by installing the course repository by following the instructions in the README on that page. NB if you skipped the Lean 3 installation part, then you can still play along by using the links on the README, but you will have a degraded (slower) Lean experience and I cannot guarantee that the links will continue to compile correctly as Lean and mathlib evolve; on the other hand, the course repository will always compile.

[ Note for nerds: you can’t just clone the repo. You have to install it correctly using leanproject as explained in the README if you want access to Lean’s maths library (which you will need for some of the questions). ]

Open the repository using VS Code as explained in the README (you must use the “open folder” functionality of VS Code to open the entire repo, don’t just open a file or else again you won’t have access to the maths library), and find your way to the Lean files in src/week1.

[note for people who have not actually downloaded and installed Lean or anything — you can still play along with this part, you can click here and then wait for a while until Lean stops saying “Lean is busy…” and switches to “Lean is ready!”. You’ll have to allow cookies I think, and you won’t get the colours, but you gets what you pays for here]

Open the first of the Lean files — part_A_logic.lean. You will see some sorrys in the file — they’re in red (and there are corresponding little orange dots in the area where the scrollbar is). Let’s take a look at the first sorry in this file — it’s on line 68. In this puzzle, P is a proposition, which is a computer-science word for a general true-false statement (in particular, P might not be true: 2+2=4 and 2+2=5 are both propositions in Lean). The puzzle is to prove that P\implies P. The sorry indicates that the proof is incomplete — it tells Lean that this proof is omitted right now.

Let’s figure out how to prove this theorem in Lean.

Your first Lean proof.

Here is the maths proof that we’re going to formalise. We are trying to prove that if P is any true-false statement, then P implies P. One proof of this is a boring “truth table” proof, where we just check the two cases where P is true and P is false. You can do this in Lean using the tauto! tactic, which will in fact solve pretty much all of the problems in this first logic file. But let’s solve it “constructively”, as the computer scientists call it.

Here’s the mathematical proof we’re going to formalise. Our goal is to show that P implies P. So let’s start by assuming that P is true. Our goal now changes into a new goal: we now have to prove that P is true. But this is exactly one of our assumptions, so we are done.

Let’s step through how this looks in Lean. We have our file on the left, and another window called the “infoview” on the right, which displays the tactic state (the goal and the hypotheses — i.e., the state of Lean’s brain). Right now the tactic state should look like this:

This means that P is a Proposition, and the goal (the statement after the sideways T) is to prove P\implies P. Now, back in the Lean file, under the -- Prove this using intro and exact comment, type intro hP, so it looks like this:

Don’t forget the comma. Just after you’ve typed the comma, look at the infoview. It has changed — it now looks like this:

We now have a Proposition called P, and a new hypothesis hP which is the hypothesis that P holds, i.e., that P is true. The goal has changed too — our goal now is to prove that P is true.

Note that hP is just a name. We could instead of typed intro Nigel and then Lean’s infoview would display Nigel : P . Beginners can get confused about the difference between P and hP; I will say more about the difference later on, although I could make the following cryptic remark now that P is what is called a Type, and hP is a term of that type. We’ll come back to this. Let’s concentrate on finishing the proof right now.

Our goal is now to show that P is true. But this is exactly one of our assumptions, namely hP. So we can finish the proof by adding exact hP, under the intro hP tactic, like this:

If we take a look at the infoview just after typing that comma, we see that a good thing has happened:

You just proved a theorem in Lean! But actually there is still something wrong: you can see that the sorry on the left now has a red underline on it. Clicking on sorry shows you the text tactic failed, there are no goals to be solved. in the infoview. We are apologising unnecessarily. Delete the sorry, the error goes away, and now the proof is complete. If you look down at the bottom left hand corner of VS Code you will now see something like this:

That means that there are no errors in your file, and 39 warnings. Each theorem whose proof still has one or more sorrys in generates a warning, indicating that you cheated. You should always try and keep your files so that they have no errors, and your job in the workshop is to try and decrease the number of warnings, by filling in sorrys with proofs.

The difference between P and hP

If I were to say to you “What is Fermat’s Little Theorem?” then you would probably tell me that it’s the statement that a^p is congruent to a mod p, if p is a prime and a is an integer. If you were writing a proof and you were working mod p and had a number a^p, you could replace it with a and justify this in your work by saying you were using Fermat’s Little Theorem.

This is the way mathematicians speak. But something I noticed only after I started formalising is that here they are using the phrase “Fermat’s Little Theorem” in two different ways. When asked what the theorem is, the response above is a description of the statement of the theorem. But in the middle of a proof, if you want to invoke it, then you are really using the proof of the theorem. If I were to ask you what the Riemann Hypothesis was, then you might state it, but this doesn’t mean that you can just say “now by the Riemann Hypothesis” in the middle of a proof, at least not if you want your proof to be complete, because in the middle of a proof you are only allowed to use stuff which is proved in your arguments, not just stuff which is stated.

We saw hP : P above. One model of what is going on here is that P is the statement of the proposition we’re thinking about, and hP is the proof. I see beginners writing things like exact P in their proofs. But In the proof we have to use exact hP because we need to use the proof, not the statement. In fact what we are doing in this proof is constructing a function which sends proofs of P to proofs of P. Lean is a functional programming language, and under the hood the way Lean understands a proof of the form P\implies Q is as a function, or an algorithm, which takes as input a proof of P and returns as output a proof of Q. If you want to learn more about how Lean’s type theory models theorems and proofs, you can try this other blog post — it’s not too difficult — but you don’t really need to know about this stuff to get something out of this course.

A second proof.

Let me go through one more proof. Let’s prove that P and P\implies Q together imply Q. The sorryed proof is around like 98 or so in the file (it might have moved a bit if you typed in the earlier proof). It looks like this:

The first surprise is that there is no “and” here. The way I have formalised this statement in Lean is like this: P\implies ((P\implies Q)\implies Q). In words, P implies that (deep breath) P\implies Q implies Q. This is of course logically the same as saying that P and P\implies Q imply Q. Also worth noting is that the Lean formalisation has a disconcertingly small number of brackets in. This is because the β†’ symbol in Lean is right associative, which is just a fancy way of saying that if you see P β†’ Q β†’ R it means P β†’ (Q β†’ R).

Our goal is hence of the form “P implies something” so we can start with intro hP, (or intro Nigel or whatever you want to call the assumption that P is true). The goal now changes to the something, which is “(P implies Q) implies something”, so we can continue with intro hPQ, and now our tactic state looks like this:

We have two propositions, P and Q, and hypotheses hP (saying that P is true) and hPQ (saying that P implies Q). So, exactly as advertised, we know P and P\implies Q, and we want to prove Q. Next we need to learn a new tactic, the apply tactic.

apply is a difficult word. Humans apply all sorts of facts to deduce other facts all over the place in mathematics. In Lean, apply has a very precise meaning, and we are in precisely the situation here where we can use it. If you try apply hPQ, for your next move in this puzzle game, the tactic state changes to the following:

The difference: the goal has changed! Our goal used to be to come up with a proof of Q. But P implies Q, so if we apply this fact then we see that it suffices to come up with a proof of P instead. That is what the apply tactic does — it reduces our problem to a simpler one by applying an implication. You can only use the apply h tactic if h is an implication, or a series of implications, which ultimately imply your goal.

The rest is easy — exact hP, or even the assumption tactic (which closes any goal which happens to be one of your assumptions) will work.

Try the rest on your own.

The rest is up to you. I have put model solutions in the solutions directory. You will need to know a few more tactics than these three — I have explained them in the week 1 README. Thanks for coming to my course, and good luck!

Post workshop thoughts.

There was far too much material for a beginner to complete within 2 hours. Most people did not even finish Part A. In some sense, there was a lot of material designed to hammer home a few basic tactics again and again. I don’t think it matters if students don’t make it through the material. I’m going to press on next week and do basic group theory and subgroups.

Posted in formalising mathematics course | 2 Comments

Formalising mathematics: an introduction.

As part of the EPSRC Taught Course Centre I am giving a course on formalising mathematics. This is a course for mathematics PhD students enrolled at Imperial College London, Bristol, Bath, Oxford, or Warwick. No formalisation experience will be assumed. Details of the course are available at the second link above. I have been timetabled eight 2-hour lectures on Thursdays 4-6pm UK time, starting this coming Thursday, 21st Jan 2021.

My instinct in the first lecture would be to start by listing a bunch of reasons why learning how to formalise pure mathematics is interesting/useful/important/whatever, and perhaps also explaining how I got involved with it. But I could probably spend about 30 minutes on this, and I don’t want to waste valuable lecture time on it. In fact I won’t actually be giving lectures at all — the 2-hour slots will be mini Lean workshops, where beginners formalise mathematics they know, with me watching, and I cannot see the point of making the students listen to me waffle on about my opinions/history when, after all, they have chosen to come to the course anyway. So I’ve just decided to write the introduction here, and then students can choose to read it at their leisure (or not read it at all).

Note that I will also be posting notes for the eight lectures here on this blog, explaining the tactics and lemmas which students will need to do the problem sheets associated to each of the eight workshops. I am hoping that the course material itself can be used by other people if they want, e.g. by people wanting to teach Lean courses in other universities, or students wanting to self-study. But without more ado, let’s get on with the stuff which I have now decided not to say in lecture 1.

Formalising mathematics: Why?

Hello, and welcome to my course! I’m Kevin Buzzard, a lecturer at Imperial. For 20+ years I was a algebraic number theorist in the traditional sense, and my research work involved trying to prove new theorems in, broadly speaking, the Langlands Philosophy. Nowadays I work on something completely different — I am laying the foundations for teaching the Langlands Philosophy, and other aspects of serious modern mathematics, to computers. To make this switch, all I had to do was to learn a new language (a computer language) which was rich enough for me to be able to express my mathematical ideas in, and then it simply became a matter of explaining the mathematics I know and love to the computer; the computer checks the details of the proofs. In this course, we will be jumping right in. You will be actively learning this language, because you will be spending the vast majority of every lecture teaching basic undergraduate level mathematics (stuff you understand very well) to your own computer, and I or, hopefully, other students, will be helping you out when you get stuck. As the term progresses we might move onto harder mathematics — for example maybe we’ll do some MSc level algebra or algebraic geometry, maybe we’ll take a look at Patrick Massot’s sphere inversion project, or maybe we’ll explore the foothills of recent work of Clausen and Scholze. How far we get and what we’ll do in the last few lectures will depend very much on who is still attending the workshops after the first month.

A natural way to start such a course would be to spend half the first lecture explaining why I am investing my time giving such a course — what the point of it all is, and why I think this area is worth teaching. But such an explanation will eat into our precious time formalising time, so I decided to write it up here instead. In this introduction, which I won’t be formally reading out in the first lecture, I will explain the following:

  • What happened to make me change area, and some mistakes I made at that time;
  • Why I think it’s important to teach young people (i.e. you lot) about formalisation of mathematics;

The nature of research mathematics

As I explained, I used to be a research algebraic number theorist, working in a technical area. More and more, my work would rely on results of other people, and it was becoming increasingly the case that I did not know the proofs of these results. This is the normal procedure in many modern areas of pure mathematics, as many of you will come to realise (if you didn’t realise already) during your time as PhD students.

Around 2017-18 I had some kind of a midlife crisis and decided that I did not want to proceed in this manner any more. I was seeing more and more evidence that some of these results that my colleagues and I were using were either not completely proved, or were completely proved but the complete proofs were not in the literature and may well never be in the literature. I was becoming increasingly concerned that the literature was incomplete and, in places, incorrect. Worse, I had begun to feel that there was a certain amount of failure to uphold what I thought were reasonable academic standards in number theory, and talking to experts in other areas I discovered that similar things were happening in several other branches of pure mathematics. By “failure to uphold academic standards” I certainly do not mean to imply that people are intentionally cheating the system. What I mean is that I felt, at that time, that the system was not scaling. Papers are getting longer. Areas are becoming highly technical. The fact that a journal is sitting on a technical paper with only a small pool of potential expert referees, and needs a report from a busy person — and that busy person has to make a judgement call on the correctness of a paper despite not having time to check the details correctly — and this person was sometimes me — means that some stuff gets through the net. I know papers, in my area, which have errors in. This is not a big deal, because I know, for the most part. how to work around the errors. Traditionally this is regarded as “to be expected”, but by 2017 I had became unhappy with the status quo. Did you know that there are two papers (this one and this one) in the Annals of Mathematics, the most prestigious maths journal, which contain results that directly contradict each other, for example? No erratum was ever published for either paper, and if you chase up the web pages of the authors involved you will see that (at least at the time of writing) both Annals papers are still being proudly displayed on authors’ publication lists. The reasons for this are complex. The incorrect paper has got some groud-breaking ideas in, even if the main theorem is wrong. Journals seem to be unhappy to waste their time/pages publishing errata, and authors sometimes seem to be unhappy to waste their time writing them. The experts know what’s going on anyway, so why bother fixing stuff? Contrast this with the world of computer science, where bugs in programs are discovered and fixed, and the fixes are pushed so that future users of the programs will not run into the same problems.

Mathematicians tend to be smart people. How come they’re making errors?

Mathematicians think in pictures

I have a picture of the real numbers in my head. It’s a straight line. This picture provides a great intuition as to how the real numbers work. I also have a picture of what the graph of a differentiable function looks like. It’s a wobbly line with no kinks in. This is by no means a perfect picture, but it will do in many cases. For example: If someone asked me to prove or disprove the existence of a strictly increasing infinitely differentiable function f:\mathbb{R}\to\mathbb{R} such that f'(37)=0 and f''(37)<0 then I would start by considering a picture of a graph of a strictly increasing function (monotonically increasing as we move from left to right), and a second picture of a function whose derivative at x=37 is zero and whose second derivative is negative (a function with a local maximum). I then note that there are features in these pictures which make them incompatible with each other. Working with these pictures in mind, I can now follow my intuition and write down on paper a picture-free proof that such a function cannot exist, and this proof would be acceptable as a model solution to an exam question. My perception is that other working mathematicians have the same pictures in their head when presented with the same problem, and would go through roughly the same process if they were asked to write down a sketch proof of this theorem.

I also have a picture in my head of an overconvergent modular form defined on a neighbourhood of the ordinary locus on a p-adic modular curve. This picture informed several papers I wrote earlier this century with Richard Taylor, Frank Calegari, and others. I was once privileged to be invited to speak in the number theory seminar at Orsay in Paris, and Jean-Pierre Serre was in the audience. I drew one of these pictures of mine on the board and Serre interrupted! He asked what the picture meant. I had drawn a picture of a compact Riemann surface of genus 3 and was drawing discs and annuli on the Riemann surface. The picture was however supposed to represent a 1-dimensional p-adic manifold (a rigid analytic space in the sense of Tate). It was a representation of the argument I was explaining, but because the object I was actually working with was p-adic, the drawing in some sense bore essentially no relation to the actual mathematical object I was working with. However, my Annals of Mathematics paper with Taylor and my follow-up Journal of the AMS single-author paper (which I was lecturing on at the time) were all evidence that my way of thinking about things, the pictures in my head, really could be translated down into rigorous mathematics, even though they were in some sense meaningless. They were effective guides. My picture came with caveats, which I had a mental note of (for example there are all sorts of subtleties with the “topology” on a rigid analytic space, issues which were solved initially by Tate in the 60s using Grothendieck topologies, and nowadays there are other solutions). These subtleties were not displayed in the picture I’d drawn on the board in Orsay, but I was aware of them. In short, I knew “how far one could push the picture” in some sense — which bits of it to take seriously.

I once found what I thought was an error, when refereeing a paper (the author of which was an expert). I could see where the error was coming from. There was a certain object being studied where my picture of the object was more sophisticated than that of the expert writing the paper. The author claimed that something was obvious. I could not see why it was obvious, so I consulted another expert. This expert I consulted (whose picture, or perhaps I should say “understanding”, of the situation was the most sophisticated out of all three of us) said that the result which the author claimed to be obvious was almost certainly true, they said they knew of no reference, and sketched an argument which they were confident could be turned into a proof with a fair amount of work. I relayed a brief summary of these ideas back to the author in my report (requesting revision) and in the second version of the paper they still claimed the result was obvious and attached a note saying that they could not see what the fuss was about. I then sent a much longer report explaining the problems in far more detail. In the next version of the paper which I was sent, the entire section containing the problematic argument had been rewritten and the difficult-but-probably-true result was no longer needed. This is a great example of the unreasonable resilience of mathematics. The author knew several proofs of the main result in that section; I had spotted an issue with one, so they simply supplied another one. In my experience, this is the way human-written mathematics routes its way round most errors in preprints and papers. But occasionally we’re not so lucky, and unfixable errors make it through the system.

The system is broken!

You might conclude from this, and other stories you’ll encounter throughout your PhD studies, that the system is broken. I did, and in 2017-18 I started to seriously consider the idea that perhaps using computer proof checkers might help to fix the problem. I have lost count of the number of times people told me to read Thurston’s On Proof and Progress in mathematics; Thurston makes some extremely coherent arguments which one might interpret as being against formalisation of mathematics, because one naive interpretation of what formalisation is doing is that it is “removing the pictures”. Of course, one also has to remember that Thurston’s pictures were very accurate. Thurston was responding to an article by Jaffe and Quinn called Theoretical Mathematics, and anyone who reads Thurston’s paper should also read Jaffe–Quinn, to see what inspired it. The Jaffe–Quinn paper also makes some very good points. They, like me at that time, were concerned. In 2017 I had independently started to learn about computer proof checkers because I had become interested in their potential use as a teaching tool, and I had some grandiose ideas about how everything could be put together. I went on a “UK tour” in 2018, giving lectures at several serious UK universities called things like “Pure mathematics in crisis?” and I made many provocative statements in an attempt to get a dialogue going about these issues. I also invested a lot of my time into becoming an expert in using one particular computer proof checker, called Lean (other computer proof checkers are available, e.g. Coq, Isabelle, Metamath, Mizar, Agda, HOL 4, HOL Light and many other systems). With a team of undergraduates at Imperial we taught Lean what a scheme was (the idea discovered by Grothendieck which revolutionised algebraic geometry in the 1960s), and I quickly realised that modern proof checking systems were now capable of handling serious modern research mathematics. Could these systems somehow be used to fix our broken system?

The system is not broken!

However, I now believe that the system is not broken at all. Rather, this is just the nature of mathematics as done by humans. What I did not realise in 2017 was that mathematics has always been like this. Humans are fallable. Philosophers and historians of mathematics have put me right on this point. What I also did not realise in 2017 is that I was going to have to come to terms with the fact that perhaps 20 years of administration and childcare meant that I was no longer a top expert in the Langlands philosophy, and the fact that I now was having problems with the literature certainly did not mean that the top experts were. The top experts know where the problems are, and why some things which seem like problems are not problems. Furthermore, they would probably be happy to explain things to you, if you raise specific issues of concern, and thus give them a reason to take you seriously. Thank you very much to those that did this for me.

So…why formalise?

So, if formalising mathematics in a computer proof checker is not going to save pure mathematics, and if indeed pure mathematics does not even need saving, then why should a pure mathematician bother with using computer proof checkers at all? Well, the thing is, in stark contrast to my naive 2017 self, I have now seen what these systems can do, and it is now manifestly clear to me that these systems can change mathematics for the better. These systems can digitise mathematics. Why should we want to digitise mathematics? Is it not obvious to you? It will enable us to use it in different ways. It has the potential to change mathematics completely. Let me give you an analogy.

In 1992 I, very reluctantly, bought my first CD player. At that time I had hundreds and hundreds of vinyl records (those of you at Imperial might well have seen a bunch of them in my office), and each one I regarded as a thing of beauty. CDs were ugly (the artwork looked much less good), breakable (we had been told during the marketing campaigns that you could spread jam on them and they wouldn’t get damaged, but I had CDs which jumped like crazy and one of them had even started to rot in some weird way), and some people were arguing that the switch to digital meant that they did not sound as good as vinyl (although I will be honest and say that my ears were never good enough to tell the difference, at least when it came to the kind of noisy guitar rock and early breakbeat/drum and bass which I was listening to at that time). What I (and many others, I suspect) at the time had not realised was that the crucial switch was not from vinyl to CD, it was from analogue to digital. Nowadays, essentially all of the music which I consume is digital, even though essentially none of it is on CD’s. The switch to digital has made music more portable. It means that during lockdown I can access new music instantly, send YouTube links to my friends, and my son can create and manipulate music to make new kinds of sounds using his laptop. Digitising music was a game-changer. Essentially nobody really realised the extent of this in 1992, indeed at the time it just seemed to me to be a ploy by The Establishment to make me re-purchase music I had already paid for.

Digitising mathematics is going to be a game-changer in the same way. Digitising mathematics changes the way that it is consumed. Digitising mathematics turns it into a new kind of picture — some kind of directed acyclic graph of terms and functions in a type theory. This kind of picture is of no direct use to humans. Patrick Massot made a picture of the graph corresponding to a perfectoid space, and it is just, to quote A A Milne, a “confused noise”, at least for humans. However it is a picture which computer proof systems can understand very well. Unless you are living under a rock, you will know that artificial intelligence and machine learning are changing the world. IBM made a computer program which could beat a grandmaster at chess. Deepmind made a computer program which could beat a grandmaster at go. Proving theorems in pure mathematics is a very natural next step. When will computers start to compete with humans in this game? Not for a while yet. But this is starting to happen. However, AI works best with a database — and those databases are not yet there. Those that we have are deeply flawed for a number of reasons. For example, one mathematical database we have is a formalised contains a list of many many theorems about the smallest finite non-solvable group of odd order — a substantial collection of results which a machine learning program could learn from. The last entry in that database is the theorem that there is no finite non-solvable group of odd order. This is the famous Feit–Thompson theorem, formalised by a team of 20 researchers over a six year period. I ask then — what use is that database of theorems to an AI? Hundreds of theorems, some with quite technical proofs, about a group which does not exist. These theorems are very important for the proof — they are the proof — but are they important to an AI? The Gonthier et al work on the Feit–Thompson theorem is extremely important, because it is unquestionable evidence that the systems are ready to handle a 400 page proof, at least when when it is a proof which only involves low-level objects like finite groups. But to an AI researcher it seems to me that this database has problems, as it spends a lot of time developing a theory for an object which doesn’t exist. This is by no means a criticism of the Gonthier et al work! I do not know of a single database of mathematical theorems which I would consider remotely adequate for machine learning. Where is our analogue of the thousands of grandmaster games which IBM’s chess computer Deep Blue trained on? My comments are, at least implicitly I guess, a criticism of the mathematical community itself. Isn’t it about time that we supplied these AI experts with a database of theorems and conjectures about schemes, manifolds, automorphic representations, Shimura varieties and so on? Things which human mathematicians are working on in 2021? The systems are ready; this is what people like Gonthier and Hales have shown us.

Digitising mathematics is not just useful for the AI researcher. Digitising mathematics guarantees a basic level of accuracy, which is extremely helpful for the learning experience. I have still not forgotten the example sheet question I spent five hours on as an undergraduate, which asked to prove that if a topological space had various nice properties then it was metrizable. I finally gave up, went to my supervision, showed my supervisor the problem, and they instantly responded “Oh that’s not true at all! The Stone-Cech compactification of the natural numbers is a counterexample”. Having never heard of this object at the time, I felt rather cheated. I have been formalising my undergraduate problem sheets over the last couple of years, and “edge cases” such as how a claimed result might become untrue/meaningless when some of the variables take extreme values such as zero are now completely eliminated. I am not saying that this is a reason to learn how to formalise. I am however saying that once more mathematics is formalised, people will take what we have and begin to consider doing things like creating online interactive textbooks and problem sheets, where students will be able to solve problems perhaps in some “front end” language. These tools and utilities will begin to appear as more and more mathematics is digitised and the software becomes more and more normalised in mathematics departments.

Let me finish by going back to pictures. Thurston’s ideas of the importance of humans turning mathematics into pictures and then using their intuition to manipulate these pictures can be interpreted as an argument for the importance of fluidity in thought, and hence an argument against formalisation. But imagine arguing that coordinate geometry should not be used to study Platonic solids because making Platonic solids out of clay clearly teaches you more about them. For example, testing the hypothesis that unit tetrahedra can be used to tessellate 3-space is far more easily done with physical tetrahedra than with a pile of formulae satisfied by the coordinates of the vertices in 3-space. However, without 3-dimensional coordinates there would be no 3d computer modelling software Blender, and using Blender (if you know what you’re doing) it is also very easy to check that unit tetrahedra do not fit together in a neat way to fill 3-space. When Descartes came up with his coordinates, he was in no position to imagine Blender. Maybe we are in no position in 2021 to imagine the kind of ways which a computer can be used to show us how to visualise various mathematical objects. But if people like me start teaching people like you the basics of how this stuff works, perhaps you will start having your own insights about what can be achieved in the future.

You can’t stop progress. Undergraduate and MSc mathematics is going to be formalised, and then who knows what will happen. Maybe the statements of the theorems in the Stacks project will be formalised — we’re working on it. Maybe the proofs of some of the theorems in the Clausen-Scholze work on a new variant of topological spaces will be formalised — we have made a start, and every day (at least at the time of writing) people are talking about formalising condensed mathematics on the Lean Zulip chat. These projects indicate that these systems are capable of understanding modern mathematics. What will they do with this understanding? I don’t think any of us are in a position to guess. But I want to find out sooner rather than later, and this is why I want to show young people like you what is currently possible — so you can prepare to dream.

Posted in formalising mathematics course, Uncategorized | 19 Comments

Lean Together 2021

Thank you so much to Rob Lewis and Patrick Massot for organising Lean Together 2021! This annual Lean conference, like mathlib, seems to double in size every year. This year over 200 people registered, and even though it was all online and there were a lot of talks spanning everything from research level mathematics to hard-core computer science (so nobody would understand all of them), there still seemed to be between 50 and 100 people attending every talk. I attended all of them, but certainly didn’t follow all of them; I’ll discuss some the talks that I feel qualified to talk about. Anyone jealous that they missed out can catch up with essentially all of the talks on the leanprover-community youtube channel.

Most or all of the talks mentioned below have slides and/or video available. Check out the schedule for links.

Overview of some of the talks.

Do your theorems use all their assumptions? Is your theorem about fields actually true more generally for division rings? Or even rings? Or even semirings? Lean now has a linter which will check these things for you, written by Alex Best and showcased in his talk. This is important for Lean’s maths library because, like Bourbaki, we strive to prove theorems in the correct generality, and now we have a tool which checks that this is happening.

Floris van Doorn has formalised the theory of the Haar measure in Lean. This is the sort of thing which really excites me. Haar measure is a measure on a locally compact Hausdorff topological group which is translation-invariant. I read the proof of existence and uniqueness of Haar measure when I was teaching an old course on Tate’s thesis. The proof is long, and tricky in places, but fairly low-level. It’s just the kind of thing which one might imagine “sure this could be checked by a computer — in theory”. Floris has shown that with Lean it can be checked by a computer in practice. The systems work and this is why we mathematicians should be using them.

Heather Macbeth did something equally great even though to a non-expert it might look less impressive. She gave the n-sphere the structure of a manifold. The reason this is important is that it’s essentially the first non-trivial example of a manifold that we have. Manifolds are really complicated! Sorting out all the theory of charts took a while, and the fact that Heather can make a sphere into a manifold is, for me, strong evidence that the Lean analysts (Heather included) have got the set-up right. Future work: use the inverse function theorem to make zeros of functions in affine space into manifolds. This stuff is hard work. But it’s happening.

I remember when the world wide web was born. Academics made web pages, which consisted of text, pictures, and hyperlinks to more web pages. People could see the potential, but it took many more years before the web became accessible to the masses, and web pages became powerful and flexible tools. Ed Ayers talked about widgets, his addition to core Lean which enables pictures and links to be embedded in Lean output in VS Code, and enables the user to inspect the types of terms and subterms in Lean output, jump to definitions and more. This has got huge potential, and it will be interesting to see it realised. Students on the Xena Discord have already used them to make graphical logic games. There is more to come here.

Chris Hughes talked about his group theory tactic. It is now fast, and solves problems in group theory of the form “show this finite set of equations implies this one”. In fact he has written several tactics which do this; time will tell which techniques are most useful in practice.

The blue-eyed islanders puzzle is something we’ve talked about on the blog before. One has to think hard about how to formalise the question mathematically. Paula Neeley talked about her work formalising a framework for dynamic epistemic logic in Lean, which will enable us to work on questions such as this in a much less ad-hoc way.

Kenny Lau talked about his work formalising the Deligne/Fontaine–Wintenberger/Scholze theory of tilting, part of his Imperial MSc project. Working with Kenny has been an interesting experience. When Massot, Commelin and I were working on perfectoid spaces we formalised a very general definition of a valuation, taking values in an arbitrary totally ordered abelian group with zero; this is what is needed to formalise the theory of adic spaces. For the theory of tilting, Kenny and I have realised that actually we need an even more general theory — in Deligne’s paper on fields of characteristic p which are “limits” of fields of characteristic zero, he uses valuations taking values in certain monoids with zero. I’ve been working on refactoring the definition and we are nearly there.

Yasmine Sharoda talked about her work on building algebraic hierarchies. This is a very “computer science” problem, and quite hard to explain to mathematicians. Basically mathematicians carry around their head a big web of trivial facts — a scheme is by definition a locally ringed space, which is by definition a ringed space, which is by definition a topological space. A ring is by definition a semiring, a monoid, an additive group, and additive semigroup, and so on. Computer programs like Lean need some kind of organisational principle to mean that mathematicians can move freely between these ideas without noticing that they’re doing so — either that, or libraries suddenly experience an irrelevant growth factor when the theorem that 0+x=x has to be proved for additive monoids, groups, semirings, rings, division rings, integral domains and fields. This is a problem which I do not understand the details of, but I have now seen sufficiently many talks on it to understand that computer scientists are concerned about this issue. I think they are in some sense planning for the future though — Lean has no trouble working with schemes or fields right now.

Adam Topaz talked about his work formalising theorems in anabelian geometry. There is a problem here — beyond a certain point one needs the theorems of local and global class field theory, and the proofs of these things are long and difficult. My proposed solution: let’s just formalise the statements, and worry about the proofs later. The sooner we start doing interesting maths like what Adam is attempting, the better.

Peter Nelson talked about formalising matroids. This is hard, for the wrong reasons. It seems to me that type theory is ideal if you are formalising things which can be expressed in the language of category theory (i.e., if you have objects and morphisms). Matroids are combinatorial objects and a fundamental construction stressed by Peter was that of a minor of a matroid. This is some kind of subobject, but there seems to me to be no natural general notion of a morphism of matroids. One needs to talk about minors of minors freely, and this can start to get convoluted if one is not careful. One wonders whether this kind of stuff would be easier to do in a set-theory based system. One approach would be the “Coq odd order theorem” approach, where you work in a large matroid and everything is some kind of submatroid of this (groups in the Coq odd order work were all subgroups of a large ambient group which had no mathematical relevance — it was just a way to hold things together). But I really don’t want to go down this route. Kenny Lau seems to have solved the problem of subfields of subfields using an elegant is_scalar_tower structure; I wonder if a similar idea can be used here. I’ve realised that matroids are a great example to bear in mind when one is thinking about solving these type-theoretic issues practically.

Marie Kerjean talked about her work on the foundations of complex analysis in Coq. We have essentially no complex analysis in Lean, although there is a ton of the stuff in Isabelle/HOL. I’m sure that the Lean community will be able to learn from her efforts when we turn our attention to these matters.

Damiano Testa is well on the way to proving that the image of a constructible set under a finite type morphism of schemes is constructible. What we are missing right now is the definition of a product of schemes, so he has been working in the affine case for now and has essentially finished that part. I remember ploughing through the construction of a product of schemes in Hartshorne when I was a PhD student. It will be fun to do it in Lean — this is the kind of stuff which should turn out nicely.

Amelia Livingston talked about the nature of universal objects. This talk had some ideas in which I’m still wrestling with. Here’s a funny observation, which goes back to Patrick Massot. Say R is a commutative ring and S is a multiplicative subset. How does one prove that the kernel of the canonical map R\to R[1/S] is the annihilator of S? This can’t be hard. I don’t know how to deduce it directly from the universal property of R[1/S] though. The only proof I know involves having to make the usual explicit construction of R[1/S] as R\times S/\sim and to prove it for this specific model. Is this a surprise? Well, if I told you that you could assume that the real numbers were a complete ordered archimedean field (the universal property) and then I asked you to prove a fact about the real numbers, and it turned out that to prove it you had to use the construction of the reals as, say, Cauchy sequences, I think you would be surprised. The universal property is supposed to characterise the object up to unique isomorphism and save us the trouble of having to build it every time we need it. Amelia in her talk explained that she had discovered something even weirder. Despite the fact that mathlib has both a construction of the tensor algebra T(M) of an R-module M and a proof of its universal property, the statement that M^{\otimes n}\to T[M] is injective seemed to follow from neither the universal property nor the construction in mathlib, so she had to make a second construction. This is like coming up with an assertion about the real numbers which you can’t prove from the axioms, or from the construction as Cauchy sequences, so you have to make them again as Dedekind cuts before the result becomes clear. What is going on here? Amelia is working on Koszul complexes in Lean as part of her MSc project and this is where the issue came up.

Alena Gusakov, one of the moderators of the Xena Discord, talked about her work with Bhavik Mehta (another moderator) and Kyle Miller on formalising all the forms of Hall’s Marriage Theorem that you’ve ever heard of. Graph theory is another one of these objects which don’t naturally fall into the arms of category theory (what is a morphism of graphs? I did an entire graph theory course as an undergraduate and saw some deep and tricky theorems, but this concept was not mentioned once). Like matroids, graphs get manipulated (edges get removed or contracted) rather than morphismed, and for some reason this makes them more difficult to use in type theory. What is going on here? This deserves a better explanation.

Thomas Browning and Patrick Lutz are PhD students at UC Berkeley, and, building on work of undergraduates at Imperial, they have formalised the fundamental theorem of Galois theory in Lean and are on their way to proving insolvability of the quintic, problem 16 of the Formalizing 100 Theorems challenge. To my astonishment, this is one of the problems which has not been formalised in any theorem prover, although Cyril Cohen tells me that Sophie Bernard, Pierre-Yves Strub, Assia Mahboubi and him are working on it in Coq. [Update: the Coq team have finished!]

There were other talks, but on things I understand less well. Yury Kudryashov talked about dynamics of the circle. Mario Carneiro talked about Metamath 1. Vaibhav Karve talked about axiomatising various theories he’s been thinking about such as model theory. Jannis Limberg talked about tactic-writing. Stan Polu and Jason Rute talked about Lean and AI (including demonstrations of an AI program solving levels of the natural number game!), and Koundinya Vajjha, Joe Hendrix and Logan Murphy all talked about formalisation projects which are way beyond my area of expertise.

Oh, and Leo de Moura and Sebastian Ullrich released Lean 4, and gave several talks demonstrating what it did. A lot of people were excited about this, but for me Lean 4 is currently unusable (there is no maths in it). I’m not going back to proving (x+y)^3=x^3+3x^2y+3xy^2+y^3 directly from the axioms of a ring, and unfortunately I am not capable of doing any of the things which are required in order to make Lean 4 have a ring tactic. However, there were a bunch of people at the conference who clearly are capable of doing these things, including Daniel Selsam, who is thinking about how one can import compiled Lean 3 files into Lean 4 (Lean 3 and Lean 4 share the same type theory). Once we have this, I will be extremely interested. And hopefully it won’t be too long.


I counted no fewer than four occurrences of the phrase “it’s fun!” in various slides in the talks, and there could have been more. This was within the context of people asking “why formalise X?”. Typically it would be one of several answers. I did not speak at the conference, but had I done so I suspect there would have been a fifth occurrence. Imagine people putting “it’s fun!” in the slides in a traditional number theory talk! Sure it’s interesting to do research into modern number theory — this was something I did with great verve and enthusiasm for nearly 25 years, and I suspect I probably was having fun some of the time. But “fun” is not a word I ever remember running into in a presentation. Why are people highlighting it here? Perhaps we are more conscious of the fact. Is it guilt that it’s supposed to be work but actually we’re really enjoying it? Sometimes I wake up in the middle of the night and want to get out of bed and formalise. Maybe “it’s addictive” is closer to the truth?

Whatever the real reason is behind it, Lean’s maths library continues to grow. In 2020 several more professional number theorists started formalising in Lean (Riccardo Brasca, Damiano Testa, Marc Masdeu, Filippo Nuccio) and stuff like the theory of cyclotomic polynomials has just appeared organically in mathlib as a result of this. Johan Commelin continues to work tirelessly on this stuff too; my PhD student Ashvni needed Bernoulli numbers for a project she’s working on, and it turned out that Johan had formalised them a few months earlier. As long as things like this keep happening, mathlib will more and more start to look like a respectable basis for a lot of MSc level mathematics. Perhaps my forthcoming Lean course for PhD students (starting in 10 days; more info to come) will get us a few more converts.

Thanks once again to Patrick and Rob for organising the conference.

My daughter says she hasn’t got any art ready, so here’s something one of my sons did a few years ago when he was learning Blender.

Posted in Uncategorized | Tagged | 12 Comments

The end of the summer.

So, what’s been happening? Well, the Lean 4 developers told us in mid-June that Lean 4 would be released at “the end of the summer“. And on Monday Lean 4 got released! The corollary is that it’s the end of the summer now, so it must finally be time to talk about the summer projects, and some of the things which happened since then.

Summer projects

In stark contrast to the 2018 Lean summer projects, the 2020 projects all happened online, on Discord. I had far less control of what was going on, and (something which I rather liked) I ended up working with a bunch of people, some of whom I didn’t know the real name, race or gender of. I say I ended up working with them — what happened in practice was that they worked with each other. A community was born, people did Lean stuff. People did other stuff too. The community still exists. I didn’t really expect that. Undergraduate mathematicians welcome. So, what got done?

Harun Khan proved that 144 was the largest square in the Fibonacci sequence in Lean, following an ingenious argument of Cohn from the ’60s. As far as I know this result has never been formalised.

Xiang Li formalised a crazy construction of the reals directly from the integers, called the Eudoxus reals. The idea is that the reals are equivalence classes of almost-linear maps from the integers to the integers (a function f: \mathbb{Z}\to\mathbb{Z} is almost linear if f(a+b)-f(a)-f(b) is bounded). Once the mathlib PR characterising the reals as the unique complete archimedean field hits, he’ll be able to get an isomorphism from his reals to mathlib’s reals for free.

Elizabeth Sun figured out which natural numbers were the sum of two squares. It’s great to see some basic number theory being done in Lean. I am a number theorist but for some reason a lot of my Lean work is in commutative algebra rather than the elementary stuff. Thank you Elizabeth and Harun for flying the number theory flag πŸ™‚

James Arthur proved some theorems about some kind of twisted variants of sine and cosine. James is an undergraduate at Exeter and is currently in charge of marketing down there πŸ˜‰ Thanks James.

People learnt about foundations. James Palmer, an undergraduate at Warwick, ended up writing an essay on it. Foundations are not something I know much about, but James is a maths/philosophy student so had a particular interest in it.

People made games. Kendall Frey made a Rubiks cube (although he’s not an undergraduate); Markus Himmel made formally verified Sudoku, and more recently Angela Li has made a tower of Hanoi. All these things are possible because of Ed Ayers’ widgets, part of Ed’s Cambridge university PhD thesis. Angela is now working on a 15 puzzle and thinking about “loopy” and other games from Simon Tatham’s puzzle game collection.

The summer has dragged on. People keep making stuff. Alena Gusakov and Bhavik Mehta worked hard on graph theory over the summer, initially formalising some of the problem sheets from the graph theory course Alena had attended, and now this work has culminated in a formal proof of Hall’s Marriage Theorem that Alena has put together with Bhavik Mehta and Kyle Miller! Alena gave a talk on that at Lean Together 2021 — more of that in the next post.

MSc projects

There was some sort of blurring between work done over the summer and work which is now becoming MSc projects. Chris Hughes made a group theory tactic! Given a collection of hypotheses, the tactic will try and prove another one. For example if a,b are elements of a group, the tactic will prove abab^2=1\implies ab=ba (try it online, or on paper if you’re still living in the past). What’s even better, after some discussions with Kyle Miller he made another tactic, implementing a different algorithm to solve the same problem. This work started over the summer but has now turned into his MSci project (he’s an MSc student at Imperial). Similarly Kenny Lau has defined the tilt of a perfectoid field, and Amelia Livingston has got a working API for tensor and exterior algebras and is on the way to Koszul complexes. All of these will become Imperial MSci projects.

All this has been a really positive change to my work life. For 20 years I’ve supervised MSc/MSci students in the maths department and many of the projects have been of the form “here’s a Springer GTM; now rephrase the relevant chapters”. I have grown tired of such projects. My colleague Toby Gee has started to give students research papers to read instead, which is brave, but still ultimately the results are a student giving an exposition of something which is there already. I now have students who are not only learning the material but also formalising it in Lean, and some of this stuff ultimately makes it into mathlib. My previous students have formalised schemes, group cohomology, and transcendence of e. The schemes work has quite a history, and several people were involved. Our account of what happened is now on ArXiv. It’s a joy to be writing papers with undergraduates and MSc students, and something I’m rather proud of. Before I moved into this area, such an idea was essentially inconceivable.

Undergraduate/PG teaching

I spent September trying to learn how to teach online. I made lectures for my undergraduate course, some of them cheekily using Lean. But Lean is not a compulsory part of my course; I do not think I really have the resources to teach almost 300 students how to use some complicated software; I have to rely on the fact that students will get interested and become engaged. In the introductory course, I teach the students about sets, functions and equivalence relations, and then Marie-AmΓ©lie Lawn teaches them about how to build the naturals and the reals from scratch in an axiomatic way. A lot of this material is very Lean-friendly. Welcome to the new Imperial members of the Xena community — Aksel, Jack, Deepro, Deniz, Jia, Archie, and all the other new 1st years who came along and whose names I’ve now forgotten. These kids show up on Thursday evenings on the Discord and we work through undergraduate problem sheets together. Is this good for them? Is it teaching them mathematics in a new way? Is it making them learn better, or understand the material better? I don’t know. Maybe! Is it fun? Yes. Does it change the way these undergraduates think about mathematics? Yes. I am convinced it makes them think more clearly. Students who engage with Lean seem to be more careful about their logic, and more careful about pointing out special cases. This can’t be a bad thing. But is this because they’re using Lean, or are they drawn to Lean because they are already the kind of people who think like that? I don’t know.

Should I make Lean a compulsory part of my 1st year undergraduate course? I am not convinced. People did not come to Imperial to do mathematics in a computer proof system. Some undergraduates have no interest in using computers at all. Athina Thoma, an education expert, told me that perhaps it is difficult for a student to learn new mathematics and to learn Lean at the same time. I think this is a very pertinent comment. So why do I even put time into this optional component of my course? Because I think it is time that this area begins to grow, possibly in ways I’ve not thought of yet, and the best way to make it grow is to make sure that a bunch of smart young people know about it. This is the motivation behind the Xena project. Once mathematicians can use this software, they’ll figure out interesting things to do with it. The killer apps will come.

As a reaction to Athina’s comment, I thought it might be interesting to teach Lean to people who did know the mathematics already. And so this term I am teaching a graduate course which will deal with undergraduate mathematics! As part of the Imperial/Oxford/Bath/Bristol/Warwick EPSRC Taught Course Centre, I am teaching a course on formalising mathematics this coming term! The course will comprise 8 two-hour workshops, where I get PhD students to think about stuff they understand well and to see if they understand it well enough to convince Lean that they do. Introductory lectures will be on things like equivalence relations and basic group theory; we will later move on to harder stuff (possibly guided by interests of the audience). Current ideas for what we will do in the last couple of workshops: some algebraic geometry, condensed sets, commutative algebra? We’ll see.

Oh — talking of EPSRC — they gave me a New Horizons grant! Job ad for digitising the Langlands program is out next week πŸ˜€

Posted in Imperial, Learning Lean, M40001, undergrad maths | Tagged | Leave a comment

Liquid tensor experiment

This is a guest post, written by Peter Scholze, explaining a liquid real vector space mathematical formalisation challenge. For a pdf version of the challenge, see here. For comments about formalisation, see section 6. Now over to Peter.

1. The challenge

I want to propose a challenge: Formalize the proof of the following theorem.

Theorem 1.1 (Clausen-S.) Let 0<p'<p\leq 1 be real numbers, let S be a profinite set, and let V be a p-Banach space. Let \mathcal M_{p'}(S) be the space of p'-measures on S. Then

\mathrm{Ext}^i_{\mathrm{Cond}(\mathrm{Ab})}(\mathcal M_{p'}(S),V)=0

for i\geq 1.

(This is a special case of Theorem 9.1 in, and is the essence of the proof of Theorem 6.5 there.)

Below, I will explain what all the terms in the theorem are, and why I care. I apologize in advance that the background story is a little longer, but I think it’s a fascinating story and I tried to be as brief as possible.

2. Getting condensed

The first thing to explain is the category of condensed abelian groups \mathrm{Cond}(\mathrm{Ab}), in which this computation takes place. This is a variant of the category of topological abelian groups, but with much better properties. It is part of the “condensed mathematics” that Dustin Clausen and myself started to develop two years ago. I gave two courses in Bonn about it: and, and recently we gave a Masterclass in Copenhagen about it,

Condensed mathematics claims that topological spaces are the wrong definition, and that one should replace them with the slightly different notion of condensed sets. Before giving their definition, let me state a few properties:

— in practice, most topological spaces of interest are “compactly generated weak Hausdorff”. In fact, this is the class of topological spaces customarily used within algebraic topology; also, all usual topological vector spaces (e.g., all metrizable ones, or just first-countable ones) have this property. Now compactly generated weak Hausdorff spaces also embed fully faithfully into condensed sets, so in this large class of examples, the transition is completely inconsequential.

— condensed sets have very nice categorical properties, in particular all limits and colimits (like topological spaces). But they also have internal mapping objects: if X, Y are condensed sets, there is a condensed set \mathrm{Hom}(X,Y) such that for any other condensed set T, maps T\to \mathrm{Hom}(X,Y) are functorially identified with maps T \times X \to Y. In fact, the same is true relatively, for maps over some base condensed set S. In category-speak, “condensed sets are locally cartesian closed”.

— even better, up to mild set-theoretic problems, condensed sets form a topos. (More precisely, they satisfy Giraud’s axioms except for the existence of a set of generators. One can build a variant of the category of condensed sets, known as the category of pyknotic sets of Barwick–Haine, that is a topos, essentially by restricting to the subcategory generated by some set of generators.)

Let me now explain what condensed sets are. Clausen and I came to it from different trajectories (and Barwick and Haine yet from a different one); there are some precursors in the literature, notably Spanier’s quasi-topological spaces (in algebraic topology) and Waelbroeck’s compactological spaces (in functional analysis). For me, it started (unsurprisingly…) in relation to my work on perfectoid geometry, but feel free to skip this paragraph. The starting point of perfectoid geometry is the idea to study something like a p-adic annulus \{T\in \mathbb C_p, |T|=1\} by extracting all p-power roots of T, leading to an infinite tower of covers, and then pass to the limit object, which is a perfectoid space. One wants to do this as, surprisingly, perfectoid spaces are in some ways nicer than the usual annulus; in particular, there is the tilting procedure relating them to characteristic p (and again, surprisingly, characteristic p is in some ways easier than characteristic 0). Technically, one defines the pro-Γ©tale site of the annulus (or any rigid-analytic variety), and shows that locally in the pro-Γ©tale site, the space is perfectoid. But let us forget all about perfectoid geometry, and let us simply look at what the analogue of this story would be for a point: In the fibre over a point, say T=1, in each finite stage one covers it by finitely many points, and in the limit by a limit of finite sets, i.e. a profinite set. (A profinite set is by definition an object of the category Pro-(finite sets). Their category is equivalent to the category of totally disconnected compact Hausdorff spaces, and I will often tacitly identify the two concepts.)

This led Bhatt and myself to introduce the pro-Γ©tale site of a point (or a general scheme). This is the category of profinite sets S, where covers are given by finite collections of jointly surjective maps.

Definition 2.1. Condensed sets are sheaves on the pro-Γ©tale site of a point.

This makes it sound like condensed sets are a topos, but above I mentioned set-theoretic problems. Indeed, the category of profinite sets is not small, so one has to resolve this in some way. Let me gloss over this point here; it is not essential for any of the following discussion.

This definition (without the name) is already in my 2013 paper with Bhatt. But I never tried to thoroughly understand this notion. I knew that there is a functor from topological spaces to condensed sets (up to set-theoretic problems): Given a topological space T, one can send any profinite set S to the continuous maps from S to T. What I did not understand is that this functor is fully faithful on a very large class of topological spaces, and that condensed sets are actually an improvement over topological spaces. This is what Clausen quickly convinced me of when he arrived in Bonn in 2018 as a PostDoc.

Before going on, let me describe what a condensed set X “is”: For each profinite set S, it gives a set X(S), which should be thought of as the “maps from S to X”, so it is measuring how profinite sets map into X. The sheaf axiom guarantees some coherence among these values. Taking S=\ast a point, there is an “underlying set” X(\ast). Beware however that there are condensed sets X with X(\ast)=\ast a point, but with X(S) big for general S. We will see an example below. It is important to allow such examples in order to have a nice general theory. In practice, they do not produce much trouble either. It’s a free world.

Example 2.2. Let T be a compact Hausdorff space. Then a classical and somewhat weird fact is that T admits a surjection S\to T from a profinite set S. One construction is to let S be the Stone-Čech compactification of T^\delta, where T^\delta is T considered as a discrete set. This lets one recover T as the quotient of S by the equivalence relation R = S\times_T S\subset S\times S. Thus, compact Hausdorff spaces can be thought of as quotients of profinite sets by profinite equivalence relations. Of course, this seems like a horrible way to look at compact Hausdorff spaces, but this is what happens in the condensed perspective, which only records maps from profinite sets. Part of the reason for the challenge is the question: Can this possibly be a good perspective?

Example 2.3. Let T=[0,1] be the interval. In high-school, we learn to think of real numbers in terms of decimal expansions. I claim that this produces a surjection from a profinite set onto T! Why? Any point of [0,1] can be written as 0.a_1a_2a_3\ldots with coefficients a_i\in \{0,1,\ldots,9\}, however one needs to make some identifications, like 0.099999\ldots = 0.100000\ldots. Now decimal expansions per se naturally form a profinite set, namely \prod_{n\geq 1} \{0,1,\ldots,9\}. It is only when doing these extra identifications that one produces the continuum [0,1]! So secretly, in high school we learn to think of the interval as a quotient of a profinite set.

3. Condensed abelian groups

Just like topological abelian groups, one can consider condensed abelian groups, that is abelian group objects in the category of condensed sets. Equivalently, these are sheaves of abelian groups on the pro-Γ©tale site of a point. In fact, one nice feature of the condensed formalism is that for any notion of mathematical objects, one can consider their condensed version: Condensed groups, condensed rings, condensed categories, …, just by talking about sheaves of such on the pro-Γ©tale site of a point, making it possible to “put a topology on top of something” without trouble. (This is the spell of “working internally in a topos”…) However, unlike topological abelian groups, condensed abelian groups have excellent formal properties:

Proposition 3.1. The category \mathrm{Cond}(\mathrm{Ab}) of condensed abelian groups is an abelian category with all limits and colimits, and a class of compact projective generators. In particular, it satisfies Grothendieck’s axioms (AB3–6) and (AB*3–4) (i.e., the same that the category of abelian groups satisfies), in particular filtered colimits and infinite products are exact.

This makes \mathrm{Cond}(\mathrm{Ab}) exceptionally well-behaved, even more well-behaved than the category of abelian sheaves on a usual topos: Usually, infinite products fail to be exact.

Example 3.2. In topological abelian groups, a standard problem is that one can have continuous maps f: M \to N of topological abelian groups that are isomorphisms of underlying abelian groups, but the map is not a homeomorphism. For example, let N=\mathbb R be the real numbers and M=\mathbb R^\delta be the real numbers equipped with the discrete topology. Then the kernel and cokernel of f are trivial, while f is not an isomorphism. This is corrected in condensed abelian groups as follows: There is an exact sequence

0\to \mathbb R^\delta \to \mathbb R \to Q \to 0

where Q is a condensed abelian group with underlying abelian group Q(\ast)=0, but for a general profinite set S, one has Q(S) = \mathrm{Cont}(S,\mathbb R)/\mathrm{LocConst}(S,\mathbb R), which is in general not zero.

Another nice feature of the condensed world is that it’s a free world. Namely, for any condensed set X, there is a free condensed abelian group \mathbb Z[X] on X (given by the sheafification of the presheaf taking S to \mathbb Z[X(S)]). If X is a compact Hausdorff space, one can describe \mathbb Z[X] explicitly: It is a union of compact Hausdorff spaces \mathbb Z[X]_{\leq n}\subset \mathbb Z[X], with underlying set \{\sum_{s\in S} n_s[s]\mid \sum_{s\in S} |n_s|\leq n\}. This defines in fact also a (compactly generated weak Hausdorff) topological abelian group; in that context, this construction goes back to old work of Markov.

What are compact projective generators of \mathrm{Cond}(\mathrm{Ab})? One can take \mathbb Z[S] for certain special profinite sets S, namely so-called “extremally disconnected” profinite S. This includes the Stone-Čech compactifications of discrete sets, and all other examples are retracts. In fact, extremally disconnected profinite sets are exactly the projective objects of the category of compact Hausdorff spaces, by an old theorem of Gleason. Extremally disconnected profinite sets are pretty rare: The smallest infinite one has cardinality 2^{2^{\aleph_0}}. All convergent sequences in extremally disconnected sets are eventually constant. So all your favourite examples of profinite sets (Cantor sets, \mathbb Z_p, etc.) are not extremally disconnected. In fact, the existence of extremally disconnected ones is tied extremely closely to the axiom of choice (in the form of the existence of non-principal ultrafilters): The basic examples are Stone-Čech compactifications of discrete sets, which are discrete sets together with some ghostly dust: You will never be able to name any point in the boundary!

The proposition ensures that one can do homological algebra in \mathrm{Cond}(\mathrm{Ab}), but it means that for a computation, one has to resolve by free condensed abelian groups on extremally disconnected sets. These are some “clouds of dust”, and projective resolutions in \mathrm{Cond}(\mathrm{Ab}) amount to resolving nice objects by “clouds of dust”. It is basically impossible to do this completely explicitly, but one can sometimes be reasonably explicit. Clausen had developed some techniques to carry out nontrivial computations in \mathrm{Cond}(\mathrm{Ab}). In particular:

Theorem 3.3. Let A, B be locally compact abelian groups, considered as objects in \mathrm{Cond}(\mathrm{Ab}). Then \mathrm{Ext}^i_{\mathrm{Cond}(\mathrm{Ab})}(A,B)=0 for i\geq 2, and \mathrm{Ext}^1_{\mathrm{Cond}(\mathrm{Ab})}(A,B) agrees with the Yoneda-Ext in the category of locally compact abelian groups, while \mathrm{Hom}_{\mathrm{Cond}(\mathrm{Ab})}(A,B) are the usual continuous maps.

Thus, starting from locally compact abelian groups, there are no “strange” condensed abelian groups arising as extensions (or “higher” extensions): Even if \mathrm{Cond}(\mathrm{Ab}) chops things up into clouds of dust, the only ways to build the dust back together are the ways you would have thought about. The proof of the theorem is quite nontrivial; it makes use of the Breen–Deligne resolution of abelian groups. However, given the right approach the proof is quite tidy.

4. Completeness

These results show that \mathrm{Cond}(\mathrm{Ab}) behaves quite well, and if one starts with reasonable examples, then their \mathrm{Hom} and \mathrm{Ext}-groups generally stay well-behaved. However, this is not true for tensor products. Namely, if one takes say a tensor product of \mathbb R and \mathbb Z_p, one gets a condensed abelian group \mathbb R\otimes \mathbb Z_p whose underlying abelian group is the usual algebraic tensor product of \mathbb R and \mathbb Z_p, which is a bit of a mess as the real and p-adic topologies are incompatible. One would like to have a notion of “complete” condensed abelian groups, and a resulting notion of completed tensor product, such that the completed tensor product vanishes. Similarly, the completed tensor product of \mathbb Z_p with itself should just be \mathbb Z_p.

After toying around, we found a notion of completeness that works perfectly away from the real numbers.

Definition 4.1. A condensed abelian group M is solid if for any profinite set S, any map f: S\to M extends uniquely to a map \mathbb Z[S]^\blacksquare \to M, where \mathbb Z[S]^\blacksquare = \varprojlim_i \mathbb Z[S_i] when S=\varprojlim_i S_i is written as a limit of finite sets.

One can regard \mathbb Z[S]^\blacksquare also as \mathrm{Hom}(\mathrm{Cont}(S,\mathbb Z),\mathbb Z), i.e. as the \mathbb Z-valued measures on S. In this sense, this says that any map f: S\to M gives a unique map \mathbb Z[S]^\blacksquare\to M sending a measure \mu to \int f\mu. Note that there is a map S\to \mathbb Z[S]^\blacksquare given by “Dirac measures”.

Theorem 4.2. The category of solid abelian groups is stable under all limits and colimits and extensions, and is an abelian subcategory of condensed abelian groups. The inclusion into condensed abelian groups admits a left adjoint M\mapsto M^\blacksquare “solidification”, which is the unique colimit-preserving extension of \mathbb Z[S]\mapsto \mathbb Z[S]^\blacksquare. There is a unique tensor product on solid abelian groups making M\mapsto M^\blacksquare symmetric monoidal, i.e. compatible with the tensor product. The category of solid abelian groups has compact projective generators, given by infinite products of copies of \mathbb Z.

Example 4.3. The discrete group \mathbb Z is solid, and then all objects built via limits and colimits (and internal Hom’s) from \mathbb Z are still solid. This includes \mathbb Z_p, \mathbb Q_p, \prod_I \mathbb Q_p, etc., in fact everything that comes up in usual algebra. The only way to leave the category is to take tensor products; these have to be resolidified. But then \mathbb Z_p \otimes^\blacksquare \mathbb Z_p = \mathbb Z_p, \mathbb Z_p \otimes^\blacksquare \mathbb Z_\ell = 0 for p\neq \ell, etc. In fact, considering solid \mathbb Q_p-vector spaces, one gets a perfect framework for doing p-adic functional analysis. The solid tensor product recovers the usual completed tensor product of Banach spaces, and even of FrΓ©chet spaces.

5. Road to reality

Let us recall where we stand. The condensed formalism was largely motivated by questions in p-adic geometry, is based on profinite sets, and it can beautifully handle p-adic functional analysis. All of these are very much nonarchimedean. On the other hand, we would like to claim that condensed sets are better than topological spaces, even over \mathbb R. So what about the real numbers?

The bad news is that the real numbers are definitely not solid; their solidification is equal to 0. The issue is that \mathbb Z[S]^\blacksquare was the space of \mathbb Z-valued measures on S, without any archimedean bounds.

Working over \mathbb R, one is thus led to consider for profinite sets S=\varprojlim_i S_i the space \mathcal M(S) of (signed Radon) measures on S; as a condensed abelian group,

\mathcal M(S) = \mathrm{Hom}_{\mathbb R}(\mathrm{Cont}(S,\mathbb R),\mathbb R).

This is a union of compact Hausdorff spaces \mathcal M(S)_{\leq c} (equipped with the weak-\ast-topology). One can write \mathcal M(S)_{\leq c} = \varprojlim_i \mathbb R[S_i]_{\ell^1\leq c}, where \mathbb R[S_i] is the free vector space on the finite set S_i, and \ell^1\leq c refers to the part of \ell^1-norm at most c. Originally, we were hoping that the following definition was reasonable.

Definition 5.1. A condensed \mathbb R-vector space V is \mathcal M-complete if for any profinite set S, any map f: S\to V extends uniquely to a map \mathcal M(S)\to V.

Any complete locally convex vector space gives rise to an \mathcal M-complete V. Conversely, the condition of \mathcal M-completeness is closely related to local convexity. One can translate much of functional analysis to this setting. In particular, (up to also asking for quasiseparatedness), one can define a completed tensor product in this setting, and in some sense it recovers both the injective and the projective tensor product of Banach spaces. Unfortunately, it turns out that this category is not abelian, not stable under extensions, etc. The problem is the existence of the Ribe extension, that is an extension of Banach spaces that is not itself locally convex; it arises from the entropy functional. This forces us to include non-locally convex spaces in the formalism. Traditionally, this leads to the notion of p-Banach spaces for 0<p\leq 1, which includes \ell^p-spaces: a p-Banach space is a topological \mathbb R-vector space V that is complete for a p-norm ||\cdot||: V\to \mathbb R_{\geq 0}. The only axiom that is changed is the scaling axiom: ||av||= |a|^p ||v|| for a\in \mathbb R, v\in V. In particular, ||\cdot|| still satisfies the usual triangle inequality. [There is some unfortunate clash of notation here between p always denoting a prime number for me, and it being the standard letter to denote \ell^p-norms. Denoting primes by \ell doesn’t even help much!]

There is a theorem of Kalton that any extension of p-Banach spaces is p'-Banach for all p'<p. This leads to the following definition.

Definition 5.2. For a profinite set S=\varprojlim_i S_i, let the space of p-measures be \mathcal M_p(S) = \bigcup_{c>0} \mathcal M_p(S)_{\leq c}, where

\mathcal M_p(S)_{\leq c} = \varprojlim_i \mathbb R[S_i]_{\ell^p\leq c}.

Definition 5.3. A condensed \mathbb R-vector space V is p-liquid if for any profinite set S and any p'<p, any map f: S\to V extends uniquely to a map \mathcal M_{p'}(S)\to V.

After a lot of struggles, we believe we were able to prove the following analogue of our theorem on solid abelian groups.

Theorem 5.4. The class of p-liquid \mathbb R-vector spaces is stable under all limits and colimits and extensions (even in condensed abelian groups), and forms an abelian subcategory of condensed abelian groups. The inclusion into all condensed abelian groups admits a left adjoint “p-liquidification”, which is the unique colimit-preserving extension of \mathbb Z[S]\mapsto \mathcal M_{<p}(S) = \varinjlim_{p'<p} \mathcal M_{p'}(S). There is unique tensor product of p-liquid \mathbb R-vector spaces making p-liquidification symmetric monoidal. The category of p-liquid \mathbb R-vector spaces has compact projective generators, given by \mathcal M_{<p}(S) for S extremally disconnected.

On nuclear spaces, the p-liquid tensor product agrees with the usual completed tensor product, for any choice of 0<p\leq 1. (Arguably, this is the best one can hope for, as for say Banach spaces, there is not one but several natural choices for a tensor product as defined by Grothendieck. The p-liquid one is still different, and does not usually produce a Banach space, but in the nuclear case these subtleties disappear.)

This is the theorem that I would like to see formalized. As stated in the beginning, it comes down (by a not completely obvious, but very safe reduction) to Theorem 1.1, the vanishing of \mathrm{Ext}-groups between \mathcal M_{p'}(S) and p-Banach spaces.

Remark 5.5. The class of p-liquid \mathbb R-vector spaces depends on the choice of 0<p\leq 1; for p=1, the condition is the strongest (closest to local convexity) and gets weaker as p approaches 0. For applications, usually any choice of p is fine. It is quite mysterious that there is not only one theory over \mathbb R now, but a whole family of them! In fact, it turns out that there are similar theories also over \mathbb Q_\ell. In that case, they exist for all 0<p<\infty, and in some sense the limiting theory for p=\infty is the theory of solid \mathbb Q_\ell-vector spaces. One can thus think that for increasing p, the objects get more and more “viscous” until they become solid for p=\infty. (On the other hand, the limit for p=0 is in some sense the class of all condensed \mathbb Q_\ell-vector spaces, which one might refer to as “gaseous”.) Over \mathbb R, one needs to take p\leq 1 as otherwise the transition maps in the limit \varprojlim_i \mathbb R[S_i]_{\ell^p\leq c} are not well-defined, as they can increase the \ell^p-norm.

6. Sympathy for the devil

Why do I want a formalization?

— I want to make the strong claim that in the foundations of mathematics, one should replace topological spaces with condensed sets (except when they are meant to be topoi — topoi form a separate variant of topological spaces that is useful, and somewhat incomparable to condensed sets). This claim is only tenable if condensed sets can also serve their purpose within real functional analysis.

— with this theorem, the hope that the condensed formalism can be fruitfully applied to real functional analysis stands or falls. I think the theorem is of utmost foundational importance, so being 99.9% sure is not enough.

— if it stands, the theorem gives a powerful framework for real functional analysis, making it into an essentially algebraic theory. For example, in the Masterclass, Clausen sketched how to prove basic results on compact Riemann surfaces or general compact complex manifolds (finiteness of cohomology, Serre duality), and one can black box all the functional analysis into this theorem. Generally, whenever one is trying to mix real functional analysis with the formalism of derived categories, this would be a powerful black box. As it will be used as a black box, a mistake in this proof could remain uncaught.

— I spent much of 2019 obsessed with the proof of this theorem, almost getting crazy over it. In the end, we were able to get an argument pinned down on paper, but I think nobody else has dared to look at the details of this, and so I still have some small lingering doubts.

— as I explain below, the proof of the theorem has some very unexpected features. In particular, it is very much of arithmetic nature. It is the kind of argument that needs to be closely inspected.

— while I was very happy to see many study groups on condensed mathematics throughout the world, to my knowledge all of them have stopped short of this proof. (Yes, this proof is not much fun…)

— I have occasionally been able to be very persuasive even with wrong arguments. (Fun fact: In the selection exams for the international math olympiad, twice I got full points for a wrong solution. Later, I once had a full proof of the weight-monodromy conjecture that passed the judgment of some top mathematicians, but then it turned out to contain a fatal mistake.)

— the Lean community has already showed some interest in formalizing parts of condensed mathematics, so the theorem seems like a good goalpost.

— from what I hear, it sounds like the goal is not completely out of reach. (Besides some general topos theory and homological algebra (and, for one point, a bit of stable homotopy theory(!)), the argument mostly uses undergraduate mathematics.) If achieved, it would be a strong signal that a computer verification of current research in very abstract mathematics has become possible. I’ll certainly be excited to watch any progress.

— I think this may be my most important theorem to date. (It does not really have any applications so far, but I’m sure this will change.) Better be sure it’s correct…

7. Arithmetic reality

Finally, let me say a few words about the proof of Theorem 1.1; I tried to give a summary of the idea in Lectures 17 — 18 of the Copenhagen Masterclass, you can find them on YouTube (here and here). In brief, one is battling with two enemies:

a) The real numbers are not locally profinite. So in some philosophical sense there’s a mismatch between the objects \mathcal M_{p'}(S), V (real vector spaces) and the category \mathrm{Cond}(\mathrm{Ab}) (built from profinite sets). Our task is to show that the two can still be friends, even though \mathrm{Cond}(\mathrm{Ab}) wants to chop up all real vector spaces into locally profinite clouds of dust.

b) Putting bounds on the real numbers leads to subsets that are not stable under addition anymore. That’s hardly news to a normal human being — of course 1+1 is larger than 1 — but it is very bad news for a person that has spent all of their life in the p-adic world (where 1+1 is not larger than 1). When combining bounds with homological algebra, it means that within all our complexes we have to carefully keep track of norms. This quickly gets nasty. Did you ever try to chase norms in the snake lemma? In a spectral sequence?

Let me first say a few words about how we battled a), which I believe is a most profound point. In some sense we come back to decimal expansions of the real numbers. More precisely, let 0<r<1 and consider the ring \mathbb Z((T))_{>r} of those Laurent series \sum_{n\gg -\infty} a_n T^n with integer coefficients a_n\in \mathbb Z that converge on a complex punctured disc \{0<|T|< r'\} for some r' > r. Assuming r\geq \tfrac 1{10}, we get a surjection

\mathbb Z((T))_{>r}\to \mathbb R: \sum a_n T^n\mapsto \sum \frac{a_n}{10^n}

and in fact \mathbb R\cong \mathbb Z((T))_{>r}/(10T-1). (Needless to say, the choice of \tfrac 1{10} here is completely arbitrary; you can also use \tfrac 1{\pi}. Except that it’s less obvious that the kernel is a principal ideal. A theorem of Harbater states that \mathbb Z((T))_{>r} is a principal ideal domain!) Now \mathbb Z((T))_{>r} is naturally a countable union of profinite subsets — essentially, putting bounds on each a_n is cutting them down to finite sets. This means that problem a) disappears if we work with \mathbb Z((T))_{>r}. The story of liquid modules also works over \mathbb Z((T))_{>r}, and the whole proof happens here. The variant of Theorem 1.1 for \mathbb Z((T))_{>r} is Theorem 9.1 in, and its proof is given in Lecture 9.

Let me not say much about the proof in Lecture 9 — you can look for yourself — but to note that as indicated in b), it is a very delicate argument fighting with estimates against homological algebra. There is a hierarchy of implicit constants that have to be chosen, and you have to do it in exactly the right order. In the end, one formulates Theorem 9.5 which can be proved by induction; it is a statement of the form \forall \exists \forall \exists \forall \exists, and there’s no messing around with the order of the quantifiers. It may well be the most logically involved statement I have ever proved. (On the other hand, if I want to claim that the theorem on liquid vector spaces makes it possible to black box functional analysis, hard estimates have to be somewhere. Better be sure the estimates actually work…!)

Note that I’m saying that to prove a theorem about real vector spaces (in fact, to set up real functional analysis) we have to work with the arithmetic ring \mathbb Z((T))_{>r}! As a number theorist, this seems like a fair state of affairs to me, but I wonder whether I let my prejudices guide me. So here is a challenge for human mathematicians:

Question 7.1. Can one prove Theorem 1.1 directly over the reals?

Remark 7.2. In Theorem 1.1 the \mathrm{Ext}-groups are computed in \mathrm{Cond}(\mathrm{Ab}). One could also compute them in condensed \mathbb R-vector spaces. The result of Theorem 1.1 is slightly stronger (it implies the vanishing of \mathrm{Ext}-groups in condensed \mathbb R-vector spaces via a mostly formal argument), and implies that p-liquid \mathbb R-vector spaces embed fully faithfully into condensed abelian groups, i.e. the \mathbb R-linear structure is necessarily unique. One could also consider the variant of Theorem 1.1 taking \mathrm{Ext}-groups in condensed \mathbb R-vector spaces; I’d already be happy to see a direct proof of this over the reals. However, I actually believe that this is not any easier.

Finally, here is a somewhat vague question.

Question 7.3. Is the proof of Theorem 1.1 in any way related to some known arguments in functional analysis?

For example, is it related to “multi-scale” arguments? The different scales here being the powers 1,T,T^2,\ldots of the fixed element T (taken as \tfrac 1{10} above).

Peter Scholze, 5th December 2020

Posted in number theory | Tagged , | 16 Comments

Thoughts on the Pythagorean theorem

I’m sure I’m saying nothing new here. I’m just explaining another example of how thinking about how to formalise things has taught me stuff about what mathematics is.

What is the Pythagorean theorem?

The Pythagorean theorem, a.k.a. Pythagoras’ theorem, comes in two parts. Firstly there is the theorem statement, which says that in a right angled triangle (like the dark blue one below), the square of the hypotenuse equals the sum of the squares of the other two sides. And then there is the proof, which originally is either due to Pythagoras or not depending on who you believe. Let’s start with the statement.

What does the statement actually mean?

Let’s take a look at the picture of the squares on the hypotenuse and the other two sides.

Some squares

The dark blue triangle is right-angled. The claim is that the square C is equal to the sums of the squares A and B. On the face of it, this is nonsense. If you take squares A and B, you have a picture containing two squares; but square C is just one square. How can one square equal two squares? But of course the claim is not that the pictures are equal, the claim is that the areas are equal.

But what is area? To find out, let’s go back to Euclid.

Euclid’s take on the theorem

Euclid’s Elements contains a proof of the Pythagorean theorem, right at the end of book 1. The proof involves drawing some triangles and arguing that various things are “equal”. This approach is valid because Euclid has explicitly stated as his Common Notion 1 that equality, whatever it is, is transitive.

One can chase this concept of equality back to Proposition 35, which claims that two parallelograms with the same base and the same height are “equal”. In fact this seems to be the first time that the word “equal” is used to mean “having equal area” in the Elements. Halving the parallelograms we deduce the more familiar Proposition 37, that two triangles with the same base and the same height are also “equal”. So what goes into the proof of Proposition 35, that two parallelograms with the same base and height are “equal” in the sense of having equal area?

The key ideas in the proof are Euclid’s second and third common notions: that “equals added to equals are equal”, and “equals subtracted from equals are equal”. In high-level terms, these common notions imply that equality is not just an equivalence relation, but a congruence relation. But let’s see how Euclid uses these notions in his proofs.

Equals added to equals are equal.

The two orange regions have equal areas, because they are both “equals added to equals”: the small triangles and the big triangles are both congruent.

Equals subtracted from equals are equal

Here, the two larger triangles are congruent, so the two orange areas are equal, because they are equals (the dark blue triangle) subtracted from equals (the larger triangles). For Euclid, the equality of the areas of the two orange regions in these examples is axiomatic. Take a look at the proof of Proposition 35 to see how these facts are used to prove that two parallelograms with the same base and height are “equal”.

Area in Euclid book 1

So, what Euclid does mean by the “area” of a shape? Well this is the funny thing — he never says, throughout book 1! He only says what it means for two shapes to have “equal area”!

This is exactly what an equivalence relation is. An equivalence relation on a type is a concept of equality on terms of that type. It can be thought of as focussing on a particular attribute of the terms you are considering (for example the area of a shape, or the value of an integer modulo 10) and saying that two terms are equivalent if the values of those attributes are equal. Euclid is putting an equivalence relation on shapes. His definition of the relation involves cutting and pasting in geometry, and at the end of the day the proof of the Pythagorean theorem in Euclid is essentially a jigsaw puzzle. Here is an even simpler jigsaw puzzle proof:

A proof of the Pythagorean theorem

Euclid and type theory

When Euclid did mathematics, he was doing type theory. For Euclid, points were terms of the Point type, lines were terms of the Line type, and planes were terms of the Plane type. Euclid wrote down the axioms his types satisfied (for example there was a unique line between any two distinct points) and proceeded to work from there. He has a definition of a 2-dimensional shape, and assuming that a plane exists, his shapes exist too. He defined an equivalence relation on 2D shapes, and proved that the 2D shape corresponding to the square on the hypotenuse was related to the 2D shape corresponding to the union of the squares on the other two sides, using properties of this relation which he has earlier axiomatised.

The proof of Pythagoras’ theorem in Euclid is what is known as a synthetic proof. We assume that a Euclidean Plane exists and satisfies a list of axioms, which Euclid attempted to write down and which most of us never even contemplate. We then formulate the theorem, and prove it using the axioms.

Numbers from geometry?

Note that Euclid is in some kind of a position to define real numbers at this point, or at least the positive real numbers. For example, Euclid knows what it means for two line segments to have equal length — it means that you can translate and rotate one line until it coincides with the other. He could hence define the positive reals to be equivalence classes of line segments, under the equivalence relation of being the same length. However one runs into problems when it comes to completeness, something Euclid’s axioms were not really designed for.

Geometry from numbers: Enter Descartes.

Descartes suggested doing things the other way around, using numbers to do geometry rather than using geometry to define numbers. Descartes observed that one could label a point in the plane with an x and y coordinate. This changed everything. All of a sudden “the plane” (a term whose existence is never talked about in Euclid) becomes modelled by \mathbb{R}^2. Euclid’s definitions, common notions, and axioms now need to be revisited. We need to check that this more refined model satisfies the rules of Euclid’s game (a bit like checking that Einstein’s theory turns into Newton’s in the limit).

We model a point as an ordered pair of real numbers, we can define lines as solutions to linear equations because the reals are a field so we have that language available. We can prove the parallel postulate no problem. The theory of integration gives us a way to measure lines (length), angles (measure), curves (length) and 2-dimensional shapes (area), using the natural (Euclidean) Riemannian metric on the plane. We can now completely rephrase Pythagoras’ theorem: it is now an equality of numbers. We can re-interpret the “jigsaw puzzle” proof in Euclid as a consequence of finite additivity of Lebesgue measure on the plane. We can also give a completely new proof, using the theorem that the distance from (a,b) to (c,d) is \sqrt{(a-c)^2+(b-d)^2}, as one can check using a line integral (modulo the theorem that the shortest distance between two points is a straight line, which needs proving in this context).

I saw measure theory developed as an undergraduate, and probably a few years ago I would have argued that this is now the “proper” proof — but now I realise that this proof still has some synthetic element to it: namely, the real numbers. We have a non-synthetic plane, but it is made from synthetic numbers.

What are the real numbers?

I was told as an undergraduate that it was an axiom that the reals existed, and that they were a complete ordered field. All of the analysis I learnt as an undergraduate was built upon this assumption, or rather, this structure (addition, multiplication, inequality) and these axioms (associativity and commutativity of addition and multiplication etc) on this type (the reals). In some sense it is no different to Euclid, who also had types (e.g. points), structures (e.g. triangles) and axioms (e.g. the common notions, or the parallel postulate), but who was modelling the Pythagorean theorem in a different, arguably more primitive, way.

Enter the analysts, bearing sequences

Descartes solved the problem of how to represent points in a plane with real numbers, but for Descartes, the reals were a type. Many years later, Cauchy and Dedekind gave two ways to represent the reals using simpler objects. Indeed, Cauchy sequences and Dedekind cuts are (different) ways of building the reals from the rationals. Similarly, the rationals can be built from the integers, the integers from the naturals, and the naturals are…well, they’re just some synthetic thing satisfying Peano’s axioms, right? At this point one could argue that Pythagoras’ theorem has become a statement about sequences of pairs of natural numbers (or however else one is modelling positive rationals), and the natural numbers have no definition — they are synthetic. But we can go further.

Enter the logicians, bearing sets.

One thing that ZFC set theory (the usual set-theoretic foundations of 20th century mathematics) has going for it, is that it gives a very unambiguous answer to the question: “Why are the natural numbers a set?”. The answer is “It’s one of the axioms”. One thing against it is that in ZFC set theory, everything is a set, even stuff which you don’t want to be a set. For example, real numbers (like the area of the square on the hypotenuse) are now sets, and the Pythagorean Theorem is now a theorem about the equality of two sets, although we don’t know exactly what the sets are, because we can never be sure whether the real numbers in the Platonic universe (or whichever universe we’re operating in) use Cauchy sequences or Dedekind cuts. [Pro tip: if your universe offers good quotienting facilities, use Cauchy sequences: they’re cleaner.]

The fact that we don’t know whether the reals being used in Pythagoras’ theorem are Cauchy sequences or Dedekind cuts is an indication that we have unfolded things too far, as far as Pythagoras’ theorem goes. Most mathematicians regard the real numbers as a type. A real number is not a set — Gauss or Riemann could certainly have informed you of that.

It is interesting that we can keep unfolding this way — but we can never get to the bottom of things. We can’t define everything, we always have to start somewhere — a synthetic beginning. Euclid started with points, lines and planes. Descartes started with the reals. Cauchy and Dedekind observed that you could start with the naturals. Set theorists start with sets. There is no definition of a set in ZFC set theory — a set is just a term in a model of ZFC set theory. The model can be thought of as the type, and its elements (or whatever you want to call them — they are not elements of a set in the internal logic of the model) as the terms. The behaviour of sets is developed using the axioms.

Pythagoras’ theorem : refinements of equality

So what is Pythagoras’ theorem? Is it that two shapes are “equal”, two numbers are equal, or two sets are equal? In some sense, it’s all of these things. In some sense this story reminds me of chemistry at school. The joke in the chemistry class was that they were always telling you lies — models of atoms which were very naive, and then more careful models which were more accurate, culminating (for me) with an undergraduate quantum mechanics course which told me that an electron was actually some kind of a complex-valued function. It feels very similar here. Euclid had a perfectly adequate notion of geometry, he axiomatised what he needed and argued from there. Later on we found a way to model the plane from more fundamental objects such as the real numbers. After that we found a way of modelling the real numbers using the natural numbers, and in some sense this is where we stop; using either type theory or set theory, the natural numbers have no definition — their existence is asserted, and we use them via their API (addition, multiplication, the principle of induction). Maybe someone will come along in future with a theory of mathematical quarks, which can be used together with the quark axioms to build the natural numbers in a natural way, and then our understanding of what Pythagoras’ theorem is might change again.

Posted in General, Olympiad stuff, Type theory, undergrad maths | Tagged , , | 8 Comments