Equality part 3: canonical isomorphism.

I hadn’t planned on writing a third equality post, but my talk at Big Proof 2019 (see slides here) ended up having a huge section about “mathematician’s equality”, and I realised I had more to say about it. Here are some more detailed thoughts about the issue.

Examples of canonical isomorphisms.

Canonical isomorphism is a magic kind of equality, apparently accessible to mathematicians who are “thinking about things in the right way”. It has several definitions; it has no definition. However, we know it when we see it, we are almost universally in agreement on the notion, and hence we don’t seem to really challenge each other on it. The isomorphism between a finite dimensional vector space and its double dual is canonical — we all know that. The class field theory isomorphism is canonical (at least up to sign) — we know that too. Even the Langlands correspondence is supposed to be canonical — it is for that reason that we believe in the Langlands group. On a more mundane note, R[1/fg] is canonically isomorphic to R[1/f][1/g]. Do we even stop to think what these things mean? Formalisation has forced me to think about one of these, and it led me to a surprising conclusion.

Wikipedia defines a canonical map as — well, why don’t you just take a look. There is no formal definition there at all. The page is tagged with the “mathematical terminology” tag but one could argue that this is stretching the point, unless you regard “the map which preserves the widest amount of structure” or “conventionally agreed upon to be the most useful for further analysis” to have some sort of mathematical meaning.

But what am I saying here? Is there no space in mathematics for guiding principles, and not-completely-formed ideas about how things should fit together? Can we not experiment with concepts and worry about dotting the i’s and crossing the t’s later? Of course we can do these things! That is exactly the art of mathematics! In fact, it’s more than just allowed — it is essential that mathematicians are allowed to be creative, and come up with brilliant ideas of maps between apparently unrelated objects such as Galois representations and automorphic representations, and then suggest that these constructions are in some way canonical and begin to find out exactly what other observations such an idea might lead us to. This way of thinking is essential for mathematics to proceed.

One thing I understood recently is that there are two independent subtleties here, so in this post I will separate them out. The first is well-known, the second I think less so.

Issue 1: Mission creep.

The first issue is the many contexts where the word “canonical” is used. Sometimes people use it in a situation where it can be easily rigorously formalised. In category theory there is a perfectly rigorous definition of a natural isomorphism. For example, in the category of finite dimensional complex vector spaces, we could consider the functor sending a vector space V to itself, and we could consider a second functor sending a vector space V to its double dual. There is a natural isomorphism between these two functors, sending a vector space to its double dual. One can look up the definition of natural isomorphism on Wikipedia; it is the statement that a precise collection of diagrams all commute. People say that the obvious map from a vector space to its double dual is “canonical” but they could instead say that it is a natural transformation. Another example would be where two objects both satisfy a certain universal property. Then a piece of abstract nonsense proves that there are are unique isomorphisms in both directions between these two objects, again both making a bunch of diagrams commute. Again a mathematician might say that isomorphisms arising in this way are “canonical”, and again there is an equivalent category-theoretic language which they could use instead to say precisely what they mean.

However there are times in mathematics where the word is being used beyond this remit. Some number theorists would say that the isomorphisms of global class field theory are “canonical”. This is, I believe, a different usage of the word. There might not be some formal category-theoretic framework which one can hide behind here (although people attempting to categorify the Langlands program might well believe that such a framework exists). What is going on here is that Artin made some construction, an isomorphism between two abelian groups attached to a number field (one using adeles and one using Galois groups), and this map he constructed obeyed a nice long list of properties which have been explicitly written down (for example the behaviour of the map when you change from one field to a finite extension), and it is not hard to show that that there can be at most one construction with these properties. Such a construction does exist (a hard theorem), and this construction then gets the name “canonical”, which here seems to be a specialised and different use of the word. Note however that Artin’s construction gives an isomorphism between two abelian groups, and one can compose this isomorphism with the map sending an element of an abelian group to its inverse, giving us a second canonical isomorphism between these two groups, which satisfies a similar-but-not-exactly-the-same nice long list of properties, arguably indistinguishable from the first list in the sense that it is not clear (at least to me) which list is “better”. However the two lists are distinguishable in the sense that they are different, and can be told apart. In this situation it is hence difficult to say which of the constructions is the “most canonical”. The constructions have got different names, and both are used in the literature. One is called “class field theory normalised the classical way, sending uniformisers to arithmetic Frobenii” and the other is called “class field theory normalised the Deligne/Langlands/Carayol way, sending uniformisers to geometric Frobenii”. A geometric Frobenius is simply the inverse of an arithmetic Frobenius. This example is quite jarring, read within the context of Wikipedia’s definition of a canonical map at least, because both maps are useful (the classical normalisation when doing concrete things like Heegner points, the more modern normalisation when doing the cohomology of Shimura varieties), and there is no particular agreement within the number theory community as to which choice is “best” — it simply depends on what you are doing.

Andre Weil once jokingly wrote “I can assure you, at any rate, that my intentions are honourable and my results invariant, probably canonical, perhaps even functorial”, perhaps parodying the usage of the word in mathematics at that time (the 1950s). I think this sums up rather well the way the word is sometimes used nowadays.

Issue 2: canonicality beyond the universal property.

There is however a second issue, which I began thinking about when preparing my Big Proof talk, and this is when mathematicians see an isomorphism which is “canonical” in a sense which can definitely be made completely precise, and then decides that the isomorphism is so completely canonically canonical that it will not cause any problem to simply use the notation = for it. J. S. Milne, in his book “Etale cohomology“, even explains in his notational conventions that canonical isomorphisms will be denoted by =. [What, all of them? Even the two distinct canonical isomorphisms of class field theory? It is not hard to come up with a very plausible-looking proof that x=x^{-1} for x in the abelianisation of the absolute Galois group of the rationals using this convention; of course, this is not what Milne means to say (although it would be interesting to hear his definition of “canonical” in this context, if indeed he had one)]. Milne is of course a professional mathematician who knows what he is talking about, and so uses this convention sensibly. He is not the only one. Even Grothendieck adopts this convention in EGA (EGA I, section 1.3.3), although he does not seem to be so explicit about it. The context here is that if R is a ring, and f and g are two elements of R with the property that a prime ideal of R contains f if and only if it contains g (for example we could have g=f^2 or g^3=f^2), then one can check that there are unique R-algebra isomorphisms R[1/f]\to R[1/g] and R[1/g]\to R[1/f] whose composite in both directions is the identity. Grothendieck himself uses the French word “canoniquement” in the link above when talking about this identification, and five lines later is using = as its notation, when claiming M_f = M_g, a claim of the same nature as the claim R[1/f] = R[1/g]. If R is an integral domain then one can build R[1/f] and R[1/g] as explicit subsets of the field of fractions of R and then they really are equal subsets. However in the general case one does not have this luxury. What does it even mean to say R[1/f]=R[1/g]? Grothendieck explicitly wrote M_f = M_g. What does this even mean? They are not equal in the formal sense at all, as we are about to see — but they are most definitely canonically isomorphic — any mathematician who knows the area would tell you that.

This is a serious issue for formalisers. This notion of equality here is most definitely not definitional equality or syntactic equality. In dependent type theory one could attempt to interpret it an equality of two types, however in this context it is not really reasonable to argue that the types are equal. Anyone who has seen the usual construction of R[1/f] knows that it is equivalence classes of pairs (r,n) with r\in R and n\in\mathbb{Z}_{\geq0}, where (r,n) is initially interpreted informally as r/f^n, even though at this point these symbols make no sense. The equivalence relation is then defined, informed via this interpretation, and then the informal interpretation is forgotten about, and a ring structure is put on the equivalence classes. The ring R[1/f] is proved to have a universal property, namely that any ring homomorphism R\to A sending f to a unit extends uniquely to an R-algebra homomorphism R[1/f]\to A (did I mention that all rings are commutative and have a 1, and all morphisms send 1 to 1?). This is the universal property. In this optic, the two equivalence relations on R\times\mathbb{Z}_{\geq0} really are not equal in general, and it’s quite messy writing down the canonical map on equivalence classes here; it’s much easier to prove a theorem once and for all saying that g is invertible in R[1/f] and vice versa, and then we can use the universal property to define the maps which give us the canonical isomorphisms.

Here is my objection to this usage. The moment two mathematical objects are equal, it is apparently possible to replace one by the other in any argument whatsoever, without giving any justification. Here is an explicit example, which bit me hard a year or so ago. It is a standard argument in basic graduate level algebraic geometry which I have now read many treatments of, and I am not really satisfied with any of them. For simplicity, I will talk about the treatment in the stacks project, because it is the easiest to cite. In tag 01HR, it says (a little before definition 25.5.3) “Thus we may apply Algebra, Lemma 10.23.1 to the module M_f over R_f and the elements g_1, g_2, \ldots, g_n”. But actually this is not strictly speaking possible, from a formalist perspective. Bear with me here, this is not a picky pedantic point, there is a little more to it than that. The lemma being cited is in tag 00EK, which is some statement about an exact sequence of modules. There are unfortunately two modules M here, one in 01HR and one in 00EK, and they’re not the same. For simplicity let’s assume that M=R in 01HR, which then means that M=R[1/f] in 00EK. To make things even simpler, let’s assume that the elements f_i\in R[1/f] are actually elements of R, and let’s call them g_i to coincide with the notation in 01HR, so D(g_i)\subseteq D(f) using the usual notation for open sets attached to elements of R. Even with these simplifications, there is still a subtle, almost invisible, problem, which formalisation is about to show us.

The problem is now that 00EK is a statement about an exact sequence relating R[1/f], R[1/f][1/g_i] and R[1/f][1/g_ig_j]. We now want to apply this to prove a result relating R[1/f], R[1/g_i] and R[1/g_ig_j]. Now because of the assumptions we know that R[1/f][1/g_i] and R[1/g_i] are canonically isomorphic. If we now just assume that they are equal, and we furthermore just assume that the maps R[1/g_i]\to R[1/g_ig_j] are equal to the maps R[1/f][1/g_i]\to R[1/f][1/g_ig_j] (i.e. the diagrams all commute), then there is no problem at all! This is precisely what the stacks project is doing, it is what Grothendieck does in EGA, it is what Hartshorne does, it is what everyone does.

The authors here are smart people, so let’s step back and just check that their intuition is correct. Of course, any one of these authors, when challenged, will say that they know that R[1/f][1/g_i] is not literally equal to R[1/g_i], however both maps satisfy the same universal property, and so they are canonically isomorphic. So now we have a diagram — a sequence we’ve shown to be exact, and a sequence of canonically isomorphic objects which we want to show is exact. So now we draw in all the canonical isomorphisms and now we just have to prove that all the resulting squares commute. But all the isomorphisms are natural, coming from a universal property, so all the diagrams will commute because of this, and everything will work out fine. Right?

But guess what — the map \beta in 00EK is not a morphism of rings. It cannot be defined via the universal property directly. The universal property really does not tell us immediately that the diagram commutes! We are kind of cheating here; there is a diagram, we can’t see it because of this trick of using = to mean an isomorphism, the isomorphism is natural, but the naturality does not immediately imply that the diagram we need to commute, commutes. We have slightly over-stepped the mark. We need two applications of the universal property and then we need to take their difference and observe something of the following form: if two squares involving four abelian groups commute, and the vertical maps are the same in each square, then subtracting the horizontal maps gives rise to a a third diagram which also commutes. There is some little missing argument here. I never spotted this argument until I tried to formalise it. Once I’d realised that from a formalist point of view 00EK was actually too weak to use in 01HR, I had to then prove a bunch of lemmas showing that 00EK (whose proof was a lot of work, done by Chris Hughes) plus a horrible diagram chase whose proof is omitted here but which could not be omitted in Lean, was enough to deduce 01HR. It concerned me that at some point I had to go “beyond the universal property”. Not only that, it made this part of the code base much harder to follow. I had got us into this mess by telling Chris Hughes to prove 00EK without checking beforehand that it would suffice in applications — I naturally assumed that the literature didn’t have subtle holes in. Ramon Fernandez Mir has rewritten the entire code base from scratch, and did not prove 00EK by hand but instead proved a more invariant version, using rings isomorphic to R[1/g_i] etc instead of the actual rings themselves. Instead of using the universal property (which might have caused issues with universes) he used a predicate introduced by Strickland which characterises precisely when a ring homomorphism R\to A makes A into an R-algebra isomorphic to R[1/S] as an R-algebra. Everything works much more nicely — but we do not prove 00EK as it stands; we prove something slightly more general. We prove the thing that we actually need.

Conclusions.

What I would hence like to argue is that this lazy usage of = to mean “canonical isomorphism” actually blinds some mathematicians. Equality is such a powerful thing. If two canonically isomorphic objects are now deemed to be equal because they both obey some universal property, then all of a sudden we can manipulate them as if they are equal regardless of whether we are using the universal property or not. This is a very useful trick, because it enables us to cheat effectively without our peers noticing. By “cheat” here I mean that it enables us to leave out tedious arguments. No mathematician cares about these tedious arguments, as long as they do actually exist. In the case of schemes, they do exist. Of course they exist! If there was some fundamental problem with structure sheaves, people would have obviously spotted it decades ago. But what about all the other uses of = to mean canonical isomorphism? Are we really checking them? Would we definitely know if some subtlety came up?

Imagine the following situation. Say there are two objects X and Y in your argument. Say that it is well-known that they are canonically isomorphic. Now maybe there are some constructions that one can make easily using X. And maybe there are some other constructions that one can make easily using Y. The constructions do involve the hands-on construction of X and Y and in particular go “beyond” the functoriality or universality of the isomorphism X\to Y, however they “have the same feel to them”, so they are clearly “equal” — at least, they feel as “equal” as X and Y are “equal”. Wouldn’t it be easy just to pass off the assertion that these constructions are also “equal”, even though this cannot be formally deduced from the functorial properties that one knows which give rise to the isomorphism X\to Y? Could that happen in the literature or is this just abstract navel-gazing?

It absolutely happens! And here is a very important time where a mathematician very nearly got away with it! In Dick Gross’ seminal 1990 paper on companion forms (“A tameness criterion for Galois representations associated to modular forms (mod p)”, Duke Math Journal Vol 61 no 2, Oct 1990) Gross canonically identifies several cohomology theories of modular curves, and then claims without any justification whatsoever that the Hecke operators defined on these cohomology groups are also identified. The cohomology groups were known to be “canonically isomorphic”, however this was one of those “canonical isomorphisms” which was of the form “look at this neat map; now everything works, except it takes many pages to justify this”. The canonical isomorphism was not even known to be functorial or natural, it was just “canonical” in this vague sense. In particular it was not known to commute with correspondences and hence with Hecke operators, in 1990. The anonymous referee picked up on this subtlety, but rather than rejecting the paper they just demanded that Gross explicitly flag that there was a gap in the argument. The paper was still published, with a comment about this in the introduction. Four years later, with the gap still not filled, Wiles and Taylor-Wiles used the paper in their proof of Fermat’s Last Theorem. Nobody batted an eyelid. I asked Taylor about this in 1995 and he assured me that he had checked that in the application of Gross’ results to the FLT proof, one could get around this issue. Most other people did not care becuase the results were “obviously true anyway” — at least that was my impression. There are a few people out there who care though; Taylor is one, and another one of them is Brian Conrad. It was not until 2007, seventeen years after Gross’ paper was published, that Bryden Cais, a student of Brian’s, resolved these issues in his PhD thesis. Nobody ever worried about this issue — because everyone knew that the results were going to be true. But with Gross thinking about canonical isomorphism as equality, we almost missed that there was even an issue to be checked.

How many other times has this happened in mathematics, and how many times have we, the community, not noticed? The referee of Gross’ paper was rumoured to be Serre, an extremely careful mathematician. Not all of us are as careful as him. Sloppy use of = can cause trouble.

Advertisements
Posted in Uncategorized | Tagged , | 11 Comments

Equality part 2: syntactic equality

In equality part 1 I talked about the difference between equality-because-of-a-theorem and equality-by-definition, with the uncomfortable example of n+0=n and 0+n=n; when you look at the definition of addition, you find that the former is true by definition and the latter needs a proof (by induction on n).

But there is an even more picky version of equality, which is syntactic equality. Two terms are syntactically equal if, basically, you have to press the same buttons on your keyboard in the same order to make them. As a non-example, n + 0 and n, even though they’re equal by definition, are not syntactically equal. In fact a great example of a term syntactically equal to n + 0 is n + 0, and there aren’t too many more examples.

Actually, syntactic equality doesn’t have to be literally the same keypress. There are two things you can get away with. The first is notation, because notation is just dealt with by Lean’s parser and prettyprinter — it’s some extra layer between Lean and humans, made to make our lives a bit easier. So if n : ℕ then the easy-on-the-eye n + 0 will be syntactically equivalent to n + has_zero.zero ℕ which is what the notation means. And also, Lean is not too fussed about the names of bound variables — for example ∃ n : ℕ, n < 2 is syntactically equivalent to ∃ w : ℕ, w < 2 . But that’s it. Thinking about equality made me appreciate the difference between notation and definitions in fact — for example the zero for the natural numbers, has_zero.zero ℕ, is defined to be nat.zero, but has_zero.zero ℕ and nat.zero are not syntactically equal, just definitionally equal.

Once I’d started getting the hang of definitional equality, and realised that in many cases Lean simply didn’t care if you changed a term to a definitionally equal term, I started to get the impression that definitional equality was “all that mattered”. If you were trying to prove an equality, you applied theorems until the terms were definitionally equal, and then proved the theorem by rfl. But in practice things were not so easy. The first thing that I learnt was that rw will not work with terms that are merely definitionally equal. If you’re in tactic mode with a hypothesis h : X = Y and your goal is X' = Z where X' is definitionally equal to X, then rw h will fail. You have to first apply show X = Z to get the goal into its current form, and then the rewrite will succeed.

That last para I’ve known for a while, but it was only when reading through data/complex/basic.lean in Lean’s maths library this week that it dawned on me that the same is true for simp. I have never really used simp much in my Lean code because I always felt that I hadn’t really got the hang of it. What made the penny drop was that I was developing the theory of complex numbers myself, given the real numbers, and proving that the complex numbers were a ring. How might this code look? Well first you import the reals, and then define the complexes as being a pair of reals.

-- import the reals...
import data.real.basic

-- ...and now we can build the complex numbers as an ordered pair of reals.
structure complex : Type :=
(re : ℝ) (im : ℝ)

notation `ℂ` := complex

Now let’s think about what needs to be done. To prove the ring axioms for the complexes, e.g. a(b+c)=ab+ac, we need to prove that two complex numbers are equal. We can prove this by showing that their real parts and imaginary parts are equal. Checking that the real parts are equal just involves checking that some real number made by sums and products of real and imaginary parts of a,b,c equals some other number, and the ring tactic does this. So first we need to define things like addition, multiplication, negation, zero and 1 something like this:

definition add (z w : ℂ) : ℂ := ⟨z.re + w.re, z.im + w.im⟩

and so on, and then to check stuff like a(b+c)=ab+ac we will want Lean to prove (a * (b + c)).re = (a * b + a * c).re. Now the left hand side is definitionally equal to a.re * (b.re + c.re) + -(a.im * (b.im + c.im)) and the right hand side is definitionally equal to a.re * b.re + (a.re * c.re + (-(a.im * b.im) + -(a.im * c.im))), and if we can get the left and right hand sides into this expanded form then the ring tactic will immediately solve our goal.

But ring by itself will not prove (a * (b + c)).re = (a * b + a * c).re . Definitional equality is not enough. It is not ring‘s job to get our terms into exactly the right state! We need to use the fact that (z * w).re = z.re * w.re - z.im * w.im and so on; each of these lemmas is proved by rfl because the two sides are definitionally equal, however these lemmas need to be explicitly applied, and all in the right order, in order to turn our terms into the form that ring needs.

Now we could apply all the lemmas by hand with a big string of rewrites, but it would be easier to use simp here. The simp tactic is very good at figuring out at what we want to do to our terms here, it will apply all the rewrites in a sane order. So we need to add results like this:

@[simp] lemma mul_re (a b : ℂ) : 
(a * b).re = a.re * b.re - a.im * b.im := rfl

tagged with the [simp] tag, proved with rfl, and telling Lean that every time simp sees the real part of a product, we want it to expand it out.

The first time I tried to get all this to work, I had some trouble, because when defining the ring structure on the complexes, I wrote neg := complex.neg. This innocuous line of code caused me all sorts of problems, and it was only after I cheated and looked at mathlib’s complex.lean did I realise that I should have written neg := has_neg.neg. This was a bit subtle for me, so I’ll finish this post by trying to explain why this was an issue.

The point I’m trying to make is that sometimes syntactic equality really matters. simp and rw want to match terms using syntactic equality, or perhaps they will allow some slight leeway involving unfolding reducible definitions, but syntactic equality is the thing to focus on. This means that we need to think about a “canonical form” for everything we want to say. Should we write re a or a.re? Well, it doesn’t matter, I guess, but what does matter is that we are consistent. We need to make a choice and then stick to it. I was randomly mixing 0 and complex.zero and then finding that I needed two simp lemmas 0.re = 0 and zero.re = 0. We could use complex.add or has_add.add but really we should stick to exactly one (and has_add.add is definitely the best one!). I was using has_add.add but complex.neg because as far as I could see there was no difference at all between the two choices. Sure, they’re definitionally equal, but that doesn’t mean they’re the same.

I got it all working really nicely in the end. My code for proving that the complexes are a ring, which is basically literally a subset of mathlib’s data/complex/basic.lean but with some added comments, is here in the xena project on github. The proof ultimately is very short:

-- and now let's prove all of the axioms of a commutative ring using
-- the same technique.
instance : comm_ring ℂ :=
by refine { zero := 0, add := (+), neg := has_neg.neg, one := 1, mul := (*), ..};
{ intros, apply ext; simp; ring }

ext is the assertion that two complex numbers are equal if their real and imaginary parts are equal; simp puts the real and imaginary parts into the canonical form we need, and ring then proves the equality we need.

Posted in Learning Lean, tactics | Tagged , | Leave a comment

Equality part 1: definitional equality.

I keep meaning to write something about equality in Lean, and on the tube home today I just managed to understand it a little bit better, so perhaps now is the time (although I won’t get to today’s revelation until part 2).

My post on building the natural numbers from first principles , written when I had had been learning Lean for only a few months, is a popular post. I explain how the natural numbers are defined in Lean, and also how addition of natural numbers is defined. As mathematicians we do not think twice about the facts that 0 + n = n and n + 0 = n, but in Lean these two equalities have a different status, because of the definition of addition.

The definition of addition in Lean is “by induction on the second variable” — or more precisely by recursion on the second variable. Every natural number is either 0, or the “successor” S(n) of a previously-defined natural number. Lean defines x+0 to be x, and if we have defined x+n already, then Lean defines x+S(n) to be S(x+n). If you like, we can now prove by induction that x+y has been defined for all x and y, by induction on y. Lean’s notation for S(n) is succ n or more precisely nat.succ n .

I highlight “defines” in the preceding paragraph because we now see some examples of so-called “definitional equality”. The equality x+0=x for x a natural number is a definitional equality — it is true by definition. However the equality 0+x=x is certainly true, but it is not true by definition — it is true because of a theorem. Indeed, one proves 0 + x = x by induction on x, as is done fairly early on in the post linked to above.

As a mathematician I never really had to distinguish between these two kinds of equality. However, when writing Lean code, it is a good idea to keep track of which equalities should be true by definition and which ones are true because of some extra content. One very simple reason for this is that if your goal is an equality which is true by definition, however complicated the definitions involved may be, there is every chance that Lean will be able to prove it using the refl tactic (or, in term mode, the rfl term). This might sound silly but it is absolutely not — several times in the past I found myself stuck trying to prove something which felt obvious to me but for which I could not find the right lemmas in Lean’s library to do the job, and then it was pointed out to me that refl would solve the goal.

Over the last few months I worked intensely on the perfectoid project and during this time I had to work with equivalence classes. I have a traditional model of equivalence classes in my head — if S is a set and \sim is an equivalence relation on S then the set Q of equivalence classes for \sim is, by definition, a set of subsets of S, equipped with the canonical map S\to Q sending s\in S to its equivalence class [s]. It was Patrick Massot who pointed out to me that in some sense this construction is misleading — one does not really need Q to be precisely this set, in fact the main thing one needs is the universal property for Q, namely that if f:S\to T is a function such that s_1\sim s_2\implies f(s_1)=f(s_2) then f descends to a function \overline{f}: Q\to T satisfying \overline{f}([s])=f(s) for all s\in S. Now all this is in Lean of course, but its implementation was something which I always found mysterious until recently, for the reasons I shall now explain.

The first, completely dumb, thing is that the descent \overline{f} of f is called the lift of f in Lean. This huge change of notation disoriented me for a long time. A related piece of notation: the universal property \overline{f}([s])=f(s) mentioned above has a name — it’s called “the computation principle” in Lean — and I had never heard it being called this before, which was an obstruction to my understanding when people were talking about this sort of thing on the Lean Zulip chat.

The second reason was that whilst Lean is completely capable of forming subsets of S and making them into a new type Q, Lean did not use this approach at all, using instead a new quotient axiom to make Q which is explicitly added to the system, and then a certain amount of fuss is made about it right the end of Theorem Proving In Lean. For a long time I could not see the point of this quotient axiom at all; the only thing it seemed to do was to provide a second method for making quotient types which was not the standard way taught in undergraduate mathematics classes.

However, when actually working with equivalence relations and equivalence classes in a technical setting, the penny finally dropped: the computational principle is a definitional equality. In other words, the proof that \overline{f}([s])=f(s) is refl. This really does change your workflow in practice; indeed I was very excited on a few occasions that what looked like some very complicated goal involving quotients turned out to be provable with refl. To appreciate the power of this, imagine how one would go about defining \overline{f} using the traditional approach. We have some equivalence class C, which is non-empty by definition. We want to define \overline{f}(C). How do we do this? Well we apply the computer scientists’ axiom of choice, which gives us an element c\in C, and then we just return f(c). Now it is certainly a theorem that \overline{f}([s])=f(s), however the chances are that the c\in [s] that we chose was not s, and the fact that f(c)=f(s) is true because it is a theorem that f is constant on equivalence classes! In particular, \overline{f}([s])=f(s) would not be true by definition with this approach.

Definitional equality is a powerful tool. If x=y is true by definition, and h is a function, then h(x)=h(y) is also true by definition. If however x and y are only equal because of a theorem, then one is going to have to use Lean’s rewrite tactic with the theorem to prove that h(x)=h(y) and in particular one is going to have to do some work. Seeing a gigantic goal disappear completely with the proof being refl is very satisfying, and when I understood the difference between definitional equality and equality-because-of-a-theorem I realised that refl is a powerful tool for finishing proofs. In fact I have even claimed in the past that refl is my favourite tactic, although I guess on occasion I have to confess that ring can be pretty useful too…

Posted in Learning Lean | Tagged , | 1 Comment

Perfectoid spaces!

Patrick Massot, Johan Commelin and I finished our definition of a perfectoid space in Lean! Patrick and Johan are professional mathematicians like me, not Imperial undergraduates, but I am sufficiently excited about the project that I couldn’t stop myself blogging about it. The home page of our project is here (check out Patrick’s graph, representing how Lean thinks about perfectoid spaces!) and the source code is here. We ultimately wrote around 12,000 lines of code in the repository itself, and Patrick wrote much more stuff on topological rings (formalising a bunch of Bourbaki Topologie Generale) which is already in Lean’s maths library.

What does this all mean? Well, one way of thinking about it, following the theme of this blog, might be: my virtual undergraduate Xena, who did not even know the definition of the complex numbers 18 months ago, now knows some really complicated things! On the other hand, that’s not because my real undergraduates taught her about this. Perfectoid spaces were discovered/invented by Peter Scholze in 2011, and used by him to prove new cases of the weight-monodromy conjecture and Langlands’ conjectures. Scholze was awarded the Fields Medal, one of the highest honours in mathematics, for this work.

But in my mind, the important thing is this. We all know, and indeed computer scientists have known for decades, that computer proof assistants like Lean are capable of working with undergraduate-level mathematics; indeed in my previous post I link to Abhimanyu Pallavi Sudhir’s solutions to last May’s M1F exam (the 1st year “introduction to proof” course which I teach at Imperial), and over the last 18 months I have blogged about many other achievements of Imperial’s undergraduates, and in particular the work of Chris Hughes and Kenny Lau formalising many proofs of results on our undergraduate curriculum and beyond. Ramon Mir’s MSc thesis formalised schemes in Lean in a robust way, and these are MSc level objects. But what I did not really feel I knew, until now, was whether Lean was capable of handling genuinely complex objects which are of actual interest to research mathematicians. Now we know that it is. In particular, I now see no theoretical obstruction stopping Tom Hales’ Formal Abstracts project from succeeding. In the long term, I believe that projects such as Hales’ have a real chance of changing the way humans do mathematics, and some chance of ultimately changing the way computers do mathematics.

It would still require many person-years of work to formally verify, for example, Scholze’s tilting correspondence for perfectoid spaces in Lean. But the crucial thing is that it would require far less work to formally state the correspondence. Here in 2019, formalising deep human proofs in computer proof verification systems is still a hugely non-trivial problem, which computers cannot do and which takes humans a very long time to do. But now I am sure that we are in a position where we can start telling computers the statements of precisely what we humans believe that we have achieved in pure mathematics. What computers can do with this information — well, that’s a question for the computer scientists. Given that so much effort has been put into search algorithms over the last 10 years, I should think that this would be a good place to start. I remember as a PhD student paging through a book on algebraic spaces desperately hoping to spot facts which would enable me to progress with my thesis. Can this process be automated in such a way as to be practical and useful to the 21st century PhD student?

The success of Hales’ project will involve a culture change amongst mathematicians. Ultimately it will rely on mathematicians formalising statements of their theorems, once their papers are accepted, and uploading the formalised statements to Hales’ database. We are not ready for this now, for several reasons. Firstly, we do not have enough mathematical objects formalised in Lean (we still don’t have manifolds [written May 2019]! But we will do soon, they are certainly on the way). And secondly, most mathematicians do not know the appropriate formal language which Lean understands. If my project to educate undergraduates to speak this language succeeds, and if Imperial students and others continue to get on board and formalise more and more mathematical objects, then I believe that progress will be inevitable. Imperial’s new mathematics curriculum launches in October and I will be pushing hard to digitise all the pure mathematics courses — definitions, theorems, proofs, example sheets, everything. After 25 years of teaching mathematics to undergraduates in the standard way, I am both surprised and excited to now be doing things in a completely different and new way.

Posted in Uncategorized | Tagged , , , , , | 4 Comments

M1F, Imperial undergraduates, and Lean

M1F is the “introduction to proof” course which I have taught for the last few years at Imperial College London. When I started this blog just over one and a half years ago, one of my goals was to see whether Lean could handle the solutions to an M1F exam. At the time it was pretty clear that it wasn’t ready — a random sample of M1F exams showed that most of them mentioned either the complex numbers, or functions such as the exponential, sine and cosine function. None of these were in Lean at the time.

I wrote some basic complex number code myself, and PR’ed it to the maths library; it was my first serious PR. A team led by Chris Hughes (at the time a first year) did sines and cosines and the exponential function. And now the inevitable has occurred: Abhimanyu Pallavi Sudhir, a current first year, has formalised the solutions to last year’s M1F final exam! Many thanks to Abhi — I was pretty sure this sort of thing should be accessible, but this is the proof. Apologies to Abhi for taking so long to get around to looking at his work. I believe that having concrete goals like this are a good way of driving Lean’s maths library in the direction which I want it to go in — formalising mathematics which is actually used by mathematicians. One reason I took so long to look at it was that I have been very focussed on the perfectoid project recently — but more of that in another post.

Funnily enough, one of the things in the exam which was the trickiest to do in Lean was the proof that 0.71 has no 8’s in its decimal expansion! Lean still does not have a theory of decimal expansions, although Calle Sönne and I will eventually get around to writing one (after exams). In the mean time, I wrote a definition from first principles which does the trick. Another irritating issue is the question about a set with two elements, which a mathematician can start with “WLOG the set is \{0,1\}“. I believe you — but this actually takes a few lines to justify in Lean. Something is missing here — a transport tactic of some kind. We’ll get there one day — the computer scientists are on the case.

20 months ago, formalising the solutions to an M1F exam seemed like a dream away. I’ve already mentioned three undergraduates who were involved in one way or another in realising this dream. All this made me start wondering about which Imperial undergraduates have contributed to Lean’s maths library, and what they have done. At the top we have Chris Hughes and Kenny Lau, who have made many many contributions. Kenny made a robust theory of localisations of commutative rings, essential for our work on schemes, and Chris proved the law of quadratic reciprocity and the fundamental theorem of algebra, amongst many other achievements. Both Chris and Kenny wrote their first year projects in Lean, and Chris is now a maintainer for Lean’s maths library! But here are a whole bunch of other Imperial undergraduates who have managed to get namechecked in Lean’s maths library as well:

  • Ellen Arlt (matrices)
  • Sangwoo Jo (number theory)
  • Guy Leroy (number theory)
  • Jean Lo (analysis, irrationals, operator norms)
  • Rohan Mitta (topology)
  • Blair Shi (matrices)
  • Abhimanyu Pallavi Sudhir (irrationals, analysis, hyperreals)
  • Calle Sönne (analysis, irrationals)
  • Andreas Swerdlow (algebra)

I must also mention Guillermo Barajas Ayuso, Luca Gerolla, Clara List, Amelia Livingston, Elliott Macneil, Ramon Mir and Ali Sever, all of whom wrote a substantial amount of Lean code which they did not PR; Clara on primes of the form x^2+ny^2 (completely solving the problem for n\leq 3), Luca on fundamental groups, Guillermo on analysis, Elliott on group theory and number theory, Amelia on group theory and Ali on Tarski’s formulation of 2d geometry. Ramon Fernandez Mir has written his MSci project in Lean! And last but not least, Keji Neri, who did a bunch of algebra over the summer with the Imperial summer students, even though he is a Cambridge undergraduate — the only one to have contributed to Lean’s maths library as far as I know. Oxford, Warwick, UCL, Bristol — where are you? Lean’s maths library is ready to digitise a huge chunk of your curriculum. Let’s do it — it will scare the grown-ups.

Finally, some news on Athina Thoma. Athina is the post-doc who has been observing the Imperial undergraduates and trying to figure out whether Lean helps them to learn proofs better. First great piece of news: she has had her contract renewed for another six months! Second great piece of news: slowly she and Paola Iannone are beginning to make sense of all the data they collected. I await with interest the paper they are writing, and the conclusions they will draw.

Posted in M1F, undergrad maths | Tagged , , , | 1 Comment

Happy pi day

Yesterday was pi day, and discussion of Lean’s support of pi came up on the chat. The discussion of pi is I think a great way of illustrating what Lean can and cannot do.

To start with, “Lean has pi”. Here it is in the source code.

lemma exists_cos_eq_zero : ∃ x, 1 ≤ x ∧ x ≤ 2 ∧ cos x = 0 :=      real.intermediate_value'       
(λ x _ _, continuous_iff_continuous_at.1 continuous_cos _)
(le_of_lt cos_one_pos)
(le_of_lt cos_two_neg) (by norm_num)

noncomputable def pi : ℝ := 2 * classical.some exists_cos_eq_zero

If you have ever thought about how pi might be defined formally, then maybe you can see what is going on here. The lemma cos_one_pos is the lemma that cos(1)>0 and the lemma cos_two_neg is the lemma that cos(2)<0. Cosine is defined as a power series and these are grotty power series computations. Now using the fact that cosine is continuous, and the intermediate value theorem, we can prove that cos(x) has a zero between x=1 and x=2. We define \pi to be two times an arbitrary zero of the cosine function between 1 and 2 (we choose a random one, using the computer science version of the axiom of choice, which says that you can choose an element from a non-empty set — just don’t get me started on this). The library then goes on to prove that in fact the cosine function has a unique zero in this range (because the cosine function is decreasing in this range) and now we have an unambiguous definition of \pi in Lean.

One has to constrast this approach with an approach taken by, say, a hand calculator, or a programming language like python or C++. In these languages, a value for pi will either be hard coded, or there will be an algorithm implemented which will generate pi to an arbitrary precision in as quick a time as possible. So, for example, a calculator can typically produce the value of \pi to ten decimal places (and a computer to far more if necessary), and then one could compute the value of \sin(\pi) to ten decimal places (or to far more) and see that it is 0.000000\dots.

But Lean is different. Lean is not a calculator. Lean is a theorem prover. It is true by definition that \cos(\pi/2)=0 in Lean, and it is a theorem in Lean that \sin(2\theta)=2\sin(\theta)\cos(\theta) and hence it is a theorem in Lean that \sin(\pi)=0. Note however that Lean has not done any numerical computation to prove this, it has deduced it from theorems and definitions. For this kind of question, one could regard Lean as performing better than the traditional “computational” model of a computer that we have — a computer can check that \sin(\pi) is zero to 10 or 1000000 decimal places, and the more places you want, the longer it would take. But it cannot prove the answer is zero. Lean can prove that \sin(\pi)=0 with no numerical computation at all.

But what if one did want to do a numerical computation of \pi? Well, this is the flip side of the story. Lean is not designed for this sort of question at all! It is a computer program which “knows \pi“, but actually what does it know about the value of \pi? Well, by definition, \pi is twice a number between 1 and 2, so we get

Theorem. 2<\pi<4.

So as you can see, in this regard Lean is lagging a bit behind the a hand calculator. It seemed like pi day was a good day to make things better.

And by the end of it, things had got about an order of magnitude better! Thanks to the efforts of Floris van Doorn, with some help by others, we now have

Theorem. 3.14<\pi<3.15.

Ongoing work by Kenny Lau might give us even more decimal places within the next day or so. But how might one go about proving such a theorem, given only a bunch of standard theorems about sine and cosine, and the facts that \cos(\pi/2)=0 and 2<\pi<4? Floris used the following trick:

\cos(x)^2=\frac{1+\cos(2x)}{2}.

Using this standard fact, and the fact that \cos(\pi/2)=0, he could deduce exact values for \cos(\pi/4),\ \cos(\pi/8),\ \cos(\pi/16),\ldots. Because \sin(x)^2+\cos(x)^2=1 he could deduce exact values for \sin(\pi/4),\ \sin(\pi/8),\ \sin(\pi/16),\ldots. He could also prove that if x>0 then \sin(x)<x (by using the power series for x\le 1 and the fact that \sin(x)\le 1 when x>1) and as a consequence could prove lower bounds for \pi! By using two terms of the power series for sine we can also prove that \sin(x)>x-x^3/6 for 0<x<1 and this gives us upper bounds.

Taking a random rational number and then squaring it, doubling it and subtracting one (which one has to do to get from \cos(x) to \cos(x)^2 can get a bit hairy once one has done it a few times. Lean had to prove intermediate goals such as

(2 : ℝ) ≤ ((((2 - (157 / 50 / 2 ^ (4 + 1)) ^ 2) ^ 2 - 2) ^ 2 - 2) ^ 2 - 2) ^ 2

and these are real numbers, which in Lean are big complicated equivalence classes of Cauchy sequences, so this looked a bit daunting at first — another reminder that Lean is designed to do proofs, not numerical computations. The norm_num algorithm strained a bit with these checks, however Mario Carneiro pointed out that with judicious approximations one can make life easier: the trick is to constantly replace fractions with large numerators and denominators with approximately equal fractions with smaller numerators and denominators before you square again. These dirty hacks got us through in the end.

The festivities are documented on the pi day thread at the Zulip chat (login required) or the archive of the Zulip chat (no login required), and Floris’ code is here (permalink) or get it at https://github.com/fpvandoorn/mathlib/blob/pi/src/data/real/pi.lean .

Using the decimal expansions library currently being developed by Calle Sonne and myself at Imperial, we can now prove that the first two digits of \pi after the decimal point are 1 and 4. We still have some way to go before we can catch up with Emma Haruka Iwao, but what if her code contains bugs? We have proofs 🙂

In fact in 2017, Bertot, Rideau and Thery did much better — they use Coq to get a million decimal places of pi, formally verified. Doing this work today made me understand far better what an achievement that was.

Posted in Uncategorized | Tagged , | 2 Comments

2+2=4

I am interested in teaching and outreach, and often get asked to give talks to schoolkids. I gave one today, to a room full of year 10 kids (aged 14-15), and I showed them Lean. This was the youngest group of students I’ve ever attempted to show Lean to. I decided to show them a proof that 2 + 2 = 4, starting from definitions of 2 and +. However by year 10 the students have not seen induction yet, so I started by showing them parts of a YouTube video with stuff in a line falling over (e.g. dominoes, humans). I then defined the naturals (I called them “numbers”) as an inductive type in Lean, showed how to define recursively the function which adds two to a number, and finally we proved 2 + 2 = 4 by unwrapping the definitions of 2 and + and so on.

I really didn’t know how it was going to go. But, to my relief, when we were doing the big calculation, I asked the audience to fill in the next line for all seven lines, and these kids managed to do it! I was very pleased.

Here are the pdf slides. In the actual talk the text appeared slowly (there were beamer pauses) and there were times when I spent a minute or two talking to the kids about what was going on.

On page 12 of the pdf I say “Let’s go and make some numbers using Lean” and this is what I coded up live. I finish by trying to formalise the statement that 2+2=4 but there’s a problem because although I’ve defined two and four I’ve not yet defined addition. We then go back to the pdf, and on page 15 I return to Lean and define addition like this. Then comes the moment of truth: the calculation on line 16 (which appears line by line in the actual presentation) and which I got them to do. Having made it through, I then showed how Lean could prove it with rfl, and claimed that this was because even though we’d only just taught it numbers it was already an expert in numbers. I then asked the audience what other kinds of maths did they think it was possible to teach Lean and immediately one kid said “all of it”. I agreed. I then made some provocative statements about why people like my daughter should not be forced to learn about cosines, and then asked if there were any questions. The first one was “How do I learn more about this stuff?”. But I did feel that I’d lost plenty of people in the room.

I think that age 14-15 is a little too young for this talk. These kids had self-selected — they were at Imperial for an open day (during a UK school holiday) so were probably interested in science. But I wonder if the material presented here would be a little more suited to kids a little older. I’m giving a couple more school talks in the future, so maybe I’ll find out soon enough.

Posted in Uncategorized | 4 Comments