I hadn’t planned on writing a third equality post, but my talk at Big Proof 2019 (see slides here) ended up having a huge section about “mathematician’s equality”, and I realised I had more to say about it. Here are some more detailed thoughts about the issue.
Examples of canonical isomorphisms.
Canonical isomorphism is a magic kind of equality, apparently accessible to mathematicians who are “thinking about things in the right way”. It has several definitions; it has no definition. However, we know it when we see it, we are almost universally in agreement on the notion, and hence we don’t seem to really challenge each other on it. The isomorphism between a finite dimensional vector space and its double dual is canonical — we all know that. The class field theory isomorphism is canonical (at least up to sign) — we know that too. Even the Langlands correspondence is supposed to be canonical — it is for that reason that we believe in the Langlands group. On a more mundane note, is canonically isomorphic to . Do we even stop to think what these things mean? Formalisation has forced me to think about one of these, and it led me to a surprising conclusion.
Wikipedia defines a canonical map as — well, why don’t you just take a look. There is no formal definition there at all. The page is tagged with the “mathematical terminology” tag but one could argue that this is stretching the point, unless you regard “the map which preserves the widest amount of structure” or “conventionally agreed upon to be the most useful for further analysis” to have some sort of mathematical meaning.
But what am I saying here? Is there no space in mathematics for guiding principles, and not-completely-formed ideas about how things should fit together? Can we not experiment with concepts and worry about dotting the i’s and crossing the t’s later? Of course we can do these things! That is exactly the art of mathematics! In fact, it’s more than just allowed — it is essential that mathematicians are allowed to be creative, and come up with brilliant ideas of maps between apparently unrelated objects such as Galois representations and automorphic representations, and then suggest that these constructions are in some way canonical and begin to find out exactly what other observations such an idea might lead us to. This way of thinking is essential for mathematics to proceed.
One thing I understood recently is that there are two independent subtleties here, so in this post I will separate them out. The first is well-known, the second I think less so.
Issue 1: Mission creep.
The first issue is the many contexts where the word “canonical” is used. Sometimes people use it in a situation where it can be easily rigorously formalised. In category theory there is a perfectly rigorous definition of a natural isomorphism. For example, in the category of finite dimensional complex vector spaces, we could consider the functor sending a vector space to itself, and we could consider a second functor sending a vector space to its double dual. There is a natural isomorphism between these two functors, sending a vector space to its double dual. One can look up the definition of natural isomorphism on Wikipedia; it is the statement that a precise collection of diagrams all commute. People say that the obvious map from a vector space to its double dual is “canonical” but they could instead say that it is a natural transformation. Another example would be where two objects both satisfy a certain universal property. Then a piece of abstract nonsense proves that there are are unique isomorphisms in both directions between these two objects, again both making a bunch of diagrams commute. Again a mathematician might say that isomorphisms arising in this way are “canonical”, and again there is an equivalent category-theoretic language which they could use instead to say precisely what they mean.
However there are times in mathematics where the word is being used beyond this remit. Some number theorists would say that the isomorphisms of global class field theory are “canonical”. This is, I believe, a different usage of the word. There might not be some formal category-theoretic framework which one can hide behind here (although people attempting to categorify the Langlands program might well believe that such a framework exists). What is going on here is that Artin made some construction, an isomorphism between two abelian groups attached to a number field (one using adeles and one using Galois groups), and this map he constructed obeyed a nice long list of properties which have been explicitly written down (for example the behaviour of the map when you change from one field to a finite extension), and it is not hard to show that that there can be at most one construction with these properties. Such a construction does exist (a hard theorem), and this construction then gets the name “canonical”, which here seems to be a specialised and different use of the word. Note however that Artin’s construction gives an isomorphism between two abelian groups, and one can compose this isomorphism with the map sending an element of an abelian group to its inverse, giving us a second canonical isomorphism between these two groups, which satisfies a similar-but-not-exactly-the-same nice long list of properties, arguably indistinguishable from the first list in the sense that it is not clear (at least to me) which list is “better”. However the two lists are distinguishable in the sense that they are different, and can be told apart. In this situation it is hence difficult to say which of the constructions is the “most canonical”. The constructions have got different names, and both are used in the literature. One is called “class field theory normalised the classical way, sending uniformisers to arithmetic Frobenii” and the other is called “class field theory normalised the Deligne/Langlands/Carayol way, sending uniformisers to geometric Frobenii”. A geometric Frobenius is simply the inverse of an arithmetic Frobenius. This example is quite jarring, read within the context of Wikipedia’s definition of a canonical map at least, because both maps are useful (the classical normalisation when doing concrete things like Heegner points, the more modern normalisation when doing the cohomology of Shimura varieties), and there is no particular agreement within the number theory community as to which choice is “best” — it simply depends on what you are doing.
Andre Weil once jokingly wrote “I can assure you, at any rate, that my intentions are honourable and my results invariant, probably canonical, perhaps even functorial”, perhaps parodying the usage of the word in mathematics at that time (the 1950s). I think this sums up rather well the way the word is sometimes used nowadays.
Issue 2: canonicality beyond the universal property.
There is however a second issue, which I began thinking about when preparing my Big Proof talk, and this is when mathematicians see an isomorphism which is “canonical” in a sense which can definitely be made completely precise, and then decides that the isomorphism is so completely canonically canonical that it will not cause any problem to simply use the notation for it. J. S. Milne, in his book “Etale cohomology“, even explains in his notational conventions that canonical isomorphisms will be denoted by . [What, all of them? Even the two distinct canonical isomorphisms of class field theory? It is not hard to come up with a very plausible-looking proof that for in the abelianisation of the absolute Galois group of the rationals using this convention; of course, this is not what Milne means to say (although it would be interesting to hear his definition of “canonical” in this context, if indeed he had one)]. Milne is of course a professional mathematician who knows what he is talking about, and so uses this convention sensibly. He is not the only one. Even Grothendieck adopts this convention in EGA (EGA I, section 1.3.3), although he does not seem to be so explicit about it. The context here is that if is a ring, and and are two elements of with the property that a prime ideal of contains if and only if it contains (for example we could have or ), then one can check that there are unique -algebra isomorphisms and whose composite in both directions is the identity. Grothendieck himself uses the French word “canoniquement” in the link above when talking about this identification, and five lines later is using as its notation, when claiming , a claim of the same nature as the claim . If is an integral domain then one can build and as explicit subsets of the field of fractions of and then they really are equal subsets. However in the general case one does not have this luxury. What does it even mean to say ? Grothendieck explicitly wrote . What does this even mean? They are not equal in the formal sense at all, as we are about to see — but they are most definitely canonically isomorphic — any mathematician who knows the area would tell you that.
This is a serious issue for formalisers. This notion of equality here is most definitely not definitional equality or syntactic equality. In dependent type theory one could attempt to interpret it an equality of two types, however in this context it is not really reasonable to argue that the types are equal. Anyone who has seen the usual construction of knows that it is equivalence classes of pairs with and , where is initially interpreted informally as , even though at this point these symbols make no sense. The equivalence relation is then defined, informed via this interpretation, and then the informal interpretation is forgotten about, and a ring structure is put on the equivalence classes. The ring is proved to have a universal property, namely that any ring homomorphism sending to a unit extends uniquely to an -algebra homomorphism (did I mention that all rings are commutative and have a 1, and all morphisms send 1 to 1?). This is the universal property. In this optic, the two equivalence relations on really are not equal in general, and it’s quite messy writing down the canonical map on equivalence classes here; it’s much easier to prove a theorem once and for all saying that is invertible in and vice versa, and then we can use the universal property to define the maps which give us the canonical isomorphisms.
Here is my objection to this usage. The moment two mathematical objects are equal, it is apparently possible to replace one by the other in any argument whatsoever, without giving any justification. Here is an explicit example, which bit me hard a year or so ago. It is a standard argument in basic graduate level algebraic geometry which I have now read many treatments of, and I am not really satisfied with any of them. For simplicity, I will talk about the treatment in the stacks project, because it is the easiest to cite. In tag 01HR, it says (a little before definition 25.5.3) “Thus we may apply Algebra, Lemma 10.23.1 to the module over and the elements ”. But actually this is not strictly speaking possible, from a formalist perspective. Bear with me here, this is not a picky pedantic point, there is a little more to it than that. The lemma being cited is in tag 00EK, which is some statement about an exact sequence of modules. There are unfortunately two modules here, one in 01HR and one in 00EK, and they’re not the same. For simplicity let’s assume that in 01HR, which then means that in 00EK. To make things even simpler, let’s assume that the elements are actually elements of , and let’s call them to coincide with the notation in 01HR, so using the usual notation for open sets attached to elements of . Even with these simplifications, there is still a subtle, almost invisible, problem, which formalisation is about to show us.
The problem is now that 00EK is a statement about an exact sequence relating , and . We now want to apply this to prove a result relating , and . Now because of the assumptions we know that and are canonically isomorphic. If we now just assume that they are equal, and we furthermore just assume that the maps are equal to the maps (i.e. the diagrams all commute), then there is no problem at all! This is precisely what the stacks project is doing, it is what Grothendieck does in EGA, it is what Hartshorne does, it is what everyone does.
The authors here are smart people, so let’s step back and just check that their intuition is correct. Of course, any one of these authors, when challenged, will say that they know that is not literally equal to , however both maps satisfy the same universal property, and so they are canonically isomorphic. So now we have a diagram — a sequence we’ve shown to be exact, and a sequence of canonically isomorphic objects which we want to show is exact. So now we draw in all the canonical isomorphisms and now we just have to prove that all the resulting squares commute. But all the isomorphisms are natural, coming from a universal property, so all the diagrams will commute because of this, and everything will work out fine. Right?
But guess what — the map in 00EK is not a morphism of rings. It cannot be defined via the universal property directly. The universal property really does not tell us immediately that the diagram commutes! We are kind of cheating here; there is a diagram, we can’t see it because of this trick of using to mean an isomorphism, the isomorphism is natural, but the naturality does not immediately imply that the diagram we need to commute, commutes. We have slightly over-stepped the mark. We need two applications of the universal property and then we need to take their difference and observe something of the following form: if two squares involving four abelian groups commute, and the vertical maps are the same in each square, then subtracting the horizontal maps gives rise to a a third diagram which also commutes. There is some little missing argument here. I never spotted this argument until I tried to formalise it. Once I’d realised that from a formalist point of view 00EK was actually too weak to use in 01HR, I had to then prove a bunch of lemmas showing that 00EK (whose proof was a lot of work, done by Chris Hughes) plus a horrible diagram chase whose proof is omitted here but which could not be omitted in Lean, was enough to deduce 01HR. It concerned me that at some point I had to go “beyond the universal property”. Not only that, it made this part of the code base much harder to follow. I had got us into this mess by telling Chris Hughes to prove 00EK without checking beforehand that it would suffice in applications — I naturally assumed that the literature didn’t have subtle holes in. Ramon Fernandez Mir has rewritten the entire code base from scratch, and did not prove 00EK by hand but instead proved a more invariant version, using rings isomorphic to etc instead of the actual rings themselves. Instead of using the universal property (which might have caused issues with universes) he used a predicate introduced by Strickland which characterises precisely when a ring homomorphism makes into an -algebra isomorphic to as an -algebra. Everything works much more nicely — but we do not prove 00EK as it stands; we prove something slightly more general. We prove the thing that we actually need.
What I would hence like to argue is that this lazy usage of to mean “canonical isomorphism” actually blinds some mathematicians. Equality is such a powerful thing. If two canonically isomorphic objects are now deemed to be equal because they both obey some universal property, then all of a sudden we can manipulate them as if they are equal regardless of whether we are using the universal property or not. This is a very useful trick, because it enables us to cheat effectively without our peers noticing. By “cheat” here I mean that it enables us to leave out tedious arguments. No mathematician cares about these tedious arguments, as long as they do actually exist. In the case of schemes, they do exist. Of course they exist! If there was some fundamental problem with structure sheaves, people would have obviously spotted it decades ago. But what about all the other uses of to mean canonical isomorphism? Are we really checking them? Would we definitely know if some subtlety came up?
Imagine the following situation. Say there are two objects and in your argument. Say that it is well-known that they are canonically isomorphic. Now maybe there are some constructions that one can make easily using . And maybe there are some other constructions that one can make easily using . The constructions do involve the hands-on construction of and and in particular go “beyond” the functoriality or universality of the isomorphism , however they “have the same feel to them”, so they are clearly “equal” — at least, they feel as “equal” as and are “equal”. Wouldn’t it be easy just to pass off the assertion that these constructions are also “equal”, even though this cannot be formally deduced from the functorial properties that one knows which give rise to the isomorphism ? Could that happen in the literature or is this just abstract navel-gazing?
It absolutely happens! And here is a very important time where a mathematician very nearly got away with it! In Dick Gross’ seminal 1990 paper on companion forms (“A tameness criterion for Galois representations associated to modular forms (mod p)”, Duke Math Journal Vol 61 no 2, Oct 1990) Gross canonically identifies several cohomology theories of modular curves, and then claims without any justification whatsoever that the Hecke operators defined on these cohomology groups are also identified. The cohomology groups were known to be “canonically isomorphic”, however this was one of those “canonical isomorphisms” which was of the form “look at this neat map; now everything works, except it takes many pages to justify this”. The canonical isomorphism was not even known to be functorial or natural, it was just “canonical” in this vague sense. In particular it was not known to commute with correspondences and hence with Hecke operators, in 1990. The anonymous referee picked up on this subtlety, but rather than rejecting the paper they just demanded that Gross explicitly flag that there was a gap in the argument. The paper was still published, with a comment about this in the introduction. Four years later, with the gap still not filled, Wiles and Taylor-Wiles used the paper in their proof of Fermat’s Last Theorem. Nobody batted an eyelid. I asked Taylor about this in 1995 and he assured me that he had checked that in the application of Gross’ results to the FLT proof, one could get around this issue. Most other people did not care becuase the results were “obviously true anyway” — at least that was my impression. There are a few people out there who care though; Taylor is one, and another one of them is Brian Conrad. It was not until 2007, seventeen years after Gross’ paper was published, that Bryden Cais, a student of Brian’s, resolved these issues in his PhD thesis. Nobody ever worried about this issue — because everyone knew that the results were going to be true. But with Gross thinking about canonical isomorphism as equality, we almost missed that there was even an issue to be checked.
How many other times has this happened in mathematics, and how many times have we, the community, not noticed? The referee of Gross’ paper was rumoured to be Serre, an extremely careful mathematician. Not all of us are as careful as him. Sloppy use of can cause trouble.