IMO 2019 Q1

Someone sent me a link to this year’s IMO questions this afternoon at work, and the first one caught my eye because it looked like a relatively straightforward functional equation question: find all f : \mathbb{Z}\to\mathbb{Z} such that f(2a)+2f(b)=f(f(a+b)). I’ve seen enough of these in my time to know how to go about them, and it didn’t take me long to find a solution on paper. I then had to go to a meeting, but half way through when I didn’t have anything else to contribute I idly started formalising my solution in Lean.

Now formalisation is hard, so I knew I would end up getting bogged down in details and things which looked straightforward on paper would end up messy. At some point someone divides by 2, that looks a bit problematic with the integers, and then there’s some integer induction — I kept waiting to get to the place where it was all going to go horribly pear-shaped.

But that place never came. Powerful tactics like convert and ring , the conv mode, and a sensible induction procedure for the integers meant that the Lean proof flowed just as the paper proof had done. Here it is! It probably took over half an hour, but under an hour, to finish it. Of course the question is not too hard — but that’s not the point.

The point is this. Before I got into the formalisation business, it was “obvious” to me that formalising questions like this should be easy. When I actually started, it became very clear to me very quickly that formalising IMO questions would be very hard, because formalising anything with content is hard. But the nice surprise is that actually, once one has a grip on formalisation, if one is working in an area which is “sufficiently close to the axioms” like elementary number theory or algebra, maybe nowadays in Lean it really is becoming as easy as I always “knew” it would be.

Advertisements
Posted in General, number theory | Tagged , | 1 Comment

Lean does not (yet) have real manifolds.

There are loads of computer theorem provers. There is Lean, there is Coq, there is Agda, Idris, Isabelle-HOL, HOL-Light, UniMath, Mizar and many others. Which one is the best one for formalising a proof of Fermat’s Last Theorem?

I’ve spent the last few months having a good look at all of these systems and I think that out of all the ones which are currently on offer to us (the mathematical community) the one which offers most hope is Lean. I checked Lean’s suitability by defining perfectoid spaces in Lean 3 with Patrick Massot and Johan Commelin, two other mathematicians — and it worked fine. We ran into a lot of problems along the way, but in essentially every situation the problem we were in was of our own making, and we were rescued by computer scientists who know nothing of perfectoid spaces but who are on the other hand very well equipped to answer extremely specific technical questions and give us extremely specialised advice about how best to turn mathematics into functional programming in practice.

But we still don’t have smooth real manifolds. I was initially perplexed by why things were going so slowly; I saw a definition from first principles as an undergraduate. But now I understand the reason, and it’s a bit bizarre. The reason is that a general principle in Lean is that one should set things up in as much generality as possible, for certain efficiency reasons which are of no interest to mathematicians. A more precise explanation is of course the extremely specialised advice offered to us by the computer scientists in this situation. This has an unfortunate consequence that you can’t just wait for an Imperial undergraduate to do it 😦 One apparently really needs to think about differentiating maps between some quite general real topological vector spaces, at least some of which can be infinite-dimensional. This kind of stinks, but that’s the way it is with dependent type theory. Important: I am not saying that real manifolds can’t or won’t be done in Lean. I am just saying that we know what to do but it is taking a long time and it is more technical than one might guess at first. We’ll get there. Thank you to (in alphabetical order) Jeremy Avigad, Sebastien Gouezel, Patrick Massot, Zhouhang Zhou and all the others who are making steady progress. We have recently reached a milestone — we have defined the Bochner Integral. I don’t know what that is. Although when thinking about it in Lean I ended up so confused that I had to confess on mathoverflow that I didn’t even really understand the real numbers.

Posted in General, Imperial, undergrad maths | Tagged , , , , , , , , , , , , , , , , | 7 Comments

A computer-generated proof that nobody understands

Very early on when I was reading about this formal proof verification game, I learnt about the Robbins Conjecture. Let me start this post with an overview of what this conjecture is, and how it was proved by a computer, and how the proof is easy for humans to understand. Then let me tell you about Bruck Loops, where the situation is quite different, and finally let me tell you about fancier mathematical objects and why things aren’t so easy in that case.

Robbins algebras

There is something in maths called a Boolean Algebra — it’s just a definition, like a group or a topological space. Feel free to look at the formal definition on the Wikipedia page for Boolean algebras. In the 1930s Herbert Robbins defined some other mathematical structure now called a Robbins Algebra, and conjectured that every Robbins Algebra was a Boolean Algebra. Proving this conjecture is like solving a big logic puzzle; we have some set with some structure and a load of axioms (a Robbins algebra), and we want to prove some other statements about this structure (i.e., the axioms for a Boolean algebra).

The conjecture remained open for around 60 years, possibly because it was tricky and possibly because there were not too many people working on it. Then in 1996 William McCune proved the conjecture using “automated theorem provers” called EQP and Otter. To give a rather brutal description of what an automated theorem prover is — it is something which takes as input a question such as the Robbins Conjecture, and then just throws all the axioms around generating lemma after lemma about Robbins algebras with the hope of ultimately managing to prove all the statements which define Boolean algebras. As you can imagine, this “brute force and ignorance” approach is not particularly appealing to the purist, but, let’s face it, it used to be the way that computers played chess, and anyone who has had their king and rook forked by a computer’s knight a fraction of a second after they have made a careless move knows that this approach can sometimes pay off.

Now one interesting thing about McCune’s proof was that actually, at the end of the day, it was pretty short. Within a year Bernd Dahn had got yet another system, ILF, to go through the Otter proof and simplify it (spotting common themes which were used more than once in the proof and pulling them off as useful lemmas); Dahn ultimately wrote a “normal” seven page maths paper, published in the Journal of Algebra in 1998, containing a human-readable account of the proof. The proof had now been completely “de-computerised”.

We learn from this story the unsurprising fact that computers are good at logic puzzles.

Bruck loops

I didn’t tell you what a Robbins algebra was, but I will attempt to tell you what a Bruck loop is. If you really don’t care then just skip down to the statement of the theorem below.

A loop is “a group without the associativity axiom”. More precisely a loop is a set Q equipped with a binary operation \cdot : Q \times Q \to Q and an identity element 1\in Q satisfying 1 \cdot x = x \cdot 1 = x for all x\in Q, plus the following “inverse axiom”: for all q\in Q, left multiplication by q and right multiplication by q are both bijections from Q to Q. In particular for any q\in Q the equations qr=1 and rq=1 have unique solutions for r, and one could call these solutions the left and right inverses of q. These two inverses might not coincide though.

Groups of course give examples of loops, but there are non-associative examples too. Most definitions in group theory have some sort of analogue for loop theory, although you sometimes have to think about things the right way. For example one can consider the group of permutations Q\to Q generated by left and right multiplications by elements of Q, and the inner mapping group Inn(Q) of a loop Q is the subgroup consisting of elements sending 1 to 1; if Q is a group then this is readily checked to be the inner automorphisms of Q.

Sometimes the lack of associativity makes these analogues really painful though. For example the centre of a loop Q is the subloop Z(Q) of Q consisting of the elements z\in Q such that z\cdot q=q\cdot z for all q\in Q and z\cdot(x\cdot y)=(z\cdot x)\cdot y for all x,y\in Q and two more equations like this with z in the middle and on the right. One checks that the centre of a loop is a normal subloop and one can form a quotient loop Q/Z(Q).

Just like groups, we say a loop Q has nilpotency class at most 1 if Q=Z(Q), and Q has nilpotency class at most 2 if Q/Z(Q) has nilpotency class at most 1.

For groups, we have Inn(G)\cong G/Z(G) and in particular if Inn(G) is abelian then G has nilpotency class at most 2. These results use associativity though, and are not true in general for loops. However they are true for Bruck Loops:

Theorem (a computer). If Q is a Bruck loop and Inn(Q) is abelian, then Q has nilpotency class at most 2.

A Bruck loop is a loop satisfying two extra axioms: firstly x\cdot (y\cdot (x\cdot z)) = (x \cdot (y \cdot x)) \cdot z (which implies that left inverses equal right inverses), and secondly that (xy)^{-1}=x^{-1}y^{-1} (another “commutativity” condition).

The humans involved in the proof were J. D. Phillips and David Stanovský, who wrote a paper published in 2012 in Communications in Algebra. The humans reduced the question to proving that a certain equation was always satisfied for Bruck loops with abelian inner mapping groups, and then they threw a bunch of automated theorem provers at it. The prover Waldmeister managed to come up with a proof. Here it is — all 30000+ lines of it. This helpful page on Stanovsky’s website offers other formats of the proof, for example a 1068 page pdf file containing a complete “human-readable” proof. What human would ever read this proof though?

I had no idea that computers were doing that sort of thing nowadays. I learnt about this work from Michael Kinyon at AITP 2019.

Shimura varieties

In my mind, this work raises an interesting question. Note that the work also raises a lot of boring questions. Of course “boring” a very subjective term. But let’s try and put this work on Bruck loops into some sort of context. I work in a mathematics department full of people thinking about mirror symmetry, perfectoid spaces, canonical rings of algebraic varieties in characteristic p, etale cohomology of Shimura varieties defined over number fields, and the Langlands philosophy, amongst other things. Nobody in my department cares about Bruck loops. People care about objects which it takes an entire course to define, not a paragraph. The questions raised by this work which people in my department might find “boring” are: here are other classes of loops; can one prove analogous theorems about these classes? Or : here is another elementary object which takes a paragraph to define but which is not taught to the undergraduates at Imperial College London. Can one prove the following conjecture about these objects? The kind of mathematicians I know are, for the most part, not remotely interested in such questions. This is all to do with subtleties such as fashions in mathematics, and which areas are regarded as important (that is, worth funding).

So what are the mathematicians I know interested in? Well, let’s take the research staff in my department at Imperial College. They are working on results about about objects which in some cases take hundreds of axioms to define, or are even more complicated: sometimes even the definitions of the objects we study can only be formalised once one has proved hard theorems. For example the definition of the canonical model of a Shimura variety over a number field can only be made once one has proved most of the theorems in Deligne’s paper on canonical models, which in turn rely on the theory of CM abelian varieties, which in turn rely on the theorems of global class field theory. That’s the kind of definitions which mathematicians in my department get excited about — not Bruck Loops. I once went to an entire 24 lecture course by John Coates which assumed local class field theory and deduced the theorems of global class field theory. I have read enough of the book by Shimura and Taniyama on CM abelian varieties to know what’s going on there. I have been to a study group on Deligne’s paper on canonical models. So after perhaps 100 hours of study absorbing the prerequisites, I was ready for the definition of a Shimura variety over a number field. And then there is still the small matter of the definition of etale cohomology.

So why are computers proving theorems about Bruck loops but not proving theorems about the cohomology of Shimura varieties?

Here is a conjectural answer. Bruck loops are easy, so it’s easy to prove theorems about them. Shimura varieties are hard, so it’s never going to work.

Yeah, but hang on a minute. If 20 years ago we had raised the possibility that a computer could come up with a 1000 page proof of a statement which humans had formulated, however “easy” the objects in question were, I think that it would not have been hard to find people who would have been very skeptical. Think about it. The definition of a Bruck Loop in the 1000 page pdf is 19 axioms. Now make a naive guess about how long it would take to create a 30,000 line proof of a statement about Bruck Loops. The number 19^{30000} does not fit into our universe in any meaningful way. The computer scientists must have done something clever here. In particular I think that although many mathematicians would instinctively rule out the possibility of computers proving hard theorems about complicated objects because of issues of scale, this work on Bruck loops throws some sort of spanner into those works.

Here is another answer, which is correct for trivial reasons. One reason why computers are not proving theorems about the etale cohomology of Shimura varieties over number fields is that the human race have not yet taken the time to tell a proof verification system what the phrase “etale cohomology of a Shimura variety over a number field” means. Why is this? Well, I am currently finding it very hard to think of people who know the definition of the canonical model of a Shimura variety and also know how to use proof verification systems. I also think that it is much easier to train mathematicians to use proof verification systems than it is to train computer scientists to read papers by Deligne, and this is one of the reasons why the Xena project exists.

But is this the only reason? Well, unfortunately, I suspect that it is not. If you were to ask a computer scientist what they would need in order to start proving things about the cohomology of Shimura varieties defined over number fields, I suspect that they would ask for something like

  • the definitions;
  • sample proofs of hundreds (or thousands?) of theorems to train their AI;
  • some conjectures, of approximately the same level of difficulty as the theorems.

The reason that, in contrast to the Bruck loop work, they will need some proofs, is because the natural framework to define objects like Shimura varieties is within some higher order logical structure; even for the definition of a topological space one has to quantify over subsets rather than just elements. On the other hand, after AITP 2019, Josef Urban showed me some work where the E theorem prover had managed to prove basic theorems about topological spaces given enough “training data” theorems.

If instead of Shimura varieties we think about getting computers to prove theorems about perfectoid spaces, then Commelin, Massot and I have achieved the first bullet point above. As you can see, we still have a vast amount of work to do before we can catch up with Peter Scholze. Another milestone in this game would be to prove the tilting correspondence, because proving that would involve proving hundreds of other theorems along the way (which could also be used to train). However before that milestone, we still have the milestone of stating the tilting correspondence. Even stating it will involve having to prove some theorems. There is still a long way to go.

Posted in General | Tagged , , , , , , , , , , | Leave a comment

What is homotopy type theory? An amateur speaks.

Over the past few months I have been attempting to get some kind of an idea about what homotopy type theory is. Lean uses dependent type theory, which is another kind of type theory; homotopy type theory is what Voevodsky studied and what many other people still study. But Reid Barton made some comments on the Lean chat which basically opened my eyes and I’ll just write them down now before I forget them.

Last summer, a bunch of Imperial students formalised a bunch of mathematics in Lean. Most of them stuck to stuff on our undergraduate syllabus, but Ali Sever raised the possibility of “formalising Euclid” and even though at the time I had no real idea about how that would work, I said “sure”. It turns out that Ali was not the first person to think of this idea in the 2000+ years since the Elements was written, and things were very well-understood. But not by me. So here’s an explanation.

I used to have to deal with geometry problems when I was younger and found myself doing various Maths Olympiad problems. My gut feeling with the plane geometry problems was that “one should be able to do them all in theory just by working out the coordinates of everything”. However of course there were often very cute proofs which just involved invoking standard theorems in plane geometry (of which there seem to be very many, some of which are of course proved in Euclid’s books). Are these methods actually fundamentally different? They certainly look fundamentally different. In fact they must be fundamentally different, because Descartes didn’t invent Cartesian coordinates until over 1500 years after Euclid had died.

So synthetic geometry is what Euclid was doing; he never chose coordinates. He had an abstract notion of a point, and an abstract notion of a line, and axiomatised what he needed from them: between two distinct points one could draw a line, and so on. He never attempted to define “point” or “line” — for Euclid, these notions were fundamental and needed no definition [note added later on: David Corfield points out that actually Euclid did offer “definitions” of some sort for point and line, but arguably never then uses them]

On the other hand, analytic geometry is what you get if you use coordinates. We need the real numbers \mathbb{R} (which we could have built using the axioms of our system) and then we can define the plane to be \mathbb{R}^2 and we can define a point to be an element of \mathbb{R}^2 (that is, a pair of real numbers), and we can define a line to be the subset of \mathbb{R}^2 defined by a degree 1 equation, and so on.

You can see that these are somehow two different theories. A powerful way of seeing this is that Euclid develops some geometry without assuming the parallel postulate, and then he assumes it and proves more. There was some sort of industry pre 1800 of trying to prove the parallel postulate from Euclid’s other axioms, but nobody was successful. Nowadays we understand why — all of Euclid’s axioms other than the parallel postulate hold for geometry done on a sphere (and the parallel postulate fails). As a consequence, all the results in planar geometry which Euclid proved without assuming the parallel postulate also hold for spherical geometry.

Tarski’s synthetic approach to geometry has an axiom (schema) of continuity, corresponding essentially to the completeness of the real numbers. It would not surprise me if the results that Tarski proved which did not assume completeness of the reals were also true in models of planar geometry such as k^2 where k is an appropriate subfield of the reals (for example the real algebraic numbers, which are not complete).

What we begin to see is that with the synthetic approach, we are inspired by the particular picture we’re trying to model, but there might be other models too, and we might have to add other axioms later on for convenience if we want to prove more results about the particular model we have in mind (and indeed we will have to add these axioms if there are other models of our theory in which these results are false).

In mathematics, we can define topological spaces, and talk about maps between them, and what it means for maps to be homotopic, and then invent homotopy theory, the basics of which are taught to second year maths undergraduates at Imperial College if they choose this option. One could regard this as “analytic homotopy theory”.

What homotopy type theory is, I now finally realise, is a synthetic version of this. There are types X : Type and then there are terms a : X . The model is that X is supposed to be a topological space, and a is supposed to be an element of X. But here’s the weird part. The equation a = b is not supposed to be interpreted as meaning that the points a and b are equal. A proof of a = b (or, to give it its more formal name, a term of type a = b) is supposed to be interpreted as a path from a to b. Symmetry of equality is then interpreted as saying that if we have a path from a to b then we have a path from b to a, and transitivity is just path composition. If we have two terms p1 and p2 of type a = b then a term of type p1 = p2 is a path from p1 to p2, or, in other words, a homotopy between p1 and p2. I had read this interpretation several times, and have known full well for over a year that this is how I was supposed to “interpret equality” — but the penny had never dropped as to what that actually even meant. But now I get it, or at least the basic idea:

Homotopy type theory is to the theory of topological spaces and homotopy classes of maps, as Euclid’s synthetic geometry is to Descartes’ coordinate geometry.

I read in Floris van Doorn’s thesis that it is an open problem to define Grassmannians in homotopy type theory. When I first read this, I was completely confused. One can build mathematics in homotopy type theory, so one can build topological spaces, the real numbers, and build Grassmannians the way they are usually built in a maths course. It was Reid Barton who pointed out to me that this was not what the question meant. Of course one can do this — this would be an analytic construction. The open problem is to give a synthetic construction. It is analogous to me saying that one could just prove an olympiad geometry question by a coordinate chase. I am clearly an unsynthetic kind of guy. Of course, one issue could be that there are perhaps other models of the axioms of homotopy type theory where Grassmannians really don’t exist…

Thanks to Ali Sever, whose questions and research last summer turned out to be a necessary prerequisite for the penny to drop. Ali’s work formalising Tarski’s synthetic geometry in Lean is here.

Posted in undergrad maths | Tagged , , , , , , | 10 Comments

Proofs are not programs

The Curry-Howard correspondence can be pithily described as saying “proofs = programs”. The statement is short, alliterative, and rather beautiful. As a professional mathematician I know exactly what a proof is; as an amateur programmer I have a pretty clear idea about what a program is, and the concept that they might be in some sense the same thing is both enigmatic and appealing. A proof is a logical sequence of statements deducing a theorem from the the rules of maths; a program is a logical sequence of commands producing valid code from the rules of the programming language being used. When learning Lean I saw the correspondence in action many times, and so it actually took quite some time for the penny to drop: the Curry-Howard correspondence is not actually true.

Of course something is true, but what is not true is any statement of this form where “proof” is interpreted as “what the pure mathematicians in my department do”. What is true is that if you stick to constructive maths or intuitionistic logic or one of the flavours of maths where you’re not allowed to use the axiom of choice or the law of the excluded middle — in other words a version of maths which is unrecognisable to over 95% of pure mathematicians working in mathematics departments — then you might be in good shape. But that is not what I see happening in my mathematics department or the other mathematics departments which I’ve had experience of. Some proofs are not programs.

Here’s an example of a proof which is not a program. The original proof that the class numbers of imaginary quadratic fields tended to infinity was the following rather extraordinary argument. First Landau showed that the theorem was true assuming that a certain generalisation of the Riemann Hypothesis was true, and then Heilbronn managed to show that the theorem was true assuming that this generalisation was false! See for example Keith Conrad’s comment on this mathoverflow question, and for more details see p359 of the book “A Classical Introduction to Modern Number Theory” by Ireland and Rosen (I have a copy of this book in my office if any Imperial students are interested in seeing more details).

The analogous situation for programs would be something like the following. Your boss asks you to write some computer code to do a certain task so your company can release their new product. A week later you enter their office with two USB sticks, explaining that one of them does the task if the Riemann hypothesis is true, and the other one does the task if the Riemann hypothesis is false. What you seem to have done, on the face of it, is proved that the code your boss wanted exists. You might also have lost your programming job 😉 In fact it’s not even 100% clear that you have proved that the code exists — what if we replace the Riemann Hypothesis in this story by one of these weird unprovable logic statements like the continuum hypothesis, which is true in some models of maths and false in others? Then where’s the program?

But actually it’s even worse than that. In my last blog post, I gave an example of a computable bijective function which did not have a computable inverse. A similar sort of argument shows that the following proof:

Theorem. If f: X \to Y is a surjective function, then there exists a map g: Y \to X such that f\circ g: Y \to Y is the identity function.

Proof. Define g thus: if y\in Y then choose x\in X with f(x)=y (possible by surjectivity of f) and define g(y) to be this x. The fact that f(g(y))=y is true by definition. QED.

does not correspond to a program. The theorem is just the axiom of choice in disguise, and the axiom of choice is about as non-constructive as you can get. Programs are algorithms. The axiom of choice says that we can do something when there is no algorithm.

What is a Lean file then?

So this a bit confusing, because I can prove that theorem above in Lean:

noncomputable def g {X Y : Type} {f : X → Y} (hf : function.surjective f)
  (y : Y) : X := classical.some (hf y)

lemma g_prop {X Y : Type} {f : X → Y} (hf : function.surjective f) 
  (y : Y) : f (g hf y) = y := classical.some_spec (hf y)

theorem has_right_inverse_of_surjective (X Y : Type) (f : X → Y)
  (hf : function.surjective f) :
  ∃ g : Y → X, ∀ y : Y, f (g y) = y := ⟨g hf, g_prop hf⟩

and that looks like code to me. But not all computer files are computer programs. As an example, consider a LaTeX file. The LaTeX program is the program, the LaTeX file is just the input to the program, and the output is typically a pdf file.

Lean itself is a program, but the Lean files which people make are not all programs. Some of them are though! Some Lean files (which use constructive logic and only computable functions) are turned into bytecode when run through Lean, and bytecode is a program. But no bytecode is generated for the definition of the function g above because it is noncomputable, and that stops the proof has_right_inverse_of_surjective from ever being a program. Lean itself can verify that the proof is correct, but the proof itself is not a program.

I am still getting to grips with all this computer science malarkey. I asked a bunch of questions about this stuff when writing this blog post, on this thread in the Lean chat (login required), and thanks as ever to Mario Carneiro for answering most of them.

Posted in computability | Tagged , , , | Leave a comment

The inverse of a bijection.

In my “introduction to proof” course I teach the undergraduates about equivalence relations. What the students do not know, is that earlier on in the course I have secretly inserted examples of equivalence relations without ever letting on. For example, in the number theory section I prove that that if N is a fixed positive integer, and a, b, c are integers, then

a \equiv a \pmod N;

a \equiv b \pmod N \implies b \equiv a \pmod N;

a \equiv b \pmod N and b \equiv c\pmod N \implies a \equiv c\pmod N.

And when we do equivalence relations I can then say “oh look! Congruence mod N is an equivalence relation!”

I do the same trick with bijections. I prove that the identity function is a bijection, the inverse of a bijection is a bijection, and that the composite of two bijections is a bijection. Then later on I can say “oh look! ‘I biject with you’ is an equivalence relation!”

Coding up congruences mod N in Lean is a joy to do, and one of my example files on this stuff even made it into mathlib!

But coding up the bijection equivalence relation into mathlib, one runs into an issue which is hard to explain to mathematicians. The problem is with symmetry — proving that the inverse of a bijection is a bijection. Say f: X \to Y is a bijection, and y\in Y. We’re trying to define the inverse function to f and we want to figure out its value on y. Lean has no trouble proving that there exists a unique element of X such that applying f to it gives y. But when you try to access this element of X while defining the inverse of f, you get an intimidating error:

induction tactic failed, recursor 'Exists.dcases_on'
can only eliminate into Prop

Now Lean 3 does not have particularly friendly error messages, but as a Lean beginner I remember finding this one completely baffling. As far as type theory is concerned, there is a universe issue; the user just tried to define a function to Type but the only function Lean could find was one into Prop. A mathematician using Lean might know about Lean’s two kinds of universes — Prop is the universe of true-false statements, where things like the statement of Fermat’s Last Theorem lives, and Type is the universe of data, where things like the real numbers live. So what is going on? The problem is that \exists x \in X, f(x)=y is in Prop, but the actual x which exists is in Type, and Lean is complaining that it does not have an algorithm to get hold of x, and hence to move from the weak Prop universe to the stronger Type one.

Now of course to a mathematician this is just all ridiculous — the algorithm is “there is a unique x — take that one”. The Lean beginner will learn that instead of applying the cases tactic to ∃ (x : X), f x = y to get x, they can apply the choose tactic, and then one gets a new error

definition 'g' is noncomputable, it depends on 'classical.some'

which one fixes by writing noncomputable theory at the top of the file. One is left with a feeling that Lean is kind of bewildering, and why was there so much fuss about applying the obvious algorithm “take the unique x, duh”?

Recently on the Lean Zulip chat there has been talk amongst computationally-minded people about fancy things such as domain theory, topologies, the difference between option and roption, and difference lists. All of this has been slowly sinking into my head in some sort of way, and it culminated this morning in my actually seeing an explicit example (needs Zulip login) of a bijection between two sets where even a mathematician can see that there is some sort of issue with defining the inverse bijection. Thanks to Mario Carneiro for coming up with the example. The rest of this post explains it.

The example.

The sets in question are as follows. Let X=\{-1,0,1,2,3,\ldots\}=\mathbb{Z}_{\geq-1}. That’s easy enough.

Let Y be the set of infinite sequences P(0), P(1), P(2), \ldots where each P(i)\in\{T,F\} (true or false), and with the property that for all m\geq0 we have P(m)\implies P(m+1). In words — if at some point in the sequence we see T, then everything after that point is also T.

Examples of elements of Y:

FFFFTTTTTTTTTTTTTTT...
FFFFFFFFFTTTTTTTTTT...
FFFFFFFFFFFFFFFFFFF...
TTTTTTTTTTTTTTTTTTT...

It is not hard to convince yourself that sequences in Y come in two flavours: either the sequence has some finite number n\geq0 of Fs and then everything afterwards is T, or the sequence can be all Fs.

The bijection f: X\to Y sends -1 to the “all F” sequence, and n\geq0 to the sequence which contains n Fs and then reverts to T forevermore. This is easily checked to be a bijection, and one even feels that it’s possible to write down the inverse. But how easy is it in practice?

Here’s the issue. Say I now give you a “black box” sequence P in Y. It’s a function, I can guarantee that it’s in Y (I even have a proof!), and if you give me any i\in\mathbb{Z}_{\geq0} I will return P(i). We know that there is a unique n\in X with f(n)=P and it’s now your job to find it. What is your algorithm?

Initially you can start off by evaluating the function at some values. It turns out that P(3) is F, that P(100) is F and then that P(10^{18}) is F. You realise that the n you’re looking for is either a very large integer, or -1. But which one? You have what looks like complete access to P\in Y, but you cannot work out which n\in X it corresponds to — indeed you cannot even say whether n\geq0 or not!

After a while, you get sick of this and say that actually you haven’t got complete access to P and I am suppressing information. You complain that you don’t just want to evaluate P at lots of numbers, you want to see the actual definition. So I reveal the definition: P(i) is defined to be T if there is a counterexample to the Goldbach conjecture which is at most i, and F otherwise. To put it another way — if the Goldbach conjecture is true then the sequence is all Fs and n=-1, but if it fails at some point then n is the positive integer where it fails. I even show you the one-line proof that P is an element of Y: if there’s a counterexample which is at most m then clearly there’s a counterexample which is at most m+1. Even with the definition, you are none the wiser!

Conclusion.

When a function is marked as noncomputable in Lean it means that Lean has not been told a method for working it out for every input. The innocuous-looking statement “there exists a unique x\in X with f(x)=y, so choose that x” is clearly something which can be done abstractly, but there is no method for finding x, even if we can prove that it must exist. There can’t be a method for finding it in general — any such method would translate into a method for proving Goldbach’s conjecture, the 3n+1 problem, Fermat’s Last Theorem, the twin primes conjecture and so on; these are hard problems. Indeed, one can prove using a diagonalisation argument that no algorithm can exist to invert this bijection. The inverse of our computable function f: X\to Y certainly exists, but it is a noncomputable function. Lean can handle such functions, no problem, but it has to be told explicitly that it should be ignoring computability issues (e.g. with the line noncomputable theory at the top of a file).

Posted in Learning Lean, M1F | Tagged , | 9 Comments

Equality part 3: canonical isomorphism.

I hadn’t planned on writing a third equality post, but my talk at Big Proof 2019 (see slides here) ended up having a huge section about “mathematician’s equality”, and I realised I had more to say about it. Here are some more detailed thoughts about the issue.

Examples of canonical isomorphisms.

Canonical isomorphism is a magic kind of equality, apparently accessible to mathematicians who are “thinking about things in the right way”. It has several definitions; it has no definition. However, we know it when we see it, we are almost universally in agreement on the notion, and hence we don’t seem to really challenge each other on it. The isomorphism between a finite dimensional vector space and its double dual is canonical — we all know that. The class field theory isomorphism is canonical (at least up to sign) — we know that too. Even the Langlands correspondence is supposed to be canonical — it is for that reason that we believe in the Langlands group. On a more mundane note, R[1/fg] is canonically isomorphic to R[1/f][1/g]. Do we even stop to think what these things mean? Formalisation has forced me to think about one of these, and it led me to a surprising conclusion.

Wikipedia defines a canonical map as — well, why don’t you just take a look. There is no formal definition there at all. The page is tagged with the “mathematical terminology” tag but one could argue that this is stretching the point, unless you regard “the map which preserves the widest amount of structure” or “conventionally agreed upon to be the most useful for further analysis” to have some sort of mathematical meaning.

But what am I saying here? Is there no space in mathematics for guiding principles, and not-completely-formed ideas about how things should fit together? Can we not experiment with concepts and worry about dotting the i’s and crossing the t’s later? Of course we can do these things! That is exactly the art of mathematics! In fact, it’s more than just allowed — it is essential that mathematicians are allowed to be creative, and come up with brilliant ideas of maps between apparently unrelated objects such as Galois representations and automorphic representations, and then suggest that these constructions are in some way canonical and begin to find out exactly what other observations such an idea might lead us to. This way of thinking is essential for mathematics to proceed.

One thing I understood recently is that there are two independent subtleties here, so in this post I will separate them out. The first is well-known, the second I think less so.

Issue 1: Mission creep.

The first issue is the many contexts where the word “canonical” is used. Sometimes people use it in a situation where it can be easily rigorously formalised. In category theory there is a perfectly rigorous definition of a natural isomorphism. For example, in the category of finite dimensional complex vector spaces, we could consider the functor sending a vector space V to itself, and we could consider a second functor sending a vector space V to its double dual. There is a natural isomorphism between these two functors, sending a vector space to its double dual. One can look up the definition of natural isomorphism on Wikipedia; it is the statement that a precise collection of diagrams all commute. People say that the obvious map from a vector space to its double dual is “canonical” but they could instead say that it is a natural transformation. Another example would be where two objects both satisfy a certain universal property. Then a piece of abstract nonsense proves that there are are unique isomorphisms in both directions between these two objects, again both making a bunch of diagrams commute. Again a mathematician might say that isomorphisms arising in this way are “canonical”, and again there is an equivalent category-theoretic language which they could use instead to say precisely what they mean.

However there are times in mathematics where the word is being used beyond this remit. Some number theorists would say that the isomorphisms of global class field theory are “canonical”. This is, I believe, a different usage of the word. There might not be some formal category-theoretic framework which one can hide behind here (although people attempting to categorify the Langlands program might well believe that such a framework exists). What is going on here is that Artin made some construction, an isomorphism between two abelian groups attached to a number field (one using adeles and one using Galois groups), and this map he constructed obeyed a nice long list of properties which have been explicitly written down (for example the behaviour of the map when you change from one field to a finite extension), and it is not hard to show that that there can be at most one construction with these properties. Such a construction does exist (a hard theorem), and this construction then gets the name “canonical”, which here seems to be a specialised and different use of the word. Note however that Artin’s construction gives an isomorphism between two abelian groups, and one can compose this isomorphism with the map sending an element of an abelian group to its inverse, giving us a second canonical isomorphism between these two groups, which satisfies a similar-but-not-exactly-the-same nice long list of properties, arguably indistinguishable from the first list in the sense that it is not clear (at least to me) which list is “better”. However the two lists are distinguishable in the sense that they are different, and can be told apart. In this situation it is hence difficult to say which of the constructions is the “most canonical”. The constructions have got different names, and both are used in the literature. One is called “class field theory normalised the classical way, sending uniformisers to arithmetic Frobenii” and the other is called “class field theory normalised the Deligne/Langlands/Carayol way, sending uniformisers to geometric Frobenii”. A geometric Frobenius is simply the inverse of an arithmetic Frobenius. This example is quite jarring, read within the context of Wikipedia’s definition of a canonical map at least, because both maps are useful (the classical normalisation when doing concrete things like Heegner points, the more modern normalisation when doing the cohomology of Shimura varieties), and there is no particular agreement within the number theory community as to which choice is “best” — it simply depends on what you are doing.

Andre Weil once jokingly wrote “I can assure you, at any rate, that my intentions are honourable and my results invariant, probably canonical, perhaps even functorial”, perhaps parodying the usage of the word in mathematics at that time (the 1950s). I think this sums up rather well the way the word is sometimes used nowadays.

Issue 2: canonicality beyond the universal property.

There is however a second issue, which I began thinking about when preparing my Big Proof talk, and this is when mathematicians see an isomorphism which is “canonical” in a sense which can definitely be made completely precise, and then decides that the isomorphism is so completely canonically canonical that it will not cause any problem to simply use the notation = for it. J. S. Milne, in his book “Etale cohomology“, even explains in his notational conventions that canonical isomorphisms will be denoted by =. [What, all of them? Even the two distinct canonical isomorphisms of class field theory? It is not hard to come up with a very plausible-looking proof that x=x^{-1} for x in the abelianisation of the absolute Galois group of the rationals using this convention; of course, this is not what Milne means to say (although it would be interesting to hear his definition of “canonical” in this context, if indeed he had one)]. Milne is of course a professional mathematician who knows what he is talking about, and so uses this convention sensibly. He is not the only one. Even Grothendieck adopts this convention in EGA (EGA I, section 1.3.3), although he does not seem to be so explicit about it. The context here is that if R is a ring, and f and g are two elements of R with the property that a prime ideal of R contains f if and only if it contains g (for example we could have g=f^2 or g^3=f^2), then one can check that there are unique R-algebra isomorphisms R[1/f]\to R[1/g] and R[1/g]\to R[1/f] whose composite in both directions is the identity. Grothendieck himself uses the French word “canoniquement” in the link above when talking about this identification, and five lines later is using = as its notation, when claiming M_f = M_g, a claim of the same nature as the claim R[1/f] = R[1/g]. If R is an integral domain then one can build R[1/f] and R[1/g] as explicit subsets of the field of fractions of R and then they really are equal subsets. However in the general case one does not have this luxury. What does it even mean to say R[1/f]=R[1/g]? Grothendieck explicitly wrote M_f = M_g. What does this even mean? They are not equal in the formal sense at all, as we are about to see — but they are most definitely canonically isomorphic — any mathematician who knows the area would tell you that.

This is a serious issue for formalisers. This notion of equality here is most definitely not definitional equality or syntactic equality. In dependent type theory one could attempt to interpret it an equality of two types, however in this context it is not really reasonable to argue that the types are equal. Anyone who has seen the usual construction of R[1/f] knows that it is equivalence classes of pairs (r,n) with r\in R and n\in\mathbb{Z}_{\geq0}, where (r,n) is initially interpreted informally as r/f^n, even though at this point these symbols make no sense. The equivalence relation is then defined, informed via this interpretation, and then the informal interpretation is forgotten about, and a ring structure is put on the equivalence classes. The ring R[1/f] is proved to have a universal property, namely that any ring homomorphism R\to A sending f to a unit extends uniquely to an R-algebra homomorphism R[1/f]\to A (did I mention that all rings are commutative and have a 1, and all morphisms send 1 to 1?). This is the universal property. In this optic, the two equivalence relations on R\times\mathbb{Z}_{\geq0} really are not equal in general, and it’s quite messy writing down the canonical map on equivalence classes here; it’s much easier to prove a theorem once and for all saying that g is invertible in R[1/f] and vice versa, and then we can use the universal property to define the maps which give us the canonical isomorphisms.

Here is my objection to this usage. The moment two mathematical objects are equal, it is apparently possible to replace one by the other in any argument whatsoever, without giving any justification. Here is an explicit example, which bit me hard a year or so ago. It is a standard argument in basic graduate level algebraic geometry which I have now read many treatments of, and I am not really satisfied with any of them. For simplicity, I will talk about the treatment in the stacks project, because it is the easiest to cite. In tag 01HR, it says (a little before definition 25.5.3) “Thus we may apply Algebra, Lemma 10.23.1 to the module M_f over R_f and the elements g_1, g_2, \ldots, g_n”. But actually this is not strictly speaking possible, from a formalist perspective. Bear with me here, this is not a picky pedantic point, there is a little more to it than that. The lemma being cited is in tag 00EK, which is some statement about an exact sequence of modules. There are unfortunately two modules M here, one in 01HR and one in 00EK, and they’re not the same. For simplicity let’s assume that M=R in 01HR, which then means that M=R[1/f] in 00EK. To make things even simpler, let’s assume that the elements f_i\in R[1/f] are actually elements of R, and let’s call them g_i to coincide with the notation in 01HR, so D(g_i)\subseteq D(f) using the usual notation for open sets attached to elements of R. Even with these simplifications, there is still a subtle, almost invisible, problem, which formalisation is about to show us.

The problem is now that 00EK is a statement about an exact sequence relating R[1/f], R[1/f][1/g_i] and R[1/f][1/g_ig_j]. We now want to apply this to prove a result relating R[1/f], R[1/g_i] and R[1/g_ig_j]. Now because of the assumptions we know that R[1/f][1/g_i] and R[1/g_i] are canonically isomorphic. If we now just assume that they are equal, and we furthermore just assume that the maps R[1/g_i]\to R[1/g_ig_j] are equal to the maps R[1/f][1/g_i]\to R[1/f][1/g_ig_j] (i.e. the diagrams all commute), then there is no problem at all! This is precisely what the stacks project is doing, it is what Grothendieck does in EGA, it is what Hartshorne does, it is what everyone does.

The authors here are smart people, so let’s step back and just check that their intuition is correct. Of course, any one of these authors, when challenged, will say that they know that R[1/f][1/g_i] is not literally equal to R[1/g_i], however both maps satisfy the same universal property, and so they are canonically isomorphic. So now we have a diagram — a sequence we’ve shown to be exact, and a sequence of canonically isomorphic objects which we want to show is exact. So now we draw in all the canonical isomorphisms and now we just have to prove that all the resulting squares commute. But all the isomorphisms are natural, coming from a universal property, so all the diagrams will commute because of this, and everything will work out fine. Right?

But guess what — the map \beta in 00EK is not a morphism of rings. It cannot be defined via the universal property directly. The universal property really does not tell us immediately that the diagram commutes! We are kind of cheating here; there is a diagram, we can’t see it because of this trick of using = to mean an isomorphism, the isomorphism is natural, but the naturality does not immediately imply that the diagram we need to commute, commutes. We have slightly over-stepped the mark. We need two applications of the universal property and then we need to take their difference and observe something of the following form: if two squares involving four abelian groups commute, and the vertical maps are the same in each square, then subtracting the horizontal maps gives rise to a a third diagram which also commutes. There is some little missing argument here. I never spotted this argument until I tried to formalise it. Once I’d realised that from a formalist point of view 00EK was actually too weak to use in 01HR, I had to then prove a bunch of lemmas showing that 00EK (whose proof was a lot of work, done by Chris Hughes) plus a horrible diagram chase whose proof is omitted here but which could not be omitted in Lean, was enough to deduce 01HR. It concerned me that at some point I had to go “beyond the universal property”. Not only that, it made this part of the code base much harder to follow. I had got us into this mess by telling Chris Hughes to prove 00EK without checking beforehand that it would suffice in applications — I naturally assumed that the literature didn’t have subtle holes in. Ramon Fernandez Mir has rewritten the entire code base from scratch, and did not prove 00EK by hand but instead proved a more invariant version, using rings isomorphic to R[1/g_i] etc instead of the actual rings themselves. Instead of using the universal property (which might have caused issues with universes) he used a predicate introduced by Strickland which characterises precisely when a ring homomorphism R\to A makes A into an R-algebra isomorphic to R[1/S] as an R-algebra. Everything works much more nicely — but we do not prove 00EK as it stands; we prove something slightly more general. We prove the thing that we actually need.

Conclusions.

What I would hence like to argue is that this lazy usage of = to mean “canonical isomorphism” actually blinds some mathematicians. Equality is such a powerful thing. If two canonically isomorphic objects are now deemed to be equal because they both obey some universal property, then all of a sudden we can manipulate them as if they are equal regardless of whether we are using the universal property or not. This is a very useful trick, because it enables us to cheat effectively without our peers noticing. By “cheat” here I mean that it enables us to leave out tedious arguments. No mathematician cares about these tedious arguments, as long as they do actually exist. In the case of schemes, they do exist. Of course they exist! If there was some fundamental problem with structure sheaves, people would have obviously spotted it decades ago. But what about all the other uses of = to mean canonical isomorphism? Are we really checking them? Would we definitely know if some subtlety came up?

Imagine the following situation. Say there are two objects X and Y in your argument. Say that it is well-known that they are canonically isomorphic. Now maybe there are some constructions that one can make easily using X. And maybe there are some other constructions that one can make easily using Y. The constructions do involve the hands-on construction of X and Y and in particular go “beyond” the functoriality or universality of the isomorphism X\to Y, however they “have the same feel to them”, so they are clearly “equal” — at least, they feel as “equal” as X and Y are “equal”. Wouldn’t it be easy just to pass off the assertion that these constructions are also “equal”, even though this cannot be formally deduced from the functorial properties that one knows which give rise to the isomorphism X\to Y? Could that happen in the literature or is this just abstract navel-gazing?

It absolutely happens! And here is a very important time where a mathematician very nearly got away with it! In Dick Gross’ seminal 1990 paper on companion forms (“A tameness criterion for Galois representations associated to modular forms (mod p)”, Duke Math Journal Vol 61 no 2, Oct 1990) Gross canonically identifies several cohomology theories of modular curves, and then claims without any justification whatsoever that the Hecke operators defined on these cohomology groups are also identified. The cohomology groups were known to be “canonically isomorphic”, however this was one of those “canonical isomorphisms” which was of the form “look at this neat map; now everything works, except it takes many pages to justify this”. The canonical isomorphism was not even known to be functorial or natural, it was just “canonical” in this vague sense. In particular it was not known to commute with correspondences and hence with Hecke operators, in 1990. The anonymous referee picked up on this subtlety, but rather than rejecting the paper they just demanded that Gross explicitly flag that there was a gap in the argument. The paper was still published, with a comment about this in the introduction. Four years later, with the gap still not filled, Wiles and Taylor-Wiles used the paper in their proof of Fermat’s Last Theorem. Nobody batted an eyelid. I asked Taylor about this in 1995 and he assured me that he had checked that in the application of Gross’ results to the FLT proof, one could get around this issue. Most other people did not care becuase the results were “obviously true anyway” — at least that was my impression. There are a few people out there who care though; Taylor is one, and another one of them is Brian Conrad. It was not until 2007, seventeen years after Gross’ paper was published, that Bryden Cais, a student of Brian’s, resolved these issues in his PhD thesis. Nobody ever worried about this issue — because everyone knew that the results were going to be true. But with Gross thinking about canonical isomorphism as equality, we almost missed that there was even an issue to be checked.

How many other times has this happened in mathematics, and how many times have we, the community, not noticed? The referee of Gross’ paper was rumoured to be Serre, an extremely careful mathematician. Not all of us are as careful as him. Sloppy use of = can cause trouble.

Posted in Uncategorized | Tagged , | 11 Comments