Does anyone “know” a proof of Fermat’s Last Theorem?

So there’s a piece in Vice which quotes me as saying that I fear “all published maths is wrong”. Of course this is nonsense: indeed, as every mathematician knows, some published maths is wrong, but the vast majority of it is fine (indeed some of it has even been checked in Lean 🙂 ). The idea of how much of the literature is wrong/incomplete/whatever came up in the context of explaining various viewpoints which Patrick Massot, Michael Harris and myself had been exploring during our PhilMath Intersem last June, but for the record, “all published maths is wrong” is not something I fear, because I know it’s not true. There is another quote attributed to me: “I think there is a non-zero chance that some of our great castles are built on sand”, which I did say, and which I think is a far more accurate representation of my views (but rather less clickbaity). I guess that this viewpoint — that maybe some important stuff is not ok — is not believed by all mathematicians. Some seem to think that all the important stuff is fine, and I’m less sure.

But I also spoke about Fermat’s Last Theorem in the article, and made a claim which again not everyone in my community seems to believe — indeed Persiflage expressed an opposing view on Twitter. The claim I made is that no human knows a complete proof of Fermat’s Last Theorem. Speaking as someone who was doing a PhD in precisely the area used by Wiles and Taylor at the time it all blew up, I am lucky enough to have an overview of the argument — and certainly Persiflage does too. But does one human know all the proof? Last year I asked Wiles if he knew all the details of his own proof (30 minutes in), and he does not say “yes”. Indeed he says that he would not want to sit an exam on the proof of Langlands–Tunnell.

So what does it mean to “completely understand a proof of Fermat’s Last Theorem”? Vice quotes me as saying that nobody “completely understands” it, but it also claims that I suggest “(nobody) knows whether it’s true”, which again is a bit misrepresentative. There are people that I trust, who have checked the parts which I have not checked, so I think we can argue that I know (to the extent that a human can know anything) that FLT is true. But what about finding that one person who “completely understands” it? Let’s take Wiles’ interpretation — could we find someone who is guaranteed to pass an exam whose syllabus was all of the mathematics involved in the proof? I think we should make it an open book exam, so the human being examined can look at the literature — but they cannot consult other humans.

And here I’m going to stick to my guns. To completely understand a proof of FLT (let’s say, for now, the proof explained in the 1995 Darmon-Diamond-Taylor paper) you will need to be a master of the techniques used by Langlands in his proof of cyclic base change (and I know people who are), and a master of Mazur’s work on the Eisenstein ideal (and I know people who are). But what about the far less sexy technical stuff? To move from the complex analytic theory of modular forms to the algebraic theory you will need to know the delicate interplay between the analytic and algebraic theory of moduli spaces of elliptic curves (and I know people who know this — but I went through some of this stuff once and it’s far more delicate than I had imagined, and there are parts of it where the only useful reference seems to be Brian Conrad’s brain). This last example is perhaps a good example of a tedious technical issue which it’s very easy to forget about, because the results are intuitive and the proofs can be technical. There are many other subtleties which one would have to fully understand because they’re on the syllabus. Is there really one human being who would feel confident answering questions on all of this material? I am really not sure at all.

And I think the reason I am not sure is because Lean has taught me that checking the details can sometimes be hard. If someone had asked me two years ago if I knew the definition and basic properties of adic spaces, I would have said “sure — I even wrote a paper about them”. But when formalising them I realised that there were all sorts of subtleties which I had just “taken for granted” because they were tedious technical issues which “of course I could work through in theory”. Actually working through some of them, when formalising them, made me understand that some of these issues were far bigger and more complex than I had realised. In the perfectoid project there is still an issue regarding the topology on an adic space, which I had assumed would be all over within about a page or two of pdf, but whose actual proof goes on for much longer, and at one point even invokes Hochster’s thesis (which I’ve certainly not read), although Conrad assures me that it is not necessary. We’ve formalised the definition, but there are still some very basic properties of adic spaces which we have not formalised. Maths is hard. My belief is that, over time, mathematicians begin to forget this, and they start to conflate the idea of a result being “standard” with the idea of a result that they completely understand the proof of. Our notion of proof is so different to the formal notion that we skip huge steps knowing that the techniques will work, and after a while we may forget what is involved in these techniques (if indeed we ever knew). This observation is not really of mathematical relevance — if I need the Langlands–Tunnell theorem in my work (and I did, once, in my work on Artin) then I will invoke it and not lose any sleep over it. But on the other hand, do I really completely understand the proof of that Artin work — my own theorem? I wouldn’t say I did.

Advertisements
Posted in Uncategorized | Tagged , , , | 9 Comments

IMO 2019 Q1

Someone sent me a link to this year’s IMO questions this afternoon at work, and the first one caught my eye because it looked like a relatively straightforward functional equation question: find all f : \mathbb{Z}\to\mathbb{Z} such that f(2a)+2f(b)=f(f(a+b)). I’ve seen enough of these in my time to know how to go about them, and it didn’t take me long to find a solution on paper. I then had to go to a meeting, but half way through when I didn’t have anything else to contribute I idly started formalising my solution in Lean.

Now formalisation is hard, so I knew I would end up getting bogged down in details and things which looked straightforward on paper would end up messy. At some point someone divides by 2, that looks a bit problematic with the integers, and then there’s some integer induction — I kept waiting to get to the place where it was all going to go horribly pear-shaped.

But that place never came. Powerful tactics like convert and ring , the conv mode, and a sensible induction procedure for the integers meant that the Lean proof flowed just as the paper proof had done. Here it is! It probably took over half an hour, but under an hour, to finish it. Of course the question is not too hard — but that’s not the point.

The point is this. Before I got into the formalisation business, it was “obvious” to me that formalising questions like this should be easy. When I actually started, it became very clear to me very quickly that formalising IMO questions would be very hard, because formalising anything with content is hard. But the nice surprise is that actually, once one has a grip on formalisation, if one is working in an area which is “sufficiently close to the axioms” like elementary number theory or algebra, maybe nowadays in Lean it really is becoming as easy as I always “knew” it would be.

Posted in General, number theory | Tagged , | 1 Comment

Lean does not (yet) have real manifolds.

There are loads of computer theorem provers. There is Lean, there is Coq, there is Agda, Idris, Isabelle-HOL, HOL-Light, UniMath, Mizar and many others. Which one is the best one for formalising a proof of Fermat’s Last Theorem?

I’ve spent the last few months having a good look at all of these systems and I think that out of all the ones which are currently on offer to us (the mathematical community) the one which offers most hope is Lean. I checked Lean’s suitability by defining perfectoid spaces in Lean 3 with Patrick Massot and Johan Commelin, two other mathematicians — and it worked fine. We ran into a lot of problems along the way, but in essentially every situation the problem we were in was of our own making, and we were rescued by computer scientists who know nothing of perfectoid spaces but who are on the other hand very well equipped to answer extremely specific technical questions and give us extremely specialised advice about how best to turn mathematics into functional programming in practice.

But we still don’t have smooth real manifolds. I was initially perplexed by why things were going so slowly; I saw a definition from first principles as an undergraduate. But now I understand the reason, and it’s a bit bizarre. The reason is that a general principle in Lean is that one should set things up in as much generality as possible, for certain efficiency reasons which are of no interest to mathematicians. A more precise explanation is of course the extremely specialised advice offered to us by the computer scientists in this situation. This has an unfortunate consequence that you can’t just wait for an Imperial undergraduate to do it 😦 One apparently really needs to think about differentiating maps between some quite general real topological vector spaces, at least some of which can be infinite-dimensional. This kind of stinks, but that’s the way it is with dependent type theory. Important: I am not saying that real manifolds can’t or won’t be done in Lean. I am just saying that we know what to do but it is taking a long time and it is more technical than one might guess at first. We’ll get there. Thank you to (in alphabetical order) Jeremy Avigad, Sebastien Gouezel, Patrick Massot, Zhouhang Zhou and all the others who are making steady progress. We have recently reached a milestone — we have defined the Bochner Integral. I don’t know what that is. Although when thinking about it in Lean I ended up so confused that I had to confess on mathoverflow that I didn’t even really understand the real numbers.

Posted in General, Imperial, undergrad maths | Tagged , , , , , , , , , , , , , , , , | 7 Comments

A computer-generated proof that nobody understands

Very early on when I was reading about this formal proof verification game, I learnt about the Robbins Conjecture. Let me start this post with an overview of what this conjecture is, and how it was proved by a computer, and how the proof is easy for humans to understand. Then let me tell you about Bruck Loops, where the situation is quite different, and finally let me tell you about fancier mathematical objects and why things aren’t so easy in that case.

Robbins algebras

There is something in maths called a Boolean Algebra — it’s just a definition, like a group or a topological space. Feel free to look at the formal definition on the Wikipedia page for Boolean algebras. In the 1930s Herbert Robbins defined some other mathematical structure now called a Robbins Algebra, and conjectured that every Robbins Algebra was a Boolean Algebra. Proving this conjecture is like solving a big logic puzzle; we have some set with some structure and a load of axioms (a Robbins algebra), and we want to prove some other statements about this structure (i.e., the axioms for a Boolean algebra).

The conjecture remained open for around 60 years, possibly because it was tricky and possibly because there were not too many people working on it. Then in 1996 William McCune proved the conjecture using “automated theorem provers” called EQP and Otter. To give a rather brutal description of what an automated theorem prover is — it is something which takes as input a question such as the Robbins Conjecture, and then just throws all the axioms around generating lemma after lemma about Robbins algebras with the hope of ultimately managing to prove all the statements which define Boolean algebras. As you can imagine, this “brute force and ignorance” approach is not particularly appealing to the purist, but, let’s face it, it used to be the way that computers played chess, and anyone who has had their king and rook forked by a computer’s knight a fraction of a second after they have made a careless move knows that this approach can sometimes pay off.

Now one interesting thing about McCune’s proof was that actually, at the end of the day, it was pretty short. Within a year Bernd Dahn had got yet another system, ILF, to go through the Otter proof and simplify it (spotting common themes which were used more than once in the proof and pulling them off as useful lemmas); Dahn ultimately wrote a “normal” seven page maths paper, published in the Journal of Algebra in 1998, containing a human-readable account of the proof. The proof had now been completely “de-computerised”.

We learn from this story the unsurprising fact that computers are good at logic puzzles.

Bruck loops

I didn’t tell you what a Robbins algebra was, but I will attempt to tell you what a Bruck loop is. If you really don’t care then just skip down to the statement of the theorem below.

A loop is “a group without the associativity axiom”. More precisely a loop is a set Q equipped with a binary operation \cdot : Q \times Q \to Q and an identity element 1\in Q satisfying 1 \cdot x = x \cdot 1 = x for all x\in Q, plus the following “inverse axiom”: for all q\in Q, left multiplication by q and right multiplication by q are both bijections from Q to Q. In particular for any q\in Q the equations qr=1 and rq=1 have unique solutions for r, and one could call these solutions the left and right inverses of q. These two inverses might not coincide though.

Groups of course give examples of loops, but there are non-associative examples too. Most definitions in group theory have some sort of analogue for loop theory, although you sometimes have to think about things the right way. For example one can consider the group of permutations Q\to Q generated by left and right multiplications by elements of Q, and the inner mapping group Inn(Q) of a loop Q is the subgroup consisting of elements sending 1 to 1; if Q is a group then this is readily checked to be the inner automorphisms of Q.

Sometimes the lack of associativity makes these analogues really painful though. For example the centre of a loop Q is the subloop Z(Q) of Q consisting of the elements z\in Q such that z\cdot q=q\cdot z for all q\in Q and z\cdot(x\cdot y)=(z\cdot x)\cdot y for all x,y\in Q and two more equations like this with z in the middle and on the right. One checks that the centre of a loop is a normal subloop and one can form a quotient loop Q/Z(Q).

Just like groups, we say a loop Q has nilpotency class at most 1 if Q=Z(Q), and Q has nilpotency class at most 2 if Q/Z(Q) has nilpotency class at most 1.

For groups, we have Inn(G)\cong G/Z(G) and in particular if Inn(G) is abelian then G has nilpotency class at most 2. These results use associativity though, and are not true in general for loops. However they are true for Bruck Loops:

Theorem (a computer). If Q is a Bruck loop and Inn(Q) is abelian, then Q has nilpotency class at most 2.

A Bruck loop is a loop satisfying two extra axioms: firstly x\cdot (y\cdot (x\cdot z)) = (x \cdot (y \cdot x)) \cdot z (which implies that left inverses equal right inverses), and secondly that (xy)^{-1}=x^{-1}y^{-1} (another “commutativity” condition).

The humans involved in the proof were J. D. Phillips and David Stanovský, who wrote a paper published in 2012 in Communications in Algebra. The humans reduced the question to proving that a certain equation was always satisfied for Bruck loops with abelian inner mapping groups, and then they threw a bunch of automated theorem provers at it. The prover Waldmeister managed to come up with a proof. Here it is — all 30000+ lines of it. This helpful page on Stanovsky’s website offers other formats of the proof, for example a 1068 page pdf file containing a complete “human-readable” proof. What human would ever read this proof though?

I had no idea that computers were doing that sort of thing nowadays. I learnt about this work from Michael Kinyon at AITP 2019.

Shimura varieties

In my mind, this work raises an interesting question. Note that the work also raises a lot of boring questions. Of course “boring” a very subjective term. But let’s try and put this work on Bruck loops into some sort of context. I work in a mathematics department full of people thinking about mirror symmetry, perfectoid spaces, canonical rings of algebraic varieties in characteristic p, etale cohomology of Shimura varieties defined over number fields, and the Langlands philosophy, amongst other things. Nobody in my department cares about Bruck loops. People care about objects which it takes an entire course to define, not a paragraph. The questions raised by this work which people in my department might find “boring” are: here are other classes of loops; can one prove analogous theorems about these classes? Or : here is another elementary object which takes a paragraph to define but which is not taught to the undergraduates at Imperial College London. Can one prove the following conjecture about these objects? The kind of mathematicians I know are, for the most part, not remotely interested in such questions. This is all to do with subtleties such as fashions in mathematics, and which areas are regarded as important (that is, worth funding).

So what are the mathematicians I know interested in? Well, let’s take the research staff in my department at Imperial College. They are working on results about about objects which in some cases take hundreds of axioms to define, or are even more complicated: sometimes even the definitions of the objects we study can only be formalised once one has proved hard theorems. For example the definition of the canonical model of a Shimura variety over a number field can only be made once one has proved most of the theorems in Deligne’s paper on canonical models, which in turn rely on the theory of CM abelian varieties, which in turn rely on the theorems of global class field theory. That’s the kind of definitions which mathematicians in my department get excited about — not Bruck Loops. I once went to an entire 24 lecture course by John Coates which assumed local class field theory and deduced the theorems of global class field theory. I have read enough of the book by Shimura and Taniyama on CM abelian varieties to know what’s going on there. I have been to a study group on Deligne’s paper on canonical models. So after perhaps 100 hours of study absorbing the prerequisites, I was ready for the definition of a Shimura variety over a number field. And then there is still the small matter of the definition of etale cohomology.

So why are computers proving theorems about Bruck loops but not proving theorems about the cohomology of Shimura varieties?

Here is a conjectural answer. Bruck loops are easy, so it’s easy to prove theorems about them. Shimura varieties are hard, so it’s never going to work.

Yeah, but hang on a minute. If 20 years ago we had raised the possibility that a computer could come up with a 1000 page proof of a statement which humans had formulated, however “easy” the objects in question were, I think that it would not have been hard to find people who would have been very skeptical. Think about it. The definition of a Bruck Loop in the 1000 page pdf is 19 axioms. Now make a naive guess about how long it would take to create a 30,000 line proof of a statement about Bruck Loops. The number 19^{30000} does not fit into our universe in any meaningful way. The computer scientists must have done something clever here. In particular I think that although many mathematicians would instinctively rule out the possibility of computers proving hard theorems about complicated objects because of issues of scale, this work on Bruck loops throws some sort of spanner into those works.

Here is another answer, which is correct for trivial reasons. One reason why computers are not proving theorems about the etale cohomology of Shimura varieties over number fields is that the human race have not yet taken the time to tell a proof verification system what the phrase “etale cohomology of a Shimura variety over a number field” means. Why is this? Well, I am currently finding it very hard to think of people who know the definition of the canonical model of a Shimura variety and also know how to use proof verification systems. I also think that it is much easier to train mathematicians to use proof verification systems than it is to train computer scientists to read papers by Deligne, and this is one of the reasons why the Xena project exists.

But is this the only reason? Well, unfortunately, I suspect that it is not. If you were to ask a computer scientist what they would need in order to start proving things about the cohomology of Shimura varieties defined over number fields, I suspect that they would ask for something like

  • the definitions;
  • sample proofs of hundreds (or thousands?) of theorems to train their AI;
  • some conjectures, of approximately the same level of difficulty as the theorems.

The reason that, in contrast to the Bruck loop work, they will need some proofs, is because the natural framework to define objects like Shimura varieties is within some higher order logical structure; even for the definition of a topological space one has to quantify over subsets rather than just elements. On the other hand, after AITP 2019, Josef Urban showed me some work where the E theorem prover had managed to prove basic theorems about topological spaces given enough “training data” theorems.

If instead of Shimura varieties we think about getting computers to prove theorems about perfectoid spaces, then Commelin, Massot and I have achieved the first bullet point above. As you can see, we still have a vast amount of work to do before we can catch up with Peter Scholze. Another milestone in this game would be to prove the tilting correspondence, because proving that would involve proving hundreds of other theorems along the way (which could also be used to train). However before that milestone, we still have the milestone of stating the tilting correspondence. Even stating it will involve having to prove some theorems. There is still a long way to go.

Posted in General | Tagged , , , , , , , , , , | Leave a comment

What is homotopy type theory? An amateur speaks.

Over the past few months I have been attempting to get some kind of an idea about what homotopy type theory is. Lean uses dependent type theory, which is another kind of type theory; homotopy type theory is what Voevodsky studied and what many other people still study. But Reid Barton made some comments on the Lean chat which basically opened my eyes and I’ll just write them down now before I forget them.

Last summer, a bunch of Imperial students formalised a bunch of mathematics in Lean. Most of them stuck to stuff on our undergraduate syllabus, but Ali Sever raised the possibility of “formalising Euclid” and even though at the time I had no real idea about how that would work, I said “sure”. It turns out that Ali was not the first person to think of this idea in the 2000+ years since the Elements was written, and things were very well-understood. But not by me. So here’s an explanation.

I used to have to deal with geometry problems when I was younger and found myself doing various Maths Olympiad problems. My gut feeling with the plane geometry problems was that “one should be able to do them all in theory just by working out the coordinates of everything”. However of course there were often very cute proofs which just involved invoking standard theorems in plane geometry (of which there seem to be very many, some of which are of course proved in Euclid’s books). Are these methods actually fundamentally different? They certainly look fundamentally different. In fact they must be fundamentally different, because Descartes didn’t invent Cartesian coordinates until over 1500 years after Euclid had died.

So synthetic geometry is what Euclid was doing; he never chose coordinates. He had an abstract notion of a point, and an abstract notion of a line, and axiomatised what he needed from them: between two distinct points one could draw a line, and so on. He never attempted to define “point” or “line” — for Euclid, these notions were fundamental and needed no definition [note added later on: David Corfield points out that actually Euclid did offer “definitions” of some sort for point and line, but arguably never then uses them]

On the other hand, analytic geometry is what you get if you use coordinates. We need the real numbers \mathbb{R} (which we could have built using the axioms of our system) and then we can define the plane to be \mathbb{R}^2 and we can define a point to be an element of \mathbb{R}^2 (that is, a pair of real numbers), and we can define a line to be the subset of \mathbb{R}^2 defined by a degree 1 equation, and so on.

You can see that these are somehow two different theories. A powerful way of seeing this is that Euclid develops some geometry without assuming the parallel postulate, and then he assumes it and proves more. There was some sort of industry pre 1800 of trying to prove the parallel postulate from Euclid’s other axioms, but nobody was successful. Nowadays we understand why — all of Euclid’s axioms other than the parallel postulate hold for geometry done on a sphere (and the parallel postulate fails). As a consequence, all the results in planar geometry which Euclid proved without assuming the parallel postulate also hold for spherical geometry.

Tarski’s synthetic approach to geometry has an axiom (schema) of continuity, corresponding essentially to the completeness of the real numbers. It would not surprise me if the results that Tarski proved which did not assume completeness of the reals were also true in models of planar geometry such as k^2 where k is an appropriate subfield of the reals (for example the real algebraic numbers, which are not complete).

What we begin to see is that with the synthetic approach, we are inspired by the particular picture we’re trying to model, but there might be other models too, and we might have to add other axioms later on for convenience if we want to prove more results about the particular model we have in mind (and indeed we will have to add these axioms if there are other models of our theory in which these results are false).

In mathematics, we can define topological spaces, and talk about maps between them, and what it means for maps to be homotopic, and then invent homotopy theory, the basics of which are taught to second year maths undergraduates at Imperial College if they choose this option. One could regard this as “analytic homotopy theory”.

What homotopy type theory is, I now finally realise, is a synthetic version of this. There are types X : Type and then there are terms a : X . The model is that X is supposed to be a topological space, and a is supposed to be an element of X. But here’s the weird part. The equation a = b is not supposed to be interpreted as meaning that the points a and b are equal. A proof of a = b (or, to give it its more formal name, a term of type a = b) is supposed to be interpreted as a path from a to b. Symmetry of equality is then interpreted as saying that if we have a path from a to b then we have a path from b to a, and transitivity is just path composition. If we have two terms p1 and p2 of type a = b then a term of type p1 = p2 is a path from p1 to p2, or, in other words, a homotopy between p1 and p2. I had read this interpretation several times, and have known full well for over a year that this is how I was supposed to “interpret equality” — but the penny had never dropped as to what that actually even meant. But now I get it, or at least the basic idea:

Homotopy type theory is to the theory of topological spaces and homotopy classes of maps, as Euclid’s synthetic geometry is to Descartes’ coordinate geometry.

I read in Floris van Doorn’s thesis that it is an open problem to define Grassmannians in homotopy type theory. When I first read this, I was completely confused. One can build mathematics in homotopy type theory, so one can build topological spaces, the real numbers, and build Grassmannians the way they are usually built in a maths course. It was Reid Barton who pointed out to me that this was not what the question meant. Of course one can do this — this would be an analytic construction. The open problem is to give a synthetic construction. It is analogous to me saying that one could just prove an olympiad geometry question by a coordinate chase. I am clearly an unsynthetic kind of guy. Of course, one issue could be that there are perhaps other models of the axioms of homotopy type theory where Grassmannians really don’t exist…

Thanks to Ali Sever, whose questions and research last summer turned out to be a necessary prerequisite for the penny to drop. Ali’s work formalising Tarski’s synthetic geometry in Lean is here.

Posted in undergrad maths | Tagged , , , , , , | 12 Comments

Proofs are not programs

The Curry-Howard correspondence can be pithily described as saying “proofs = programs”. The statement is short, alliterative, and rather beautiful. As a professional mathematician I know exactly what a proof is; as an amateur programmer I have a pretty clear idea about what a program is, and the concept that they might be in some sense the same thing is both enigmatic and appealing. A proof is a logical sequence of statements deducing a theorem from the the rules of maths; a program is a logical sequence of commands producing valid code from the rules of the programming language being used. When learning Lean I saw the correspondence in action many times, and so it actually took quite some time for the penny to drop: the Curry-Howard correspondence is not actually true.

Of course something is true, but what is not true is any statement of this form where “proof” is interpreted as “what the pure mathematicians in my department do”. What is true is that if you stick to constructive maths or intuitionistic logic or one of the flavours of maths where you’re not allowed to use the axiom of choice or the law of the excluded middle — in other words a version of maths which is unrecognisable to over 95% of pure mathematicians working in mathematics departments — then you might be in good shape. But that is not what I see happening in my mathematics department or the other mathematics departments which I’ve had experience of. Some proofs are not programs.

Here’s an example of a proof which is not a program. The original proof that the class numbers of imaginary quadratic fields tended to infinity was the following rather extraordinary argument. First Landau showed that the theorem was true assuming that a certain generalisation of the Riemann Hypothesis was true, and then Heilbronn managed to show that the theorem was true assuming that this generalisation was false! See for example Keith Conrad’s comment on this mathoverflow question, and for more details see p359 of the book “A Classical Introduction to Modern Number Theory” by Ireland and Rosen (I have a copy of this book in my office if any Imperial students are interested in seeing more details).

The analogous situation for programs would be something like the following. Your boss asks you to write some computer code to do a certain task so your company can release their new product. A week later you enter their office with two USB sticks, explaining that one of them does the task if the Riemann hypothesis is true, and the other one does the task if the Riemann hypothesis is false. What you seem to have done, on the face of it, is proved that the code your boss wanted exists. You might also have lost your programming job 😉 In fact it’s not even 100% clear that you have proved that the code exists — what if we replace the Riemann Hypothesis in this story by one of these weird unprovable logic statements like the continuum hypothesis, which is true in some models of maths and false in others? Then where’s the program?

But actually it’s even worse than that. In my last blog post, I gave an example of a computable bijective function which did not have a computable inverse. A similar sort of argument shows that the following proof:

Theorem. If f: X \to Y is a surjective function, then there exists a map g: Y \to X such that f\circ g: Y \to Y is the identity function.

Proof. Define g thus: if y\in Y then choose x\in X with f(x)=y (possible by surjectivity of f) and define g(y) to be this x. The fact that f(g(y))=y is true by definition. QED.

does not correspond to a program. The theorem is just the axiom of choice in disguise, and the axiom of choice is about as non-constructive as you can get. Programs are algorithms. The axiom of choice says that we can do something when there is no algorithm.

What is a Lean file then?

So this a bit confusing, because I can prove that theorem above in Lean:

noncomputable def g {X Y : Type} {f : X → Y} (hf : function.surjective f)
  (y : Y) : X := classical.some (hf y)

lemma g_prop {X Y : Type} {f : X → Y} (hf : function.surjective f) 
  (y : Y) : f (g hf y) = y := classical.some_spec (hf y)

theorem has_right_inverse_of_surjective (X Y : Type) (f : X → Y)
  (hf : function.surjective f) :
  ∃ g : Y → X, ∀ y : Y, f (g y) = y := ⟨g hf, g_prop hf⟩

and that looks like code to me. But not all computer files are computer programs. As an example, consider a LaTeX file. The LaTeX program is the program, the LaTeX file is just the input to the program, and the output is typically a pdf file.

Lean itself is a program, but the Lean files which people make are not all programs. Some of them are though! Some Lean files (which use constructive logic and only computable functions) are turned into bytecode when run through Lean, and bytecode is a program. But no bytecode is generated for the definition of the function g above because it is noncomputable, and that stops the proof has_right_inverse_of_surjective from ever being a program. Lean itself can verify that the proof is correct, but the proof itself is not a program.

I am still getting to grips with all this computer science malarkey. I asked a bunch of questions about this stuff when writing this blog post, on this thread in the Lean chat (login required), and thanks as ever to Mario Carneiro for answering most of them.

Posted in computability | Tagged , , , | Leave a comment

The inverse of a bijection.

In my “introduction to proof” course I teach the undergraduates about equivalence relations. What the students do not know, is that earlier on in the course I have secretly inserted examples of equivalence relations without ever letting on. For example, in the number theory section I prove that that if N is a fixed positive integer, and a, b, c are integers, then

a \equiv a \pmod N;

a \equiv b \pmod N \implies b \equiv a \pmod N;

a \equiv b \pmod N and b \equiv c\pmod N \implies a \equiv c\pmod N.

And when we do equivalence relations I can then say “oh look! Congruence mod N is an equivalence relation!”

I do the same trick with bijections. I prove that the identity function is a bijection, the inverse of a bijection is a bijection, and that the composite of two bijections is a bijection. Then later on I can say “oh look! ‘I biject with you’ is an equivalence relation!”

Coding up congruences mod N in Lean is a joy to do, and one of my example files on this stuff even made it into mathlib!

But coding up the bijection equivalence relation into mathlib, one runs into an issue which is hard to explain to mathematicians. The problem is with symmetry — proving that the inverse of a bijection is a bijection. Say f: X \to Y is a bijection, and y\in Y. We’re trying to define the inverse function to f and we want to figure out its value on y. Lean has no trouble proving that there exists a unique element of X such that applying f to it gives y. But when you try to access this element of X while defining the inverse of f, you get an intimidating error:

induction tactic failed, recursor 'Exists.dcases_on'
can only eliminate into Prop

Now Lean 3 does not have particularly friendly error messages, but as a Lean beginner I remember finding this one completely baffling. As far as type theory is concerned, there is a universe issue; the user just tried to define a function to Type but the only function Lean could find was one into Prop. A mathematician using Lean might know about Lean’s two kinds of universes — Prop is the universe of true-false statements, where things like the statement of Fermat’s Last Theorem lives, and Type is the universe of data, where things like the real numbers live. So what is going on? The problem is that \exists x \in X, f(x)=y is in Prop, but the actual x which exists is in Type, and Lean is complaining that it does not have an algorithm to get hold of x, and hence to move from the weak Prop universe to the stronger Type one.

Now of course to a mathematician this is just all ridiculous — the algorithm is “there is a unique x — take that one”. The Lean beginner will learn that instead of applying the cases tactic to ∃ (x : X), f x = y to get x, they can apply the choose tactic, and then one gets a new error

definition 'g' is noncomputable, it depends on 'classical.some'

which one fixes by writing noncomputable theory at the top of the file. One is left with a feeling that Lean is kind of bewildering, and why was there so much fuss about applying the obvious algorithm “take the unique x, duh”?

Recently on the Lean Zulip chat there has been talk amongst computationally-minded people about fancy things such as domain theory, topologies, the difference between option and roption, and difference lists. All of this has been slowly sinking into my head in some sort of way, and it culminated this morning in my actually seeing an explicit example (needs Zulip login) of a bijection between two sets where even a mathematician can see that there is some sort of issue with defining the inverse bijection. Thanks to Mario Carneiro for coming up with the example. The rest of this post explains it.

The example.

The sets in question are as follows. Let X=\{-1,0,1,2,3,\ldots\}=\mathbb{Z}_{\geq-1}. That’s easy enough.

Let Y be the set of infinite sequences P(0), P(1), P(2), \ldots where each P(i)\in\{T,F\} (true or false), and with the property that for all m\geq0 we have P(m)\implies P(m+1). In words — if at some point in the sequence we see T, then everything after that point is also T.

Examples of elements of Y:

FFFFTTTTTTTTTTTTTTT...
FFFFFFFFFTTTTTTTTTT...
FFFFFFFFFFFFFFFFFFF...
TTTTTTTTTTTTTTTTTTT...

It is not hard to convince yourself that sequences in Y come in two flavours: either the sequence has some finite number n\geq0 of Fs and then everything afterwards is T, or the sequence can be all Fs.

The bijection f: X\to Y sends -1 to the “all F” sequence, and n\geq0 to the sequence which contains n Fs and then reverts to T forevermore. This is easily checked to be a bijection, and one even feels that it’s possible to write down the inverse. But how easy is it in practice?

Here’s the issue. Say I now give you a “black box” sequence P in Y. It’s a function, I can guarantee that it’s in Y (I even have a proof!), and if you give me any i\in\mathbb{Z}_{\geq0} I will return P(i). We know that there is a unique n\in X with f(n)=P and it’s now your job to find it. What is your algorithm?

Initially you can start off by evaluating the function at some values. It turns out that P(3) is F, that P(100) is F and then that P(10^{18}) is F. You realise that the n you’re looking for is either a very large integer, or -1. But which one? You have what looks like complete access to P\in Y, but you cannot work out which n\in X it corresponds to — indeed you cannot even say whether n\geq0 or not!

After a while, you get sick of this and say that actually you haven’t got complete access to P and I am suppressing information. You complain that you don’t just want to evaluate P at lots of numbers, you want to see the actual definition. So I reveal the definition: P(i) is defined to be T if there is a counterexample to the Goldbach conjecture which is at most i, and F otherwise. To put it another way — if the Goldbach conjecture is true then the sequence is all Fs and n=-1, but if it fails at some point then n is the positive integer where it fails. I even show you the one-line proof that P is an element of Y: if there’s a counterexample which is at most m then clearly there’s a counterexample which is at most m+1. Even with the definition, you are none the wiser!

Conclusion.

When a function is marked as noncomputable in Lean it means that Lean has not been told a method for working it out for every input. The innocuous-looking statement “there exists a unique x\in X with f(x)=y, so choose that x” is clearly something which can be done abstractly, but there is no method for finding x, even if we can prove that it must exist. There can’t be a method for finding it in general — any such method would translate into a method for proving Goldbach’s conjecture, the 3n+1 problem, Fermat’s Last Theorem, the twin primes conjecture and so on; these are hard problems. Indeed, one can prove using a diagonalisation argument that no algorithm can exist to invert this bijection. The inverse of our computable function f: X\to Y certainly exists, but it is a noncomputable function. Lean can handle such functions, no problem, but it has to be told explicitly that it should be ignoring computability issues (e.g. with the line noncomputable theory at the top of a file).

Posted in Learning Lean, M1F | Tagged , | 9 Comments