Blue-eyed islanders (guest post)

Thanks very much indeed to Chris Hughes, who has written up some thoughts on the blue-eyed islanders problem. Before we start, here’s the history. I (KMB) heard about this problem a few years ago, and thought it was cute, but then forgot about it. Richard Thomas then ran into it in an old post on Terry Tao’s blog and suggested I put it onto an M1F example sheet. I did this, and knocked up a solution. Rohan Mitta then suggested that my solution was incomplete — my inductive argument did not prove that the blue-eyed islanders left on day 100, but only that they left on day at most 100. I agreed with him, and then realised that actually I found this sort of question hard to reason about (because I knew nothing about modal logic, it turned out). Chris then independently invented enough modal logic himself to put the question into Lean and gave what I now regard as a definitive solution to the problem.

Chris writes about a bunch of lean code in his post; the code for his first proof is here and the code for the decidability proof (shorter code, takes longer to run though) is here. For maximising the chances of understanding, it’s probably best to look at this code in Lean whilst reading the post.

Here is Chris’ post.

*****

A few people have asked me about how my Lean proof of the blue eyed islanders question on Kevin Buzzard’s M1F problem sheet works
[Question 6, on http://wwwf.imperial.ac.uk/~buzzard/maths/teaching/18Aut/M1F/exsht05.pdf]. I didn’t have a readable proof of this until now – I first wrote the proof when I was much less proficient at Lean, so in this blog post I present a more readable version of the proof. This is actually a proof of a slightly simplified version of the problem, since the brown eyed islanders do not have to leave the island in this version. I also strengthen the proof by proving that the blue-eyed islanders do not leave until day 100. This version still demonstrates the most interesting parts of the question.

Statement

The hardest part of this proof to understand is the statement of the problem. I defined the function island_rules to represent what the blue eyed islanders know. Given a day number d, and the actual number of blue eyed islanders b, and a hypothesized number of blue eyed islanders B, it returns whether or not B is a possible number of blue eyed islanders from the blue eyed islanders perspective. If the only consistent value for B is b then the blue eyed islanders have deduced the number of blue eyed islanders, and thus worked out their own eye colour and must leave the island. Given that the islanders are perfect logicians, they know exactly whether this function returns true or false for any given values (there is a philosophical question here about how perfect the logicians are, and whether they can solve undecidable propositions, for the purposes of this proof they can, although this problem is constructively true).


def island_rules : ℕ → ℕ → ℕ → Prop
| 0     b B := (B = b ∨ B = b - 1) ∧ 0 < B
| (d+1) b B := island_rules d b B ∧
((∀ C, island_rules d b C → C = b) ↔ ∀ C, island_rules d B C → C = B)

On day 0 there are two possibilities for the number of blue eyed islanders b and b - 1 unless b = 1, in which case there is only one possibility, since it is also known that the actual number is greater than zero.

On subsequent days, a hypothesized value `B` is possible if and only if it was possible the previous day and it is consistent with whether or not islanders left the previous day. The statement ((∀ C, island_rules d b C → C = b) ↔ ∀ C, island_rules d B C → C = B) is the hardest part to understand. The left hand side is true if and only if the blue eyed islanders deduced their own eye colour the previous day. The right hand side is true if and only if the islanders would have left the previous day if the actual number of blue eyed islanders was B. Therefore the entire iff statement is true if and only if the actual events the previous day are consistent with what would have happened the previous day if B was the actual number of blue eyed islanders.

To illustrate how this works try substituting b = 2 into this function.

On day zero it returns island_rules 0 2 B := (B = 2 ∨ B = 1) ∧ 0 < B, so 1 and 2 are the only possibilities, and since there are two possibilities the islanders do not leave.

On day one it returns island_rules 0 2 B ∧ ((∀ C, island_rules 0 2 C → C = 2) ↔ ∀ C, island_rules 0 B C → C = B). island_rules 0 2 B := (B = 2 ∨ B = 1) ∧ 0 < B which equals B = 2 ∨ B = 1, so we know that the only possibilities are 1 and 2 for now. Now we can check which of these values is consistent with the second part of the proposition. Obviously B = 2 is consistent since substituting B = 2 leaves both sides of the iff statement the same.

Let’s try B = 1. The the statement becomes (∀ C, island_rules 0 2 C → C = 2) ↔ ∀ C, island_rules 0 1 C → C = 1. We know island_rules 0 2 C is equivalent to the statement C = 2 ∨ C = 1, so the left hand side becomes ∀ C, C = 2 ∨ C = 1 → C = 2. This is clearly false, C = 1 is a counterexample to this.

Now look at the right hand side. By definition island_rules 0 1 C is equal to (C = 1 ∨ C = 0) ∧ 0 < C. Clearly this is equivalent to C = 1. So the left hand side becomes ∀ C, C = 1 → C = 1, which is true. This is because if the actual number of blue eyed islanders was 1, they would have left the previous day. So for C = 1 the iff statement is equivalent to false ↔ true, so B = 1 is not a possibility. Therefore the only possibility on day 1 (note out by one error due to starting at day 0) is 2, so the blue eyed islanders will leave.

Now that we have defined the “rules”, the statement of the theorem proved is this. The left hand side indicates whether or not the blue eyed islanders leave.


lemma blue_eyed_islander {d b} (hb0 : 0 < b) : (∀ B, island_rules d b B → B = b) ↔ b ≤ d + 1

Proof

First we prove a trivial lemma, saying that the conditions on day zero hold on any other day


lemma island_rules_zero_of_island_rules : ∀ {d b B : ℕ}, island_rules d b B → (B = b ∨ B = b - 1) ∧ 0 < B
| 0     b B h := h
| (d+1) b B h := island_rules_zero_of_island_rules h.left

Next we prove the first direction of the iff, that they leave by day b - 1. This proof is easier to read in a Lean editor.


theorem blue_eyed_islander_mp : ∀ {d b}, 0 < b → b ≤ d + 1 → ∀ B, island_rules d b B → B = b
| 0 b hb0 hdb B hir := begin
    have : b = 1, from le_antisymm hdb (succ_le_of_lt hb0),
    unfold island_rules at hir,
    cases hir.left with h h,
    { exact h },
    { simp [this, *, lt_irrefl] at * }
  end
| (d+1) b hb hdbB B hir :=
  have (B = b ∨ B = b - 1) ∧ 0 < B, from island_rules_zero_of_island_rules hir,
  begin
    unfold island_rules at hir,
    cases this.left with h h,
    { exact h },
    { /- the case B = b - 1 -/
      have hb0 : 0 < b - 1, from h ▸ this.right,
      have hb1 : 1 ≤ b, from le_trans (succ_le_of_lt hb0) (nat.sub_le_self _ _),
      have hdbB' : b - 1 ≤ d + 1, from nat.sub_le_right_iff_le_add.2 hdbB,
      /- by our induction hypothesis, they would have left on day d if the actual number was b - 1 -/
      have ih : ∀ C : ℕ, island_rules d B C → C = B, from h.symm ▸ blue_eyed_islander_mp hb0 hdbB',
      /- From the definition of island_rules, this means they left yesterday -/
      rw ← hir.right at ih,
      /- Slightly strange proof, even though we know B = b - 1 and we're trying to prove
        B = b, we don't find a contradiction, we just prove B = b directly -/
      exact ih B hir.left }
  end

Next we prove a trivial lemma about naturals that kept coming up


theorem nat.sub_one_ne_self_of_pos {a : ℕ} (ha : 0 < a) : a - 1 ≠ a :=
ne_of_lt (nat.sub_lt_self ha dec_trivial)

Next we prove the other direction of the if and only if. This is the harder direction, and it has been stated in a slightly different, but equivalent way to make it easier to prove.


lemma blue_eyed_islander_mpr : ∀ {d b}, 0 < b → d + 1 < b → ∀ B, island_rules d b B ↔ (B = b ∨ B = b - 1)
| 0     b hb0 hdb B := begin
  unfold island_rules,
  split,
  { exact λ h, h.left },
  { assume hbB,
    split,
    { exact hbB },
    { cases hbB with hbB hbB,
      { exact hbB.symm ▸ hb0 },
      { exact hbB.symm ▸ nat.sub_pos_of_lt hdb } } }
  end
| (succ d) b hb0 hdb B :=
begin
  split,
  { exact λ h, (island_rules_zero_of_island_rules h).left },
  { assume hbB,
    have hb10 : 0 < b - 1, from nat.sub_pos_of_lt (lt_trans dec_trivial hdb),
    have hdb1 : d + 1 < b - 1, from nat.lt_sub_right_of_add_lt hdb,
    /- Use our induction hypothesis twice. For both possible values of B, the islanders
      did not leave the previous day. This means we have no new information -/
    have ihb : ∀ B : ℕ, island_rules d b B ↔ B = b ∨ B = b - 1,
      from blue_eyed_islander_mpr hb0 (lt_trans (lt_succ_self _) hdb),
    have ihb1 : ∀ B : ℕ, island_rules d (b - 1) B ↔ B = b - 1 ∨ B = b - 1 - 1,
      from blue_eyed_islander_mpr hb10 hdb1,
    unfold island_rules,
    split,
    { rw ihb,
      exact hbB },
    /- the interesting part of the proof starts here, we have to prove that either value of B is
      possible -/
    { cases hbB with hbB hbB,
      { /- case B = b is easy -/
        rw hbB },
      { /- case B = b - 1 is harder -/
        /- By our induction hypotheses it was impossible to deduce the value of `b` yesterday in both
          the real world, and for our hypothesized value of `B`, which is `b - 1`. This means both sides
          of the iff statement are false -/
        simp only [ihb, ihb1, hbB],
        /- After applying the induction hypothesis, it is now obvious that both sides are false,
          and the proof is easy from now on -/
        apply iff_of_false,
        { assume h,
          have : b - 1 = b, from h (b - 1) (or.inr rfl),
          exact nat.sub_one_ne_self_of_pos hb0 this },
        { assume h,
          have : b - 1 - 1 = b - 1, from h (b - 1 - 1) (or.inr rfl),
          exact nat.sub_one_ne_self_of_pos hb10 this } } } }
end

Proving the final lemma from the iff requires a small amount of work.


lemma blue_eyed_islander {d b} (hb0 : 0 < b) : (∀ B, island_rules d b B → B = b) ↔ b ≤ d + 1 :=
begin
  split,
  { assume h,
    refine le_of_not_gt (λ hbd : d + 1 < b, _),
    have := blue_eyed_islander_mpr hb0 hbd,
    have : b - 1 = b, from h (b - 1) ((this (b - 1)).mpr (or.inr rfl)),
    exact nat.sub_one_ne_self_of_pos hb0 this },
  { exact blue_eyed_islander_mp hb0 }
end

Alternative proof

In the proof above we proved it for any number of blue eyed islanders. However the problem sheet only asked us to prove it in the case where there were 100 blue eyed islanders, and there is a simple way to prove it in this case.

If we look at our function island_rules, it has type ℕ → ℕ → ℕ → Prop, which is actually the same as ℕ → ℕ → set ℕ , since set ℕ = (ℕ → Prop) by definition. The set it returns is actually always a finite set, since it is always a subset of {b, b - 1}. There is a Type in Lean called a finset, for finite sets, which is not ℕ → Prop, but actually stores a list of all the elements, meaning you can check whether an number is in the finset or not, and actually print a list of all the elements. We can rewrite our function island_rules to return a finset ℕ, instead of a set ℕ


def island_rules2 : ℕ → ℕ → finset ℕ
| 0     b := ({b - 1, b} : finset ℕ).filter (> 0)
| (d+1) b := (island_rules2 d b).filter
  (λ B, (∀ C ∈ island_rules2 d b, C = b) ↔ (∀ C ∈ island_rules2 d B, C = B))

filter is just a function that filters a finset according given a decidable predicate. Decidable means that, given a natural number, there is a Lean program for determining whether the predicate holds for that natural number or not. > 0 is obviously decidable.

The second predicate (λ B, (∀ C ∈ island_rules2 d b, C = b) ↔ (∀ C ∈ island_rules2 d B, C = B)) is also decidable. An iff statement is decidable, if both sides of the iff are decidable, and (∀ C ∈ island_rules2 d b, C = b) is decidable since island_rules2 d b is a finite set, Lean can determing this by checking it for all the elements of this finite set.

This means the function is computable, and we can evaluate the set of possibilities, given any day or number of blue eyed islanders


#eval island_rules2 5 7 -- {6, 7}
#eval island_rules2 6 7 -- {7}

To prove the lemma we simply have to evaluate this function on every day less than 100, and on day one hundred. After providing a few more decidable instances, we can prove the lemma with dec_trivial. This proof takes a long time for Lean to check.


lemma island_rules_iff (d : ℕ) : ∀ b B, island_rules d b B ↔ B ∈ island_rules2 d b :=
by induction d; simp [island_rules, island_rules2, *]; finish

instance (d b B) : decidable (island_rules d b B) :=
decidable_of_iff _ (island_rules_iff _ _ _).symm

lemma island_rules_iff' (d : ℕ) : ∀ b, (∀ B, island_rules d b B → B = b) ↔ (∀ B ∈ island_rules2 d b, B = b) :=
by simp [island_rules_iff]

instance dec2 : decidable_rel (λ d b, ∀ B, island_rules d b B → B = b) :=
λ d b, decidable_of_iff _ (island_rules_iff' d b).symm

lemma blue_eyed_islander : ∀ d < 100, (∀ B, island_rules d 100 B → B = 100) ↔ d = 99 :=
dec_trivial
Advertisements
Posted in Learning Lean, M1F, undergrad maths | Leave a comment

Maths undergraduate example sheets

I’m teaching a course called M1F this term — the basic “introduction to proof” course which all first year maths (and joint maths/computing) students have to take at Imperial College London. The course is supposed to be managed via our VLE, but in practice this means that no other staff members can see what’s going on, so I also maintain a web page, which is here. The problem sheets will slowly appear over the course of the term.

One thing which is kind of obvious if you know about this sort of thing, but somehow I still find striking, is how incredibly varied in difficulty the problems are if you try to solve them in Lean. There seems to be barely any correlation between how difficult the questions are for a human and how difficult they are in Lean. Because of the way type theory works, questions involving integers, rationals and reals can get quite messy in Lean. Trivial observations such that “this integer is greater than 2 and less than 4, so it’s 3” can be hard to do in Lean, especially if you don’t know how to begin.

On the other hand, one thing I’ve noticed this year as I work through the questions in Lean for the second time in my life, is how much easier they are than the first time I tried them (this time last year). This is for two reasons. Firstly it’s because, obviously, I have a lot more Lean experience this time around. But secondly it’s really noticeable how many more tools there are — some are there because Mario Carneiro and others reacted to my frustration as a mathematician trying to use Lean and wrote specific tools in order to make my life easier, and some are there because members of the community are just writing tools anyway. In particular Rob Lewis wrote a tactic called linarith which tries to solve basic linear inequalities and I just noticed yesterday how much easier this can make my life. Here’s an example (it’s sheet 2 question 2).

A “largest element” of a set S\subseteq\mathbb{R} is an element s\in S which is greater than or equal to every element of S. One of the questions I ask the students is to prove that the open interval (0,1) has no largest element (exercise for students: why isn’t 0.999999\dots the largest element?).

The mathematical idea behind the solution is this: if m\in(0,1) is an element, then it’s not the largest element because the average \frac{1+m}{2} of m and 1 is larger. Looking back at the proof I wrote last year, it was 25 lines long and used exotic inbuilt theorems such as div_lt_of_mul_lt_of_pos, the statement that c > 0 and b < ac implies b / c < a (this is not surprising — one has to prove \frac{1+m}{2}<1 somehow…).

This year I'm putting up the Lean solutions in a github repository and this year the proof looks like this:


import data.real.basic
import tactic.linarith

def open_zero_one := {x : ℝ | 0 < x ∧ x < 1}

-- open_zero_one has no maximal element
theorem Q2 : ¬ (∃ m : ℝ, m ∈ open_zero_one ∧ 
                  ∀ x : ℝ, x ∈ open_zero_one → x ≤ m) :=
begin
  rintro ⟨m,⟨H1,H2⟩,Hx⟩,
  -- now apply max hypothesis to (m + 1) / 2.
  have H := Hx ((m + 1) / 2),
  -- Can only use the hypothesis if we know (m + 1) / 2 is in the interval.
  have Hav : (m + 1) / 2 ∈ open_zero_one,
    split,
      -- now 0 < m is a hypothesis and 0 < (m + 1) / 2 is the goal
      linarith, -- and this tactic does it
    -- similarly hypothesis m < 1 implies goal (m + 1) / 2 < 1
    linarith,
  -- now use that m is supposed to be the max.
  have Hwrong := H Hav, 
  -- hypothesis Hwrong (m + 1) / 2 <= m contradicts hypothesis m < 1.
  -- Contradictory hypos means we can even prove a false goal.
  linarith,
end

The proof is eight lines long (not counting the comment lines). Mario’s rintro tactic does all the introductory stuff and Rob’s linarith does all the trivial inequality argument; pretty much the only thing we have to do “manually” is have the idea that \frac{m+1}{2} is relevant.

Posted in Imperial, Learning Lean, M1F, tactics, undergrad maths | Leave a comment

What is the Xena Project?

The current aims and ambitions of the Xena Project are twofold.

1) Digitise the curriculum. I would like to see the main definitions, theorems and proofs, and example sheet questions and solutions, in all of the undergraduate pure mathematics courses at Imperial College London, typed up into formal proof verification software.

2) Teach undergraduates the new way. I would like to teach mathematics undergraduates how to use formal proof verification software.

Many people have asked me why I want to do these things. Here are four reasons, in no particular order.

Reason 1: I conjecture that offering undergraduates the opportunity to type basic proofs into a computer will actually make them learn better. I am actively involved in an educational project with Athina Thoma and Paola Iannone where we aim to investigate this. Most mathematics departments in the UK teach students to use basic “calculational” software such as Python, Matlab or R. I am suggesting that we should extend this to “verificational” software such as Lean, Coq or Isabelle — whichever turns out to be the most appropriate and/or useful. I had to choose one — I chose Lean. In March 2019 we hope to have our first results and I will post something at this blog.

Reason 2: Computers are now capable of understanding and doing undergraduate level mathematics. In my mind this is a huge deal. They currently don’t do it very well. But they understand it extremely well. And I believe that it is only a matter of time before they start doing it very well. As a benchmark (or challenge), I would be interested in when we will able to get a computer program to pass the end of year exam for my first year undergraduate course. One must of course say exactly what one means by this, and there are many interpretations, some far more difficult to achieve than others. But I believe that for some interpretations of this challenge (involving, for example, a human translating the exam into a format readable by the computer), this may be just about accessible within a few years — and note that I say “pass”, not “get 100 percent”. My belief is that the media would be interested in claims that computers can pass undergraduate maths exams. More importantly, I am hoping that funding agencies will be interested. This goal is still at least several years away and might be harder than I think. However the goal of teaching a computer our entire pure mathematics curriculum is most definitely accessible, of this I am 100 percent convinced. I would like to have completely achieved digitisation of the main definitions, theorems and proofs of our curriculum, with the help of my undergraduate co-workers, within the next five to ten years.

Reason 3: It is simply blue sky research. The people who developed the CD or mp3 as a method of digitising music — if you asked them why they were doing it, do you think they would have replied “so someone can invent the iPod and Spotify”? I don’t think so. Their answers would have been vaguer, perhaps of the form that digitised music is easier to move around, costs less to store, and can be analysed in new ways. If we have a database of theorems, we’ll be able to do search, we’ll be able to suggest possible theorems to people using the software, we’ll be able to run AI on the database. What will come of any of this? I don’t know. But let’s find out. There is no time frame on this.

Reason 4: I know, and indeed all researchers in pure mathematics know, that the modern pure mathematics literature contains gaps, and some of it is wrong. I believe that most of the gaps can be filled, however I believe that some of these fillable gaps can only be filled in by a handful of experts. Similarly, the experts know which parts are wrong. I think that all of this is unsatisfactory. It acts as a barrier to learners, it creates elite clubs of people “in the know”, and there is a chance that it will lead to disaster. I would like to change the culture of mathematics, so that more and more people begin to consider formalising mathematical arguments that they learn or come up with. It will be decades before computers catch up, if they ever do. It will probably cost hundreds of millions of pounds worth of person-hours to digitise any one of the known proofs of Fermat’s Last Theorem, and once digitised, these proofs will need maintaining, like any software, which will cost more money. The idea of using computers to help us fill in the gaps in serious mathematical literature is currently to a large extent a naive dream.

My idea of how to proceed is by educating undergraduates, so that they can see that computer proof verification exists, and they can see that at undergraduate level it is nowadays fairly straightforward, and fun. I believe that the current generation have left pure mathematics in somewhat of a mess, and I want to train the next generation in the tools which I believe will somehow, one day, start helping to tidy it up. I want them to drain the swamp that we have created. The time frame on this: it will still not be solved when I die. But that is merely an indication of how bad the mess is. Much of pure mathematics (e.g. current work on the p-adic Langlands program) is extremely beautiful, but at the end of the day much of it is not useful beyond our discipline and may never be useful beyond our discipline. If it is not right either — why are we wasting our time doing it? Is “mostly right” really good enough for us?

Posted in General, Imperial, undergrad maths | 8 Comments

Formalising mathematics — a mathematician’s personal viewpoint.

Here is an overview of my current thoughts on formal proof verification systems in modern mathematics.

3d computer modelling software such as Blender can make virtual versions of abstract 3d mathematical figures such as truncated dodecahedra and so on. This has random cool consequences. For example this means that 16 year olds can make random cool videos about fractals.

Lean is a system (one of many) where we can make virtual versions of far more complex mathematical objects. Mathematicians no longer have to imagine them in their minds — they can create them and manipulate them in a virtual world. This virtual world is really bad at allowing mathematicians to manipulate objects in the intuitive and fluid way that mathematicans think about concepts. However it is really good at checking that at the end of the day, any precise ideas which come out of the fluid experiments and manipulations which our brains seem to be so good at, are well-defined, clear, and correct. The virtual world is an exceptionally good place for mathematicians to be able to clearly discuss whether or not a given proof is correct, for example, and to very precisely pinpoint places where more details need to be supplied. The virtual world is black and white. It is a computer model of formalist’s world of mathematics, where if something compiles, it is correct, and if it does not compile, it is not correct. There is no longer any debate. It is pure mathematics, the way pure mathematics is meant to be done, and needs to be done. What are we, as pure mathematicians, without rigour?

The exciting thing about Lean in particular, is not that not only is this theoretically possible, it is actually happening. 15 months ago I had never used a theorem prover. But I made schemes with the help of some friendly mathematicians and computer scientists, that I met at the Lean Zulip chat, and we are now making perfectoid spaces. The speed at which new things are appearing is terrifying. We got modular forms yesterday. Group cohomology is easily within our grasp, we just need to find the manpower to do it. We don’t even have real manifolds yet, but we’ve got the upper half plane, and I bet you we’ll have real manifolds by this time next year. What area do you work in? You should be interested in knowing where we are up to. Come to the Zulip chat and ask us what needs doing before you can state your favourite theorem in your area in Lean. There is a non-zero chance that we can state it right now. There is a non-zero chance that your conversation with us will end up with a new feasible target for us to aim at. There is a non-zero chance that we will be able to state it in Lean by this time next year.

But why? Why bother formalising definitions of complex mathematical objects and statements (theorems, conjectures, questions) about these objects?

In some sense, my answer is — because it’s fun. And it’s going to happen whether or not you care, and the sooner that we as a community of pure mathematicians care, the sooner we will be able to have great new ideas about what to do with these virtual objects. Moreover, there is funding.

What I think the correct question is, is “What shall we try and do with these objects, now they are appearing?”. Here are some answers to that.

a) Maybe one day we will be able to make cool search.

b) Maybe one day we will be able to get AI to look at what we’ve done and suggest more stuff.

c) Maybe one day we can start trying to check proofs, or at least start thinking about how certain theorems imply other theorems.

d) Maybe right now we will be able to use these objects to make our teaching of undergraduate mathematics better. Undergraduate mathematics students at Imperial College London pay over £9000, and in some cases far far more, for the privilege of spending one year at this institution, and they deserve an exceptional service for this. I am integrating Lean into my teaching and every year is a new experiment. I am going to keep experimenting until I get it right.

d) Maybe right now we should try to make formalising statements of research theorems the new normal. Mathematicians changed from typewriters to LaTeX. They can change again. Formalising statements is quick, if all the right definitions are there, and if the mathematician has been trained. Let’s formalise definitions and then let’s train mathematicians to formalise statements of their theorems.

Why right now? Because now is good. Here is my personal story. As I already said, 15 months ago I had never heard of the Lean Theorem Prover, and I had never used anything like a theorem prover in my life. Then in July 2017 I watched a live stream (thank you Newton Institute!) of Tom Hales’ talk in Cambridge, and in particular I saw his answer to Tobias Nipkow’s question 48 minutes in. And here we are now, just over a year later, with me half way through perfectoid spaces, integrating Lean into my first year undergraduate teaching, and two of my starting second year Imperial College undergraduate students, Chris Hughes and Kenny Lau, are both much better than me at it. The links are to their first year undergraduate projects, one a complete formal proof of Sylow’s theorems and the other an almost finished formalization of the local Langlands conjectures for abelian algebraic groups over a p-adic field. It’s all open source, we are writing the new Bourbaki and I cannot see it stopping. I know many people don’t care about Bourbaki, and I know it’s not a perfect analogy, but I do care about Bourbaki.

Mathematics is built on rigorous definitions. Computers are now capable of understanding many more mathematical definitions than they have ever been told, and I believe that this is mostly because the mathematical community, myself included, just didn’t ever realise or care that it was happening. If you’re a mathematician, I challenge you to formalise your best theorem in Lean and send it to Tom Hales! If you need hints about how to do that, come and ask us at the Lean Zulip chat. And if if it turns out that you can’t do it because you are missing some definitions, find a computer scientist with some experience in using dependent type theory, explain the definition to them, and direct them to the Lean chat where they can find help on how to write the code which you need in order to state your theorem. Believe that this new technology will have a role to play in future pure mathematics research, even if we do not currently quite understand what it will be. It is here. It needs to be taught. Mathematicians — come and teach it what you are interested in and let’s see what we can make it do.

Posted in General, Uncategorized | Tagged | 2 Comments

Happy birthday Xena!

It was a year ago today that I emailed Inkeri Hibbins, a wonderful administrator at Imperial College London, and asked her if she could email all the Imperial maths and joint maths/computing undergraduates and tell them about my new Thursday evening club called the Xena Project. At that time I was still struggling to prove that the real numbers 1 and 2 were distinct. But I knew that I somehow wanted to formalise a whole chunk of the undergraduate syllabus.

Fast forwards exactly one year, and somehow Lean’s mathematical library contains, as of today, a whole bunch of Imperial’s first year pure mathematics curriculum, and some of the second and third year pure curriculum too. Today the developers, which include Imperial undergraduates, announced definitions of the exponential function, and sine and cosine. These definitions represent a filling-in of one of the very last few missing pieces of the project to formalise my undergraduate course, M1F. My course is based on Martin Liebeck’s book “what is mathematics”. One of the chapters in that book talks about Euler’s formula for a convex polyhedron. I think that that’s the only chapter left of M1F. So much progress has been made. Thank you to everyone involved. M1F is running for the last time ever, this year, but something new and exciting will rise from its ashes and the Xena project will be ready for it. What is more, the Lean maths library also contains a bunch of stuff from M2PM2 like characteristic polynomials and determinants of matrices. A whole bunch of foundational work for this stuff was done by Imperial undergraduates (and Keji 😉 ) and I would like to thank all of the people who came along to the Xena project last year, particularly Chris Hughes and Kenny Lau.

And last but not least I would like to thank all of the Xena Project summer students. I made many funding decisions using random algorithms, because Lean is for everyone who is interested. I am sorry I couldn’t fund all of you. You have convinced me that this entire endeavour is worth continuing with. Here’s to a great term and hopefully I’ll see some of you on the evening of Thursday 4th at Xena.

If you’re not at Imperial but want to know how to use this material, come and ask at the Lean maths chat (Zulip : registration required, real name preferred). I’d be particularly interested to hear from other people involved in undergraduate teaching at this sort of level, or students about to start a mathematics degree.

Kevin

Posted in Imperial, M1F, undergrad maths | 3 Comments

What is a uniform space? How some computer scientists think about completions.

This is post 2 of 2 posts both of which are attempts to explain some basic mathematical definitions which I have never seen covered in an undergraduate mathematics course. These definitions play a rolein Lean because it’s important to set things up in the maximal generality for which they make sense, to avoid having to duplicate work. This post is about uniform spaces, and it mentions filters, which is the subject of the first post, so you might want to read that one first.

As a first year undergraduate I learnt that the reals were complete: every Cauchy sequence converges. As a second year undergraduate I learnt what it meant more generally for a metric space to be complete: every Cauchy sequence converges. However, my second year lecturer stressed that completeness was not a topological property. The example they gave was that the real numbers were homeomorphic to the open interval (-1,1) (the function \tanh being an example of a homeomorphism), and the reals were complete, yet (-1,1) was not. At the time, I thought this was the end of the story.

Around a decade later I became interested in non-archimedean analysis, and here I saw some constructions which confused me. In fact in some sense I can trace my confusion back to the construction of the p-adic integers. One can complete the integers using the p-adic metric to get \mathbb{Z}_p, the p-adic integers, and a very slick way to do this is to define \mathbb{Z}_p=\lim_n\mathbb{Z}/p^n\mathbb{Z}, the projective limit. Whilst this can be dressed up as the completion of a metric space with respect to the p-adic metric, it looks like something more fundamental than that is going on. Later on I learnt about the profinite completion of a discrete group G as \projlim_U G/U as U runs over the finite index subgroups. Where is the metric now? In what sense is the resulting space complete if it’s not even a metric space? Do I have to put some sort of artificial metric on G somehow?

Skip forward another decade and I became interested in Huber’s adic spaces as a foundation for non-archimedean geometry. I learnt a vast amount from Torsten Wedhorn’s unpublished notes, which are the notes that we are using in our perfectoid spaces project. These notes contain really clear and precise definitions of pretty much everything they use. By definition 5.31 it was clear to me that I was missing something. Wedhorn defines what it means for an abelian topological group to be complete with no mention of a metric at all! But my 2nd year topology lecturer told me that completeness was not a topological property! What is going on?

Uniform spaces — an overview

It turns out that there’s a definition which had completely passed me by — that of a uniform space. The ever-reliable Wikipedia has an article on them of course. Before I say what they are, let me tell you some facts about them.

1) Every metric space is a uniform space; every uniform space is a topological space. Uniform spaces sit in between these two standard notions.

2) It makes sense to ask if a uniform space is complete, and indeed every uniform space has a Hausdorff completion, which is unique up to unique isomorphism and satisfies a universal property. This generalises the notion of a complete metric space.

3) Every topological group can be given the structure of a uniform space, and every compact Hausdorff topological space can be given the structure of a uniform space.

These three facts together explain everything that I was confused about above. The definition of a uniform space was due to Weil in 1937! Why don’t they teach it to undergraduates? It looks at the very least like a super sequence of example sheet questions, or a little first year mini-project or something. There’s more! You can talk about uniformly continuous maps between uniform spaces, and uniformly convergent sequences of such maps. In fact this gives us a clue as to exactly what we lose when going from a metric space to a topological space, and what we need to get back.

What does it mean for a map f : X \to Y between metric spaces to be uniformly continuous? Let’s start by recalling the definition of continuity for a map between metric spaces. Continuity at x\in X means that for all \epsilon>0 there’s a \delta>0 such that d(x',x)<\delta implies d(f(x'),f(x))<\epsilon. When teaching this we usually stress that \delta depends on \epsilon, but what we need to note here is that \delta also in general depends on x. The function f is uniformly continuous if our rule for getting \deltas from \epsilons can be made to be independent of x, that is, if for all \epsilon>0 there exists \delta>0 such that for any x,x'\in X we have d(x',x)<\delta implies d(f(x'),f(x))<\epsilon.

It's clear why this notion doesn't generalise to topological spaces. If x_1\in X is arbitrary then we can consider all the open sets in the topology which contain x_1; but if x_2 is a second point then there's in general no way of comparing the open neighbourhoods of x_1 with those of x_2; there is no way of formalising the notion that "x'_1 is nearer to x_1 than x'_2 is to x_2", so we can't write down a definition of uniform continuity for maps between topological spaces; the neighbourhoods of x_1 and x_2 are in general incomparable. A uniform structure is a way of putting this information back without going as far as adding a metric.

Uniform spaces — the definition.

The definition of a uniform space in Lean is rather intimidating; it’s in mathlib at analysis/topology/uniform_space.lean. Let’s not worry about it just yet, let’s instead look at the definition on Wikipedia, which says that a uniformity (a.k.a. a uniform structure) on a set X is a filter on X\times X such that every set in the filter contains the diagonal \{(x,x)\,|\,x\in X\} (that’s the first three axioms) and then some more axioms; but I think it’s worth explaining what this notion is trying to model before talking about the last two axioms. If X is a metric space, the idea is that the uniformity is defined by: a subset U\subseteq X\times X is in the uniformity iff there exists \delta>0 such that d(x_1,x_2)<\delta implies (x_1,x_2)\in U. The fact that every set in the uniformity contains all (x,x) is the uniformity's version of d(x,x)=0; the 5th axiom on Wikipedia, that the "transpose" of an element of the uniformity is in the uniformity, is some version of the symmetry axiom for a metric, and the fourth axiom, saying that for all U in the uniformity there's some V in the uniformity with V \circ V\subseteq U, is some analogue of the statement that if d(x_1,x)<\delta/2 and d(x,x_2)<\delta/2 then d(x_1,x_2)<\delta.

The philosophy is that an element U in the uniformity is a replacement for a positive real \delta; two points x,y\in X are said to be "U-close" if (x,y)\in U, the thought being that U is our replacement for \{(x,y)\in X\times X\,|\,d(x,y)<\delta\}.

Given a uniformity on a set X, X inherits a topology: a subset V of X is open iff for all x\in V there's some U in the uniformity such that (x,y)\in U\implies y\in V. The idea of course is that for fixed x\in X, the things in X which are U-close to x are a neighbourhood of x.

Conversely given an abelian topological group (with group law +), there’s an associated uniformity: for any neighbourhood V of zero, consider the set U=\{(x,y)\,|\,x-y\in V\}; these sets generate a filter which is a uniformity. What’s going on here is that we model the distance from x to y as the distance from x-y to zero. If the group is not abelian then there are two uniformities, the left uniformity and the right uniformity, which may in general be different, but both of which generate the group’s topology.

Definition of a uniform space in Lean

let’s try and take it apart.


/-- This core description of a uniform space is outside of the type class hierarchy. It is useful
  for constructions of uniform spaces, when the topology is derived from the uniform space. -/
structure uniform_space.core (α : Type u) :=
(uniformity : filter (α × α))
(refl       : principal id_rel ≤ uniformity)
(symm       : tendsto prod.swap uniformity uniformity)
(comp       : uniformity.lift' (λs, comp_rel s s) ≤ uniformity)

-- [snip]

/-- A uniform space is a generalization of the "uniform" topological aspects of a
  metric space. It consists of a filter on `α × α` called the "uniformity", which
  satisfies properties analogous to the reflexivity, symmetry, and triangle properties
  of a metric.

  A metric space has a natural uniformity, and a uniform space has a natural topology.
  A topological group also has a natural uniformity, even when it is not metrizable. -/
class uniform_space (α : Type u) extends topological_space α, uniform_space.core α :=
(is_open_uniformity : ∀s, is_open s ↔ (∀x∈s, { p : α × α | p.1 = x → p.2 ∈ s } ∈ uniformity.sets))

We see that uniform_space extends topological_space (which is what we think it is) and also uniform_space.core, which is the uniformity. The statement that if U is in the uniformity then the set \{(y,x)\,|\,(x,y)\in U\} is too, Wikipedia’s axiom 5, is very coolly represented by the assertion that the swap function makes the uniformity filter tend to itself, or however one is supposed to talk about the tendsto predicate. The comp_rel function in Lean composes two relations, and the comp axiom is Wikipedia’s axiom 4. Of course one should not worry too much about how uniform spaces are implemented in Lean: the idea is that any standard definition or theorem about uniform spaces should be accessible via the interface (that is, the definitions and theorems that compose the majority of the uniform_space.lean file) and one should not have to worry about proving them.

Uniform continuity, and completeness

Say f:X\to Y is a map between uniform spaces. Recall the definition of uniform continuity on metric spaces said that for all \epsilon>0 there exists \delta>0 such that \delta-closeness on X implies \epsilon-closeness on Y. For uniform spaces we ask that for all U on Y‘s uniformity there exists V on X‘s uniformity such that the pre-image of U contains V. In other words, we demand that the pre-image of the uniformity on Y contains the uniformity on X. This is nothing more than saying that the induced map f \times f : X\times X\to Y\times Y satisfies tendsto (f x f) uniformity_X uniformity_Y, which is exactly the definition of uniform_continuous.

The last definition I want to mention is completeness. A uniform space is complete if every Cauchy filter converges! Let’s face it, that definition does look familiar, apart from the filter bit. So what does this mean? A filter on a uniform space X is Cauchy if it contains “arbitrarily small sets”; that is, for every U in the uniformity (remember U is our replacement for \epsilon>0) there exists V in the filter with V\times V\subseteq U. A filter converges to x\in X if for every neighbourhood W of x there’s an element V of the filter with V\subseteq W. The idea is that if (x_n) is a sequence in X then there’s a corresponding filter generated by the sets \{x_n\,|\,n\geq N\} for N=1,2,\ldots. Note that, as expected, the notion of convergence just depends on the topology, but the notion of Cauchy depends on the uniformity.

I guess one could now talk about the completion of a uniform topological space, but perhaps I’ve said enough; I am not sure I want to give a comprehensive account of these things, I just want mathematicians to be less scared of them. The definition of the completion of a uniform space is covered in the Wikipedia article. But the philosophy, for someone who wants to write Lean code, should be that any standard construction or definition or result about uniform spaces (for example any standard construction or definition or result about metric spaces) should already be in mathlib, and as an end user your job is not to prove it, but simply to find it.

Posted in undergrad maths | 2 Comments

What is a filter? How some computer scientists think about limits.

This is post 1 of 2 posts both of which are attempts to explain some basic mathematical definitions which I have never seen covered in a mathematics course. This one is on filters, and the next one will be on uniform spaces. I will be brief — I just want to say enough so that people who are interested in understanding how Lean does limits (this post) and completions (the next post) can actually attempt to engage with the Lean code in mathlib about these ideas, without being completely intimidated.

OK so let’s start with limits. Guillermo Barajas Ayuso was trying to figure out where exactly in mathlib it said that a sequence of real numbers was Cauchy iff it converged. This is a pretty fundamental fact about the real numbers, but if you dig far enough then you discover that the definition of tendsto is in mathlib’s order/filter.lean and it says


/-- `tendsto` is the generic "limit of a function" predicate.
  `tendsto f l₁ l₂` asserts that for every `l₂` neighborhood `a`,
  the `f`-preimage of `a` is an `l₁` neighborhood. -/
def tendsto (f : α → β) (l₁ : filter α) (l₂ : filter β) := l₁.map f ≤ l₂

Gaargh. Whatever does that mean? Why doesn’t it say that for all \epsilon there exists an N such that something or other?

What is happening here is that when you are building maths from nothing, it’s important to isolate key concepts and set them up in the correct generality, which is the maximal generality for which they make sense. We are trying to understand limits here. As a maths undergraduate I was introduced to the notion of a sequence tending to a limit three times. Firstly I was told what it meant for a sequence of real or complex numbers to tend to a limit. Then I was told what it meant for a sequence of elements of a metric space to tend to a limit. And finally I was told what it meant for a sequence of elements of a topological space to tend to a limit. Each of these definitions is more general than the one before. The real numbers are a metric space, and every metric space is a topological space. And of course, each new definition specialises to the one before. Note also that each time we generalised, we introduced a new mathematical concept — first metric spaces, and then topological spaces.

Well, it turns out that there are even more general definitions of what it means for something to tend to something, and before I can explain the next level of generality I need to introduce a new concept — that of a filter on a set (or, if you’re doing this in Lean, on a type).

What is a filter?

The relevant bit of Wikipedia is here:

Filter on a set

In Lean it looks like this:

structure filter (α : Type u) :=
(sets            : set (set α))
(exists_mem_sets : ∃x, x ∈ sets)
(upwards_sets    : ∀{x y}, x ∈ sets → x ⊆ y → y ∈ sets)
(directed_sets   : directed_on (⊆) sets)

So, like a topology on a set, a filter on a set X is a collection of subsets of X with some properties. The easiest way to think of the properties is to think of the sets in the filter as the “big” sets. The axioms for a filter then say: (1) there’s a big set. (2) Any set containing a big set is big and (3) the intersection of two big sets is big (I don’t quite know why mathlib chooses to express this third fact in such a fancy way, but there you go).

Note that (1) and (2) together imply that X is big. I guess the smallest filter on X would be the filter just containing X and the largest would be the one containing every subset of X, but I don’t think that either of these filters are particularly interesting. Here are some more interesting ones.

1) Principal filters. Choose x\in X and let F be the filter defined by U\in F (i.e. U is big) iff x\in U. In other words, we regard x as “the big element of X” and everything else as having size zero. Easy exercise: check this is a filter.

2) The cofinite filter. Let X be an infinite set, and say that a subset U \subseteq X is big iff it only misses finitely many elements of X, i.e. X\backslash U is finite. Such a U is called cofinite (“complement is finite”) and it’s again an easy exercise to check that this is a filter.

Filters in action part I

I remember when I was a first year undergraduate being slightly annoyed by the following definitions.
Let x_0,x_1,x_2,\ldots,x_n,\ldots be a sequence of real numbers.

Definition 1. Say \ell\in\mathbb{R}. We say (x_n)\to \ell (pronounced (x_n) tends to \ell) if for all \epsilon>0 there exists N=N(\epsilon) such that n\geq N\implies |x_n-\ell|<\epsilon.

Definition 2. We say (x_n)\to+\infty if for all B\in\mathbb{R} there exists N=N(B) such that n\geq N\implies x_n>B.

It used to irk me a bit that there was no one definition which could specialise to these two cases, given that we were using the same notation and everything. Well, Mr my old first year analysis lecturer, you could have put me out of my misery by using filters!

Definition (neighbourhood filter of \ell). If \ell\in\mathbb{R} then define the filter of neighbourhoods of \ell by saying that a subset U\subseteq\mathbb{R} is big iff there exists \epsilon>0 such that (\ell-\epsilon,\ell+\epsilon)\subseteq U. If you’re happy with the lingo, you could say U is big iff \ell is in its interior. Easy check: this is a filter.

Definition (neighbourhood filter of +\infty). Define the filter of neighbourhoods of +\infty by saying that a subset U\subseteq\mathbb{R} is big iff there exists B\in\mathbb{R} such that (B,\infty)\subseteq U. If you know how to compactify the reals by adding \pm\infty you’ll see that we’re just looking at the filter of neighbourhoods of +\infty in this compactified space and then removing \pm\infty again.

Definition (unifying two definitions in first year analysis). Let \ell be either a real number or +\infty. We say a sequence (x_n) of reals tends to \ell iff for every U in the neighbourhood filter of \ell there exists some N=N(U) such that n\geq N implies x_n\in U.

The world is now a tidier place — the definitions have been unified using this notion of a filter.

If you’re happy with the lingo, you can see that if X is a general metric or topological space then we can define the neighbourhood filter of x\in X to be all the subsets U such that x is in the interior of U, and the same definition works: (x_n)\to x in the traditional sense taught to 2nd year undergraduates iff for every U in the filter, there exists N=N(U) such that n\geq N\implies x_n\in U.

If you’ve not seen this before, you might think this is already pretty neat. But it gets even neater!

Filters in action Part II

Remember the cofinite filter on an infinite set? Let’s look at the cofinite filter on the naturals. Here a subset is big iff it is only missing finitely many naturals. Put it another way — a subset U is big iff there exists N=N(U) such that the only elements missing are less than N. Put another way: U is big iff there exists N=N(U) such that n\geq N\implies n\in U. This kind of looks familiar — in fact I wrote that phrase about four times above. We should be able to use this somehow. Say (x_n) is a sequence of real numbers. We can think of the sequence as a map f : \mathbb{N}\to\mathbb{R} sending n to x_n.

Definition Say \ell\in\mathbb{R} or \ell=+\infty. We say that (x_n)\to\ell if for all U in the neighbourhood filter of \ell, f^{-1}(U):=\{n\in\mathbb{N}\ |\ f(n)\in U\} is in the cofinite filter on \mathbb{N}.

This is just the same definition as before, rephrased using filters on both the source and the target of f. It says that the pullback of the neighbourhood filter of \ell is contained in the cofinite filter on \mathbb{N}. Now compare with this code in mathlib’s order/filter.lean:


lemma tendsto_def {f : α → β} {l₁ : filter α} {l₂ : filter β} :
  tendsto f l₁ l₂ ↔ ∀ s ∈ l₂.sets, f ⁻¹' s ∈ l₁.sets := iff.rfl

This lemma says that whilst the definition of tendsto above was a bit obscure involving map and perhaps a slightly subtle use of \leq, at the end of the day, for f : X\to Y a function, and filters L_1 on X and L_2 on Y, we have tendsto f L₁ L₂ if and only if f^{-1}(U_2)\in L_1 for all U_2\in L_2. Thinking about the case where L_2 is a neighbourhood filter and L_1 a cofinite filter keeps us grounded in sanity, but this is an extremely general definition of something tending to something and when you’re involved in building something like mathlib, the more general you can go, the better. The philosophy is that each definition or lemma should be made or proved in the maximum generality for which it makes sense or is true, and we are seeing that here.

Posted in undergrad maths | 3 Comments