Maths undergraduate example sheets

I’m teaching a course called M1F this term — the basic “introduction to proof” course which all first year maths (and joint maths/computing) students have to take at Imperial College London. The course is supposed to be managed via our VLE, but in practice this means that no other staff members can see what’s going on, so I also maintain a web page, which is here. The problem sheets will slowly appear over the course of the term.

One thing which is kind of obvious if you know about this sort of thing, but somehow I still find striking, is how incredibly varied in difficulty the problems are if you try to solve them in Lean. There seems to be barely any correlation between how difficult the questions are for a human and how difficult they are in Lean. Because of the way type theory works, questions involving integers, rationals and reals can get quite messy in Lean. Trivial observations such that “this integer is greater than 2 and less than 4, so it’s 3” can be hard to do in Lean, especially if you don’t know how to begin.

On the other hand, one thing I’ve noticed this year as I work through the questions in Lean for the second time in my life, is how much easier they are than the first time I tried them (this time last year). This is for two reasons. Firstly it’s because, obviously, I have a lot more Lean experience this time around. But secondly it’s really noticeable how many more tools there are — some are there because Mario Carneiro and others reacted to my frustration as a mathematician trying to use Lean and wrote specific tools in order to make my life easier, and some are there because members of the community are just writing tools anyway. In particular Rob Lewis wrote a tactic called linarith which tries to solve basic linear inequalities and I just noticed yesterday how much easier this can make my life. Here’s an example (it’s sheet 2 question 2).

A “largest element” of a set S\subseteq\mathbb{R} is an element s\in S which is greater than or equal to every element of S. One of the questions I ask the students is to prove that the open interval (0,1) has no largest element (exercise for students: why isn’t 0.999999\dots the largest element?).

The mathematical idea behind the solution is this: if m\in(0,1) is an element, then it’s not the largest element because the average \frac{1+m}{2} of m and 1 is larger. Looking back at the proof I wrote last year, it was 25 lines long and used exotic inbuilt theorems such as div_lt_of_mul_lt_of_pos, the statement that c > 0 and b < ac implies b / c < a (this is not surprising — one has to prove \frac{1+m}{2}<1 somehow…).

This year I'm putting up the Lean solutions in a github repository and this year the proof looks like this:


import data.real.basic
import tactic.linarith

def open_zero_one := {x : ℝ | 0 < x ∧ x < 1}

-- open_zero_one has no maximal element
theorem Q2 : ¬ (∃ m : ℝ, m ∈ open_zero_one ∧ 
                  ∀ x : ℝ, x ∈ open_zero_one → x ≤ m) :=
begin
  rintro ⟨m,⟨H1,H2⟩,Hx⟩,
  -- now apply max hypothesis to (m + 1) / 2.
  have H := Hx ((m + 1) / 2),
  -- Can only use the hypothesis if we know (m + 1) / 2 is in the interval.
  have Hav : (m + 1) / 2 ∈ open_zero_one,
    split,
      -- now 0 < m is a hypothesis and 0 < (m + 1) / 2 is the goal
      linarith, -- and this tactic does it
    -- similarly hypothesis m < 1 implies goal (m + 1) / 2 < 1
    linarith,
  -- now use that m is supposed to be the max.
  have Hwrong := H Hav, 
  -- hypothesis Hwrong (m + 1) / 2 <= m contradicts hypothesis m < 1.
  -- Contradictory hypos means we can even prove a false goal.
  linarith,
end

The proof is eight lines long (not counting the comment lines). Mario’s rintro tactic does all the introductory stuff and Rob’s linarith does all the trivial inequality argument; pretty much the only thing we have to do “manually” is have the idea that \frac{m+1}{2} is relevant.

Advertisements
Posted in Imperial, Learning Lean, M1F, tactics, undergrad maths | Leave a comment

What is the Xena Project?

The current aims and ambitions of the Xena Project are twofold.

1) Digitise the curriculum. I would like to see the main definitions, theorems and proofs, and example sheet questions and solutions, in all of the undergraduate pure mathematics courses at Imperial College London, typed up into formal proof verification software.

2) Teach undergraduates the new way. I would like to teach mathematics undergraduates how to use formal proof verification software.

Many people have asked me why I want to do these things. Here are four reasons, in no particular order.

Reason 1: I conjecture that offering undergraduates the opportunity to type basic proofs into a computer will actually make them learn better. I am actively involved in an educational project with Athina Thoma and Paola Iannone where we aim to investigate this. Most mathematics departments in the UK teach students to use basic “calculational” software such as Python, Matlab or R. I am suggesting that we should extend this to “verificational” software such as Lean, Coq or Isabelle — whichever turns out to be the most appropriate and/or useful. I had to choose one — I chose Lean. In March 2019 we hope to have our first results and I will post something at this blog.

Reason 2: Computers are now capable of understanding and doing undergraduate level mathematics. In my mind this is a huge deal. They currently don’t do it very well. But they understand it extremely well. And I believe that it is only a matter of time before they start doing it very well. As a benchmark (or challenge), I would be interested in when we will able to get a computer program to pass the end of year exam for my first year undergraduate course. One must of course say exactly what one means by this, and there are many interpretations, some far more difficult to achieve than others. But I believe that for some interpretations of this challenge (involving, for example, a human translating the exam into a format readable by the computer), this may be just about accessible within a few years — and note that I say “pass”, not “get 100 percent”. My belief is that the media would be interested in claims that computers can pass undergraduate maths exams. More importantly, I am hoping that funding agencies will be interested. This goal is still at least several years away and might be harder than I think. However the goal of teaching a computer our entire pure mathematics curriculum is most definitely accessible, of this I am 100 percent convinced. I would like to have completely achieved digitisation of the main definitions, theorems and proofs of our curriculum, with the help of my undergraduate co-workers, within the next five to ten years.

Reason 3: It is simply blue sky research. The people who developed the CD or mp3 as a method of digitising music — if you asked them why they were doing it, do you think they would have replied “so someone can invent the iPod and Spotify”? I don’t think so. Their answers would have been vaguer, perhaps of the form that digitised music is easier to move around, costs less to store, and can be analysed in new ways. If we have a database of theorems, we’ll be able to do search, we’ll be able to suggest possible theorems to people using the software, we’ll be able to run AI on the database. What will come of any of this? I don’t know. But let’s find out. There is no time frame on this.

Reason 4: I know, and indeed all researchers in pure mathematics know, that the modern pure mathematics literature contains gaps, and some of it is wrong. I believe that most of the gaps can be filled, however I believe that some of these fillable gaps can only be filled in by a handful of experts. Similarly, the experts know which parts are wrong. I think that all of this is unsatisfactory. It acts as a barrier to learners, it creates elite clubs of people “in the know”, and there is a chance that it will lead to disaster. I would like to change the culture of mathematics, so that more and more people begin to consider formalising mathematical arguments that they learn or come up with. It will be decades before computers catch up, if they ever do. It will probably cost hundreds of millions of pounds worth of person-hours to digitise any one of the known proofs of Fermat’s Last Theorem, and once digitised, these proofs will need maintaining, like any software, which will cost more money. The idea of using computers to help us fill in the gaps in serious mathematical literature is currently to a large extent a naive dream.

My idea of how to proceed is by educating undergraduates, so that they can see that computer proof verification exists, and they can see that at undergraduate level it is nowadays fairly straightforward, and fun. I believe that the current generation have left pure mathematics in somewhat of a mess, and I want to train the next generation in the tools which I believe will somehow, one day, start helping to tidy it up. I want them to drain the swamp that we have created. The time frame on this: it will still not be solved when I die. But that is merely an indication of how bad the mess is. Much of pure mathematics (e.g. current work on the p-adic Langlands program) is extremely beautiful, but at the end of the day much of it is not useful beyond our discipline and may never be useful beyond our discipline. If it is not right either — why are we wasting our time doing it? Is “mostly right” really good enough for us?

Posted in General, Imperial, undergrad maths | 6 Comments

Formalising mathematics — a mathematician’s personal viewpoint.

Here is an overview of my current thoughts on formal proof verification systems in modern mathematics.

3d computer modelling software such as Blender can make virtual versions of abstract 3d mathematical figures such as truncated dodecahedra and so on. This has random cool consequences. For example this means that 16 year olds can make random cool videos about fractals.

Lean is a system (one of many) where we can make virtual versions of far more complex mathematical objects. Mathematicians no longer have to imagine them in their minds — they can create them and manipulate them in a virtual world. This virtual world is really bad at allowing mathematicians to manipulate objects in the intuitive and fluid way that mathematicans think about concepts. However it is really good at checking that at the end of the day, any precise ideas which come out of the fluid experiments and manipulations which our brains seem to be so good at, are well-defined, clear, and correct. The virtual world is an exceptionally good place for mathematicians to be able to clearly discuss whether or not a given proof is correct, for example, and to very precisely pinpoint places where more details need to be supplied. The virtual world is black and white. It is a computer model of formalist’s world of mathematics, where if something compiles, it is correct, and if it does not compile, it is not correct. There is no longer any debate. It is pure mathematics, the way pure mathematics is meant to be done, and needs to be done. What are we, as pure mathematicians, without rigour?

The exciting thing about Lean in particular, is not that not only is this theoretically possible, it is actually happening. 15 months ago I had never used a theorem prover. But I made schemes with the help of some friendly mathematicians and computer scientists, that I met at the Lean Zulip chat, and we are now making perfectoid spaces. The speed at which new things are appearing is terrifying. We got modular forms yesterday. Group cohomology is easily within our grasp, we just need to find the manpower to do it. We don’t even have real manifolds yet, but we’ve got the upper half plane, and I bet you we’ll have real manifolds by this time next year. What area do you work in? You should be interested in knowing where we are up to. Come to the Zulip chat and ask us what needs doing before you can state your favourite theorem in your area in Lean. There is a non-zero chance that we can state it right now. There is a non-zero chance that your conversation with us will end up with a new feasible target for us to aim at. There is a non-zero chance that we will be able to state it in Lean by this time next year.

But why? Why bother formalising definitions of complex mathematical objects and statements (theorems, conjectures, questions) about these objects?

In some sense, my answer is — because it’s fun. And it’s going to happen whether or not you care, and the sooner that we as a community of pure mathematicians care, the sooner we will be able to have great new ideas about what to do with these virtual objects. Moreover, there is funding.

What I think the correct question is, is “What shall we try and do with these objects, now they are appearing?”. Here are some answers to that.

a) Maybe one day we will be able to make cool search.

b) Maybe one day we will be able to get AI to look at what we’ve done and suggest more stuff.

c) Maybe one day we can start trying to check proofs, or at least start thinking about how certain theorems imply other theorems.

d) Maybe right now we will be able to use these objects to make our teaching of undergraduate mathematics better. Undergraduate mathematics students at Imperial College London pay over £9000, and in some cases far far more, for the privilege of spending one year at this institution, and they deserve an exceptional service for this. I am integrating Lean into my teaching and every year is a new experiment. I am going to keep experimenting until I get it right.

d) Maybe right now we should try to make formalising statements of research theorems the new normal. Mathematicians changed from typewriters to LaTeX. They can change again. Formalising statements is quick, if all the right definitions are there, and if the mathematician has been trained. Let’s formalise definitions and then let’s train mathematicians to formalise statements of their theorems.

Why right now? Because now is good. Here is my personal story. As I already said, 15 months ago I had never heard of the Lean Theorem Prover, and I had never used anything like a theorem prover in my life. Then in July 2017 I watched a live stream (thank you Newton Institute!) of Tom Hales’ talk in Cambridge, and in particular I saw his answer to Tobias Nipkow’s question 48 minutes in. And here we are now, just over a year later, with me half way through perfectoid spaces, integrating Lean into my first year undergraduate teaching, and two of my starting second year Imperial College undergraduate students, Chris Hughes and Kenny Lau, are both much better than me at it. The links are to their first year undergraduate projects, one a complete formal proof of Sylow’s theorems and the other an almost finished formalization of the local Langlands conjectures for abelian algebraic groups over a p-adic field. It’s all open source, we are writing the new Bourbaki and I cannot see it stopping. I know many people don’t care about Bourbaki, and I know it’s not a perfect analogy, but I do care about Bourbaki.

Mathematics is built on rigorous definitions. Computers are now capable of understanding many more mathematical definitions than they have ever been told, and I believe that this is mostly because the mathematical community, myself included, just didn’t ever realise or care that it was happening. If you’re a mathematician, I challenge you to formalise your best theorem in Lean and send it to Tom Hales! If you need hints about how to do that, come and ask us at the Lean Zulip chat. And if if it turns out that you can’t do it because you are missing some definitions, find a computer scientist with some experience in using dependent type theory, explain the definition to them, and direct them to the Lean chat where they can find help on how to write the code which you need in order to state your theorem. Believe that this new technology will have a role to play in future pure mathematics research, even if we do not currently quite understand what it will be. It is here. It needs to be taught. Mathematicians — come and teach it what you are interested in and let’s see what we can make it do.

Posted in General, Uncategorized | Tagged | 2 Comments

Happy birthday Xena!

It was a year ago today that I emailed Inkeri Hibbins, a wonderful administrator at Imperial College London, and asked her if she could email all the Imperial maths and joint maths/computing undergraduates and tell them about my new Thursday evening club called the Xena Project. At that time I was still struggling to prove that the real numbers 1 and 2 were distinct. But I knew that I somehow wanted to formalise a whole chunk of the undergraduate syllabus.

Fast forwards exactly one year, and somehow Lean’s mathematical library contains, as of today, a whole bunch of Imperial’s first year pure mathematics curriculum, and some of the second and third year pure curriculum too. Today the developers, which include Imperial undergraduates, announced definitions of the exponential function, and sine and cosine. These definitions represent a filling-in of one of the very last few missing pieces of the project to formalise my undergraduate course, M1F. My course is based on Martin Liebeck’s book “what is mathematics”. One of the chapters in that book talks about Euler’s formula for a convex polyhedron. I think that that’s the only chapter left of M1F. So much progress has been made. Thank you to everyone involved. M1F is running for the last time ever, this year, but something new and exciting will rise from its ashes and the Xena project will be ready for it. What is more, the Lean maths library also contains a bunch of stuff from M2PM2 like characteristic polynomials and determinants of matrices. A whole bunch of foundational work for this stuff was done by Imperial undergraduates (and Keji 😉 ) and I would like to thank all of the people who came along to the Xena project last year, particularly Chris Hughes and Kenny Lau.

And last but not least I would like to thank all of the Xena Project summer students. I made many funding decisions using random algorithms, because Lean is for everyone who is interested. I am sorry I couldn’t fund all of you. You have convinced me that this entire endeavour is worth continuing with. Here’s to a great term and hopefully I’ll see some of you on the evening of Thursday 4th at Xena.

If you’re not at Imperial but want to know how to use this material, come and ask at the Lean maths chat (Zulip : registration required, real name preferred). I’d be particularly interested to hear from other people involved in undergraduate teaching at this sort of level, or students about to start a mathematics degree.

Kevin

Posted in Imperial, M1F, undergrad maths | 3 Comments

What is a uniform space? How some computer scientists think about completions.

This is post 2 of 2 posts both of which are attempts to explain some basic mathematical definitions which I have never seen covered in an undergraduate mathematics course. These definitions play a rolein Lean because it’s important to set things up in the maximal generality for which they make sense, to avoid having to duplicate work. This post is about uniform spaces, and it mentions filters, which is the subject of the first post, so you might want to read that one first.

As a first year undergraduate I learnt that the reals were complete: every Cauchy sequence converges. As a second year undergraduate I learnt what it meant more generally for a metric space to be complete: every Cauchy sequence converges. However, my second year lecturer stressed that completeness was not a topological property. The example they gave was that the real numbers were homeomorphic to the open interval (-1,1) (the function \tanh being an example of a homeomorphism), and the reals were complete, yet (-1,1) was not. At the time, I thought this was the end of the story.

Around a decade later I became interested in non-archimedean analysis, and here I saw some constructions which confused me. In fact in some sense I can trace my confusion back to the construction of the p-adic integers. One can complete the integers using the p-adic metric to get \mathbb{Z}_p, the p-adic integers, and a very slick way to do this is to define \mathbb{Z}_p=\lim_n\mathbb{Z}/p^n\mathbb{Z}, the projective limit. Whilst this can be dressed up as the completion of a metric space with respect to the p-adic metric, it looks like something more fundamental than that is going on. Later on I learnt about the profinite completion of a discrete group G as \projlim_U G/U as U runs over the finite index subgroups. Where is the metric now? In what sense is the resulting space complete if it’s not even a metric space? Do I have to put some sort of artificial metric on G somehow?

Skip forward another decade and I became interested in Huber’s adic spaces as a foundation for non-archimedean geometry. I learnt a vast amount from Torsten Wedhorn’s unpublished notes, which are the notes that we are using in our perfectoid spaces project. These notes contain really clear and precise definitions of pretty much everything they use. By definition 5.31 it was clear to me that I was missing something. Wedhorn defines what it means for an abelian topological group to be complete with no mention of a metric at all! But my 2nd year topology lecturer told me that completeness was not a topological property! What is going on?

Uniform spaces — an overview

It turns out that there’s a definition which had completely passed me by — that of a uniform space. The ever-reliable Wikipedia has an article on them of course. Before I say what they are, let me tell you some facts about them.

1) Every metric space is a uniform space; every uniform space is a topological space. Uniform spaces sit in between these two standard notions.

2) It makes sense to ask if a uniform space is complete, and indeed every uniform space has a Hausdorff completion, which is unique up to unique isomorphism and satisfies a universal property. This generalises the notion of a complete metric space.

3) Every topological group can be given the structure of a uniform space, and every compact Hausdorff topological space can be given the structure of a uniform space.

These three facts together explain everything that I was confused about above. The definition of a uniform space was due to Weil in 1937! Why don’t they teach it to undergraduates? It looks at the very least like a super sequence of example sheet questions, or a little first year mini-project or something. There’s more! You can talk about uniformly continuous maps between uniform spaces, and uniformly convergent sequences of such maps. In fact this gives us a clue as to exactly what we lose when going from a metric space to a topological space, and what we need to get back.

What does it mean for a map f : X \to Y between metric spaces to be uniformly continuous? Let’s start by recalling the definition of continuity for a map between metric spaces. Continuity at x\in X means that for all \epsilon>0 there’s a \delta>0 such that d(x',x)<\delta implies d(f(x'),f(x))<\epsilon. When teaching this we usually stress that \delta depends on \epsilon, but what we need to note here is that \delta also in general depends on x. The function f is uniformly continuous if our rule for getting \deltas from \epsilons can be made to be independent of x, that is, if for all \epsilon>0 there exists \delta>0 such that for any x,x'\in X we have d(x',x)<\delta implies d(f(x'),f(x))<\epsilon.

It's clear why this notion doesn't generalise to topological spaces. If x_1\in X is arbitrary then we can consider all the open sets in the topology which contain x_1; but if x_2 is a second point then there's in general no way of comparing the open neighbourhoods of x_1 with those of x_2; there is no way of formalising the notion that "x'_1 is nearer to x_1 than x'_2 is to x_2", so we can't write down a definition of uniform continuity for maps between topological spaces; the neighbourhoods of x_1 and x_2 are in general incomparable. A uniform structure is a way of putting this information back without going as far as adding a metric.

Uniform spaces — the definition.

The definition of a uniform space in Lean is rather intimidating; it’s in mathlib at analysis/topology/uniform_space.lean. Let’s not worry about it just yet, let’s instead look at the definition on Wikipedia, which says that a uniformity (a.k.a. a uniform structure) on a set X is a filter on X\times X such that every set in the filter contains the diagonal \{(x,x)\,|\,x\in X\} (that’s the first three axioms) and then some more axioms; but I think it’s worth explaining what this notion is trying to model before talking about the last two axioms. If X is a metric space, the idea is that the uniformity is defined by: a subset U\subseteq X\times X is in the uniformity iff there exists \delta>0 such that d(x_1,x_2)<\delta implies (x_1,x_2)\in U. The fact that every set in the uniformity contains all (x,x) is the uniformity's version of d(x,x)=0; the 5th axiom on Wikipedia, that the "transpose" of an element of the uniformity is in the uniformity, is some version of the symmetry axiom for a metric, and the fourth axiom, saying that for all U in the uniformity there's some V in the uniformity with V \circ V\subseteq U, is some analogue of the statement that if d(x_1,x)<\delta/2 and d(x,x_2)<\delta/2 then d(x_1,x_2)<\delta.

The philosophy is that an element U in the uniformity is a replacement for a positive real \delta; two points x,y\in X are said to be "U-close" if (x,y)\in U, the thought being that U is our replacement for \{(x,y)\in X\times X\,|\,d(x,y)<\delta\}.

Given a uniformity on a set X, X inherits a topology: a subset V of X is open iff for all x\in V there's some U in the uniformity such that (x,y)\in U\implies y\in V. The idea of course is that for fixed x\in X, the things in X which are U-close to x are a neighbourhood of x.

Conversely given an abelian topological group (with group law +), there’s an associated uniformity: for any neighbourhood V of zero, consider the set U=\{(x,y)\,|\,x-y\in V\}; these sets generate a filter which is a uniformity. What’s going on here is that we model the distance from x to y as the distance from x-y to zero. If the group is not abelian then there are two uniformities, the left uniformity and the right uniformity, which may in general be different, but both of which generate the group’s topology.

Definition of a uniform space in Lean

let’s try and take it apart.


/-- This core description of a uniform space is outside of the type class hierarchy. It is useful
  for constructions of uniform spaces, when the topology is derived from the uniform space. -/
structure uniform_space.core (α : Type u) :=
(uniformity : filter (α × α))
(refl       : principal id_rel ≤ uniformity)
(symm       : tendsto prod.swap uniformity uniformity)
(comp       : uniformity.lift' (λs, comp_rel s s) ≤ uniformity)

-- [snip]

/-- A uniform space is a generalization of the "uniform" topological aspects of a
  metric space. It consists of a filter on `α × α` called the "uniformity", which
  satisfies properties analogous to the reflexivity, symmetry, and triangle properties
  of a metric.

  A metric space has a natural uniformity, and a uniform space has a natural topology.
  A topological group also has a natural uniformity, even when it is not metrizable. -/
class uniform_space (α : Type u) extends topological_space α, uniform_space.core α :=
(is_open_uniformity : ∀s, is_open s ↔ (∀x∈s, { p : α × α | p.1 = x → p.2 ∈ s } ∈ uniformity.sets))

We see that uniform_space extends topological_space (which is what we think it is) and also uniform_space.core, which is the uniformity. The statement that if U is in the uniformity then the set \{(y,x)\,|\,(x,y)\in U\} is too, Wikipedia’s axiom 5, is very coolly represented by the assertion that the swap function makes the uniformity filter tend to itself, or however one is supposed to talk about the tendsto predicate. The comp_rel function in Lean composes two relations, and the comp axiom is Wikipedia’s axiom 4. Of course one should not worry too much about how uniform spaces are implemented in Lean: the idea is that any standard definition or theorem about uniform spaces should be accessible via the interface (that is, the definitions and theorems that compose the majority of the uniform_space.lean file) and one should not have to worry about proving them.

Uniform continuity, and completeness

Say f:X\to Y is a map between uniform spaces. Recall the definition of uniform continuity on metric spaces said that for all \epsilon>0 there exists \delta>0 such that \delta-closeness on X implies \epsilon-closeness on Y. For uniform spaces we ask that for all U on Y‘s uniformity there exists V on X‘s uniformity such that the pre-image of U contains V. In other words, we demand that the pre-image of the uniformity on Y contains the uniformity on X. This is nothing more than saying that the induced map f \times f : X\times X\to Y\times Y satisfies tendsto (f x f) uniformity_X uniformity_Y, which is exactly the definition of uniform_continuous.

The last definition I want to mention is completeness. A uniform space is complete if every Cauchy filter converges! Let’s face it, that definition does look familiar, apart from the filter bit. So what does this mean? A filter on a uniform space X is Cauchy if it contains “arbitrarily small sets”; that is, for every U in the uniformity (remember U is our replacement for \epsilon>0) there exists V in the filter with V\times V\subseteq U. A filter converges to x\in X if for every neighbourhood W of x there’s an element V of the filter with V\subseteq W. The idea is that if (x_n) is a sequence in X then there’s a corresponding filter generated by the sets \{x_n\,|\,n\geq N\} for N=1,2,\ldots. Note that, as expected, the notion of convergence just depends on the topology, but the notion of Cauchy depends on the uniformity.

I guess one could now talk about the completion of a uniform topological space, but perhaps I’ve said enough; I am not sure I want to give a comprehensive account of these things, I just want mathematicians to be less scared of them. The definition of the completion of a uniform space is covered in the Wikipedia article. But the philosophy, for someone who wants to write Lean code, should be that any standard construction or definition or result about uniform spaces (for example any standard construction or definition or result about metric spaces) should already be in mathlib, and as an end user your job is not to prove it, but simply to find it.

Posted in undergrad maths | 2 Comments

What is a filter? How some computer scientists think about limits.

This is post 1 of 2 posts both of which are attempts to explain some basic mathematical definitions which I have never seen covered in a mathematics course. This one is on filters, and the next one will be on uniform spaces. I will be brief — I just want to say enough so that people who are interested in understanding how Lean does limits (this post) and completions (the next post) can actually attempt to engage with the Lean code in mathlib about these ideas, without being completely intimidated.

OK so let’s start with limits. Guillermo Barajas Ayuso was trying to figure out where exactly in mathlib it said that a sequence of real numbers was Cauchy iff it converged. This is a pretty fundamental fact about the real numbers, but if you dig far enough then you discover that the definition of tendsto is in mathlib’s order/filter.lean and it says


/-- `tendsto` is the generic "limit of a function" predicate.
  `tendsto f l₁ l₂` asserts that for every `l₂` neighborhood `a`,
  the `f`-preimage of `a` is an `l₁` neighborhood. -/
def tendsto (f : α → β) (l₁ : filter α) (l₂ : filter β) := l₁.map f ≤ l₂

Gaargh. Whatever does that mean? Why doesn’t it say that for all \epsilon there exists an N such that something or other?

What is happening here is that when you are building maths from nothing, it’s important to isolate key concepts and set them up in the correct generality, which is the maximal generality for which they make sense. We are trying to understand limits here. As a maths undergraduate I was introduced to the notion of a sequence tending to a limit three times. Firstly I was told what it meant for a sequence of real or complex numbers to tend to a limit. Then I was told what it meant for a sequence of elements of a metric space to tend to a limit. And finally I was told what it meant for a sequence of elements of a topological space to tend to a limit. Each of these definitions is more general than the one before. The real numbers are a metric space, and every metric space is a topological space. And of course, each new definition specialises to the one before. Note also that each time we generalised, we introduced a new mathematical concept — first metric spaces, and then topological spaces.

Well, it turns out that there are even more general definitions of what it means for something to tend to something, and before I can explain the next level of generality I need to introduce a new concept — that of a filter on a set (or, if you’re doing this in Lean, on a type).

What is a filter?

The relevant bit of Wikipedia is here:

Filter on a set

In Lean it looks like this:

structure filter (α : Type u) :=
(sets            : set (set α))
(exists_mem_sets : ∃x, x ∈ sets)
(upwards_sets    : ∀{x y}, x ∈ sets → x ⊆ y → y ∈ sets)
(directed_sets   : directed_on (⊆) sets)

So, like a topology on a set, a filter on a set X is a collection of subsets of X with some properties. The easiest way to think of the properties is to think of the sets in the filter as the “big” sets. The axioms for a filter then say: (1) there’s a big set. (2) Any set containing a big set is big and (3) the intersection of two big sets is big (I don’t quite know why mathlib chooses to express this third fact in such a fancy way, but there you go).

Note that (1) and (2) together imply that X is big. I guess the smallest filter on X would be the filter just containing X and the largest would be the one containing every subset of X, but I don’t think that either of these filters are particularly interesting. Here are some more interesting ones.

1) Principal filters. Choose x\in X and let F be the filter defined by U\in F (i.e. U is big) iff x\in U. In other words, we regard x as “the big element of X” and everything else as having size zero. Easy exercise: check this is a filter.

2) The cofinite filter. Let X be an infinite set, and say that a subset U \subseteq X is big iff it only misses finitely many elements of X, i.e. X\backslash U is finite. Such a U is called cofinite (“complement is finite”) and it’s again an easy exercise to check that this is a filter.

Filters in action part I

I remember when I was a first year undergraduate being slightly annoyed by the following definitions.
Let x_0,x_1,x_2,\ldots,x_n,\ldots be a sequence of real numbers.

Definition 1. Say \ell\in\mathbb{R}. We say (x_n)\to \ell (pronounced (x_n) tends to \ell) if for all \epsilon>0 there exists N=N(\epsilon) such that n\geq N\implies |x_n-\ell|<\epsilon.

Definition 2. We say (x_n)\to+\infty if for all B\in\mathbb{R} there exists N=N(B) such that n\geq N\implies x_n>B.

It used to irk me a bit that there was no one definition which could specialise to these two cases, given that we were using the same notation and everything. Well, Mr my old first year analysis lecturer, you could have put me out of my misery by using filters!

Definition (neighbourhood filter of \ell). If \ell\in\mathbb{R} then define the filter of neighbourhoods of \ell by saying that a subset U\subseteq\mathbb{R} is big iff there exists \epsilon>0 such that (\ell-\epsilon,\ell+\epsilon)\subseteq U. If you’re happy with the lingo, you could say U is big iff \ell is in its interior. Easy check: this is a filter.

Definition (neighbourhood filter of +\infty). Define the filter of neighbourhoods of +\infty by saying that a subset U\subseteq\mathbb{R} is big iff there exists B\in\mathbb{R} such that (B,\infty)\subseteq U. If you know how to compactify the reals by adding \pm\infty you’ll see that we’re just looking at the filter of neighbourhoods of +\infty in this compactified space and then removing \pm\infty again.

Definition (unifying two definitions in first year analysis). Let \ell be either a real number or +\infty. We say a sequence (x_n) of reals tends to \ell iff for every U in the neighbourhood filter of \ell there exists some N=N(U) such that n\geq N implies x_n\in U.

The world is now a tidier place — the definitions have been unified using this notion of a filter.

If you’re happy with the lingo, you can see that if X is a general metric or topological space then we can define the neighbourhood filter of x\in X to be all the subsets U such that x is in the interior of U, and the same definition works: (x_n)\to x in the traditional sense taught to 2nd year undergraduates iff for every U in the filter, there exists N=N(U) such that n\geq N\implies x_n\in U.

If you’ve not seen this before, you might think this is already pretty neat. But it gets even neater!

Filters in action Part II

Remember the cofinite filter on an infinite set? Let’s look at the cofinite filter on the naturals. Here a subset is big iff it is only missing finitely many naturals. Put it another way — a subset U is big iff there exists N=N(U) such that the only elements missing are less than N. Put another way: U is big iff there exists N=N(U) such that n\geq N\implies n\in U. This kind of looks familiar — in fact I wrote that phrase about four times above. We should be able to use this somehow. Say (x_n) is a sequence of real numbers. We can think of the sequence as a map f : \mathbb{N}\to\mathbb{R} sending n to x_n.

Definition Say \ell\in\mathbb{R} or \ell=+\infty. We say that (x_n)\to\ell if for all U in the neighbourhood filter of \ell, f^{-1}(U):=\{n\in\mathbb{N}\ |\ f(n)\in U\} is in the cofinite filter on \mathbb{N}.

This is just the same definition as before, rephrased using filters on both the source and the target of f. It says that the pullback of the neighbourhood filter of \ell is contained in the cofinite filter on \mathbb{N}. Now compare with this code in mathlib’s order/filter.lean:


lemma tendsto_def {f : α → β} {l₁ : filter α} {l₂ : filter β} :
  tendsto f l₁ l₂ ↔ ∀ s ∈ l₂.sets, f ⁻¹' s ∈ l₁.sets := iff.rfl

This lemma says that whilst the definition of tendsto above was a bit obscure involving map and perhaps a slightly subtle use of \leq, at the end of the day, for f : X\to Y a function, and filters L_1 on X and L_2 on Y, we have tendsto f L₁ L₂ if and only if f^{-1}(U_2)\in L_1 for all U_2\in L_2. Thinking about the case where L_2 is a neighbourhood filter and L_1 a cofinite filter keeps us grounded in sanity, but this is an extremely general definition of something tending to something and when you’re involved in building something like mathlib, the more general you can go, the better. The philosophy is that each definition or lemma should be made or proved in the maximum generality for which it makes sense or is true, and we are seeing that here.

Posted in undergrad maths | 3 Comments

617 is prime

More number theory today. Clara List and Sangwoo Jo were trying to do a question from the third year number theory example sheet; to work out whether 605 was a square modulo 617. They decided to assume the law of quadratic reciprocity (which we hopefully will have formalised by the end of the summer) and to go on from there. But the next problem is to prove that 617 is prime, so the law can be applied! People who know about the Jacobi symbol might think that this problem can be avoided, but the problem with using the Jacobi symbol is that if it evaluates to +1 then this does not guarantee that the top is a square modulo the bottom.

Now Lean was not designed to do calculations, and whilst it knows that primality is decidable it does not have an efficient algorithm to actually do the deciding (or at least, it did not seem to at the time of writing). In particular, the code


import data.nat.prime

open nat

example : prime 617 := dec_trivial

fails! It should compile in theory, but it times out in practice. Lean was not designed to do computations. Lean was designed to do things like verify the proof that there are infinitely many primes (which involves theoretically multiplying lots of primes together and adding one, without ever actually calculating the product).

I was sufficiently interested in this to dig further. Lean’s maths library does have the theorem that an integer p \geq 2 is prime iff it has no factors m with 2 \leq m \leq \sqrt{p}, and it can deduce with little trouble that to prove that 617 is prime, it suffices to show that it has no divisors m with 2 \leq m \leq 24. However, proving, for example, that 3 does not divide 617, is a lot of work for Lean’s default algorithm, which I assume is doing something like subtracting 3 from 617 hundreds of times and storing the result in ever more complex proof terms or some such thing. The problem is that the natural number 617 is stored internally as succ (succ (succ (succ ... (succ zero))))))) ... ), which is very unwieldy to work with in practice. What to do?

Remember that the thing Lean is good at is theorems. For example it knows the following theorem:

add_mul_mod_self_right : ∀ (x y z : ℕ), (x + y * z) % z = x % z

[here % denotes the “remainder after division” operator] so if it somehow knew that 617 = 2 + 205 * 3 then it would be able to use this theorem to prove that 617 had remainder 2 when divided by 3, and checking non-divisibility by all positive integers m\leq 24 in this way we can deduce that 617 is prime.

Now checking that 617 = 2 + 205 * 3 is something Lean can manage! It has a powerful norm_num tactic which works with numbers using a sensible binary format and where it can do calculations efficiently; it can then translate its arguments into a formal proof that the calculations are correct. So all we need to do now is to figure out how to write 617 as r + q * m for 2\leq m\leq 24.

To do this I just wrote some python code whose output was the Lean code I wanted.


import tactic.norm_num
import data.nat.prime

open nat

theorem sqrt617 : sqrt 617 = 24 :=
begin
  symmetry,
  rw eq_sqrt,
  norm_num,
end

theorem prime617iff : prime 617 ↔ ∀ m, 2 ≤ m → m ≤ 24 → ¬ (m ∣ 617) :=
begin
  rw prime_def_le_sqrt,
  rw sqrt617,
  apply and_iff_right,
  exact dec_trivial
end


theorem prime617 : prime 617 :=
begin
  rw prime617iff,
  show ∀ (m : ℕ), 2 ≤ m → m ≤ 24 → ¬m ∣ 617,
  intros m hm1 hm2,
  cases m,cases hm1,
  cases m,cases hm1,cases hm1_a, -- m = 1
-- the rest of this code was written by python!
  cases m,show ¬ (2 ∣ 617), have h : 617 = 1 + 308 * 2 := by norm_num,rw h,intro h2,
    have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
  cases m,show ¬ (3 ∣ 617), have h : 617 = 2 + 205 * 3 := by norm_num,rw h,intro h2,
    have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
  cases m, show ¬ (4 ∣ 617), have h : 617 = 1 + 154 * 4 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
cases m, show ¬ (5 ∣ 617), have h : 617 = 2 + 123 * 5 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
cases m, show ¬ (6 ∣ 617), have h : 617 = 5 + 102 * 6 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
cases m, show ¬ (7 ∣ 617), have h : 617 = 1 + 88 * 7 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
cases m, show ¬ (8 ∣ 617), have h : 617 = 1 + 77 * 8 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
cases m, show ¬ (9 ∣ 617), have h : 617 = 5 + 68 * 9 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
cases m, show ¬ (10 ∣ 617), have h : 617 = 7 + 61 * 10 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
cases m, show ¬ (11 ∣ 617), have h : 617 = 1 + 56 * 11 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
cases m, show ¬ (12 ∣ 617), have h : 617 = 5 + 51 * 12 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
cases m, show ¬ (13 ∣ 617), have h : 617 = 6 + 47 * 13 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
cases m, show ¬ (14 ∣ 617), have h : 617 = 1 + 44 * 14 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
cases m, show ¬ (15 ∣ 617), have h : 617 = 2 + 41 * 15 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
cases m, show ¬ (16 ∣ 617), have h : 617 = 9 + 38 * 16 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
cases m, show ¬ (17 ∣ 617), have h : 617 = 5 + 36 * 17 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
cases m, show ¬ (18 ∣ 617), have h : 617 = 5 + 34 * 18 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
cases m, show ¬ (19 ∣ 617), have h : 617 = 9 + 32 * 19 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
cases m, show ¬ (20 ∣ 617), have h : 617 = 17 + 30 * 20 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
cases m, show ¬ (21 ∣ 617), have h : 617 = 8 + 29 * 21 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
cases m, show ¬ (22 ∣ 617), have h : 617 = 1 + 28 * 22 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
cases m, show ¬ (23 ∣ 617), have h : 617 = 19 + 26 * 23 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
cases m, show ¬ (24 ∣ 617), have h : 617 = 17 + 25 * 24 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
revert hm2,exact dec_trivial,
end 

Now maybe the python code I wrote to generate that Lean code has bugs! Maybe if I change 617 to some other prime it wouldn’t work for some silly reason. But who cares! The Lean code that my python code generated compiled fine, in well under a second, and now we have a proof that 617 is prime.

This is exactly what tactics are. Lean tactics are on the whole written in Lean’s metaprogramming language, and are not checked by the kernel. Tactics might have bugs — they might not always do what it says they will do. However this does not affect our proofs. If my python code had failed to produce valid equations then the Lean code would simply not have compiled, and we would have learnt nothing. But my python code did work, and then the Lean code compiled and then it didn’t matter whether the python code was buggy or not. Similarly tactics can fail — either because the user applies them in a situation where they are not designed to succeed, or because the programmer made some error. However they can never prove something false, because after the tactic produces the code, the kernel checks it, and if the kernel says the code proves the theorem, then who cares where it came from.

This algorithm can manifestly be generalised and one might imagine writing such a tactic in Lean, using unsafe calculations to do the division and remainder, and then checking them in the kernel.

***

Note added 2 days later: Mario beefed up the norm_num tactic and now theorem prime_617 : nat.prime 617 := by norm_num works out of the box! Life is great!

Posted in number theory, tactics | 3 Comments