What is a uniform space? How some computer scientists think about completions.

This is post 2 of 2 posts both of which are attempts to explain some basic mathematical definitions which I have never seen covered in an undergraduate mathematics course. These definitions play a rolein Lean because it’s important to set things up in the maximal generality for which they make sense, to avoid having to duplicate work. This post is about uniform spaces, and it mentions filters, which is the subject of the first post, so you might want to read that one first.

As a first year undergraduate I learnt that the reals were complete: every Cauchy sequence converges. As a second year undergraduate I learnt what it meant more generally for a metric space to be complete: every Cauchy sequence converges. However, my second year lecturer stressed that completeness was not a topological property. The example they gave was that the real numbers were homeomorphic to the open interval (-1,1) (the function \tanh being an example of a homeomorphism), and the reals were complete, yet (-1,1) was not. At the time, I thought this was the end of the story.

Around a decade later I became interested in non-archimedean analysis, and here I saw some constructions which confused me. In fact in some sense I can trace my confusion back to the construction of the p-adic integers. One can complete the integers using the p-adic metric to get \mathbb{Z}_p, the p-adic integers, and a very slick way to do this is to define \mathbb{Z}_p=\lim_n\mathbb{Z}/p^n\mathbb{Z}, the projective limit. Whilst this can be dressed up as the completion of a metric space with respect to the p-adic metric, it looks like something more fundamental than that is going on. Later on I learnt about the profinite completion of a discrete group G as \projlim_U G/U as U runs over the finite index subgroups. Where is the metric now? In what sense is the resulting space complete if it’s not even a metric space? Do I have to put some sort of artificial metric on G somehow?

Skip forward another decade and I became interested in Huber’s adic spaces as a foundation for non-archimedean geometry. I learnt a vast amount from Torsten Wedhorn’s unpublished notes, which are the notes that we are using in our perfectoid spaces project. These notes contain really clear and precise definitions of pretty much everything they use. By definition 5.31 it was clear to me that I was missing something. Wedhorn defines what it means for an abelian topological group to be complete with no mention of a metric at all! But my 2nd year topology lecturer told me that completeness was not a topological property! What is going on?

Uniform spaces — an overview

It turns out that there’s a definition which had completely passed me by — that of a uniform space. The ever-reliable Wikipedia has an article on them of course. Before I say what they are, let me tell you some facts about them.

1) Every metric space is a uniform space; every uniform space is a topological space. Uniform spaces sit in between these two standard notions.

2) It makes sense to ask if a uniform space is complete, and indeed every uniform space has a Hausdorff completion, which is unique up to unique isomorphism and satisfies a universal property. This generalises the notion of a complete metric space.

3) Every topological group can be given the structure of a uniform space, and every compact Hausdorff topological space can be given the structure of a uniform space.

These three facts together explain everything that I was confused about above. The definition of a uniform space was due to Weil in 1937! Why don’t they teach it to undergraduates? It looks at the very least like a super sequence of example sheet questions, or a little first year mini-project or something. There’s more! You can talk about uniformly continuous maps between uniform spaces, and uniformly convergent sequences of such maps. In fact this gives us a clue as to exactly what we lose when going from a metric space to a topological space, and what we need to get back.

What does it mean for a map f : X \to Y between metric spaces to be uniformly continuous? Let’s start by recalling the definition of continuity for a map between metric spaces. Continuity at x\in X means that for all \epsilon>0 there’s a \delta>0 such that d(x',x)<\delta implies d(f(x'),f(x))<\epsilon. When teaching this we usually stress that \delta depends on \epsilon, but what we need to note here is that \delta also in general depends on x. The function f is uniformly continuous if our rule for getting \deltas from \epsilons can be made to be independent of x, that is, if for all \epsilon>0 there exists \delta>0 such that for any x,x'\in X we have d(x',x)<\delta implies d(f(x'),f(x))<\epsilon.

It's clear why this notion doesn't generalise to topological spaces. If x_1\in X is arbitrary then we can consider all the open sets in the topology which contain x_1; but if x_2 is a second point then there's in general no way of comparing the open neighbourhoods of x_1 with those of x_2; there is no way of formalising the notion that "x'_1 is nearer to x_1 than x'_2 is to x_2", so we can't write down a definition of uniform continuity for maps between topological spaces; the neighbourhoods of x_1 and x_2 are in general incomparable. A uniform structure is a way of putting this information back without going as far as adding a metric.

Uniform spaces — the definition.

The definition of a uniform space in Lean is rather intimidating; it’s in mathlib at analysis/topology/uniform_space.lean. Let’s not worry about it just yet, let’s instead look at the definition on Wikipedia, which says that a uniformity (a.k.a. a uniform structure) on a set X is a filter on X\times X such that every set in the filter contains the diagonal \{(x,x)\,|\,x\in X\} (that’s the first three axioms) and then some more axioms; but I think it’s worth explaining what this notion is trying to model before talking about the last two axioms. If X is a metric space, the idea is that the uniformity is defined by: a subset U\subseteq X\times X is in the uniformity iff there exists \delta>0 such that d(x_1,x_2)<\delta implies (x_1,x_2)\in U. The fact that every set in the uniformity contains all (x,x) is the uniformity's version of d(x,x)=0; the 5th axiom on Wikipedia, that the "transpose" of an element of the uniformity is in the uniformity, is some version of the symmetry axiom for a metric, and the fourth axiom, saying that for all U in the uniformity there's some V in the uniformity with V \circ V\subseteq U, is some analogue of the statement that if d(x_1,x)<\delta/2 and d(x,x_2)<\delta/2 then d(x_1,x_2)<\delta.

The philosophy is that an element U in the uniformity is a replacement for a positive real \delta; two points x,y\in X are said to be "U-close" if (x,y)\in U, the thought being that U is our replacement for \{(x,y)\in X\times X\,|\,d(x,y)<\delta\}.

Given a uniformity on a set X, X inherits a topology: a subset V of X is open iff for all x\in V there's some U in the uniformity such that (x,y)\in U\implies y\in V. The idea of course is that for fixed x\in X, the things in X which are U-close to x are a neighbourhood of x.

Conversely given an abelian topological group (with group law +), there’s an associated uniformity: for any neighbourhood V of zero, consider the set U=\{(x,y)\,|\,x-y\in V\}; these sets generate a filter which is a uniformity. What’s going on here is that we model the distance from x to y as the distance from x-y to zero. If the group is not abelian then there are two uniformities, the left uniformity and the right uniformity, which may in general be different, but both of which generate the group’s topology.

Definition of a uniform space in Lean

let’s try and take it apart.


/-- This core description of a uniform space is outside of the type class hierarchy. It is useful
  for constructions of uniform spaces, when the topology is derived from the uniform space. -/
structure uniform_space.core (α : Type u) :=
(uniformity : filter (α × α))
(refl       : principal id_rel ≤ uniformity)
(symm       : tendsto prod.swap uniformity uniformity)
(comp       : uniformity.lift' (λs, comp_rel s s) ≤ uniformity)

-- [snip]

/-- A uniform space is a generalization of the "uniform" topological aspects of a
  metric space. It consists of a filter on `α × α` called the "uniformity", which
  satisfies properties analogous to the reflexivity, symmetry, and triangle properties
  of a metric.

  A metric space has a natural uniformity, and a uniform space has a natural topology.
  A topological group also has a natural uniformity, even when it is not metrizable. -/
class uniform_space (α : Type u) extends topological_space α, uniform_space.core α :=
(is_open_uniformity : ∀s, is_open s ↔ (∀x∈s, { p : α × α | p.1 = x → p.2 ∈ s } ∈ uniformity.sets))

We see that uniform_space extends topological_space (which is what we think it is) and also uniform_space.core, which is the uniformity. The statement that if U is in the uniformity then the set \{(y,x)\,|\,(x,y)\in U\} is too, Wikipedia’s axiom 5, is very coolly represented by the assertion that the swap function makes the uniformity filter tend to itself, or however one is supposed to talk about the tendsto predicate. The comp_rel function in Lean composes two relations, and the comp axiom is Wikipedia’s axiom 4. Of course one should not worry too much about how uniform spaces are implemented in Lean: the idea is that any standard definition or theorem about uniform spaces should be accessible via the interface (that is, the definitions and theorems that compose the majority of the uniform_space.lean file) and one should not have to worry about proving them.

Uniform continuity, and completeness

Say f:X\to Y is a map between uniform spaces. Recall the definition of uniform continuity on metric spaces said that for all \epsilon>0 there exists \delta>0 such that \delta-closeness on X implies \epsilon-closeness on Y. For uniform spaces we ask that for all U on Y‘s uniformity there exists V on X‘s uniformity such that the pre-image of U contains V. In other words, we demand that the pre-image of the uniformity on Y contains the uniformity on X. This is nothing more than saying that the induced map f \times f : X\times X\to Y\times Y satisfies tendsto (f x f) uniformity_X uniformity_Y, which is exactly the definition of uniform_continuous.

The last definition I want to mention is completeness. A uniform space is complete if every Cauchy filter converges! Let’s face it, that definition does look familiar, apart from the filter bit. So what does this mean? A filter on a uniform space X is Cauchy if it contains “arbitrarily small sets”; that is, for every U in the uniformity (remember U is our replacement for \epsilon>0) there exists V in the filter with V\times V\subseteq U. A filter converges to x\in X if for every neighbourhood W of x there’s an element V of the filter with V\subseteq W. The idea is that if (x_n) is a sequence in X then there’s a corresponding filter generated by the sets \{x_n\,|\,n\geq N\} for N=1,2,\ldots. Note that, as expected, the notion of convergence just depends on the topology, but the notion of Cauchy depends on the uniformity.

I guess one could now talk about the completion of a uniform topological space, but perhaps I’ve said enough; I am not sure I want to give a comprehensive account of these things, I just want mathematicians to be less scared of them. The definition of the completion of a uniform space is covered in the Wikipedia article. But the philosophy, for someone who wants to write Lean code, should be that any standard construction or definition or result about uniform spaces (for example any standard construction or definition or result about metric spaces) should already be in mathlib, and as an end user your job is not to prove it, but simply to find it.

Advertisements
Posted in undergrad maths | Leave a comment

What is a filter? How some computer scientists think about limits.

This is post 1 of 2 posts both of which are attempts to explain some basic mathematical definitions which I have never seen covered in a mathematics course. This one is on filters, and the next one will be on uniform spaces. I will be brief — I just want to say enough so that people who are interested in understanding how Lean does limits (this post) and completions (the next post) can actually attempt to engage with the Lean code in mathlib about these ideas, without being completely intimidated.

OK so let’s start with limits. Guillermo Barajas Ayuso was trying to figure out where exactly in mathlib it said that a sequence of real numbers was Cauchy iff it converged. This is a pretty fundamental fact about the real numbers, but if you dig far enough then you discover that the definition of tendsto is in mathlib’s order/filter.lean and it says


/-- `tendsto` is the generic "limit of a function" predicate.
  `tendsto f l₁ l₂` asserts that for every `l₂` neighborhood `a`,
  the `f`-preimage of `a` is an `l₁` neighborhood. -/
def tendsto (f : α → β) (l₁ : filter α) (l₂ : filter β) := l₁.map f ≤ l₂

Gaargh. Whatever does that mean? Why doesn’t it say that for all \epsilon there exists an N such that something or other?

What is happening here is that when you are building maths from nothing, it’s important to isolate key concepts and set them up in the correct generality, which is the maximal generality for which they make sense. We are trying to understand limits here. As a maths undergraduate I was introduced to the notion of a sequence tending to a limit three times. Firstly I was told what it meant for a sequence of real or complex numbers to tend to a limit. Then I was told what it meant for a sequence of elements of a metric space to tend to a limit. And finally I was told what it meant for a sequence of elements of a topological space to tend to a limit. Each of these definitions is more general than the one before. The real numbers are a metric space, and every metric space is a topological space. And of course, each new definition specialises to the one before. Note also that each time we generalised, we introduced a new mathematical concept — first metric spaces, and then topological spaces.

Well, it turns out that there are even more general definitions of what it means for something to tend to something, and before I can explain the next level of generality I need to introduce a new concept — that of a filter on a set (or, if you’re doing this in Lean, on a type).

What is a filter?

The relevant bit of Wikipedia is here:

Filter on a set

In Lean it looks like this:

structure filter (α : Type u) :=
(sets            : set (set α))
(exists_mem_sets : ∃x, x ∈ sets)
(upwards_sets    : ∀{x y}, x ∈ sets → x ⊆ y → y ∈ sets)
(directed_sets   : directed_on (⊆) sets)

So, like a topology on a set, a filter on a set X is a collection of subsets of X with some properties. The easiest way to think of the properties is to think of the sets in the filter as the “big” sets. The axioms for a filter then say: (1) there’s a big set. (2) Any set containing a big set is big and (3) the intersection of two big sets is big (I don’t quite know why mathlib chooses to express this third fact in such a fancy way, but there you go).

Note that (1) and (2) together imply that X is big. I guess the smallest filter on X would be the filter just containing X and the largest would be the one containing every subset of X, but I don’t think that either of these filters are particularly interesting. Here are some more interesting ones.

1) Principal filters. Choose x\in X and let F be the filter defined by U\in F (i.e. U is big) iff x\in U. In other words, we regard x as “the big element of X” and everything else as having size zero. Easy exercise: check this is a filter.

2) The cofinite filter. Let X be an infinite set, and say that a subset U \subseteq X is big iff it only misses finitely many elements of X, i.e. X\backslash U is finite. Such a U is called cofinite (“complement is finite”) and it’s again an easy exercise to check that this is a filter.

Filters in action part I

I remember when I was a first year undergraduate being slightly annoyed by the following definitions.
Let x_0,x_1,x_2,\ldots,x_n,\ldots be a sequence of real numbers.

Definition 1. Say \ell\in\mathbb{R}. We say (x_n)\to \ell (pronounced (x_n) tends to \ell) if for all \epsilon>0 there exists N=N(\epsilon) such that n\geq N\implies |x_n-\ell|<\epsilon.

Definition 2. We say (x_n)\to+\infty if for all B\in\mathbb{R} there exists N=N(B) such that n\geq N\implies x_n>B.

It used to irk me a bit that there was no one definition which could specialise to these two cases, given that we were using the same notation and everything. Well, Mr my old first year analysis lecturer, you could have put me out of my misery by using filters!

Definition (neighbourhood filter of \ell). If \ell\in\mathbb{R} then define the filter of neighbourhoods of \ell by saying that a subset U\subseteq\mathbb{R} is big iff there exists \epsilon>0 such that (\ell-\epsilon,\ell+\epsilon)\subseteq U. If you’re happy with the lingo, you could say U is big iff \ell is in its interior. Easy check: this is a filter.

Definition (neighbourhood filter of +\infty). Define the filter of neighbourhoods of +\infty by saying that a subset U\subseteq\mathbb{R} is big iff there exists B\in\mathbb{R} such that (B,\infty)\subseteq U. If you know how to compactify the reals by adding \pm\infty you’ll see that we’re just looking at the filter of neighbourhoods of +\infty in this compactified space and then removing \pm\infty again.

Definition (unifying two definitions in first year analysis). Let \ell be either a real number or +\infty. We say a sequence (x_n) of reals tends to \ell iff for every U in the neighbourhood filter of \ell there exists some N=N(U) such that n\geq N implies x_n\in U.

The world is now a tidier place — the definitions have been unified using this notion of a filter.

If you’re happy with the lingo, you can see that if X is a general metric or topological space then we can define the neighbourhood filter of x\in X to be all the subsets U such that x is in the interior of U, and the same definition works: (x_n)\to x in the traditional sense taught to 2nd year undergraduates iff for every U in the filter, there exists N=N(U) such that n\geq N\implies x_n\in U.

If you’ve not seen this before, you might think this is already pretty neat. But it gets even neater!

Filters in action Part II

Remember the cofinite filter on an infinite set? Let’s look at the cofinite filter on the naturals. Here a subset is big iff it is only missing finitely many naturals. Put it another way — a subset U is big iff there exists N=N(U) such that the only elements missing are less than N. Put another way: U is big iff there exists N=N(U) such that n\geq N\implies n\in U. This kind of looks familiar — in fact I wrote that phrase about four times above. We should be able to use this somehow. Say (x_n) is a sequence of real numbers. We can think of the sequence as a map f : \mathbb{N}\to\mathbb{R} sending n to x_n.

Definition Say \ell\in\mathbb{R} or \ell=+\infty. We say that (x_n)\to\ell if for all U in the neighbourhood filter of \ell, f^{-1}(U):=\{n\in\mathbb{N}\ |\ f(n)\in U\} is in the cofinite filter on \mathbb{N}.

This is just the same definition as before, rephrased using filters on both the source and the target of f. It says that the pullback of the neighbourhood filter of \ell is contained in the cofinite filter on \mathbb{N}. Now compare with this code in mathlib’s order/filter.lean:


lemma tendsto_def {f : α → β} {l₁ : filter α} {l₂ : filter β} :
  tendsto f l₁ l₂ ↔ ∀ s ∈ l₂.sets, f ⁻¹' s ∈ l₁.sets := iff.rfl

This lemma says that whilst the definition of tendsto above was a bit obscure involving map and perhaps a slightly subtle use of \leq, at the end of the day, for f : X\to Y a function, and filters L_1 on X and L_2 on Y, we have tendsto f L₁ L₂ if and only if f^{-1}(U_2)\in L_1 for all U_2\in L_2. Thinking about the case where L_2 is a neighbourhood filter and L_1 a cofinite filter keeps us grounded in sanity, but this is an extremely general definition of something tending to something and when you’re involved in building something like mathlib, the more general you can go, the better. The philosophy is that each definition or lemma should be made or proved in the maximum generality for which it makes sense or is true, and we are seeing that here.

Posted in undergrad maths | 3 Comments

617 is prime

More number theory today. Clara List and Sangwoo Jo were trying to do a question from the third year number theory example sheet; to work out whether 605 was a square modulo 617. They decided to assume the law of quadratic reciprocity (which we hopefully will have formalised by the end of the summer) and to go on from there. But the next problem is to prove that 617 is prime, so the law can be applied! People who know about the Jacobi symbol might think that this problem can be avoided, but the problem with using the Jacobi symbol is that if it evaluates to +1 then this does not guarantee that the top is a square modulo the bottom.

Now Lean was not designed to do calculations, and whilst it knows that primality is decidable it does not have an efficient algorithm to actually do the deciding (or at least, it did not seem to at the time of writing). In particular, the code


import data.nat.prime

open nat

example : prime 617 := dec_trivial

fails! It should compile in theory, but it times out in practice. Lean was not designed to do computations. Lean was designed to do things like verify the proof that there are infinitely many primes (which involves theoretically multiplying lots of primes together and adding one, without ever actually calculating the product).

I was sufficiently interested in this to dig further. Lean’s maths library does have the theorem that an integer p \geq 2 is prime iff it has no factors m with 2 \leq m \leq \sqrt{p}, and it can deduce with little trouble that to prove that 617 is prime, it suffices to show that it has no divisors m with 2 \leq m \leq 24. However, proving, for example, that 3 does not divide 617, is a lot of work for Lean’s default algorithm, which I assume is doing something like subtracting 3 from 617 hundreds of times and storing the result in ever more complex proof terms or some such thing. The problem is that the natural number 617 is stored internally as succ (succ (succ (succ ... (succ zero))))))) ... ), which is very unwieldy to work with in practice. What to do?

Remember that the thing Lean is good at is theorems. For example it knows the following theorem:

add_mul_mod_self_right : ∀ (x y z : ℕ), (x + y * z) % z = x % z

[here % denotes the “remainder after division” operator] so if it somehow knew that 617 = 2 + 205 * 3 then it would be able to use this theorem to prove that 617 had remainder 2 when divided by 3, and hence that 617 was prime.

Now checking that 617 = 2 + 205 * 3 is something Lean can manage! It has a powerful norm_num tactic which works with numbers using a sensible binary format and where it can do calculations efficiently; it can then translate its arguments into a formal proof that the calculations are correct. So all we need to do now is to figure out how to write 617 as r + q * m for 2\leq m\leq 24.

To do this I just wrote some python code whose output was the Lean code I wanted.


import tactic.norm_num
import data.nat.prime

open nat

theorem sqrt617 : sqrt 617 = 24 :=
begin
  symmetry,
  rw eq_sqrt,
  norm_num,
end

theorem prime617iff : prime 617 ↔ ∀ m, 2 ≤ m → m ≤ 24 → ¬ (m ∣ 617) :=
begin
  rw prime_def_le_sqrt,
  rw sqrt617,
  apply and_iff_right,
  exact dec_trivial
end


theorem prime617 : prime 617 :=
begin
  rw prime617iff,
  show ∀ (m : ℕ), 2 ≤ m → m ≤ 24 → ¬m ∣ 617,
  intros m hm1 hm2,
  cases m,cases hm1,
  cases m,cases hm1,cases hm1_a, -- m = 1
-- the rest of this code was written by python!
  cases m,show ¬ (2 ∣ 617), have h : 617 = 1 + 308 * 2 := by norm_num,rw h,intro h2,
    have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
  cases m,show ¬ (3 ∣ 617), have h : 617 = 2 + 205 * 3 := by norm_num,rw h,intro h2,
    have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
  cases m, show ¬ (4 ∣ 617), have h : 617 = 1 + 154 * 4 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
cases m, show ¬ (5 ∣ 617), have h : 617 = 2 + 123 * 5 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
cases m, show ¬ (6 ∣ 617), have h : 617 = 5 + 102 * 6 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
cases m, show ¬ (7 ∣ 617), have h : 617 = 1 + 88 * 7 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
cases m, show ¬ (8 ∣ 617), have h : 617 = 1 + 77 * 8 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
cases m, show ¬ (9 ∣ 617), have h : 617 = 5 + 68 * 9 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
cases m, show ¬ (10 ∣ 617), have h : 617 = 7 + 61 * 10 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
cases m, show ¬ (11 ∣ 617), have h : 617 = 1 + 56 * 11 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
cases m, show ¬ (12 ∣ 617), have h : 617 = 5 + 51 * 12 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
cases m, show ¬ (13 ∣ 617), have h : 617 = 6 + 47 * 13 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
cases m, show ¬ (14 ∣ 617), have h : 617 = 1 + 44 * 14 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
cases m, show ¬ (15 ∣ 617), have h : 617 = 2 + 41 * 15 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
cases m, show ¬ (16 ∣ 617), have h : 617 = 9 + 38 * 16 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
cases m, show ¬ (17 ∣ 617), have h : 617 = 5 + 36 * 17 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
cases m, show ¬ (18 ∣ 617), have h : 617 = 5 + 34 * 18 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
cases m, show ¬ (19 ∣ 617), have h : 617 = 9 + 32 * 19 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
cases m, show ¬ (20 ∣ 617), have h : 617 = 17 + 30 * 20 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
cases m, show ¬ (21 ∣ 617), have h : 617 = 8 + 29 * 21 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
cases m, show ¬ (22 ∣ 617), have h : 617 = 1 + 28 * 22 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
cases m, show ¬ (23 ∣ 617), have h : 617 = 19 + 26 * 23 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
cases m, show ¬ (24 ∣ 617), have h : 617 = 17 + 25 * 24 := by norm_num,rw h,intro h2,
have h3 := mod_eq_zero_of_dvd h2,rw add_mul_mod_self_right at h3,cases h3,
revert hm2,exact dec_trivial,
end 

Now maybe the python code I wrote to generate that Lean code has bugs! Maybe if I change 617 to some other prime it wouldn’t work for some silly reason. But who cares! The Lean code that my python code generated compiled fine, in well under a second, and now we have a proof that 617 is prime.

This is exactly what tactics are. Lean tactics are on the whole written in Lean’s metaprogramming language, and are not checked by the kernel. Tactics might have bugs — they might not always do what it says they will do. However this does not affect our proofs. If my python code had failed to produce valid equations then the Lean code would simply not have compiled, and we would have learnt nothing. But my python code did work, and then the Lean code compiled and then it didn’t matter whether the python code was buggy or not. Similarly tactics can fail — either because the user applies them in a situation where they are not designed to succeed, or because the programmer made some error. However they can never prove something false, because after the tactic produces the code, the kernel checks it, and if the kernel says the code proves the theorem, then who cares where it came from.

This algorithm can manifestly be generalised and one might imagine writing such a tactic in Lean, using unsafe calculations to do the division and remainder, and then checking them in the kernel.

***

Note added 2 days later: Mario beefed up the norm_num tactic and now theorem prime_617 : nat.prime 617 := by norm_num works out of the box! Life is great!

Posted in number theory, tactics | Leave a comment

Quadratic Reciprocity and (p^2-1)/8

The law of quadratic reciprocity. The jewel in the crown of mathematics! Still not proved in Lean!

Clara List and Sangwoo Jo were thinking about this today. They formalised the statement. There are two “supplements” to the law, describing when -1 and 2 are squares mod p. For the case p=2 they used the dreaded version

\left(\frac{2}{p}\right)=(-1)^{\frac{p^2-1}{8}}

which they obtained from a reliable source. Who even came up with that way of expressing what’s going on? Does it ever help? What’s wrong with just writing that it’s +1 if p\equiv 1 or 7 mod 8, and -1 otherwise? When I’ve lectured the basic number theory course at Imperial, it’s this latter statement that comes out in the wash.

So the ring tactic came to the rescue:


import tactic.ring 

lemma quad_res_two_case_three (n : ℕ) : n % 8 = 3 → (n ^ 2 - 1) / 8 % 2 = 1 :=
begin
  intro H,
  have H2 := nat.mod_add_div n 8,
  rw H at H2,
  rw ←H2,
  suffices : ∀ t : ℕ, ((3 + 8 * t) ^ 2 - 1) / 8 % 2 = 1,
    exact this (n/8),
  intro t,
  have H3 : (1 + 2 * (4*t^2 + 3*t)) % 2 = 1,
    rw nat.add_mul_mod_self_left,exact dec_trivial,
  suffices : ((3 + 8 * t) ^ 2 - 1) / 8 = (1 + 2 * (4 * t ^ 2 + 3 * t)),
    rwa this,
  suffices : ((3 + 8 * t) ^ 2 - 1) = 8 * (1 + 2 * (4 * t ^ 2 + 3 * t)),
    rw this,
  rw nat.mul_div_cancel_left _ (show 8 > 0, from dec_trivial),
  ring
end 

I had to write three more of these but I leave them as exercises (or you can find them in our UROP project repo here in the “M3P14” directory).

So what’s going on in this code? Well, we’d like to use the ring tactic to make this easily, unfortunately it’s an assertion about naturals and it involves both subtraction and division, neither of which are defined to be what mathematicians think of them as, so we have to be careful. In particular we can’t just apply ring and hope it will work.

Unsurprisingly, the first step is to write n as 8t+3, reducing the goal to

((3 + 8 * t) ^ 2 - 1) / 8 % 2 = 1

The next trick is to observe that (1 + 2 * (4*t^2 + 3*t)) % 2 = 1 because the thing we’re reducing mod 2 is 1 plus something even. This reduces us to proving

((3 + 8 * t) ^ 2 - 1) / 8 = (1 + 2 * (4 * t ^ 2 + 3 * t))

but ring still won’t do this because we have an integer division here. However nat.mul_div_cancel_left is that if n>0 then n * m / n = m, meaning that it suffices to prove

(3 + 8 * t) ^ 2 - 1 = 8 * (1 + 2 * (4 * t ^ 2 + 3 * t))

and ring is smart enough to know that the last subtraction is not a problem, and closes our goal.

Posted in Learning Lean, number theory | Leave a comment

Summer project

I am running a summer project with some students. I’ve sort-of lost count of how many; I think that around 25 people expressed an interest, but some didn’t start yet. In practice we seem to get around 10-15 people coming every day. I show up on Mondays, Wednesdays and Thursdays, and I think some of the students show up on Tuesdays and Fridays too.

Students can just do anything they want, in Lean. Some students seem to be learning courses which they’re taking next year, some are going through courses they learnt this year, and so on. People tend to be working through example sheets. Some people are getting good quickly. Others are finding it harder. I know from experience (watching people learn) that even if you find it tough initially, things might fall into place later. I seem to spend my entire day answering questions. I absolutely love it.

The project has been running for two and a half weeks. Students have done elementary number theory, elementary complex analysis, group theory, finite-dimensional vector spaces, determinants, some homotopy theory, and some wackier stuff — some of the theory of dots and boxes, and a formalisation of geometry a la Euclid (or more precisely a la Tarski). Formalising problem sheet questions is a popular option. You can see what people are doing here on github. Random people come along — not all Imperial undergraduates. I’ve had two schoolkids coming along, and a Cambridge undergraduate is a regular attendee. Chris Hughes has been immensely helpful, supporting many of the other students, and I’d like to thank him personally here. Others are beginning to become helpful too. I am surprised at how quickly some people are learning. I was scared that two weeks in I’d still be telling people how to install Lean and mathlib, but we have moved well beyond that now.

Some of the mathematics above might sound easy, but there is a whole bunch of basic mathematics still missing from mathlib (theory of finite-dimensional vector spaces and matrices and basic theory of differentiable functions being two notable examples) and I wonder if whether by the end of this we will be in a position to offer some stuff to mathlib. We nearly have the fundamental group of a topological space, and I am pretty sure that we’ll have a bunch more topological stuff soon.

Another cool thing that’s happening is that a bunch of Imperial’s undergraduate example sheets are being formalised. When I was an undergraduate we were just at the point where many lecturers gave out problem sheets which were hand-written or had been typed on a typewriter and then photocopied — but a few were giving out example sheets which had been written in TeX. I wonder whether a similar sort of thing is happening now. The funny thing is that it was the lecturers writing the TeX sheets, whereas this time around it’s the students formalising the example sheets.

I don’t really know where all this is going but somehow I feel like I’m seeing more and more evidence that mathematics undergraduates are able to learn how to use this software. I think the fact that we meet together as a team is really important. I spend a lot of my time answering questions (and occasionally asking them). I had to learn Lean by myself with just the Lean chat to help me — it’s far more efficient to learn as a team. I should make sure that I have a good strategy for teaching the new first years come October. I really should get back to writing my book on Lean for mathematicians, but there are so many other things which need doing. If the students can teach each other then the need to write it becomes, temporarily at least, less pressing.

Posted in General | Leave a comment

(a+b)^3

The problem

This post is about the equation (a+b)^3=a^3+3a^2b+3ab^2+b^3. This special case of the binomial theorem is the sort of thing that we would expect incoming students at my university to know, or at least to be able to compute very quickly. But what actually goes into an axiomatic proof? Whether you think of a and b as real numbers or as elements of an arbitrary commutative ring, we humans are very good at multiplying out the brackets on the left hand side and then collecting up terms and putting them together to form the right hand side.

One reason humans are so good at this is that we have some really good notation available to us. This becomes apparent when we try and verify (a+b)^3=a^3+3a^2b+3ab^2+b^3 using Lean. The ring axiom left_distrib says a * (b + c) = a * b + a * c, and this axiom has a second, more user-friendly, name of mul_add. There is also add_mul for expanding out (a + b) * c. Putting these together we can expand out all the brackets on the left hand side. Well, first we have to expand out the ^3 notation, and because Lean defines ^ recursively with x^0=1 and x^(n+1)=x*x^n we see that x^3 expands out to x*(x*(x*1)). Fortunately mul_one is the theorem saying x*1=x so let’s use mul_one and mul_add and add_mul to expand out our brackets.


import algebra.group_power -- for power notation in rings

example (R : Type) [comm_ring R] (a b : R) :
  (a+b)^3 = a^3+3*a^2*b+3*a*b^2+b^3 :=
begin
  unfold pow monoid.pow, -- LHS now (a + b) * ((a + b) * ((a + b) * 1))
  repeat {rw add_mul},
  repeat {rw mul_add},
  repeat {rw mul_one},

  admit,
end 

What do we have after repeatedly expanding out brackets? We must be close to proving the goal, right? Well, not really. Our goal is now


⊢ a * (a * a) + a * (a * b) + (a * (b * a) + a * (b * b)) +
      (b * (a * a) + b * (a * b) + (b * (b * a) + b * (b * b))) =
    a * (a * a) + 3 * (a * a) * b + 3 * a * (b * b) + b * (b * b)

This mess indicates exactly what humans are good at. We have terms a*(a*b) and a*(b*a) and b*(a*a) on the left hand side, and (a*a)*b on the right hand side: humans instantly simplify all of these to a^2b but to Lean these variants are all different, and we’ll have to invoke theorems (associativity and commutativity of multiplication) to prove that they’re equal. Moreover, we have random other brackets all over the place on the left hand side because the terms are being added up in quite a random order. Humans would not even write any of these brackets, they would happily invoke the fact that p+q+r+s+t+u+v+w=u+(t+p)+(r+(w+q)+(v+s)) without stopping to think how many times one would have to apply associativity and commutativity rules — these things are “obvious to humans”. Humans would take one look at the terms on the left and right hand sides, immediately remove all the brackets, tidy up, and observe that the result was trivial. But asking general purpose automation tools like Lean’s simp to do this is a bit of a tall order. Perhaps one way to start is to remind Lean that 3=1+1+1 and then expand out those new brackets:


example (R : Type) [comm_ring R] (a b : R) :
  (a+b)^3 = a^3+3*a^2*b+3*a*b^2+b^3 :=
begin
  unfold pow monoid.pow,
  repeat {rw add_mul},
  repeat {rw mul_add},
  repeat {rw mul_one},
  -- now expand out 3=1+1+1
  show _ = _ + (1+1+1) * _ * b + (1+1+1) * a * _ + _, 
  -- and now more expanding out brackets
  repeat {rw add_mul},
  repeat {rw one_mul},

  admit
end

The goal is now ⊢ a * (a * a) + a * (a * b) + (a * (b * a) + a * (b * b)) +
(b * (a * a) + b * (a * b) + (b * (b * a) + b * (b * b))) =
a * (a * a) + (a * a * b + a * a * b + a * a * b) + (a * (b * b) + a * (b * b) + a * (b * b)) + b * (b * b)
, which is trivial modulo a gazillion applications of associativity and commutativity of addition and multiplication. But even at this stage, simp [mul_comm,mul_assoc] doesn’t finish the job. What to do?

This problem became a real issue for Chris Hughes, an Imperial undergraduate, who was doing a bunch of algebra in Lean and would constantly run into such goals, maybe even with four unknown variables, and would constantly have to break his workflow to deal with these “trivial to human” proofs.

The solution

Of course, computer scientists have thought about this problem, and there are algorithms present in theorem provers which can deal with these kinds of goals no problem. The most recent work I know of is the 2005 paper by Gregoire and Mahboubi Proving Equalities in a Commutative Ring Done Right in Coq , where they explained how to write an efficient tactic which would solve these sorts of problems. Their algorithm was implemented in Lean by Mario Carneiro as the tactic ring in Lean’s maths library, so we can now write


import tactic.ring 

example (R : Type) [comm_ring R] (a b : R) :
  (a+b)^3 = a^3+3*a^2*b+3*a*b^2+b^3 :=
by ring

and our goal is solved instantly! Experiments show that the tactic seems to prove any random true ring theory lemma you throw at it. You can look at Mario’s code here (permalink) . This is tactic code, so it’s full of intimidating stuff like `meta` and `do` and `expr`, things which an “end-user” such as myself never use — these keywords are for tactic-writers, not tactic-users.

I don’t know too much about tactic writing, and this is for the most part because I have never tried to write a tactic of my own. I’ve looked at Programming In Lean — people point out that this document is unfinished, but all of the important stuff is there: the “unfinished” part is basically all covered in Theorem Proving In Lean. However, as is usual in this game currently, the examples in Programming In Lean are most definitely of a more computer-sciencey flavour. So this leads us to the question: how should a mathematician learn how to write tactics? One method, which I’ve used to learn several other things in my life, is “supervise a Masters student for their project, and give them the job of writing some tactics for mathematicians”, and indeed I will be trying this method in October. The reason this method is so effective is that it forces me to learn the material myself so I don’t let the student down. In short, I had better learn something about how to write tactics. So how does `tactic.ring` work?

The ring tactic

Looking at the code, I realised that it didn’t look too intimidating. I had looked at Mahboubi–Gregoire before, several months ago, and at that time (when I knew far far less about Lean) I could not see the wood for the trees. But when I revisited the paper this week it made a whole lot more sense to me. The main point of the Mahboubi–Gregoire paper is to explain how to write an efficient ring tactic, but clearly I am not at that stage yet — I just want to know how to write any ring tactic! I decided to start small, and to try and write a tactic that could prove basic stuff like (d+1)^2=d^2+2d+1 for d\in\mathbb{Z} automatically, or at least to try and understand what went into writing such a tactic. And actually, it turns out to be relatively easy, especially if someone like Mario shows up on Zulip and writes some meta code for you.

Here is the basic idea. If d were actually an explicit fixed int, then proving (d+1)^2=d^2+2*d+1 would be easy — you just evaluate both sides and check that they’re equal. The problem is that d is an unknown integer, so we cannot evaluate both sides explictly. However, we are lucky in that the reason the equality above is true is because it can be deduced from the axioms of a ring, even if d is considered to be an unknown variable. More formally, I am saying that in the polynomial ring \mathbb{Z}[X] we have the identity (X+1)^2=X^2+2X+1, and if we can prove this, then we can evaluate both sides at X=d and deduce the result we need.

This gives us our strategy for proving (d+1)^2=d^2+2*d+1 using a tactic:

1. Make a type representing a polynomial ring \mathbb{Z}[X] in one variable in Lean.

2. Prove that the “evaluation at d” map ev(d) : \mathbb{Z}[X] \to \mathbb{Z} is a ring homomorphism.

3. Now given the input (d+1)^2=d^2+2*d+1, construct the equation (X+1)^2=X^2+2*X+1 as an equality in int[X], prove this, and deduce the result by evaluating at d.

This is the basic idea. There are all sorts of subtleties in turning this into an efficient implementation and I will not go into these now; I just want to talk about making this sort of thing work at all. So what about these three points above? The first two are the sort of thing I now find pretty straightforward in Lean. On the other hand, the third point is, I believe, perhaps impossible to do in Lean without using tactics. The int (d+1)^2 was built using addition and the power map, but we now want to write some sort of function which says “if the last thing you did to get this int was to square something, then square some corresponding polynomial” and so on — one needs more than the int (d+1)^2 — one actually needs access to the underlying expr. There, I said it. And expr is a weird type — if you try definition expr_id : expr → expr := id, Lean produces the error invalid definition, it uses untrusted declaration 'expr'. We have to go into meta-land to access the expr — the land of untrusted code.

Writing a baby ring tactic

Two thirds of this is now within our grasp (the first two points above) and we will borrow Mario for the third one. The approach we use is called “reflection”. Every operator comes with a cost, so (out of laziness) I will stick to + and * and not implement ^ or -. It’s also far easier to stick to one unknown variable, so instead of tackling (a+b)^3 I will stick to (x+1)^3 as our test case. The full code is here (permalink)

I should also heartily thank Mario Carneiro, who wrote all the meta code.

We start by creating a concrete polynomial type, represented as a list of coefficients. We use znum (which is just a more efficient version of int when it comes to calculations rather than proofs) for our coefficients, because it’s best practice.


def poly := list znum

For example [1,2,3] represents 1+2X+3X^2. We define addition


def poly.add : poly → poly → poly
| [] g := g
| f [] := f
| (a :: f') (b :: g') := (a + b) :: poly.add f' g'

and scalar multiplication, multiplication, constants, and our variable X.

We have the wrong notion of equality: the polynomials [1,2,3] and [1,2,3,0] represent the same polynomial. Rather than defining an equivalence relation, we just define poly.is_eq, defined as "the lists are equal when you throw away the zeros on the end", and when I say two polynomials "are equal" I mean they're equal modulo this equivalence relation.

We define evaluation of a polynomial at an element of a ring:


def poly.eval {α} [comm_ring α] (X : α) : poly → α
| [] := 0
| (n::l) := n + X * poly.eval l

and then have to prove all the lemmas saying that polynomial evaluation respects equality (in the sense above) and addition and multiplication. For example


@[simp] theorem poly.eval_mul {α} [comm_ring α] (X : α) : ∀ p₁ p₂ : poly,
  (p₁.mul p₂).eval X = p₁.eval X * p₂.eval X := ...

We are now 3/4 of the way through the code, everything has been relatively straightforward, and there hasn't even been any meta stuff. Now it starts. We make an "abstract" polynomial ring (with a hint that we're going to be using reflection later on in the attribute).


@[derive has_reflect]
inductive ring_expr : Type
| add : ring_expr → ring_expr → ring_expr
| mul : ring_expr → ring_expr → ring_expr
| const : znum → ring_expr
| X : ring_expr

The problem with this abstract ring is that (X+1)*(X+1) and X*X+2*X+1 are different terms, whereas they are both the same list [1,2,1] in the concrete polynomial ring. We define an evaluation from the abstract to the concrete polynomial ring:


def to_poly : ring_expr → poly
| (ring_expr.add e₁ e₂) := (to_poly e₁).add (to_poly e₂)
| (ring_expr.mul e₁ e₂) := (to_poly e₁).mul (to_poly e₂)
| (ring_expr.const z) := poly.const z
| ring_expr.X := poly.X

and then define evaluation of a polynomial in the abstract ring and prove it agrees with evaluation on the concrete ring. This is all relatively straightforward (and probably a good exercise for learners!).

The big theorem: if the concrete polynomials are equal, then the abstract ones, evaluated at the same value, return the same value.


theorem main_thm {α} [comm_ring α] (X : α) (e₁ e₂) {x₁ x₂}
  (H : poly.is_eq (to_poly e₁) (to_poly e₂)) (R1 : e₁.eval X = x₁) (R2 : e₂.eval X = x₂) : x₁ = x₂ :=
by rw [← R1, ← R2, ← to_poly_eval,poly.eval_is_eq X H, to_poly_eval]

We are nearly there, and the only intimidating line of code so far was the @[derive has_reflect] line before the abstract definition. But now we go one more stage of abstract: we now define a reflection map, sending an expr (like (d+1)^2) to an abstract polynomial (like (X+1)^2). This is precisely the part we cannot do unless we're in meta-land. Note that this code will fail if the expr does not represent a polynomial, but that's OK becuase it's a meta definition.


meta def reflect_expr (X : expr) : expr → option ring_expr
| `(%%e₁ + %%e₂) := do
  p₁ ← reflect_expr e₁,
  p₂ ← reflect_expr e₂,
  return (ring_expr.add p₁ p₂)
| `(%%e₁ * %%e₂) := do
  p₁ ← reflect_expr e₁,
  p₂ ← reflect_expr e₂,
  return (ring_expr.mul p₁ p₂)
| e := if e = X then return ring_expr.X else
  do n ← expr.to_int e,
     return (ring_expr.const (znum.of_int' n))

After those few lines of meta, we are actually ready to define the tactic!


meta def ring_tac (X : pexpr) : tactic unit := do
  X ← to_expr X,
  `(%%x₁ = %%x₂) ← target,
  r₁ ← reflect_expr X x₁,
  r₂ ← reflect_expr X x₂,
  let e₁ : expr := reflect r₁,
  let e₂ : expr := reflect r₂,
  `[refine main_thm %%X %%e₁ %%e₂ rfl _ _],
  all_goals `[simp only [ring_expr.eval,
    znum.cast_pos, znum.cast_neg, znum.cast_zero',
    pos_num.cast_bit0, pos_num.cast_bit1,
    pos_num.cast_one']]

The code takes in a variable x, then assumes that the goal is of the form f(x)=g(x), attempts to build two terms f(X) and g(X) in the abstract ring, proves the corresponding lists of coefficients in the concrete ring are equal using rfl, and deduces that the abstract polynomials evaluated at x are equal. If something doesn't work, it fails, and in practice this just means the tactic fails.

And it works!


example (x : ℤ) : (x + 1) * (x + 1) = x*x+2*x+1 := by do ring_tac ```(x)

example (x : ℤ) : (x + 1) * (x + 1) * (x + 1) = x*x*x+3*x*x+3*x+1 := by do ring_tac ```(x) 

example (x : ℤ) : (x + 1) + ((-1)*x + 1) = 2 := by do ring_tac ```(x) 

The last example would fail if we hadn't used the correct version of equality for the lists, because [2,0] and [2] are different lists -- but the corresponding polynomials are equal.

And that's how to write a ring tactic! Many thanks to Mario Carneiro for doing much of the planning and also writing the meta code.

Posted in General, Learning Lean, tactics | Leave a comment

Tonbridge

Going down to Tonbridge to speak to the UK IMO squad. I will give one talk about why 144 is the largest square in the Fibonacci sequence, and then one talk about why 144 is the largest square in the Fibonacci sequence.

For those attending the talk, here are some quick links.

A link to the Fibonacci project is above.

The project is written in a language called Lean. You can try Lean live within your browser at this link — this is the Lean Web Editor and it’s really slow, but it’s good enough to start experimenting with.

Here’s a link to Lean’s home page. You can
find documentation on how to install Lean on a computer there,
as well as the great instruction manual Theorem Proving In Lean, which taught me loads.

There are also instructions about how to install Lean and mathlib on a computer here.

Here’s a link to Xena, the project run by myself and the undergraduates at Imperial College London.

Here’s a link to mathlib, the maths library we’re building.

And last but by no means least, here’s the zulip chat room where experts ask and answer questions about mathematics in Lean.

Posted in Uncategorized | Leave a comment