## The problem

This post is about the equation $(a+b)^3=a^3+3a^2b+3ab^2+b^3$. This special case of the binomial theorem is the sort of thing that we would expect incoming students at my university to know, or at least to be able to compute very quickly. But what actually goes into an axiomatic proof? Whether you think of $a$ and $b$ as real numbers or as elements of an arbitrary commutative ring, we humans are very good at multiplying out the brackets on the left hand side and then collecting up terms and putting them together to form the right hand side.

One reason humans are so good at this is that we have some really good notation available to us. This becomes apparent when we try and verify $(a+b)^3=a^3+3a^2b+3ab^2+b^3$ using Lean. The ring axiom left_distrib says a * (b + c) = a * b + a * c, and this axiom has a second, more user-friendly, name of mul_add. There is also add_mul for expanding out (a + b) * c. Putting these together we can expand out all the brackets on the left hand side. Well, first we have to expand out the ^3 notation, and because Lean defines ^ recursively with x^0=1 and x^(n+1)=x*x^n we see that x^3 expands out to x*(x*(x*1)). Fortunately mul_one is the theorem saying x*1=x so let’s use mul_one and mul_add and add_mul to expand out our brackets.


import algebra.group_power -- for power notation in rings

example (R : Type) [comm_ring R] (a b : R) :
(a+b)^3 = a^3+3*a^2*b+3*a*b^2+b^3 :=
begin
unfold pow monoid.pow, -- LHS now (a + b) * ((a + b) * ((a + b) * 1))
repeat {rw mul_one},

end


What do we have after repeatedly expanding out brackets? We must be close to proving the goal, right? Well, not really. Our goal is now


⊢ a * (a * a) + a * (a * b) + (a * (b * a) + a * (b * b)) +
(b * (a * a) + b * (a * b) + (b * (b * a) + b * (b * b))) =
a * (a * a) + 3 * (a * a) * b + 3 * a * (b * b) + b * (b * b)


This mess indicates exactly what humans are good at. We have terms a*(a*b) and a*(b*a) and b*(a*a) on the left hand side, and (a*a)*b on the right hand side: humans instantly simplify all of these to $a^2b$ but to Lean these variants are all different, and we’ll have to invoke theorems (associativity and commutativity of multiplication) to prove that they’re equal. Moreover, we have random other brackets all over the place on the left hand side because the terms are being added up in quite a random order. Humans would not even write any of these brackets, they would happily invoke the fact that p+q+r+s+t+u+v+w=u+(t+p)+(r+(w+q)+(v+s)) without stopping to think how many times one would have to apply associativity and commutativity rules — these things are “obvious to humans”. Humans would take one look at the terms on the left and right hand sides, immediately remove all the brackets, tidy up, and observe that the result was trivial. But asking general purpose automation tools like Lean’s simp to do this is a bit of a tall order. Perhaps one way to start is to remind Lean that 3=1+1+1 and then expand out those new brackets:


example (R : Type) [comm_ring R] (a b : R) :
(a+b)^3 = a^3+3*a^2*b+3*a*b^2+b^3 :=
begin
unfold pow monoid.pow,
repeat {rw mul_one},
-- now expand out 3=1+1+1
show _ = _ + (1+1+1) * _ * b + (1+1+1) * a * _ + _,
-- and now more expanding out brackets
repeat {rw one_mul},

end


The goal is now ⊢ a * (a * a) + a * (a * b) + (a * (b * a) + a * (b * b)) + (b * (a * a) + b * (a * b) + (b * (b * a) + b * (b * b))) = a * (a * a) + (a * a * b + a * a * b + a * a * b) + (a * (b * b) + a * (b * b) + a * (b * b)) + b * (b * b), which is trivial modulo a gazillion applications of associativity and commutativity of addition and multiplication. But even at this stage, simp [mul_comm,mul_assoc] doesn’t finish the job. What to do?

This problem became a real issue for Chris Hughes, an Imperial undergraduate, who was doing a bunch of algebra in Lean and would constantly run into such goals, maybe even with four unknown variables, and would constantly have to break his workflow to deal with these “trivial to human” proofs.

## The solution

Of course, computer scientists have thought about this problem, and there are algorithms present in theorem provers which can deal with these kinds of goals no problem. The most recent work I know of is the 2005 paper by Gregoire and Mahboubi Proving Equalities in a Commutative Ring Done Right in Coq , where they explained how to write an efficient tactic which would solve these sorts of problems. Their algorithm was implemented in Lean by Mario Carneiro as the tactic ring in Lean’s maths library, so we can now write


import tactic.ring

example (R : Type) [comm_ring R] (a b : R) :
(a+b)^3 = a^3+3*a^2*b+3*a*b^2+b^3 :=
by ring


and our goal is solved instantly! Experiments show that the tactic seems to prove any random true ring theory lemma you throw at it. You can look at Mario’s code here (permalink) . This is tactic code, so it’s full of intimidating stuff like meta and do and expr, things which an “end-user” such as myself never use — these keywords are for tactic-writers, not tactic-users.

I don’t know too much about tactic writing, and this is for the most part because I have never tried to write a tactic of my own. I’ve looked at Programming In Lean — people point out that this document is unfinished, but all of the important stuff is there: the “unfinished” part is basically all covered in Theorem Proving In Lean. However, as is usual in this game currently, the examples in Programming In Lean are most definitely of a more computer-sciencey flavour. So this leads us to the question: how should a mathematician learn how to write tactics? One method, which I’ve used to learn several other things in my life, is “supervise a Masters student for their project, and give them the job of writing some tactics for mathematicians”, and indeed I will be trying this method in October. The reason this method is so effective is that it forces me to learn the material myself so I don’t let the student down. In short, I had better learn something about how to write tactics. So how does tactic.ring work?

## The ring tactic

Looking at the code, I realised that it didn’t look too intimidating. I had looked at Mahboubi–Gregoire before, several months ago, and at that time (when I knew far far less about Lean) I could not see the wood for the trees. But when I revisited the paper this week it made a whole lot more sense to me. The main point of the Mahboubi–Gregoire paper is to explain how to write an efficient ring tactic, but clearly I am not at that stage yet — I just want to know how to write any ring tactic! I decided to start small, and to try and write a tactic that could prove basic stuff like $(d+1)^2=d^2+2d+1$ for $d\in\mathbb{Z}$ automatically, or at least to try and understand what went into writing such a tactic. And actually, it turns out to be relatively easy, especially if someone like Mario shows up on Zulip and writes some meta code for you.

Here is the basic idea. If d were actually an explicit fixed int, then proving (d+1)^2=d^2+2*d+1 would be easy — you just evaluate both sides and check that they’re equal. The problem is that d is an unknown integer, so we cannot evaluate both sides explictly. However, we are lucky in that the reason the equality above is true is because it can be deduced from the axioms of a ring, even if d is considered to be an unknown variable. More formally, I am saying that in the polynomial ring $\mathbb{Z}[X]$ we have the identity $(X+1)^2=X^2+2X+1$, and if we can prove this, then we can evaluate both sides at $X=d$ and deduce the result we need.

This gives us our strategy for proving (d+1)^2=d^2+2*d+1 using a tactic:

1. Make a type representing a polynomial ring $\mathbb{Z}[X]$ in one variable in Lean.

2. Prove that the “evaluation at $d$” map $ev(d) : \mathbb{Z}[X] \to \mathbb{Z}$ is a ring homomorphism.

3. Now given the input (d+1)^2=d^2+2*d+1, construct the equation (X+1)^2=X^2+2*X+1 as an equality in int[X], prove this, and deduce the result by evaluating at d.

This is the basic idea. There are all sorts of subtleties in turning this into an efficient implementation and I will not go into these now; I just want to talk about making this sort of thing work at all. So what about these three points above? The first two are the sort of thing I now find pretty straightforward in Lean. On the other hand, the third point is, I believe, perhaps impossible to do in Lean without using tactics. The int (d+1)^2 was built using addition and the power map, but we now want to write some sort of function which says “if the last thing you did to get this int was to square something, then square some corresponding polynomial” and so on — one needs more than the int (d+1)^2 — one actually needs access to the underlying expr. There, I said it. And expr is a weird type — if you try definition expr_id : expr → expr := id, Lean produces the error invalid definition, it uses untrusted declaration 'expr'. We have to go into meta-land to access the expr — the land of untrusted code.

## Writing a baby ring tactic

Two thirds of this is now within our grasp (the first two points above) and we will borrow Mario for the third one. The approach we use is called “reflection”. Every operator comes with a cost, so (out of laziness) I will stick to + and * and not implement ^ or -. It’s also far easier to stick to one unknown variable, so instead of tackling $(a+b)^3$ I will stick to $(x+1)^3$ as our test case. The full code is here (permalink)

I should also heartily thank Mario Carneiro, who wrote all the meta code.

We start by creating a concrete polynomial type, represented as a list of coefficients. We use znum (which is just a more efficient version of int when it comes to calculations rather than proofs) for our coefficients, because it’s best practice.


def poly := list znum

 

For example [1,2,3] represents $1+2X+3X^2$. We define addition


def poly.add : poly → poly → poly
| [] g := g
| f [] := f
| (a :: f') (b :: g') := (a + b) :: poly.add f' g'

 

and scalar multiplication, multiplication, constants, and our variable X.

We have the wrong notion of equality: the polynomials [1,2,3] and [1,2,3,0] represent the same polynomial. Rather than defining an equivalence relation, we just define poly.is_eq, defined as "the lists are equal when you throw away the zeros on the end", and when I say two polynomials "are equal" I mean they're equal modulo this equivalence relation.

We define evaluation of a polynomial at an element of a ring:


def poly.eval {α} [comm_ring α] (X : α) : poly → α
| [] := 0
| (n::l) := n + X * poly.eval l


and then have to prove all the lemmas saying that polynomial evaluation respects equality (in the sense above) and addition and multiplication. For example


@[simp] theorem poly.eval_mul {α} [comm_ring α] (X : α) : ∀ p₁ p₂ : poly,
(p₁.mul p₂).eval X = p₁.eval X * p₂.eval X := ...


We are now 3/4 of the way through the code, everything has been relatively straightforward, and there hasn't even been any meta stuff. Now it starts. We make an "abstract" polynomial ring (with a hint that we're going to be using reflection later on in the attribute).


@[derive has_reflect]
inductive ring_expr : Type
| add : ring_expr → ring_expr → ring_expr
| mul : ring_expr → ring_expr → ring_expr
| const : znum → ring_expr
| X : ring_expr


The problem with this abstract ring is that $(X+1)*(X+1)$ and $X*X+2*X+1$ are different terms, whereas they are both the same list [1,2,1] in the concrete polynomial ring. We define an evaluation from the abstract to the concrete polynomial ring:


def to_poly : ring_expr → poly
| (ring_expr.mul e₁ e₂) := (to_poly e₁).mul (to_poly e₂)
| (ring_expr.const z) := poly.const z
| ring_expr.X := poly.X


and then define evaluation of a polynomial in the abstract ring and prove it agrees with evaluation on the concrete ring. This is all relatively straightforward (and probably a good exercise for learners!).

The big theorem: if the concrete polynomials are equal, then the abstract ones, evaluated at the same value, return the same value.


theorem main_thm {α} [comm_ring α] (X : α) (e₁ e₂) {x₁ x₂}
(H : poly.is_eq (to_poly e₁) (to_poly e₂)) (R1 : e₁.eval X = x₁) (R2 : e₂.eval X = x₂) : x₁ = x₂ :=
by rw [← R1, ← R2, ← to_poly_eval,poly.eval_is_eq X H, to_poly_eval]



We are nearly there, and the only intimidating line of code so far was the @[derive has_reflect] line before the abstract definition. But now we go one more stage of abstract: we now define a reflection map, sending an expr (like (d+1)^2) to an abstract polynomial (like (X+1)^2). This is precisely the part we cannot do unless we're in meta-land. Note that this code will fail if the expr does not represent a polynomial, but that's OK becuase it's a meta definition.


meta def reflect_expr (X : expr) : expr → option ring_expr
| (%%e₁ + %%e₂) := do
p₁ ← reflect_expr e₁,
p₂ ← reflect_expr e₂,
| (%%e₁ * %%e₂) := do
p₁ ← reflect_expr e₁,
p₂ ← reflect_expr e₂,
return (ring_expr.mul p₁ p₂)
| e := if e = X then return ring_expr.X else
do n ← expr.to_int e,
return (ring_expr.const (znum.of_int' n))


After those few lines of meta, we are actually ready to define the tactic!


meta def ring_tac (X : pexpr) : tactic unit := do
X ← to_expr X,
(%%x₁ = %%x₂) ← target,
r₁ ← reflect_expr X x₁,
r₂ ← reflect_expr X x₂,
let e₁ : expr := reflect r₁,
let e₂ : expr := reflect r₂,
[refine main_thm %%X %%e₁ %%e₂ rfl _ _],
all_goals [simp only [ring_expr.eval,
znum.cast_pos, znum.cast_neg, znum.cast_zero',
pos_num.cast_bit0, pos_num.cast_bit1,
pos_num.cast_one']]


The code takes in a variable x, then assumes that the goal is of the form f(x)=g(x), attempts to build two terms f(X) and g(X) in the abstract ring, proves the corresponding lists of coefficients in the concrete ring are equal using rfl, and deduces that the abstract polynomials evaluated at x are equal. If something doesn't work, it fails, and in practice this just means the tactic fails.

And it works!


example (x : ℤ) : (x + 1) * (x + 1) = x*x+2*x+1 := by do ring_tac (x)

example (x : ℤ) : (x + 1) * (x + 1) * (x + 1) = x*x*x+3*x*x+3*x+1 := by do ring_tac (x)

example (x : ℤ) : (x + 1) + ((-1)*x + 1) = 2 := by do ring_tac (x)


The last example would fail if we hadn't used the correct version of equality for the lists, because [2,0] and [2] are different lists -- but the corresponding polynomials are equal.

And that's how to write a ring tactic! Many thanks to Mario Carneiro for doing much of the planning and also writing the meta code.

## Tonbridge

Going down to Tonbridge to speak to the UK IMO squad. I will give one talk about why 144 is the largest square in the Fibonacci sequence, and then one talk about why 144 is the largest square in the Fibonacci sequence.

For those attending the talk, here are some quick links.

A link to the Fibonacci project is above.

The project is written in a language called Lean. You can try Lean live within your browser at this link — this is the Lean Web Editor and it’s really slow, but it’s good enough to start experimenting with.

find documentation on how to install Lean on a computer there,
as well as the great instruction manual Theorem Proving In Lean, which taught me loads.

There are also instructions about how to install Lean and mathlib on a computer here.

Here’s a link to Xena, the project run by myself and the undergraduates at Imperial College London.

Here’s a link to mathlib, the maths library we’re building.

And last but by no means least, here’s the zulip chat room where experts ask and answer questions about mathematics in Lean.

## Function composition

Here’s some musing about functions, finishing with a cool trick which Sebastian Ullrich showed me. If you don’t know what ((∘) ∘ (∘)) means, read on!

## Lambdas

The “functional programming” way of thinking about functions is not usually taught to mathematicians, but it’s easy to understand. In mathematics we sometimes write $x\mapsto x^2$ to denote the function sending $x$ to $x^2$. Computer scientists prefer “lambda notation”, namely $\lambda x, x^2$, or λ x, x^2. Computer scientists will use lambda notation for multivariable functions too — whereas a mathematician might write $(x,y)\mapsto x+y$ to denote the addition function, in lambda notation this is just λ x y, x + y.

## Currying

Another trick that computer scientists seem to like is the process of “currying” functions. A mathematician might say that addition is a function $\mathbb{R}^2\to\mathbb{R}$, because it takes as input a pair $(x,y)$ of real numbers and outputs one real number, namely their sum. Computer scientists like to think of this function in a slightly different way — they would say that addition was a function ℝ → (ℝ → ℝ). What they mean by this is that the function takes as input some real number $x$ and returns a function from the reals to the reals, namely the function sending $y$ to $x+y$. It’s not difficult to convince yourself that these two ways of looking at the situation are completely equivalent. You could think of that first arrow as the function sending the real number $x$ to the function $y\mapsto x+y$. For example it sends the real number $3$ to the function $y\mapsto 3+y$. Computer scientists even have a cool convention about bracketing of arrows, where they can even write ℝ → ℝ → ℝ, and this means ℝ → (ℝ → ℝ) by definition (even though it looks like a monstrosity to a mathematician).

## Infix and prefix

Historically the symbols for functions like addition were written in between the numbers you were adding. This is called “infix notation”. When defining new functions, this is usually not the notation chosen. For example if I define a function $f$ sending a pair $(x,y)$ of real numbers to $x^2+y^2$ I would write $f(x,y)=x^2+y^2$, and here the notation $f$ for the function goes before the $x$ and $y$ rather than in the middle. There’s a trick, used in Haskell and, I discovered this week, also in Lean, to change an “infix notation” function like $+$ to a “prefix notation” function like $f$ — you simply put brackets around the operator. So for example (+) x y just means x + y.

Now here’s a cool thing about lambda notation. Let’s go back to the function $x\mapsto 3 + x$, sending our input variable $x$ to $3 + x$. Using lambda notation I could just write

λ x, 3 + x

Using the infix/prefix trick, I could even write

λ x, (+) 3 x.

Writing it this way, you see that this function takes x as an input and then it returns f x, where f is the slightly weird-looking (to a mathematician) function (+) 3. So in fact we could just write this function as

(+) 3

and indeed you can see in Lean that this works:


def f := (+) 3

#eval f 2


prints 5 as the result of the evaluation. Click here to check this using the Lean Web Editor.

## Composition

Function composition is pretty easy to understand. If $X,Y,Z$ are sets and we have functions $f: X \to Y$ and $g : Y \to Z$, then their composition is the function from $X$ to $Z$ sending $x$ to $g(f(x))$. There is notation for this in mathematics — the “\circ” command in LaTeX. We write $h = g \circ f$ and then $h(x)$ is defined to be $g(f(x))$. In Lean one would write


variables {X Y Z : Type} {f : X → Y} {g : Y → Z}

def h : X → Z := g ∘ f

example (x : X) : h x = g (f x) := rfl


The rfl just means “this is true by definition”, and the code compiles, meaning that indeed h x is by definition g (f x). Note that those brackets are necessary — if we wrote g f x then Lean would try to evaluate g at f and would give an error, because f is not an element of X.

We see that ∘ is another example of an infix operator, but in this case the associated prefix operator (∘) has an actual name, namely function.comp, as one can see in Lean by typing #print notation ∘ (to get a ∘ you can type \circ just like in LaTeX).

## The fun part

Removing lambdas from code is kind of cool so I try to use it when I can. Any code which looks like h := λ x, blah_blah_blah x can often be shortened to just h := blah_blah_blah. But for this trick to work the x really must be the last thing written, with no brackets. For example, if I wanted to do this trick with h := λ x, g (f x) I can’t just remove the lambda and the xs, because of that close bracket — I have to use function composition notation and turn the code into h := λ x, (g ∘ f) x before I can do the trick and rewrite as h := (g ∘ f) or even h := g ∘ f.

Recently when writing some Lean code I ran into a version of this situation with more than one variable though, and pulling off the trick was harder. I had sets W X Y Z and functions f : W → X → Y (remember this just how a computer scientist writes $f : W \times X \to Y$) and g : Y → Z and I wanted to compose them to get a function $(g \circ f) : W \times X \to Z$, except I wanted the curried version W → X → Z, defined in lambda notation as λ w x, g (f w x). Because we’re using the curried version, I couldn’t use function composition! You see, g ∘ f only works if the target of f is the source of g, and here the target of f is the functions X → Y whereas the source of g is Y. How could I remove the lambda?

Sebastian Ullrich pointed out the following extremely devious trick to me though on the Lean chat — the function I want is just ((∘) ∘ (∘)) g f! And indeed he was right:


variables {W X Y Z : Type} {f : W → X → Y} {g : Y → Z}

def h : W → X → Z := ((∘) ∘ (∘)) g f

example (w : W) (x : X) : h w x = g (f w x) := rfl


Try it here using the Lean Web Editor if you, like me, couldn’t initially believe that this monstrosity even made sense.

Can you prove that ((∘) ∘ (∘)) g f does what Lean says it does? If you type #check ((∘) ∘ (∘)) into Lean you’ll see Lean’s take on what it thinks of this funny construction.

Recall that (∘) A B C is the same as (A ∘ B) C, which is the same as A (B C). Recall also that Lean evaluates things from left to right, so (f g) h means the same as f g h. Now see if you can evaluate ((∘) ∘ (∘)) g f w x! Spoiler alert — I give the first move below.

## A hint for the solution

Here’s the first move — ((∘) ∘ (∘)) g f w x = (∘) ((∘) g) f w x.

## Group Theory revision

To all Imperial students — welcome back and good luck in your exams.

I was looking at the axioms for a group in Lean today. If you fire up Lean and type #print group (e.g. by clicking here and then waiting for a few seconds or more until it stops saying “(running…)” at the top), you can see what Lean thinks a group is. There’s the usual notation for a group (it has multiplication a * b, and identity 1 and inverses a⁻¹), and then some axioms:


mul_assoc    : ∀ (a b c : G), a * b * c = a * (b * c))
one_mul      : ∀ (a : G), 1 * a = a)
mul_one      : ∀ (a : G), a * 1 = a)
mul_left_inv : ∀ (a : G), a⁻¹ * a = 1)


Now I was lucky enough to be taught group theory by the great J G Thompson, and one of the things he taught me was that you don’t need four axioms to do groups, you can get away with three. I thought this would be fun to check in Lean! Before you read the next couple of paragraphs you might want to spend some time thinking about which of the four axioms you think is implied by the others, if you don’t know already. Feel free to delete one axiom and then try to reprove it from the others. I should perhaps also add that Lean’s group has four axioms because of an artificial consequence of the way it was built from other structures, and it doesn’t matter at all that this fourth axiom is there.

Before I go on, let me tell you an interesting example of something that isn’t a group, which Thompson told us about in his lectures. It’s a set $G$ with more than one element, and equipped with the function $* : G \times G \to G$ defined by $g*h=g$. We can check (either using pen and paper, or with Lean) that this law is not a group law, but if we define $a^{-1}=1$ it does satisfy all but one of the axioms. The axiom it fails to satisfy is one_mul, which must hence be an essential axiom.

The axiom which is implied by the other is actually mul_one. Let me formally explain what I mean by this. Lean already has groups defined and the class of groups is called group. So let me in Lean define a new class group' (note the dash) by removing mul_one and just using the other axioms:


class group' (G : Type) extends has_group_notation G :=
(mul_assoc : ∀ (a b c : G), a * b * c = a * (b * c))
(one_mul : ∀ (a : G), 1 * a = a)
(mul_left_inv : ∀ (a : G), a⁻¹ * a = 1)


and now we can try and prove this:

theorem mul_one : ∀ (a : G), a * 1 = a := sorry

The way I was taught to do this was first by proving mul_left_cancel, the fact that you can cancel on the left when doing multiplication:


lemma mul_left_cancel : ∀ (a b c : G), a * b = a * c → b = c :=
λ (a b c : G) (Habac : a * b = a * c), -- got to deduce b = c.
calc b = 1 * b         : by rw one_mul
... = (a⁻¹ * a) * b : by rw mul_left_inv
... = a⁻¹ * (a * b) : by rw mul_assoc
... = a⁻¹ * (a * c) : by rw Habac
... = (a⁻¹ * a) * c : by rw mul_assoc
... = 1 * c         : by rw mul_left_inv
... = c : by rw one_mul


Using this you can now probably finish the proof yourself:


theorem mul_one : ∀ (a : G), a * 1 = a :=
begin
intro a, -- goal is a * 1 = a
apply mul_left_cancel a⁻¹, -- goal now a⁻¹ * (a * 1) = a⁻¹ * a
exact calc a⁻¹ * (a * 1) = ?       : by rw ?
...               = ?       : by rw ...
...
...               = a⁻¹ * a : by rw ?
end


Just click here to run a version of Lean in your browser where you can try and finish the job yourself. Alternatively download the source code (and the solutions) at Xena’s github site and run it on your own computer — it’s faster.

Are there any example sheet or exam group theory problems which you are interested in seeing being solved in Lean? Let me know in the comments.

## Proofs by induction

Term is over and I have spent the entire day thinking about induction. I was mainly concerned about the fact that induction somehow seemed hard for mathematicians to do in Lean. Even proving that the sum of the first n odd positive integers was n^2 seemed to be complicated. But I now know several ways of doing it, and I want to see if I can guide you to coming up with one yourself. Note that I think these Lean exercises are hard for mathematicians, because unfortunately mathematicians (correctly in my opinion) would regard the statement that the sum of the first $n$ odd positive integers was $n^2$ as “trivial by induction”. Let me try and show you several ways of doing induction in Lean. Not because I want to know several ways to do it, but, unfortunately, because I fear that in practice people really might occasionally find themselves in a situation where they need a new kind of proof by induction, and if they read this then maybe afterwards they’ll understand the exact statements that they need to prove to get induction working.

This entire exercise is available for the Lean Web Editor by the way. It’s also available in the Xena project (in the blog directory) if you want to download it to run in a local copy of Lean (which is highly recommended).

OK so let’s start with the two functions in our induction proof:


def odd : ℕ → ℕ := λ i, 2 * i + 1
def square : ℕ → ℕ := λ i, i ^ 2


You can play around with them, check $odd(2)=5$ by typing #reduce odd 2
and you can check that $1000^2=1000000$ by typing #eval square 1000 (for extra credit: what happens if you type #reduce square 1000? Why?). It’s well-known that the sum of the first $n$ positive odd integers is $n^2$ and it’s well-known that one can prove it by induction, with the key inductive step being that $d^2 + (2d+1)=(d+1)^2$. We can formalize this neatly in Lean:


theorem odd_square_inductive_step (d : ℕ) : square d + odd d = square (succ d)


and one can prove it relatively easily, using the ring tactic, as one can see in the downloadable file.

What is left is the question on how we are going to write the function which takes $n$ as input and writes down the sum $odd(0)+odd(1)+\cdots+odd(n-1)$. Then everything should be straightforward. But there are several ways to define this function!

Exercise 1.

Define a Lean function which takes as input a function f : ℕ → ℕ and returns the function which sends a natural number n to $f(0)+f(1)+f(2)+...+f(n-1)$.


definition my_sum_to_n (summand : ℕ → ℕ) : ℕ → ℕ :=
sorry -- delete this and then fill in your definition here.


Exercise 2.

The function you defined comes with a law of induction, and you now need to prove the zero and successor results for your function.


theorem my_zero_theorem (summand : ℕ → ℕ) : my_sum_to_n summand 0 = 0 := sorry

theorem my_successor_theorem (summand : ℕ → ℕ) (n : ℕ) :
my_sum_to_n summand (nat.succ n) = my_sum_to_n summand n + summand n := sorry


Exercise 3

Now can you prove my_odd_square_theorem, the fact that the sum of the first $n$ odd positive integers is $n^2$?


theorem my_odd_square_theorem (n : ℕ) : my_sum_to_n odd n = square n := sorry


Exercise 4

Finally, find someone who has given a completely different definition of my_sum_to_n (try looking in the comments below, for example) and


definition other_sum_to_n (summand : ℕ → ℕ) : ℕ → ℕ := sorry


and now see if you can prove that their definition is the same as yours.


theorem consistency : my_sum_to_n = other_sum_to_n := sorry


[You might want to start
begin apply funext,intro f,apply funext,intro n,…]

I will give some possible definitions for my_sum_to_n in the comments, if nobody else gets there first.

Posted in General | 3 Comments

## No confusion over no_confusion

I’ve always been a bit confused about no_confusion. Functions with names like this are occasionally applied in Theorem Proving In Lean but with little explanation (for example in section 10.1 of TPIL they use it twice to prove 2 ≠ 1). Of course you can always try and find out your own explanations, but in the past I’ve struggled trying to get to the bottom of no_confusion_type with no docs and no hints as to what its purpose is. But Chris Hughes and I sat down last night and tried to figure out what was going on, and I think we cracked it. So here is a hopefully unconfused explanation of at least some of the aspects of no_confusion. Before I start on it though, here are some facts about inductive types.

## Free gifts with every inductive type.

I found the explanation of inductive types in TPIL very helpful. But let me flag a few things which didn’t sink in the first time.

Let’s create a really simple inductive type — an enumerated type with two constructors.


inductive mytype
| AA : mytype
| ZZ : mytype


[try it here — lean web editor]

What actually happens when this code is processed? Lean adds a new constant mytype to the environment, which is a fancy way of saying that it now makes sense to talk about things of type mytype. It also adds some other stuff. It adds the constructors mytype.AA and mytype.ZZ — these are the two ways of creating new things of type mytype. And it adds the eliminator mytype.rec. Let’s take a closer look at this eliminator. The output of #check @mytype.rec is


mytype.rec : Π {C : mytype → Sort u_1}, C mytype.AA → C mytype.ZZ → Π (n : mytype), C n

 

which reads as follows: "Say we have a function Cfrom mytype to somewhere. If we know C AA and we know C ZZ, then for all n of type mytype, we know C n."

The point that I want to emphasize here is that these things -- the new type, its constructors and eliminator -- are added into the environment as axioms, or constants or whatever you want to call them. You can't "look at the proof" of mytype.rec in Lean -- it just is. If you try #print mytype.rec  you're just informed that mytype.rec is an eliminator. Take a look at the output, as well as the output of #print mytype and #print mytype.AA to see more examples of new things which have just appeared in your Lean environment.

However there is a whole bunch of other stuff which has also just appeared in the environment. If you type #print prefix mytype you will see, as well as the constructors and the eliminator, about 14 other new functions which have magically appeared, including the dreaded (or formerly dreaded, in my case) mytype.no_confusion, which appears to be a function whose output is of type mytype.no_confusion_type. One thing which perhaps needs to be stressed here though, is that these 14 other functions are not just magic new axioms like the constructors and eliminator -- they have actually been proved or constructed using our new axioms. You can take a look at the proof of mytype.no_confusion with
#print mytype.no_confusion. Doing this might well leave you confused, and it's the purpose of this post to try and unconfuse you.

## What does no_confusion do?

The idea behind inductive types is that two instances of an inductive type should be equal if and only if they were ultimately constructed in the same way. What does this actually mean? The constructors and eliminator do not seem to say this explicitly, so what is going on? Certainly (at least in constructive logic) every expression whose type is mytype will, at the end of the day, either equal AA or ZZ. But can we prove that AA ≠ ZZ? This sounds like the sort of thing which should somehow be an axiom, or part of the system in some way. But we've just seen how the system has changed when we created our new type -- we have mytype and AA and ZZ and the eliminator, and nothing else. If we want to prove AA ≠ ZZ then we are going to have to do it from these axioms -- we are going to have to prove it by induction somehow.

Here is quite a crazy proof that AA ≠ ZZ. I do know a simpler one, but the simpler proof doesn't generalise, and this slightly crazy proof I present here will turn out to be an instructive one (or, at least, I found it instructive).

First let's define a function which takes as input two expressions s and t of type mytype, and outputs a Proposition. The Proposition will be true if s and t are equal, and false otherwise. We can easily do this, using the cases tactic, which is in this setting just the tactic version of mytype.rec. Here's the function.


def mytype_equal : mytype → mytype → Prop :=
begin
intros s t,
cases s,
{ -- s = AA
cases t,
{ -- s = AA, t = AA
exact true },
{ -- s = AA, t = ZZ
exact false }
},
{ -- s = ZZ
cases t,
{ -- s = ZZ, t = AA
exact false },
{ -- s = ZZ, t = ZZ
exact true }
}
end


[ One could even write the same code as a function, without using tactic mode, and check that it equals
the tactic version:


definition mytype_equal' : mytype → mytype → Prop :=
λ s t, mytype.rec (mytype.rec true false t) (mytype.rec false true t) s

example : mytype_equal = mytype_equal' := rfl


]

Now here's the trick! Let's write a function which takes two terms s and t of type mytype, plus a proof that s = t, and returns a proof of mytype_equal s t. This sounds feasible -- after all, if s = t then we can just rewrite mytype_equal s t into mytype_equal t t which will always evaluate to true, and recall that trivial is a proof of true. So the code looks something like this:


definition mytype_no_confusion (s : mytype) (t : mytype) : s = t → mytype_equal s t :=
begin
intro H, -- H : s = t
rw H, -- goal now mytype_equal t t
cases t, -- goal 1 now my_type_equal AA AA
{ trivial },
-- goal 2 now my_type_equal ZZ ZZ
{ trivial }
end


If we run this code with s and t different elements of mytype, then we have proved s = t → false, which is exactly s ≠ t!


open mytype -- so we can write AA instead of mytype.AA

theorem unconfused : AA ≠ ZZ := mytype_no_confusion AA ZZ


That's how no_confusion works in this simplest case.

## The other thing no_confusion does

This is not quite the end of the story, however. For enumerated types this is all we need, but what about types like the natural numbers? Let's roll our own and see what else we need to do to give some rigour to the idea that "two natural numbers are the same if and only if they are constructed in the same way".


inductive xnat
| zero : xnat
| succ : xnat → xnat

 

The reason we think of these as the natural numbers is that it's not hard to convince oneself that the only ways to build instances of this type are zero, succ zero, succ (succ zero), succ (succ (succ zero))) and so on. Again, for these to be a good model of the natural numbers it's crucial that all of these things are distinct.

Exercise. Mimic the mytype proof and write code for xnat_equal, xnat_no_confusion and then prove ∀ n : xnat, zero ≠ succ n.

But there's something else. We also need to prove  ∀ m n : xnat, succ m = succ n → m = n and our current set-up doesn't let us do this. The problem is that xnat_equal (succ m) (succ n) is true and we've lost any information about m and n by this point. You may have noticed when writing xnat_equal that we threw away s and t when defining xnat_equal (succ s) (succ t). This was our error -- we need instead to return a Prop which depends on s and t. So how about this?



def xnat_no_confusion_prop (s t : xnat) : Prop :=
begin
cases s with m,
cases t with n,
exact true, -- s = t = zero
exact false, -- s = zero, t = succ n
cases t with n,
exact false, -- s = succ m, t = zero
exact (m = n), -- new idea when s = succ m, t = succ n!
end


Now we can still prove


def xnat_no_confusion' (s t : xnat) (H : s = t) : xnat_no_confusion_prop s t :=
begin
rw H,
cases t with n,
exact trivial, -- t = 0
show (n=n), -- t = succ n
refl,
end


but this time we have enough left in the tank to get the injectivity of succ:


example (m n : xnat) : succ m = succ n → m = n := λ H, xnat_no_confusion' _ _ H


[this is just term mode for intro H, exact xnat_no_confusion' _ _ H.]

## The whole truth

We used Props like m = n to pull off this trick. As you can imagine, for more complicated situations we might need some more general types. In fact the actual no_confusion_type and no_confusion functions built by Lean when the type is initialised look like the below. They take as input an arbitrary Type P and return Types instead of the Props I used above:


variables m n : xnat
variable P : Sort*

#reduce xnat.no_confusion_type P zero zero -- P → P
#reduce xnat.no_confusion_type P (succ m) zero -- P
#reduce xnat.no_confusion_type P zero (succ n) -- P
#reduce xnat.no_confusion_type P (succ m) (succ n) -- ((m = n) → P) → P


I would imagine that this gives more flexibility in some cases; at this point I do not have a clear idea of why this greater flexibility is needed.

[Finally let me comment that an easier proof of injectivity of nat.succ is just to write a one-sided inverse pred. This was the idea behind the simpler proof I was alluding to earlier. But this idea doesn't work in general, and it's an interesting exercise to figure out why!]

## Mini-tutorial : Currying.

A couple of mathematics undergraduates have asked me about this recently. Currying is something which is important in computer science, but is based on a fact which is trivial mathematically.

As it’s not hard to see, → is not associative. In other words, if P, Q and R are propositions, then the truth values of P → (Q → R) and (P → Q) → R might not be the same (for example, if all of P, Q and R were false). But computer scientists like writing P → Q → R because it looks terrifically clever and exciting, so we have to decide which one it means.

Convention: P → Q → R means P → (Q → R).

If we know P → (Q → R), how might we prove R? Well we could first prove P, which enables us to deduce Q → R, and then we could prove Q as well, which lets us deduce R. So in fact maybe P → (Q → R) is the same as (P ∧ Q) → R, which we can write as P ∧ Q → R because of some version of BIDMAS on steroids which computer scientists invented.

So in fact let’s test our hypothesis in Lean.


example (P Q R : Prop) : (P ∧ Q → R) ↔ (P → (Q → R)) :=
begin
cases classical.em P with hP hP;
cases classical.em Q with hQ hQ;
cases classical.em R with hR hR;
simp [hP,hQ,hR]
end


This rather cheeky piece of code says “split into two cases P true and P false, then (because of the semicolon) split each of those two cases into two cases Q true and Q false, then split again on R, and then just simplify everything in each case”. There is also a constructive way to prove it — why not try? You might need to know the tactics split,intros,cases,exact,assumption. Here is a direct link to the lean web editor where you can try.

If you do this exercise, you might begin to understand why someone working in a functional language would choose to write P → Q → R rather than P ∧ Q → R — it saves you from having to do cases on the and — you can just use intro.