Lean has two copies of basic number types such as the naturals or integers. The reason is that Lean users might want to do two completely different things with numbers such as these. Some people might want to prove theorems about numbers, such as associativity of addition, or Fermat’s Last Theorem. Other people might want to do actual computations with numbers, for example proving that 617 is prime or figuring out 10000 * 10000
. One thing I learnt from Seul Baek at Lean Together 2019 is that these two concepts fight against each other! Computer scientists will understand this well, but as a mathematician I am only just coming to terms with it.
As a mathematician I really like Lean’s definition of the natural numbers:
inductive nat
| zero : nat
| succ (n : nat) : nat
[and I should say in passing that as someone who has now looked at a lot of web pages containing Lean code, I wish these code blocks looked better in wordpress].
The advantage of this definition is that if you are trying to prove a theorem about natural numbers, then applying the built-in induction
tactic to the variable in question reduces the question into exactly the problems you expect — the base case and the inductive step. Actually the real advantage is that you can use the equation compiler to make definitions by recursion, and it just works.
But apparently computer scientists are not particularly fond of this data type:
#reduce (10000 : ℕ) * 10000 -- deep recursion error
Apparently, making a term that looks like succ (succ (succ (... succ zero)))))...)
with a hundred million succ
s can take up a lot of space.
Computer scientists hence prefer storing naturals in binary format. This code is essentially from mathlib’s data/num/basic.lean:
inductive pos_num : Type
| one : pos_num
| bit0 : pos_num → pos_num
| bit1 : pos_num → pos_num
inductive num : Type
| zero : num
| pos : pos_num → num
A pos_num
is a finite string of binary digits starting with a 1 (e.g. 6 is 110 in binary and bit0 (bit1 one) in Lean), and a num
is either a pos_num
or zero.
Computer scientists can now implement computationally efficient versions of addition and multiplication on num
, for example after defining succ
on pos_num
in the natural recursive way, they can make definitions such as
def add : pos_num → pos_num → pos_num
| 1 b := succ b
| a 1 := succ a
| (bit0 a) (bit0 b) := bit0 (add a b)
| (bit1 a) (bit1 b) := bit0 (succ (add a b))
| (bit0 a) (bit1 b) := bit1 (add a b)
| (bit1 a) (bit0 b) := bit1 (add a b)
No surprises here — addition is defined by recursion on length of binary expansion, and in the obvious way. Multiplication is also defined, basically as long multiplication:
protected def mul (a : pos_num) : pos_num → pos_num
| 1 := a
| (bit0 b) := bit0 (mul b)
| (bit1 b) := bit0 (mul b) + a
OK so great, now we can do ten thousand squared:
import data.num.basic
#reduce (10000 : pos_num) * 10000 -- ugly output corresponding to 10^8 in binary
But actually we have to be careful here — what if I have made a typo in my definition of add
on pos_num
? I had better check that my definitions of add
on nat
and pos_num
coincide! To put it another way — we need to prove that column addition is actually a valid algorithm for adding two numbers! Without this, Lean will not be able to prove anything about nat
s using the computationally efficient num
.
This boils down to constructing maps between nat
and num
and verifying that these maps commute with addition.
To define a map from nat
to num
one has to “put a nat
structure on num
”, i.e. define zero
and succ
for num
. This is not hard. To define the map in the other way, one has to define bit0
and bit1
on nat
and this isn’t hard either.
But to prove that these maps are bijections is already some work. It suffices to prove that the composition in both ways is the identity. I tried this myself in Lean and the harder way was to show that if you start with a num
then go to nat
and back, you end up where you started. The problem with num
s is that the version of induction you have is induction on length of binary representation, which is not so well-behaved; you first have to prove lemmas about the maps from num
to nat
and nat
to num
commuting with succ
, bit0
and bit1
.
After this, one wants to prove that the computer science definition of +
is the same as the mathematics one. I jumped straight in, and soon I ran into a goal of the form (a+1)+1=a+(1+1)
on num
, so I figured I’d prove associativity for pos_num
in general (after all, I’d already proved it for nat
):
theorem challenge (a b c : pos_num) : (a + b) + c = a + (b + c) := sorry
But this turns out to be quite subtle! I would be interested in hearing from anyone who manages to do this directly. The inductive hypotheses you generate from induction on pos_num
initially seem to be useless. Here is a link to the Lean web editor where you can try it yourself (or just cut and paste everything into VS Code if you have Lean installed locally).
I talked to computery-minded people at Lean Together and they were unanimous: don’t prove this directly! Prove associativity of nat
, prove that the addition on nat
and num
agree, and deduce associativity on num
from associativity on nat
. I had actually rolled my own nat
for this exercise, and all of a sudden I had to prove things like (a+b+1)+(a+b+1)=(a+a+1)+(b+b+1)
on my nat
with no high-powered tactics like ring
available to me 🙂 Lucky I had a journey from Amsterdam back to London to do all this stuff on, and I already knew how to prove commutativity and associativity of addition on nat
. Even more luckily, associativity of addition on num
was not needed to show that addition on nat
and num
coincided. Whilst I do not really understand the details, I seemed to need associativity for num
when proving that addition commuted with the map from nat
to num
, but when I gave up on this approach and instead tried to prove that addition commuted with the map from num
to nat
everything came together. I had already proven that these maps were bijections, so I could deduce that addition commuted with the map from nat
to num
from previous results.
The moral
When your schoolteacher taught you column addition back when you were a kid, you were wise not to ask them why column addition was associative, because the direct proof seems tricky!