What is maths? I think it can basically be classified into four types of thing. There are definitions, true/false statements, proofs, and ideas.
Definitions (for example the real numbers, or ) and true/false statements (for example the statement of Fermat’s Last Theorem or the statement of the Riemann Hypothesis) are part of the science of mathematics: these are black and white things which have a completely rigorous meaning within some foundational system.
Proofs are in some sense the currency of mathematics: proofs win prizes. Constructing them is an art, checking them is a science. This explains, very simply, why computer proof verification systems such as Lean, Coq, Isabelle/HOL, Agda… are much better at checking proofs than constructing them.
And ideas are the purely artistic part of mathematics. That “lightbulb” moment, the insight which enables you to solve a problem — this is the elusive mathematical idea.
Ideas are the part of mathematics that I understand the least, in a formal sense. Here are two questions:
- What is a group?
- How do you think about groups?
The first one is a precise “scientific” question. A group is a set equipped with some extra structure, and which satisfies some axioms. The formal answer is on Wikipedia’s page on groups. A group is a definition. But the second question is a different kind of question. Different people think about groups in different ways. Say is a group generated by an element satisfying . What can you say about ? If you are a mathematics undergraduate who has just seen the formal definition of a group, you can probably say nothing. If you have a more mature understanding of group theory, you instantly know that this group is trivial, because you have a far more sophisticated model of what is going on. Ideas are complicated, and human-dependent. A computer’s idea of what a group is, is literally a copy of the definition in Wikipedia, and this is one of the reasons that computers are currently bad at proving new theorems by themselves. You can develop a computer’s intuition by teaching it theorems about groups, or teaching it examples of groups, or trying to write AI’s which figure out group theory theorems or examples of groups automatically. But intuition is a very subtle thing, and I do not understand it at all well, so I will say no more about these ideas here. I think that the concept of a map being “canonical” is an idea rather than a definition — I think different mathematicians have different ways of thinking about this weasel word. In this post I’m going to talk about how the three other concepts are implemented in type theory, in the Lean theorem prover.
Definitions, true/false statements, and proofs
In contrast to ideas, the other parts of mathematics (the definitions, theorems/conjectures, and proofs) can be formalised in a foundational system, and hence can be created and stored on a computer in a precise way. By this, I don’t mean a pdf file! Pdf files are exactly what I want to move away from! I mean that people have designed computer programming languages which understand one of the various foundations of mathematics (set theory, type theory, category theory) and then mathematicians can write code in this language which represents the definition, true/false statement or proof in question.
I am certainly not qualified to explain how all this works in category theory. In set theory, let me just make one observation. A definition in set theory, for example the definition of the real numbers, or , is a set. And a proof is a sequence of steps in logic. A definition and a proof seem to me to be two completely different things in set theory. A group is a mixture of these things — a group is an ordered quadruple satisfying some axioms, so it’s a set with some logic attached.
In type theory however, things are surprisingly different. All three things — definitions, true/false statements, and proofs — are all the same kind of thing! They are all terms. A group, a proof, the real numbers — they are all terms. This unification of definitions and proofs — of sets and logic — are what seems to make type theory a practical foundational system for teaching all undergraduate level mathematics to computers.
Universes, types, and terms.
In type theory, everything is a term. But some terms are types. Not every term is a type, but every term has a type. A colon is used to express the type of a term in Lean — the notation
x : T means that
x is a term of type
T. For example, the real number (pi) is a term in Lean, and the real numbers
ℝ is a type, and we have
: ℝ , that is, is a term of type
ℝ. In set theory one writes , in type theory we write
: ℝ. They both express the same mathematical concept, namely “ is a real number”.
Now is a term but it’s not a type. In Lean,
x : makes no sense. In set theory, does happen to make sense, but this is a weird coincidence because everything is a set. Furthermore, the actual elements of will depend on how the real numbers are implemented (as Dedekind cuts or Cauchy sequences, for example), and hence in set theory , whilst being syntactically valid in theory, has no mathematical meaning; it happens to make sense, but this is a quirk of the system. If you’re adamant that should make sense then I say you’ve been brainwashed by set theory. Gauss and Euler will put you right: they were proving theorems about the real numbers before Cauchy and Dedekind came along with their sequences and cuts. There is no reason that needs to have elements, this is a quirk of the set-theoretic foundations of mathematics, and this quirk is eliminated in a type theoretic foundation.
I claimed above that every term has a type. So what is the type of
ℝ? It turns out that
ℝ : Type. The real numbers are a term of a “universe” type called
Type — the type theory analogue of the class of all sets.
Many of the mathematical objects which mathematicians think of as definitions either have type
Type, or have type
T : Type. As a vague rule of thumb, the stuff which has elements (groups, rings, fields etc) has type
Type, and the stuff which doesn’t have elements ( or an element of a general group) has type
T is some type. As another vague rule of thumb, things we write using capital letters (a group, a ring,…) or fancy letters (the reals, the rationals) tend to have type
Type, and things we write using small letters (an element
g of a group, a real number
r or an integer
n) tend to have type
T is what we think of as the set which contains these things. For example
2 : ℕ and
ℕ : Type, or if is an element of the group then in Lean we have
g : G and
G : Type. You can see that there is a three-layer hiearchy here — terms at the bottom, types above them, and the universe at the top.
- Universe :
- Examples of types :
X(something a set theorist would call a set), a Banach space, etc. Formally, we say
ℝ : Type.
- Examples of terms:
π(a term of type
g(an element of the group
G, so a term of type
x(an element of
X, so a term of type
X). Formally, we say
g : G.
This hierarchy is more expressive than the hierarchy in set theory, where there are only two levels: classes (e.g. the class of all sets), and sets.
There is a standard use of the colon in mathematics — it’s in the notation for functions. If
Y are sets (if you’re doing set theory) or types (if you’re doing type theory), then the notation for a function from
f : X → Y. This is actually consistent with Lean’s usage of the colon; Lean’s notation for the collection of functions from
X → Y , which is a type (i.e.
X → Y : Type, corresponding to the fact that set theorists think of as a set), and
f : X → Y means that
f is a term of type
X → Y, the type-theoretic version of , and the way to say that
f is a function from
Y in type theory.
(Not for exam) Strictly speaking, universes are types, and types are terms, but this is a linguistic issue: often when people speak of types, they mean types which are not universes, and when people speak of terms they mean terms which are not types. But not always. This confused me when I was a beginner.
Theorems and proofs
This is where the fun starts. So far, it just looks like a type is what a type theorist calls a set, and a term is what they call an element. But let’s now look at another universe in Lean’s type theory, the universe
Prop of true/false statements, where our traditional mental model of what’s going on is quite different. We will see how theorem statements and proofs can be modelled in the same way as types and terms.
So, how does this all work? As well as the universe
Type, there is a second universe in Lean’s type theory called
Prop. The terms of type
Prop are true/false statements. There is an unfortunate notation clash here. In mathematics, the word proposition is often used to mean a baby theorem, and theorems are true (or else they would be conjectures or counterexamples or something). Here we are using the the word Proposition in the same way as the logicians do — a Proposition is a generic true/false statement, whose truth value is of no relevance.
This will all be clearer with examples.
2 + 2 = 4 is a Proposition, so we can write
2 + 2 = 4 : Prop. But
2 + 2 = 5 is also a Proposition, so
2 + 2 = 5 : Prop as well. I’ll say it again — Propositions do not have to be true! Propositions are true/false statements. Let’s see some more complex examples.
The true/false statement that for all natural numbers is a Proposition: in Lean this can be expressed as
(∀ x : ℕ, x + 0 = x) : Prop . A Proposition is a term of type
Prop (just like the types we saw earlier were terms of type
RH denote the statement of the Riemann Hypothesis. Then
RH : Prop. We don’t care if it’s true, false, independent of the axioms of mathematics, undecidable, whatever. A Proposition is a true/false statement. Let’s look at the part of Lean’s type theory hierarchy which lives in the
- Examples of types :
2 + 2 = 4,
2 + 2 = 5, the statement of Fermat’s Last Theorem, the statement of the Riemann Hypothesis.
- Examples of terms: ??
So what are the terms in this three-layer
Prop hierarchy? They are the proofs!
Propositions are types, proofs are terms.
This is where the world of type theory seriously diverges from the way things are set up in set theory, and also the way things were set up in my brain up until three years ago. In trying to understand what was going on here, I even realised that mathematicians take some liberties with their language here. Before we start, consider this. The Bolzano-Weierstrass theorem is some statement in analysis about a bounded sequence having a convergent subsequence. I want to talk a little bit about how mathematicians use the phrase “Bolzano-Weierstrass theorem” in practice. A mathematician would say that the Bolzano-Weierstrass theorem is this statement about sequences having convergent subsequences. But if they are in the middle of a proof and need to apply it in order to continue with their proof, they say “by the Bolzano-Weierstrass theorem we deduce that there’s a convergent subsequence”. Nothing seems at all funny about any of this. But what I want to point out is that mathematicians are using the phrase “the Bolzano-Weierstrass theorem” in two different ways. When they say what it is, they are referring to the statement of the theorem. But when they say they’re using the Bolzano Weierstrass theorem, what they are actually using is its proof. The Birch and Swinnerton-Dyer conjecture is a perfectly well-formed true/false statement, you can certainly say what it is. But you can’t use the Birch and Swinnerton-Dyer conjecture in the middle of a proof of something else if you want your proof to be complete, because at the time of writing the conjecture is an unsolved problem. Making a clear distinction between the statement of a theorem, and the proof of a theorem, is important here. A mathematician might use the phrase “the Bolzano-Weierstrass theorem” to mean either concept. This informal abuse of notation can confuse beginners, because in the below it’s really important to be able to distinguish between a theorem statement, and a theorem proof; they are two very different things.
In the natural number game, I use this abuse of notation because I am trying to communicate to mathematicians. The statement
∀ x : ℕ, x + 0 = x is a true statement, and I say things like “this is called
add_zero in Lean”. In the natural number game I write statements such as
add_zero : ∀ x : ℕ, x + 0 = x. But what this means is that the term called
add_zero in Lean is a proof of
∀ x : ℕ, x + 0 = x! The colon is being used in the type theory way. I am intentionally vague about this concept in the natural number game. I let mathematicians believe that
add_zero is somehow equal to the “idea” that for all . But what is going on under the hood is that
∀ x : ℕ, x + 0 = x is a Proposition, which is a type, and
add_zero is its proof, which is a term. Making a clear distinction between the statement of a theorem, and its proof, is important here. The statements are the types, the proofs are the terms.
- Examples of types:
2 + 2 = 4,
2 + 2 = 37, the statement of Fermat’s Last Theorem —
∀ x y z : ℕ, n > 2 ∧ x^n + y^n = z^n → x*y = 0.
- Examples of terms: the proof that
2 + 2 = 4(a term of type
2 + 2 = 4), the proof of Fermat’s Last Theorem (a term of type
∀ x y z : ℕ, n > 2 ∧ x^n + y^n = z^n → x*y = 0)
Elements of a theorem
So our mental model of the statement
: ℝ is that
ℝ, the type, is “a collection of stuff”, and , the term, is a member of that collection. If we continue with this analogy, it says that the statement
2 + 2 = 4 is some kind of collection, and a proof of
2 + 2 = 4 is a member of that collection. In other words, Lean is suggesting that we model the true/false statement
2 + 2 = 4 as being some sort of a set, and a proof of
2 + 2 = 4 is an element of that set. Now in Lean, it is an inbuilt axiom that all proofs of a proposition are equal. So if
a : 2 + 2 = 4 and
b : 2 + 2 = 4 then
a = b. This is because we’re working in the
Prop universe — this is how Propositions behave in Lean. In the
Type universe the analogue is not remotely true — we have
: ℝ and
37 : ℝ and certainly . This special quirk of the
Prop universe is called “proof irrelevance”. Formally we could say that if
P : Prop, if
a : P and if
b : P then
a = b. Of course if a Proposition is false, then it has no proofs at all! It’s like the empty set. So Lean’s model of Propositions is that the true ones are like sets with 1 element, and the false ones are like sets with 0 elements.
Recall that if
f : X → Y then this means that
f is a function from
Y. Now say and are Propositions, and let’s say that we know . What does this mean? It means that implies . It means that if is true, then is true. It means that if we have a proof of we can make a proof of . It is a function from the proofs of to the proofs of . It is a function sending an element of to an element of . It is a term of type
P → Q. Again: a proof of is a term
h : P → Q. This is why in the natural number game we use the
→ symbol to denote implication.
false denote a generic false statement (thought of as a set with 0 elements), and let
true denote a generic true statement (thought of as a set with 1 element). Can we construct a term of type
false → false or a term of type
true → true? Sure — just use the identity function. In fact, in both cases there is a unique function — the hom sets have size 1. Can we construct a term of type
false → true? Sure, there is a function from the set with 0 elements to a set with 1 element, and again this function is unique. But can we construct a term of type
true → false? No we can’t, because where do we send a proof of
true? There are no proofs of
false to send it to. So
true → false is a set of size 0. This corresponds to the standard truth table for
→, where the first three statements we analysed were true and the last was false.
The proof of Fermat’s Last Theorem is a function
So what does a proof of
∀ x y z : ℕ, n > 2 ∧ x^n + y^n = z^n → x*y = 0 look like? Well, there is an arrow involved in that Proposition, so the statement of Fermat’s Last Theorem is some kind of set of the form , which means that in Lean, a proof of Fermat’s Last Theorem is actually a function! And here is what that function does. It has four inputs. The first three inputs are natural numbers
z. The fourth input is a proof: it is a proof of the Proposition
n > 2 ∧ x^n + y^n = z^n. And the output of this function is a proof of the Proposition
x*y = 0. This is quite an unconventional way to think about what the proof of Fermat’s Last Theorem is, and let me stress that it does not help at all with actually trying to understand the proof — but it is a completely consistent mental model for how mathematics works. Unifying the concept of a number and a proof — thinking of them both as terms — enables you to think of proofs as functions. Lean is a functional programming language, and in particular it is designed with functions at its heart. This, I believe, is why theorem provers such as Lean, Coq and Isabelle/HOL, which use type theory, are now moving ahead of provers such as Metamath and Mizar, which use set theory.
Prove a theorem! Write a function!
|Type examples||ℝ||2 + 2 = 5, |