Formalising Mathematics : workshop 6 — limits

I talked about limits of sequences of real numbers in workshop 3, but this week I come back to them, because this week we’ll see that the usual ten-line proof that if $(a_n)_n\to\ell$ and $(b_n)_n\to m$ then $(a_nb_n)_n\to \ell m$ can be replaced by a two-line proof involving something called filter.tendsto_mul. So this week I’m going to talk about tendsto, but first I think it’s worth refreshing our memories about the useful mental picture of a filter as a generalised subset.

Reminder: filters are generalised subsets

Let X be a type (you can call it a set if you like). The type of subsets of X has a bunch of nice structure — there’s a partial order $\subseteq$ , there are unions and intersections (both finite and infinite) and they satisfy a bunch of axioms. Back in the days of Isaac Newton, one particularly well-studied type was the type of real numbers. However, people had not quite worked out whether infinitesimals existed — infinitely small non-zero real numbers called things like $dy$ and $dx$ — and some people like Leibniz wanted to divide one of them by the other because they were discovering some new-fangled thing called calculus. By the 20th century, the experts had made their decision: there were no infinitesimally small nonzero real numbers, and that’s what the textbooks say today (other than Robinson’s book on non-standard analysis, but nobody uses that for teaching undergraduate calculus). However it was equally clear that infinitesimals provided a good picture.

A filter on X is a kind of “generalised subset” of X. Each subset S of X gives you a filter, called the principal filter 𝓟 S, and there are other filters too corresponding to slightly weirder things. For example, if X = ℝ then there’s a filter called 𝓝 0, the neighbourhood filter of 0, which should be thought of as containing 0 and all the infinitesimally small numbers. Just like usual subsets, these generalised subsets have a partial order, which we’ll call $\leq$ , extending the partial order on usual subsets. In reality these filters are defined completely rigorously as some collection of usual subsets satisfying some axioms but we won’t go into these this week, we’ll just stick with the picture.

Pushing forward and pulling back subsets

Let’s stick with “usual” subsets for this section, but let’s throw in a second type Y and a function f : X → Y. The function gives us some kind of dynamics in the system — we can start using the function to move sets around. The most obvious way that a function can move a set around is via the image construction. Given a subset S of X, we can consider what a mathematician would call $f(S)$ , the image of S in Y, defined as the set of y in Y such that there exists x in S with f x = y. This is an abuse of notation — the inputs to f are supposed to be elements of X, not subsets of X, so f X does not make sense, and in Lean we carefully differentiate between these ideas by writing f '' S for the image of S in Y. We call this “pushing forward a subset along f“.

Conversely, if T : set Y then there is a way of “pulling T back along f” to give us a subset $f^{-1}(T)$ of X, consisting of the x in X such that f x is in T. Again Lean has a weird notation for this, because ⁻¹ is taken by a general kind of inverse function on a group. So we write f⁻¹' T for this construction.

If set X denotes the type of subsets of X, then f : X → Y gives rise to functions f '' : set X → set Y and f⁻¹' : set Y → set X. Are these functions inverse to one another? No, not remotely! In general, doing one then the other won’t get you back to where you started. So what is the relationship between these two constructions? The fancy answer is that they form a Galois connection, and the even fancier answer is that they are a pair of adjoint functors. But let’s not go into this. Let’s talk about a fundamental predicate.

Let’s stay with the set-up: X and Y are types, and f : X → Y is a function. Say S is a subset of X and T is a subset of Y. Then we can ask ourselves the following true-false question: does f restrict to a function from S to T? In other words, is it true that x ∈ S implies f x ∈ T? Perhaps a good notation for this idea would be something like $S\leq_f T$ . The reason this notation is appealing is that if $S'\subseteq S\leq_f T$ then $S'\leq_f T$ , and if $S\leq_f T\subseteq T'$ then $S\leq_f T'$ , and this feels like some kind of transitivity statement, but it isn’t literally transitivity of some relation on a type, because S and T don’t in general have the same type — they’re subsets of different sets. How can we restate $\leq_f$ using pushforwards or pullbacks?

If you think about it, it turns out that there is a way to state this relation using pushforwards, and an equivalent way using pullbacks. One can check easily that $S\leq_f T$ is equivalent to f '' S ⊆ T and also to S ⊆ f⁻¹' T. In particular f '' S ⊆ T and S ⊆ f⁻¹' T are equivalent to each other (and we have proved that the functors are adjoint, for those of you who know this category-theoretic language).

Our aim is to find analogous constructions for filters.

Pushing forward and pulling back generalised subsets

Jordan Ellenberg on Twitter remarked that something I said last week reminded him of a fact about ideals. I really like this idea because if we’re going to think in pictures then it helps to understand analogies. When people were trying to understand factorization in integral domains such as the integers, they have to deal with the fact that $6=2\times3$ and also $6=(-2)\times(-3)$ but that really these are “the same factorization”. This leads us to an equivalence relation on elements of an integral domain — two elements are equivalent if one is a unit times the other, where the units are divisors of 1. Equivalence classes of elements are the same as principal ideals of the ring, and one might have hoped that this idea solves all your factorization problems. But rings like $\mathbb{Z}[\sqrt{-5}]$ teach us otherwise — now $6=2\times 3=(1+\sqrt{-5})\times(1-\sqrt{-5})$ and these factorizations are still not equivalent. The fix was to introduce some magic “ideal numbers”, or “ideals” for short, which are not really numbers, but some kind of generalised number, and now every non-zero generalised number in $\mathbb{Z}[\sqrt{-5}]$ factors uniquely into prime generalised numbers. The reason I am bringing this up is that it is not difficult to check that every ideal of a commutative ring is uniquely determined by the principal ideals which it contains (because it is uniquely determined by the elements it contains).

Filters, a.k.a. generalised subsets, have the reverse property: every filter is uniquely determined by the principal filters which contain it. This is an extensionality lemma for filters, and it is this idea which we need to keep in mind when we try to figure out how to push forward and pull back filters.

As ever, say X and Y are types, and f : X → Y is a function. Pushing forward filters along f (called map f : filter X → filter Y in Lean) is barely any harder than pushing forward subsets. Say F : filter X is a filter on X. Let’s figure out how to define its pushforward map f F, a filter on Y. By the remark above, it suffices to figure out which subsets T of Y, or more precisely which principal filters T, satisfy map f F ≤ T. If we want our intuition to be correct, this should be the case precisely when F ≤ f⁻¹' T, because this feels exactly like the situation $F \leq_f T$ studied above. Hence we will define the pushforward map f F of filter F along f by saying that map f F ≤ T if and only if F ≤ f⁻¹' T, and one can check that this definition (of (map f F).sets) satisfies the axioms of a filter. This is one of the things you’ll be proving today in workshop 6.

Pulling back filters, called comap f : filter Y → filter X, is harder, because if G : filter Y then this time we want to solve comap f G ≤ S and the adjoint functor nonsense above only tells us about S ≤ comap f G, which is not enough information: for example the only nonempty subset of ℝ contained in the infinitesimal neighbourhood filter 𝓝 0 is the subset {0}, but 𝓝 0 is strictly larger than the principal filter 𝓟 0 because it also contains Leibniz’s elements $dx$ . The one question we can answer heuristically however, is a criterion for comap f G ≤ f⁻¹' T, because if our mental model of comap f is “generalised f⁻¹'” then G ≤ T should imply this. The problem with just restricting to these Ts is that if f is not injective then we can never distinguish between distinct elements of X mapping to the same element of Y, and yet if comap f G ≤ f⁻¹' T and f⁻¹' T ≤ S then we certainly want comap f G ≤ S. So this is what we go for: comap f G ≤ S if and only if there exists a subset T of Y such that G ≤ T and f⁻¹' T ≤ S. It turns out that there does exist a filter on X satisfying these inequalities, and this is our pullback filter.

One can check, in exact analogy to pushing forward and pulling back subsets, that map f F ≤ G ↔ F ≤ comap f G — indeed, this is the boss level of Part A of today’s workshop.

filter.tendsto — what and why?

The picture behind filter.tendsto f is easy, indeed we have seen it before for sets. The idea is simply that if f : X → Y as usual, and F : filter X and G : filter Y then we can ask whether f restricts to a map from the generalised set F to the generalised set G. This true-false statement is called tendsto f F G in Lean. It is equivalent to map f F ≤ G and to F ≤ comap f G, and it seems to be pronounced “F tends to G along f“, although this notation does not often get pronounced because it seems to be rather rare to see it in the literature. It is used absolutely all over the place in Lean’s treatment of topological spaces and continuity, and it’s based on an analogous formalisation of topology in the older Isabelle/HOL proof system.

Why is this notion useful? Here is a beautiful example. Let us say now that X and Y are topological spaces, and f : X → Y is an arbitrary function. What does it mean to say that f is continuous at x, an element of X? The idea is that if x' is an element of X very close to x, then f x' should be very close to f x. How close? Let’s just say infinitesimally close. The idea is that f should send elements of X infinitesimally close to x to elements infinitesimally close to f x. In other words,

tendsto f (𝓝 x) (𝓝 (f x)).

It’s as simple as that. We want f to map an infinitesimal neighbourhood of x to an infinitesimal neighbourhood of f x. We say that f is continuous if it is continuous at x for all x in X. One can check that this is equivalent to the usual definition of continuity. The proof that a composite of continuous functions is continuous is just as easy using this filter language as it is in the usual open set language, and certain other proofs become easier to formalise using this point of view.

But the real power of the tendsto predicate is that it does not just give us a new way of thinking about continuity of a function at a point. It also gives us a new way of thinking about limits of functions, limits of sequences of real numbers, limits in metric space, and more generally essentially any kind of limit that comes up in an undergraduate degree — these concepts are all unified by the tendsto predicate.

The second Lean file in today’s workshop consists of an analysis of a two-line proof of the fact that the limit of the product of two real sequences is the product of the limits. Let me finish by talking about these two lines.

The first line observes that the definition is_limit a l of week 3 (the traditional $\forall \epsilon>0, \exists N, n\geq N\implies |a(n)-l|<\epsilon$ ) is equivalent to the rather slicker-looking tendsto a cofinite (𝓝 l), where cofinite is the cofinite filter (defined as the generalised subset C of ℕ such that C ≤ S if and only if S is cofinite). It is straightforward to see that these two predicates are equivalent: tendsto a cofinite (𝓝 l) is equivalent to saying that any subset of the reals whose interior contains l has pre-image under a a cofinite subset of the naturals, which is equivalent to saying that for any $\epsilon>0$ , the number of $n$ such that $|a(n)-l|\geq\epsilon$ is finite.

The second line comes from the following standard fact about neighbourhood filters (which we will prove): if Y is a topological space equipped with a continuous multiplication, if A is any type, and if f and g are functions from A to Y and F is any filter on A, then tendsto f F (𝓝 y) and tendsto g F (𝓝 z) together imply tendsto (f * g) F (𝓝 (y * z)), where f * g denotes the map from A to Y sending a to f a * g a. We prove this result from first principles in the file, although of course it is in Lean’s maths library already, as is the fact that multiplication on the real number is continuous, which is why we can give a complete two-line proof of the fact that limit of product is product of limits in Lean.