What mathematics can we do with vector spaces, without having to pick a basis? For example, if is a finite-dimensional vector space and is a linear map, can we define its trace “without picking a basis”?

I got interested in trying to understand if this question even has a meaning. Here are some thoughts.

## Models for vector spaces

When we learn linear algebra at high school, we typically first learn the “concrete” theory, where vectors are columns of numbers, and we can multiply them by matrices and thus get a conceptual understanding of systems of linear equations. Then at university we go on to the “abstract” theory, where a real vector space is something defined by a list of axioms, and spaces like are now *examples* of these abstract objects.

We then learn about the fundamental notion of a *basis* of a vector space. Say we have an abstract finite-dimensional vector space. By picking a basis, the vectors in our vector space suddenly transform back into columns of numbers. Not only that, but linear maps between vector-spaces-with-a-basis turn back into matrices. By the time we’ve learnt that every vector space has a basis, we can see that our new theory is in some sense “the same as” our old theory. In some sense it took me a long time to get on top of this principle as an undergraduate; perhaps the key concepts were not emphasized to me enough, or maybe I just got lost in all the new (to me, at the time) ideas. Nowadays I think about it like this: initially we learn that is an *example* of a finite-dimensional vector space, but after learning about bases we can conclude that *every* finite-dimensional real vector space is isomorphic to for some , so in fact can be thought of as a *model* for a finite-dimensional real vector space, just like the collection of equivalence classes can be thought of as a model for a quotient by an equivalence relation. Every vector space has a basis, so one can prove theorems about finite dimensional vector spaces by checking them on models, i.e. by picking a basis.

## Don’t pick a basis (unless you have to)

After a couple of years at university the following idea had somehow sunk in: if possible, one “should not choose a basis”. The canonical example shows up when we learn about the *dual* of a vector space. The dual of a real vector space is just the space of linear maps from to ; this space has a natural vector space structure and is called the *dual space* of , with notation . Confusing example: if , then an element of is represented by a column of numbers. Give its canonical basis . Then any linear map is uniquely determined by where it sends the , and in particular an element of is also uniquely determined by numbers, so we can represent it as a vector of length and we have proved that the dual of is again. Great! Furthermore, every finite-dimensional vector space equals (proof: pick a basis) so we’ve proved that duality is just the identity map!

Except that we haven’t, because we have been a bit too liberal with equality here (and as some computer scientists are very quick to point out, many mathematicians don’t understand equality properly, and hence they might accidentally end up teaching this point badly). This argument proves that if is a finite-dimensional vector space, then it is *isomorphic* to its dual ; however it is not in general *canonically isomorphic* to its dual, whatever that is supposed to mean. In this instance, it means that different bases in general produce different isomorphisms between (identified with ) and (also identified with ). This is a bit confusing because in the group theory course running at the same time as the linear algebra course, a student is simultaneously being taught that when we say “how many groups are there of order 4” we *obviously* mean “…up to isomorphism”, because isomorphic groups *are* equal for that question.

However, if we do this trick twice, we can identify with its double-dual and it turns out that this identification *is* canonical, whatever that means. What it appears to mean in this case is that there is a really cool way of writing down the isomorphism from to which *doesn’t ever pick a basis*!

[Technical note/reminder: Here’s the explicit map from to its double dual. Say . We need to write down a linear map associated to . So take and we need to construct a real number somehow. Well, what about ? That works a treat! One can check that this map sending to is indeed linear, and induces a map from to which can be checked to be an injection and hence, by dimension counting, an isomorphism (NB: some magic just happened there). I suspect that this argument was my initiation into the mysterious word *canonical*, a word I now rail against, not least because in my opinion the Wikipedia page about canonical maps contains a definition which is full of fluff (“In general, it is the map which preserves the widest amount of structure, and it tends to be unique” — this is not a definition — this is waffle).]

The moral: all the canonical kids are too cool to pick a basis.

PS here is a funny way of thinking about it: if we identify with as column vectors, then perhaps we should identify the dual of with as *row* vectors, because multiplication on the left by a row vector sends a column vector to a number, which is what we want a dual vector to do. So identifying with is some high-falutin’ version of *transpose*, and if you transpose once you don’t quite get the same thing, but if you transpose twice then you’re back where you started. Canonically.

## Picking a basis and then pretending you didn’t

OK so here’s a pretty cool theorem about traces (although any formaliser would tell you that it is actually a *definition*, not a theorem). If is an square matrix then it has a *trace*, the sum of the elements on the leading diagonal. Now say is a finite-dimensional real vector space, and is a linear map, crucially from to itself. If we choose a basis of and use the same basis for the source and the target , then becomes a matrix, and we can take its trace. If we change our basis of then the matrix representing changes. But, miraculously, its trace does not! This can be proved by an explicit calculation: changing our basis for changes the matrix representing to for a certain invertible “change of basis” matrix (here was where it was key that the source and the target of the endomorphism were the same, otherwise we would have got ), and the traces of and are equal because of the general fact that the traces of and are equal if and are square matrices of the same size (apply with and ).

As a consequence, this means that if is a finite-dimensional vector space then we can unambiguously talk about the *trace* of a linear map , in the following way. First do the slightly distasteful choose-a-basis thing, then take the trace of the corresponding matrix, and then prove that the calculation was independent of the basis you chose, so we can pretend you never did it. Similarly one can talk about the determinant and characteristic polynomial of , because these are also invariants of matrices which are constant on conjugacy classes.

However, you *did* do it — you chose a basis. Something is a little different to the map from to its double dual — the map from to its double dual really was defined without choosing a basis *at all*. Here we did something slightly different — we chose a basis, and then proved that it didn’t matter. Can we go one better, and define the trace of a linear map from to without choosing a basis *at all*?

So in the process of discussing this question on Twitter and on the Lean chat, established firstly that this very much depends on what the question actually *means*, and secondly I managed to learn something new about Lean, and when I learn something new about Lean I tend to blog about it, so here we are. First I’ll talk about the failed attempts to define the trace of an endomorphism, which led to some other questions and clarifications, and then I’ll talk about `trunc`

, something which looked to me like a completely pointless operation in Lean and which up until now I’ve just ignored, but which somehow might be at the bottom of this.

## A failed attempt, and some progress

In characteristic there are some problems with the ideas below, but these are not relevant to what I want to talk about, so let’s just stick to vector spaces over the reals (or more generally any field of characteristic zero). The first observation is that computing the trace, determinant and characteristic polynomial all seem to be pretty much the same question: for example, if you can compute the char poly of then you can compute its trace and det because you can read these things off from the coefficients. Conversely, if you can compute traces then applying this to some exterior powers you can read off the coefficients of the characteristic polynomial including the det. So computing any of these invariants without choosing a basis somehow boils down to the same thing.

Next let’s turn to Wikipedia, where we are informed of a basis-free way to compute the trace of an endomorphism! Here’s the trick. There’s an obvious bilinear map , sending to , and by the universal property of the tensor product this induces a linear map . There is also an obvious bilinear map sending to the linear map sending to , and this induces a linear map , which is easily checked to be an isomorphism if is finite-dimensional. Composing the inverse of this isomorphism with gives us a linear map which we can check to be the trace (e.g. by picking a basis). So we’re done, right?

Well, the thing about this construction is that whilst the map is canonical (indeed, it even exists in the infinite-dimensional case), to prove that it’s surjective in the finite-dimensional case the natural thing to do is to pick a basis and to make the corresponding matrix by taking an appropriate linear combination of tensor products of elements of the basis and the dual basis. I would argue that we have made some progress here — we still picked a basis, but we used it to fill in a proof, rather than to construct data. However, we still picked a basis. Note also that the inverse of a computable bijection might not be computable, if you’re interested in that sort of thing, and I suspect that this might be a situation where this annoyance kicks in.

## What is the dimension of a finite-dimensional vector space?

One might instead be tempted to argue that the map is surjective because it is an injective map between vector spaces of the same dimension (I’m not entirely sure how to prove it’s injective without picking a basis, but it might well be possible; however I do not know how to prove that the dimension of is the product of the dimensions of and without picking a basis ). Anyway, talking of dimensions, here is the other “basis-free” method I learnt to do these sorts of things: the canonical method to work out the determinant of an endomorphism without choosing a basis. If is a linear map, and if is the dimension of then induces a linear map on top wedge powers , and an endomorphism of a 1-dimensional space is canonically a number (proof: pick a basis and check it’s independent of the choice) which can be checked to be the determinant of , and if you can do determinants then you can do char polys and hence traces.

The problem with this approach is that it relies on you knowing what is, the dimension of , and if all you know is that is finite-dimensional, then how do you get , the dimension? Well obviously you pick a basis , count it, and then prove that your answer is independent of the choice of basis. So this “top exterior power” argument has moved us closer to the heart of the problem: forget defining the trace — how do we define the *dimension* of a finite-dimensional vector space without picking a basis? Note that the dimension is just the trace of the identity function, so we can add dimension to our list of things which we cannot “compute” without picking a basis. And now we’re getting somewhere — what does it *mean* to say a vector space is finite-dimensional? Did we pick a basis even to make that statement?

## This is something to do with constructivism.

I am not a big fan of constructivism, as many of you will know. I think that the emphasis placed on it by the computer theorem proving community historically has held the area back; it puts off mathematicians (e.g. the 2012 version of me) who have been indoctrinated by a traditional mathematics course which assumes the law of the excluded middle from day one. One problem I have with constructivism, as someone who was classically trained, is that it turns out that sometimes there is more than one way of doing something constructively, and all these ways are the same classically. For example very early on in my Lean career I was surprised to learn that there was `function.bijective`

, the predicate that a function was a bijection, but also there is the concept of a function with a two-sided inverse. As far as I’m concerned these are the same thing, but constructively they’re not, because given a bijection there might not be a “formula” for its inverse. The existence of a two-sided inverse is a true/false statement — but actually *having* the two-sided inverse is **data**, and, in constructive mathematics, data can sometimes be hard to come by. The function which takes as input a set/type `X`

and a proof that `X`

is nonempty, and outputs an element of `X`

, is a noncomputable function and its existence, if you think about it, is closely related to the axiom of choice, which is something that the constructivists are not big fans of.

So it turns out that this whole “ is finite-dimensional” thing, which this entire post has assumed all the way through, is a victim of this subtlety. What does it *mean* for a vector space to be finite-dimensional? The following answers are all the same classically (i.e. in “normal maths”), but constructively they’re all different:

- A
*proof*that has a finite basis; - An actual
*choice*of a finite basis; - An element of the
*truncation*of the set of all pairs , where and is an isomorphism. Here by the truncation of a set we mean the quotient of the set by the “always true” equivalence relation. Yes, you read that right.

OK so we know what the first two things are. The first statement is just a proof. If your proof is nonconstructive that’s fine, I don’t care. The second thing is data. For me this is a problematic definition of finite-dimensional, precisely because it’s *not* a proof of a true-false statement. If I am working with a finite-dimensional vector space in Lean then I might end up having to deal with the fact that if some argument changed ‘s basis whilst leaving alone, I might not have (as finite-dimensional vector spaces) any more, because the data going on behind the scenes saying that is finite-dimensional might not match up. I have enough problems formalising mathematics in type theory without having to deal with this too.

This brings us to the third definition, involving `trunc`

. OK so if `X`

is a type or set or however you want to set up your mathematics, then, as I mentioned above, `trunc X`

is the quotient of `X`

by the equivalence relation which is always true. In particular if `X`

is nonempty then `trunc X`

has one element/term, and if `X`

is empty then it has no elements/terms. If you’re happy about the concept that propositions can be types, with the idea that a true proposition has one term (its proof), and a false proposition has no terms, then `trunc X`

seems to be basically the proposition that `X`

is nonempty. However it is more than that, because it is *data*. It is the missing link in our story.

Let’s say that `V`

is a real vector space, and we have a “proof” that it is finite-dimensional in the sense that we have a term `t`

of type `trunc X(V)`

, where `X(V)`

is the space of all pairs . Here’s how to define the trace of an endomorphism . We’re going to define a map from `trunc X(V)`

to the reals, and the idea is that if you evaluate this map at `t`

then you’ll get the trace of . Now to define a map from a quotient, we use the universal property of quotients. First of all we have to define a map from the space we’re quotienting by to the reals. This is easy: given an isomorphism we get an induced map and we just take its trace with respect to the standard basis. And secondly, we need to prove that this map descends to the quotient, which boils down to proving a theorem, and the theorem of course turns out to be precisely the statement that the trace of an endomorphism is independent of choice of basis.

Similarly we can define a function from `trunc X(V)`

to the naturals, such that if `V`

is finite-dimensional in the sense that we have `t : trunc X(V)`

then its dimension is what you get by evaluating this function. And det and char poly etc.

Note that for *any* vector space `V`

, the type `trunc X(V)`

has at most one term — it’s a so-called *subsingleton* type. Lean’s `exact`

command will be fussy about subsingleton types because it’s a *theorem* that two terms of a subsingleton type are equal, rather than it being a definitional equality. Hence Lean’s `exact`

tactic still might not work if we’re carrying around `t : trunc X(V)`

as our definition of finite-dimensionality. However the `convert`

tactic will operate fine, because it looks out for subsingletons and applies the appropriate theorem.

## Type-theoretic conclusion

We now seem to have got to the bottom of this. To do this kind of metamathematics — “did we pick a basis?” — we need to think really carefully about what we even *mean* by the assumption of `V`

being finite-dimensional. My preferred approach, and the one which makes mathematics easier to do in Lean, is to just use the propositional definition “there exists a basis”. This way one never runs into problems with equality, and if one needs a basis one just applies Lean’s version of the axiom of choice. Constructivists would complain that it breaks computation, but I want to prove theorems and I prefer to do computations using tactics rather than trying to persuade the kernel to reduce things. The other approach is to carry around some extra data, and this can lead to problems for beginners with equality being a bit broken, however the experts have learnt ways to deal with this. Ultimately the best choice in a theorem prover will depend on what you actually are *doing* with your objects, and given that I just want to prove theorems, for me, the bare existence definition of finite-dimensional is enough. However to move from the `Prop`

world to the `Type`

world, e.g. when defining the trace of an endomorphism, one has to at some point do some magic, and the moment where this happens is the moment you picked a basis.

Hey Kevin – Interesting post. Just thinking out loud here, what if you defined the trace of T : V â†’ V over a field K by first establishing the Jordan Canonical Form Decomposition and then defining the trace in terms of that? This sounds ridiculous, of course, but at first glance it would appear to have some potential to avoid the “choose a basis” problem. If you look at the derivation of the JCF which I present in my blog post https://mattbaker.blog/2015/07/31/the-jordan-canonical-form/ there is no explicit mention of choosing a basis except at the end of the proof of Theorem 3… but I don’t know that one really needs to choose a basis to make that step work(?). Of course, there are lots of potential subtleties here and I don’t pretend to have thought about them all but I thought I’d pass this thought along before getting on another Zoom call (which I’m now 8 minutes late for). Cheers, Matt

LikeLike

kind of convinced myself that if you define “V finite-dimensional” as “there exists some natural number n such that V is isomorphic to k^n” or “V is a Noetherian k-module” or “V is an Artinian k-module” then you can never actually get your hands on the dimension without picking a basis, so you can’t even do induction on dimension in some sense. The whole thing is silly. It started off as an innocent-enough question but I think the answer to all of this depends on precisely what you mean by finite-dimensional. Lean uses type theory not set theory, which is a crazy way of unifying two totally different things: propositions and their proofs, and sets and their elements, both become types and their terms. This is kind of cool in the sense that proofs become functions, e.g. a proof of FLT is a function which takes as inputs four positive integers x,y,z,n and a proof that x^n+y^n=z^n, and outputs a proof that n<=2. But it's still the case that the proposition/proof universe is a different universe to the type/term universe, and moving from the proof world to the data world can only be done in Lean using the type theory version of the axiom of choice, which is a function which takes as input a proof that there exists something, and outputs the something. It's at that moment where you pick the basis.

LikeLike

In fact there is something more general I don’t understand here. As you might know, I built up the theory of the natural numbers from Peano’s axioms and you have to use recursion to define addition and multiplication, but then you can prove that the naturals are a semiring (ring without minus) just using induction. But when you develop the theory of equalities you need to be able to prove stuff like 0 != succ(n) for all n, and to prove this theorem you seem to need recursion, you can use recursion to define a function from naturals to {T,F} sending 0 to T and everything else to F, and deduce 0 != succ(n) from this. Similarly to prove injectivity of succ you need to use recursion to define pred: Nat -> Nat sending 0 to 37 (say) and n+1 to n, and then pred(succ(n))=n implies succ is injective. But you have to move into the data world to prove these propositions.

LikeLike

This might have come up in your Twitter discussions, but to an “algebraically inclined functional analyst”, the things that “naturally” have traces are not endomorphisms of V but elements of V tensor V*. Here’s one vague reason for my view: if V is a Banach space, then Grothendieck introduced/codified the notion of a nuclear operator from V to itself: these are precisely those that can be written as absolutely convergent sums (in the operator norm of B(V)) of rank-one operators. So you’d like to define the trace of a nuclear operator T as being \sum_i \phi_i(x_i) where T is represented as \sum_i x_i tensor \phi_i … but how does one show this is well-defined?

Answer: in general, it isn’t! The set of nuclear operators on V is the image of a certain map V hat-tensor V* –> B(V) where hat-tensor is a particular normed completion of the usual tensor product of vector spaces. (Or: hat-tensor defines the monoidal structure on Ban that makes Ban into a symmetric monoidal closed category, if I have to talk to categorists or alg-topologists.) But there exist Banach spaces V for which this comparison map is not injective, and on such spaces there is no way to get a unique series representation of a nuclear operator unless one makes some arbitrary or externally-dictated choices, or has some form of spectral theorem which amounts to “finding an eigenbasis”.

So in the context of your original question/problem, it feels to me that the key issue is that the computable bijection from V tensor V* to B(V) in finite dimensions might not have “computable inverse”.

LikeLike

I think you’re right that the inverse is not computable without making some choice. I’m sure I don’t need to tell you this but more evidence that the correct object to define a trace on is V tensor V* is the observation that you can forget topology: this object has a natural trace map and on it and its image in End(V) is precisely the finite rank operators, ie the ones which unambiguously have traces. The reason it’s “a bit weird” to write down an inverse of the map from the tensor to the End space is that the inverse only exists in the finite dimensional case so you need to assume something.

LikeLike

Of course the “ultimate nonsense” generality for all this is that endomorphisms of dualizable objects in symmetric monoidal categories have naturally associated traces. In Vect_k, the dualizable objects are just the finite-dimensional vector spaces. How do you prove this? Well, you pick a basis…

LikeLiked by 1 person

This was hashed over on some old MathOverflow answers/comments IIRC. Personally I have always found this somewhat tautologous (at least if the SMC is closed SMC) but I admit this might just be prejudice on my part, and a case of sour grapes from someone deeply invested in the bidual functor on the category of Banach spaces (or operator spaces, or Cstar algebras, or … )

LikeLike

Interesting, this didn’t go the way I expected! I was expecting the problem to be resolved by requiring the dimension to be part of a (finite dimensional) vector space’s definition data but in a lower powered way than the truncation you describe above.

For example, this could proceed by defining a (finite dimensional) vector space to be a set V with the usual operations and properties, a positive integer n, and a pair of functions: f : V^{n+1} -> R^{n+1} and g : V^{n-1} -> V witnessing that every set of size n+1 is linearly dependent and no set of size n-1 spans.

Can such f and g be useful? There’s no linearity condition. That seems like it would limit their usefulness. Thinking about what f and g we could choose, and how to add linearity into the mix, led me in an interesting direction. Suppose V=R^3. What could g be? The familiar cross product R^3 -> R^3 -> R^3 looks like a good candidate! It chooses a vector orthogonal to its two inputs. f also has some exterior algebra flavour. This observation suggests an alternative definition of (finite dimensional) vector space: a set V with the usual operations and properties, a positive integer n, and a proof that the nth exterior power is one dimensional.

This reduces the property of “n dimensional” to the property of “one dimensional”. How do we characterise “one dimensional” without choosing a basis? Perhaps that the second exterior power is zero yet there is a non-zero element. This is uncomfortably close to “choosing a basis” but I don’t think we can get away with exhibiting a non-zero element to prove a vector space is non-trivial any more than we can get away with exhibiting an element to prove a set inhabited.

Relatedly, I once asked a MathOverflow question asking what it really means for a vector space to be naturally isomorphic to its double dual: https://mathoverflow.net/questions/345136/is-a-vector-space-naturally-isomorphic-to-its-dual# Unfortunately I think I presented it in a way that was mistaken for trolling so I didn’t get a satisfactory answer. As far as I can tell there is no formalisation of what it means for that isomorphism to be canonical (or natural). I once thought that the notion of “natural isomorphism” sufficed but as explained in that question it does not.

LikeLike

Your n+1 and n-1 maps involve a lot of choices; why not just choose a map V->R^n? In fact g is enough to get a basis; feed in (0,0,0..,0) to get v1, then feed in (v1,0,0,0,..,0) to get v2 etc. Cross product doesn’t work because it sends (0,0) to 0, which is in the span of 0. Re the MO question: I think you’re wrong about V being naturally isomorphic to V^* — I don’t think that even makes sense because duality is contravariant so you can’t even write down the diagrams which you want to commute. As far as there being no formalisation of what it means for an isomorphism to be canonical, you’re dead right. Another map, perhaps slightly less “canonical” (whatever that means), but still natural, would be the isomorphism V -> V** sending v to where we usually send -v.

LikeLike

Very interesting post!

Here are a few thoughts:

What would go wrong if one defined the dimension of a vector space $V$ to be the supremum over all n such that $\bigwedge^n V \neq 0$?

Why not define the trace as the sum of the eigenvalues?

At a first superficial glance both approaches seem to be possible without choosing a basis.

LikeLike

It’s not the sum of the eigenvalues in general, for several reasons: not enough eigenvalues if field isn’t alg closed, and the phenomenon of generalised eigenvalues even if it is. You need to get to the char poly to see this and this involves picking a basis at some point

LikeLike

It would be better in describing the explicit double dual map, if you used an explicit v and .

LikeLike

In his Linear Algebra Done Right, Axler gives the definition of a trace of a linear operator on a finite dimensional complex/real vector space as the sum of eigenvalues of it/its complexification. An eigenvalue is defined without picking a basis. The multiplicity of an eigenvalue is defined as the dimension of the generalized eigenspace corresponding to the eigenvalue. Does this count as not picking a basis?

LikeLike

I like it! Of course now we have to pick an algebraic closure ðŸ˜› (I know of two — the concrete one $\C$, and the abstract splitting field of $x^2+1$ which has two square roots of -1 which you can’t tell apart). Again the issue will be that it’s impossible to define “dimension” without picking a basis. Going from the proposition “is finite-dimensional” to the data “the dimension” involves some magic.

LikeLike

Think of it this way: given a function (f : A -> B -> C) and a proof (h : forall (a : A) (b1 b2 : B), f a b1 = f a b2), define a function (g : A -> C) without invoking B. It turns out that for such construction to be soundly possible, one must assume B is inhabited/non-empty (*weasel words*), otherwise you would be able to prove true -> false from true -> false -> false, RIP consistency. Now what does inhabited/non-empty mean? Lean has two deceptively similar typeclasses “inhabited” and “nonempty”, welcome to the rabbit holeâ€¦â€¦

LikeLike

Yes exactly! `inhabited X` and `nonempty X` both say “X has an element”, and they say it in ways which are indistinguishable to a mathematician (to prove either of them, you have to show Lean an element!). But from a CS point of view, one of them promptly forgets the element and just remembers that some element definitely exists (no idea what it was though); the other one keeps the element around, but this means that the statement is no longer a “theorem” but in fact more like a “definition”…

LikeLike