Thanks to this comment, I have finally decided to try to understand in what sense Gaussian elimination and the Steinitz exchange lemma are “basically the same thing”. It’s not at all hard to spot similarities, but it seems to be a little trickier to come up with a purely mechanical process for translating proofs in one language into proofs in the other.
It might be of some interest to know how I approached this post. Rather than working everything out in advance, I started with an incomplete understanding of the connection. I had thought about it enough to convince myself that I could get to the end, but found that as I proceeded there were a few surprises, and the eventual connection was not quite as close as I was expecting. (Actually, this paragraph is slightly misleading. I am writing it while in the middle of writing the rest of the post. I’ve had a few surprises, and though I am fairly sure I’ll get to the end I am not quite sure what the end will look like. [Note added after I'd got to the end: it was nothing like what I expected.])
I am choosing as my example of a statement to be proved the one I mentioned in this comment, namely the fact that vectors in
cannot be linearly independent. Let me briefly sketch the two proofs I want to consider.
First, let us do it by Gaussian elimination. We write the vectors as rows of an matrix and then apply Gaussian elimination to those rows. We could either get the matrix into upper triangular form, which makes the last row entirely zero, and then observe that this row of zeros is, by the way we have produced it, a linear combination of the original
rows. Alternatively, we could put the matrix into reduced row-echelon form and draw the same conclusion. At this stage we can’t really be sure which is more likely to correspond to the use of the Steinitz exchange lemma. So it is probably best to stick with the general thought that there are many ways of using Gaussian elimination to get a row of zeros, and we hope that at least one such way is what the proof of the Steinitz exchange lemma will give if we examine it carefully.
And what is the Steinitz exchange lemma proof? We prove the slightly more abstract fact that if a vector space can be spanned by
vectors
, then no
vectors
in
can be linearly independent. The basic idea of the proof is to replace one of the
by
, then another one by
, and so on, all the while making sure that the set of vectors we now have is still a spanning set. Eventually, we have replaced
by
. Therefore
span
, which means that they contain
in their linear span, which contradicts the independence of the
.
Of course, it remains to prove that we really can do this exchanging process. So let’s suppose, without loss of generality, that we’ve replaced by
and are now trying to replace some other
by
. Our inductive hypothesis tells us that the vectors
span
, so we can write
in the form
. Since the
are linearly independent, there must exist
such that
. But that allows us to write
in terms of
and the
for which
and
. Therefore, if we replace
by
we still have a spanning set. This completes the proof.
Now one way to try to relate this to the previous proof is to convert it into an algorithm. It would be quite a surprise if the algorithm did not closely resemble Gaussian elimination, so once we have produced the algorithm it should be a smallish task to modify it and/or Gaussian elimination until they become the same.
A preliminary question to sort out is what the vectors are to which we will apply the algorithm. We are trying to show that row vectors in
must have some linear dependence, so we would like to replace
of them by some vectors from a spanning set. So it’s obvious what we take as our set that will turn out not to be linearly independent. But what about the spanning set? There are two ways of arriving at the right answer to this question. One is to take the simplest spanning set we can think of, namely the standard basis of
, which, if we write it as a matrix of row vectors, gives us the identity matrix. Alternatively, we could note that Gaussian elimination is a process for replacing a matrix by something closely related to the identity matrix, so this seems a good choice. In fact, looking back at the exchange-lemma proof, we see that we can ignore
in the exchanging process. It’s enough just to show that we can replace
by
. So now, if we’re assuming that the
are linearly independent, Gaussian elimination really will give us the identity matrix.
A small worry we might feel is that with these vectors the exchange lemma is actually replacing the identity matrix by the matrix of the rather than the other way round. But let’s go ahead and do it anyway and worry about that later.
So, the situation we’re in is that we have an matrix
with linearly independent rows and we would like to prove that these rows span
. Our method will be to start with the identity matrix, where we know that the rows span, and replace its rows by the rows of
, maintaining a spanning set at each stage. To harmonize with the previous notation, let’s write
for the
th row of
and
for the
th row of the identity matrix.
The first step is to find one of the that we can replace by
. Following the proof above, we begin by writing
in terms of the
. Of course, since the
are just the standard basis of
, this is not very hard: one might even say that
is already written in terms of the
.
The next step is to find some such that the coefficient of
in the expansion of
, which we called
before, is not zero. Here,
is just the
th entry of
, so all we are doing is finding a non-zero entry in the first row of the matrix
.
Having found this, we comment that this can be written in terms of
and the other
. To prove this, we write
and then rearrange by subtracting all the
apart from
from both sides and dividing through by
. We now know that if we replace
by
then we can still span
. Obviously, we can also span all the other
since they are still in the set, so the set is still a spanning set.
The problem is that merely replacing a row of the identity matrix by the first row of doesn’t seem to correspond to part of the Gaussian elimination process. And that is particularly the case as we continue the process: we replace rows one by one, but the calculations don’t do the replacement for us, so much as provide some assurance that we still have a spanning set at each stage.
In order to keep the calculations on board, so to speak, we should keep track not just of the fact that our set is a spanning set, but also of how it is a spanning set. That could mean various things, but a natural choice presents itself. We justified the replacement of by
by showing how to span
by the other
and
. This we did by writing
as a linear combination of these other vectors. So as we continue the operation, we could keep track of how each replaced
is a linear combination of the new set of vectors. If we can do this right to the end, then we will have shown how each
is a linear combination of the vectors
. Equivalently, we will have worked out the inverse of the matrix
. Since Gaussian elimination is ideal for working out inverses (as you convert
into the identity using row operations, the same row operations applied to the identity will produce
), there seems to be a good chance of seeing Gaussian elimination in action.
With this thought in mind, let’s see if we can describe what we know after the first step of applying the exchange process to the identity matrix. We obtain a new matrix, with replaced by
, and we also know what the inverse of this matrix is. Why do we know the inverse? Well, if
is any square matrix, then the
th row of
consists of coefficients that are needed to express
in terms of the rows of
. (To see this, consider the product
rather than
.) For us, if
it is obvious how to write
in terms of
. As for
, it can be calculated as follows. You start with
, then subtract
for every
, and finally divide by
.
Now a little difficulty seems to be emerging, which is that as this process continues it will be less and less obvious what to do. It was very easy to write in terms of
, but if we’ve already removed a few
s and replaced them by
then it will no longer be obvious how to write
in terms of the current spanning set. One way we might do it is to write
for the matrix corresponding to the spanning set at the
th stage of the process, and to multiply
by the inverse of that matrix before continuing with the exchanging process. In other words, at each stage we would change basis to the spanning set we currently had, and rewrite
in terms of that basis. Then, perhaps, we would find that the calculations we did were Gaussian elimination in reverse (as we hope, since we are turning the identity matrix into a complicated matrix, rather than the other way round).
That I cannot check without going away and doing it with a sheet of paper in front of me. So maybe I’ll do that and report back. For now, here are a couple of other thoughts. First, we could have considered a different exchange process in order for things to work forwards rather than backwards. That is, suppose that is a spanning set and
is an independent set. Now let us try to replace the
by
while preserving independence.
It turns out to be convenient to assume that is an independent set too. This is reasonable, since we can simply pass to a minimal spanning subset. Now we start by trying to find
that we can replace by
. To do this, we first try to write
as a linear combination of the
. If we can’t, then the set
is linearly independent, so we can remove
from it and we have done the replacement we want. If we can, then we have
, and we choose
such that
. It is now straightforward to check that the vectors
are linearly independent. At later stages of this process, one makes the additional remark that, since the
are independent, there must always be a non-zero
that is a coefficient of one of the remaining
.
This again proves the theorem, since once we have replaced of
by
we have a linearly independent set, and we also know that
span. Therefore,
. This argument did not at any point assume that the vectors belonged to
, so it also proves that you can’t have
linearly independent vectors in
(just replace
by
and
by
and get a contradiction).
But a disturbing feature of the above argument is that it is not very algorithmic. Indeed, look at the very first stage: we argue by contradiction. Either the vectors do not span
, in which case we are done, or they do, in which case we look for a non-zero coefficient. But all we are told here is that if we are not done then there must exist a linear combination of the
that gives
. We are not told a way of finding it. Indeed, to find it involves solving
simultaneous equations in
unknowns, and we are simply not bothering with that task. This is quite unlike Gaussian elimination, which very definitely tells us how to find such linear combinations.
So it begins to look as though the two proofs are significantly different, contrary to what somebody once told me (which I have taken on trust ever since). As one final attempt to bridge the gap between the two, let’s try to turn Gaussian elimination into a more theoretical argument. I’d like to argue that if is an
matrix with linearly independent rows, then these rows span
(which implies that
vectors in
cannot be linearly independent). If I can invert
then this will be proved, since I’ll then know how to produce all the vectors in the standard basis. Once again, I’ll write
for the rows of
and
for the rows of the identity matrix.
Gaussian elimination starts by finding a non-zero entry in the first column. In terms of the vectors, we are assuming that is a basis, and looking for
such that its first coefficient with respect to that basis is non-zero. (This seems to be sort of similar to what we have done before, but not identical.) Usually, for convenience, we then permute the
so that
. This is not a significant difference since we could have done that with the exchange process too. Let’s do it here to make the matrix easier to visualize. The next step is to “make all the other entries in the first column zero.” More precisely, let
be the first coefficient of the expansion of
in terms of
(which equals the
-entry of the matrix
). Then we replace
by
and for
we replace
by
. Let us call these new vectors
. It is not hard to check that they are still linearly independent, and that
belong to the linear span of
. By induction, we can do some more replacements of this kind (but involving linear combinations rather than straight exchanges) in such a way that we end up with an independent set
with
in the linear span of
for every $i$. (In matrix terms, we’ve made
upper triangular. Also, the way we did it we gave ourselves
s down the diagonal.)
Now I claim that the span
. To do this, I observe that
and prove inductively that
belongs to the span of
. Indeed, if I’ve proved it for
, then I argue as follows. I know that
belongs to the span of
. I also know that the coefficient of
is 1 and that I can write
in terms of
. Therefore,
equals
plus a linear combination of
, which proves that
is a linear combination of
.
Of course, all this is much easier to see in its original matrix form. But again, it seems not to be the same proof as the exchange lemma since it involves linear combinations.
As I have said and written many times, I think mathematicians should be more explicit about their thought processes. So although this post is a bit of a failure in one sense, it’s going up anyway. Perhaps somebody else will see something that I have missed. For the time being I could summarize my thoughts as follows.
1. If you turn Gaussian elimination into a “theoretical” argument in the most obvious way, then it doesn’t look much like an exchange-lemma proof, even if there are some superficial similarities.
2. At least one exchange-lemma argument appears to be strongly non-algorithmic.
3. The best hope of showing that the two are “essentially the same” seems to be to use the exchange lemma to convert the identity into a more complicated matrix with independent rows, but to keep changing basis as one does so, so that the calculations remain simple. It might be that the result of this process is to do calculations very similar to those of Gaussian elimination, but backwards. I hope to investigate this possibility before too long.
Tags: linear algebra
October 3, 2007 at 6:40 pm |
Dear Tim,
It seems to me that the key fact that underlies exchange lemma is in fact the adjoint of the key fact that underlies Gaussian elimination; the difficulty you are having in equating the two is thus the same difficulty one encounters when trying to identify a vector space with its dual.
To explain, let me use
to denote the real vector space indexed by a finite set A. The exchange lemma algorithm is based on iterating the following:
Lemma 1. If
is non-zero, then there exists
such that
is the direct sum of
(which is isomorphic to
) and
(i.e. these spaces are complementary).
(Proof: expand
in coordinates. There must exist i such that the
coordinate of T(1) is non-zero. The claim then easily follows for this choice of i.)
This lemma is what lets you exchange
for one of the
. (To exchange
for one of the
, one can first quotient out by the span of
and then argue as before.)
In contrast, the ability to use Gaussian elimination to reduce any matrix to row-echelon form is based on iterating the following:
Lemma 2. If
is non-zero, then there exists
such that
is the direct sum of
(which is isomorphic to
) and
.
(Proof: there must exist a basis element
such that
is non-zero. One can then express all the other
as a linear combination of
and something in the kernel of T.)
To see how row-echelon form follows from this, consider n vectors
in a standard vector space
, and let
be defined by setting
to be the first coordinate of
. If T is non-zero, then by Lemma 2 we can use Gaussian elimination to extract one row
with a non-vanishing first coordinate, and n-1 other vectors with vanishing first coordinate; if instead T is zero, then all rows already have vanishing first coordinate. Throwing away
and the first column and iterating, one eventually gets row echelon form.
It is not hard to see that Lemma 1 and Lemma 2 are essentially adjoints of each other.
This strongly suggests that the exchange lemma should in fact be related to the use of column operations to place a matrix in column-echelon form. If we let
denote the rows of an
matrix, column operations correspond to “passive” transformations which change the basis for the vector space
without affecting the vectors
themselves (in contrast, the “active” row transformations modify those vectors but leave the coordinate basis unchanged). To formulate the exchange lemma in this language, I think what one needs to do is express the r independent vectors
, together with the n spanning vectors
, as linear combinations of the n spanning vectors
, leading to a
matrix which has an
identity matrix at the bottom. If one applies adjoint Gaussian elimination to this matrix to place it in column-echelon form, I believe that one obtains all of the
rows and n-r of the
rows as pivot rows (because of the linear independence of the
), and that these pivot rows span the whole space, thus yielding the exchange lemma.
Incidentally, over a finite field F, it is trivial to show by a counting argument that a vector space V spanned by n vectors cannot contain n+1 independent vectors, since the former hypothesis easily implies
and the latter hypothesis would imply
. One can adapt this counting argument to vector spaces over the reals by various discretisation tricks (e.g. using metric entropy, or Minkowski dimension, or taking a numerical perspective and rounding off to some finite degree of accuracy). Alternatively, one can argue more algebraically (or model-theoretically) by invoking a “Lefschetz principle” or “compactness principle” to equate linear algebra assertions over the reals with linear algebra assertions over fields of sufficiently large characteristic.
It seems difficult (though perhaps not entirely impossible) to recast these arguments in algorithmic form.
October 3, 2007 at 10:09 pm |
Dear Terry,
That’s very interesting. I had vaguely wondered about similar ideas, but I was thinking more about inverses than adjoints. I still haven’t got my head completely round what you’ve written, but will at some stage try to convert it into a very explicit demonstration of how the exchange lemma can yield an algorithm that is more or less the same as Gaussian elimination (but probably applied to columns rather than rows).
The last thing you mentioned reminds me of a fascinating fact that I learned as a result of editing an article on model theory for the Princeton Companion. It’s the most gorgeous proof (which, from what you write, you probably know of) of the following fact: if
is an injective polynomial map from
to
then it must be surjective. The rough idea of the proof is the same as what you say above. By model-theoretic considerations one can show that the statement is true in
if and only if it is true in a finite field
of sufficiently high characteristic. But there the statement is trivial. It’s a great example of a non-obvious statement that doesn’t seem to have anything to do with logic, but which is proved by logical means. (The proof was discovered by Ax, apparently. For more details, see the PCM when it comes out.)
Another thing that’s interesting about your remark is that it is connected with a question that I like and that I intend to pursue more on this blog, namely what it means for two proofs to be “essentially the same.” It’s an intriguing phenomenon, because it so often happens that two rather different-looking proofs eventually turn out to be less different than they at first appear, as you make very clear in your article, “What is good mathematics?” Given that, it’s all the more interesting to find two proofs that really do seem to be genuinely different. And the model-theoretic argument that
vectors in
cannot be independent appears to fall into that category (which is not to say that it wouldn’t be worth trying to “expand it out,” just to check).
October 4, 2007 at 4:44 am |
Here are some blog posts related to “homotopies between proofs”. Maybe you’ll find them interesting. The idea of somehow using ideas from topology to study proofs doesn’t seem all that crazy …
Week 227 of This Week’s Finds
Neighborhood of Infinity
Orange Juice Files
October 4, 2007 at 10:12 am |
[...] is, methods for transforming a proof into another that is not interestingly different. See this comment for some interesting links, though here I am not so much looking for a formal theory right down at [...]
October 4, 2007 at 2:41 pm |
I’m still trying to parse everything here, but since it was my comment I wanted to thank you for posting this!
October 4, 2007 at 2:52 pm |
[...] The exchange lemma and Gaussian elimination Thanks to this comment, I have finally decided to try to understand in what sense Gaussian elimination and the Steinitz […] [...]
October 4, 2007 at 10:33 pm |
Easier than using Steinitz seems to be to do things matricially: use row and colummn operations (rather than just row operations) to show easily that any matrix is PAQ equivalent (P and Q being square invertible matrices) to canonical form (having least number of nonzero entries). To prove facts like linear dependence of the columns of any matrix which has less rows than columns, first check the statement is invariant by PAQ equivalence (P and Q being square invertible matrices) and then that it´s obviously true for canonical form. From this the corresponding vectors property (i.e. the linear dependence of n+1 vectors, each of which is a linear combination of n vectors) is easily deduced.
The interest of Steinitz seems to be that it pops up in other contexts…for example, in the proof that Kruskal´s algorithm works (i.e. provides a minimal spanning tree for a weighted graph).
October 5, 2007 at 8:08 pm |
1. This is all fascinating and quite above me (at least for the moment!)
2. Terry, my linear algebra is growing by leaps and bounds, and I am seeing why your comment “As regards the identification between matrices and linear transformations … it’s important to understand both perspectives, and how to swap back and forth.” makes much more sense than my first naive impression.
October 12, 2007 at 12:22 pm |
This is what mathematics ought to be, not the sterile freeze-dried stuff you see in published papers. Mathematics is about intuition, surprise, and delight: why else do we do it?
I’m having a go at conveying these things in writing about a relatively infantile problem in number theory. It’s a challenge – I keep on slipping into mathematical-paper mode – but rewarding when it works.
October 30, 2007 at 4:09 pm |
Dear Timothy,
I have a comment/question concerning the history of the subject. In your blog you speak of the “Steinitz” exchange theorem. Terence Tao uses the same name in his blog, so it seems that this is the common name for this result. Do you know how this name got attached to this result? Who used this name first?
The result of Ernst Steinitz (1871-1928) was published 1913 in Crelles Journal. There he writes that (rough translation follows) “the basics of n-dimensional geometry could have been assumed as well known, but I prefer to rederive them here.” Apparently, the result was already known to Steinitz. Indeed, the result — in almost the exact form as in your blog — already appears in Hermann Grassmann’s Ausdehnungslehre of 1862, a book that was completed in the summer of 1861, i.e. 10 years before Steinitz was born. Steinitz does not cite Grassmann, which is fortunate for him, as now the result is known under his name.
October 30, 2007 at 6:30 pm |
I’m afraid I just tend to use the standard names for things and don’t look into the history. But it’s always interesting to hear from people who do know about it. To make up (partially at any rate) for my laziness in this respect, when I give lectures I tell people that a good rule of thumb is that if somebody’s name is attached to a result (at least if the result is sufficiently simple) then they probably weren’t the first person to prove it. I’ve always wondered how Vandermonde’s theorem came to be known as that: if only all theorems were as easy to obtain …
October 30, 2007 at 10:03 pm |
Regarding names, I’ve always found it curious that almost every mention of Burnside’s lemma seems to be accompanied by the comment that it wasn’t first proved by Burnside. Given that the same is true of most results with a person’s name attached, I don’t understand why Burnside is singled out for such abuse.