Thanks to this comment, I have finally decided to try to understand in what sense Gaussian elimination and the Steinitz exchange lemma are “basically the same thing”. It’s not at all hard to spot similarities, but it seems to be a little trickier to come up with a purely mechanical process for translating proofs in one language into proofs in the other.
It might be of some interest to know how I approached this post. Rather than working everything out in advance, I started with an incomplete understanding of the connection. I had thought about it enough to convince myself that I could get to the end, but found that as I proceeded there were a few surprises, and the eventual connection was not quite as close as I was expecting. (Actually, this paragraph is slightly misleading. I am writing it while in the middle of writing the rest of the post. I’ve had a few surprises, and though I am fairly sure I’ll get to the end I am not quite sure what the end will look like. [Note added after I'd got to the end: it was nothing like what I expected.])
I am choosing as my example of a statement to be proved the one I mentioned in this comment, namely the fact that vectors in cannot be linearly independent. Let me briefly sketch the two proofs I want to consider.
First, let us do it by Gaussian elimination. We write the vectors as rows of an matrix and then apply Gaussian elimination to those rows. We could either get the matrix into upper triangular form, which makes the last row entirely zero, and then observe that this row of zeros is, by the way we have produced it, a linear combination of the original rows. Alternatively, we could put the matrix into reduced row-echelon form and draw the same conclusion. At this stage we can’t really be sure which is more likely to correspond to the use of the Steinitz exchange lemma. So it is probably best to stick with the general thought that there are many ways of using Gaussian elimination to get a row of zeros, and we hope that at least one such way is what the proof of the Steinitz exchange lemma will give if we examine it carefully.
And what is the Steinitz exchange lemma proof? We prove the slightly more abstract fact that if a vector space can be spanned by vectors , then no vectors in can be linearly independent. The basic idea of the proof is to replace one of the by , then another one by , and so on, all the while making sure that the set of vectors we now have is still a spanning set. Eventually, we have replaced by . Therefore span , which means that they contain in their linear span, which contradicts the independence of the .
Of course, it remains to prove that we really can do this exchanging process. So let’s suppose, without loss of generality, that we’ve replaced by and are now trying to replace some other by . Our inductive hypothesis tells us that the vectors span , so we can write in the form . Since the are linearly independent, there must exist such that . But that allows us to write in terms of and the for which and . Therefore, if we replace by we still have a spanning set. This completes the proof.
Now one way to try to relate this to the previous proof is to convert it into an algorithm. It would be quite a surprise if the algorithm did not closely resemble Gaussian elimination, so once we have produced the algorithm it should be a smallish task to modify it and/or Gaussian elimination until they become the same.
A preliminary question to sort out is what the vectors are to which we will apply the algorithm. We are trying to show that row vectors in must have some linear dependence, so we would like to replace of them by some vectors from a spanning set. So it’s obvious what we take as our set that will turn out not to be linearly independent. But what about the spanning set? There are two ways of arriving at the right answer to this question. One is to take the simplest spanning set we can think of, namely the standard basis of , which, if we write it as a matrix of row vectors, gives us the identity matrix. Alternatively, we could note that Gaussian elimination is a process for replacing a matrix by something closely related to the identity matrix, so this seems a good choice. In fact, looking back at the exchange-lemma proof, we see that we can ignore in the exchanging process. It’s enough just to show that we can replace by . So now, if we’re assuming that the are linearly independent, Gaussian elimination really will give us the identity matrix.
A small worry we might feel is that with these vectors the exchange lemma is actually replacing the identity matrix by the matrix of the rather than the other way round. But let’s go ahead and do it anyway and worry about that later.
So, the situation we’re in is that we have an matrix with linearly independent rows and we would like to prove that these rows span . Our method will be to start with the identity matrix, where we know that the rows span, and replace its rows by the rows of , maintaining a spanning set at each stage. To harmonize with the previous notation, let’s write for the th row of and for the th row of the identity matrix.
The first step is to find one of the that we can replace by . Following the proof above, we begin by writing in terms of the . Of course, since the are just the standard basis of , this is not very hard: one might even say that is already written in terms of the .
The next step is to find some such that the coefficient of in the expansion of , which we called before, is not zero. Here, is just the th entry of , so all we are doing is finding a non-zero entry in the first row of the matrix .
Having found this, we comment that this can be written in terms of and the other . To prove this, we write and then rearrange by subtracting all the apart from from both sides and dividing through by . We now know that if we replace by then we can still span . Obviously, we can also span all the other since they are still in the set, so the set is still a spanning set.
The problem is that merely replacing a row of the identity matrix by the first row of doesn’t seem to correspond to part of the Gaussian elimination process. And that is particularly the case as we continue the process: we replace rows one by one, but the calculations don’t do the replacement for us, so much as provide some assurance that we still have a spanning set at each stage.
In order to keep the calculations on board, so to speak, we should keep track not just of the fact that our set is a spanning set, but also of how it is a spanning set. That could mean various things, but a natural choice presents itself. We justified the replacement of by by showing how to span by the other and . This we did by writing as a linear combination of these other vectors. So as we continue the operation, we could keep track of how each replaced is a linear combination of the new set of vectors. If we can do this right to the end, then we will have shown how each is a linear combination of the vectors . Equivalently, we will have worked out the inverse of the matrix . Since Gaussian elimination is ideal for working out inverses (as you convert into the identity using row operations, the same row operations applied to the identity will produce ), there seems to be a good chance of seeing Gaussian elimination in action.
With this thought in mind, let’s see if we can describe what we know after the first step of applying the exchange process to the identity matrix. We obtain a new matrix, with replaced by , and we also know what the inverse of this matrix is. Why do we know the inverse? Well, if is any square matrix, then the th row of consists of coefficients that are needed to express in terms of the rows of . (To see this, consider the product rather than .) For us, if it is obvious how to write in terms of . As for , it can be calculated as follows. You start with , then subtract for every , and finally divide by .
Now a little difficulty seems to be emerging, which is that as this process continues it will be less and less obvious what to do. It was very easy to write in terms of , but if we’ve already removed a few s and replaced them by then it will no longer be obvious how to write in terms of the current spanning set. One way we might do it is to write for the matrix corresponding to the spanning set at the th stage of the process, and to multiply by the inverse of that matrix before continuing with the exchanging process. In other words, at each stage we would change basis to the spanning set we currently had, and rewrite in terms of that basis. Then, perhaps, we would find that the calculations we did were Gaussian elimination in reverse (as we hope, since we are turning the identity matrix into a complicated matrix, rather than the other way round).
That I cannot check without going away and doing it with a sheet of paper in front of me. So maybe I’ll do that and report back. For now, here are a couple of other thoughts. First, we could have considered a different exchange process in order for things to work forwards rather than backwards. That is, suppose that is a spanning set and is an independent set. Now let us try to replace the by while preserving independence.
It turns out to be convenient to assume that is an independent set too. This is reasonable, since we can simply pass to a minimal spanning subset. Now we start by trying to find that we can replace by . To do this, we first try to write as a linear combination of the . If we can’t, then the set is linearly independent, so we can remove from it and we have done the replacement we want. If we can, then we have , and we choose such that . It is now straightforward to check that the vectors are linearly independent. At later stages of this process, one makes the additional remark that, since the are independent, there must always be a non-zero that is a coefficient of one of the remaining .
This again proves the theorem, since once we have replaced of by we have a linearly independent set, and we also know that span. Therefore, . This argument did not at any point assume that the vectors belonged to , so it also proves that you can’t have linearly independent vectors in (just replace by and by and get a contradiction).
But a disturbing feature of the above argument is that it is not very algorithmic. Indeed, look at the very first stage: we argue by contradiction. Either the vectors do not span , in which case we are done, or they do, in which case we look for a non-zero coefficient. But all we are told here is that if we are not done then there must exist a linear combination of the that gives . We are not told a way of finding it. Indeed, to find it involves solving simultaneous equations in unknowns, and we are simply not bothering with that task. This is quite unlike Gaussian elimination, which very definitely tells us how to find such linear combinations.
So it begins to look as though the two proofs are significantly different, contrary to what somebody once told me (which I have taken on trust ever since). As one final attempt to bridge the gap between the two, let’s try to turn Gaussian elimination into a more theoretical argument. I’d like to argue that if is an matrix with linearly independent rows, then these rows span (which implies that vectors in cannot be linearly independent). If I can invert then this will be proved, since I’ll then know how to produce all the vectors in the standard basis. Once again, I’ll write for the rows of and for the rows of the identity matrix.
Gaussian elimination starts by finding a non-zero entry in the first column. In terms of the vectors, we are assuming that is a basis, and looking for such that its first coefficient with respect to that basis is non-zero. (This seems to be sort of similar to what we have done before, but not identical.) Usually, for convenience, we then permute the so that . This is not a significant difference since we could have done that with the exchange process too. Let’s do it here to make the matrix easier to visualize. The next step is to “make all the other entries in the first column zero.” More precisely, let be the first coefficient of the expansion of in terms of (which equals the -entry of the matrix ). Then we replace by and for we replace by . Let us call these new vectors . It is not hard to check that they are still linearly independent, and that belong to the linear span of . By induction, we can do some more replacements of this kind (but involving linear combinations rather than straight exchanges) in such a way that we end up with an independent set with in the linear span of for every $i$. (In matrix terms, we’ve made upper triangular. Also, the way we did it we gave ourselves s down the diagonal.)
Now I claim that the span . To do this, I observe that and prove inductively that belongs to the span of . Indeed, if I’ve proved it for , then I argue as follows. I know that belongs to the span of . I also know that the coefficient of is 1 and that I can write in terms of . Therefore, equals plus a linear combination of , which proves that is a linear combination of .
Of course, all this is much easier to see in its original matrix form. But again, it seems not to be the same proof as the exchange lemma since it involves linear combinations.
As I have said and written many times, I think mathematicians should be more explicit about their thought processes. So although this post is a bit of a failure in one sense, it’s going up anyway. Perhaps somebody else will see something that I have missed. For the time being I could summarize my thoughts as follows.
1. If you turn Gaussian elimination into a “theoretical” argument in the most obvious way, then it doesn’t look much like an exchange-lemma proof, even if there are some superficial similarities.
2. At least one exchange-lemma argument appears to be strongly non-algorithmic.
3. The best hope of showing that the two are “essentially the same” seems to be to use the exchange lemma to convert the identity into a more complicated matrix with independent rows, but to keep changing basis as one does so, so that the calculations remain simple. It might be that the result of this process is to do calculations very similar to those of Gaussian elimination, but backwards. I hope to investigate this possibility before too long.
Tags: linear algebra