Cauchy’s theorem is the assertion that the path integral of a complex-differentiable function around a closed curve is zero (as long as there aren’t any holes inside the curve where the function has singularities or isn’t defined). This theorem, which is fundamental to complex analysis, can be vastly generalized and seen from many different points of view. This post is about a little idea that occurred to me once when I was teaching complex analysis. I meant to include it as a Mathematical Discussion on my web page but didn’t get round to it. Now, while the novelty of having a blog still hasn’t worn off, I find I have the energy to put it here.
Let’s start with a simpler fact: that if is a function from to and the derivative of is everywhere zero, then is constant. What is the natural generalization of this fact to functions defined on the plane?
The most obvious generalization, which is not the same thing as we shall see, is just to interpret the individual words of the sentence in the new context. Given a function from to , we know what it means for the derivative of this function to be zero everywhere, at least if we’ve seen a bit of multidimensional calculus. For those who haven’t, it has a very similar meaning to the meaning for functions from to : for each in , the quantity tends to zero as tends to zero, where is any sensible measure of the size of . (Note that is a vector in . Perhaps the most common definition of is its distance from the origin, as calculated by Pythagoras’s theorem.)
It is fairly easy to prove that if has zero derivative everywhere then it is constant. To do so, one chooses two points and in and defines an auxiliary function . The idea is that goes in a path (in fact, a straight line) from to as goes from 0 to 1. It is then fairly easy to show that has derivative 0 everywhere, so it is constant, by the one-dimensional theorem. Therefore, since and , we must have . Finally, as and were arbitrary, we find that is constant.
Now I would like to claim that that generalization is not a natural one. Or rather, since there clearly is a sense in which it is natural, I want to argue that although it is psychologically natural, it is not mathematically natural. (At this point I had hoped to give a link to Gromov’s article called Spaces and Questions, which contains a discussion of the word “natural” means in connection with a conjecture. He feels very strongly that a natural conjecture is not just a conjecture that it is natural to think of. For him, a conjecture such as the Riemann hypothesis is a supreme example of what a natural conjecture should be like: not the sort of thing that would idly occur to you, and only gradually revealing how very fundamental it is. Unfortunately, I can’t find the article online.)
What reasons might there be to be dissatisfied with the generalization I have just talked about? I took the obvious two-dimensional generalizations of the concepts involved, and, lo and behold, the result was still true. Is that not an ideal outcome?
At this point I must confess to finding it hard to make precise what it is that I object to, but here is an attempt anyway. To begin with, the proof is ugly and non-canonical. It is never pleasant to define a parametrized line segment between two points (though it is often useful of course), especially when any old path would do. It is difficult to regard a proof as fully natural if it involves a choice quite as arbitrary as that one, and yet the statement seems to demand a choice of that kind. A related objection is that the proof shows that the two-dimensional nature of the statement is a bit of an illusion: in order to prove it, we just pick out a one-dimensional part of the function and prove it for that. (This is related to the disappointment that many mathematicians must have felt when first shown the “mean-value inequality” for multidimensional functions. It’s ugly and just doesn’t seem a worthy generalization of the mean-value theorem.)
What might a “genuinely two-dimensional” generalization of the simple fact about zero derivatives look like? To answer this question, let us give a rather loose description of the one-dimensional result and see if it helps. To say that a function has zero derivative everywhere is to say, very roughly, that is much smaller than when is small. (Incidentally, why am I doing this? The reason is rather similar to the reason that I got rid of the square root round in my post about cubics. In both cases I wanted to generalize something that was a bit too complicated for it to be obvious how all the various parts should be modified. So I needed to find a simpler formulation from which I could derive the more complicated one. The technique works if you manage to generalize the simpler formulation and can carry out an analogous derivation of the complicated formulation in the new context. The difference here is that my method of simplification is to make the idea to be generalized less precise, but that is not a profound difference, because I know how to make it precise — that’s the “derivation” in this case.) If that condition holds, then is constant. And “ is constant” means that, for every and every , is actually equal to . To summarize, the rough one-dimensional statement is that if is very small when is small, then is in fact equal to zero everywhere.
Now let’s ask what the equivalent is, in two dimensions, of picking two points and in and calculating the difference . We’ve tried picking two points in , but now, with our rough formulation, it is clear that that was not our only option. We have increased the dimension we are working in from 1 to 2. When we picked two points, we were picking zero-dimensional objects. So what integer relates to 2 in the way that 0 relates to 1? There are two obvious candidates: 0 and 1. We’ve tried 0 and didn’t like the result, so let’s try 1.
What this line of thought is telling us is that we should generalize the statements about points in to statements about 1-dimensional subsets of . But what subsets?
Let’s give ourselves a bit more information. When we calculate , we are doing two things. First we evaluate at and and then we subtract the first from the second. So we are giving a different status to and , associating a minus sign with the first and a plus sign to the second. That is, we have not just two points but two signed points, one positive and one negative.
There are many natural ways of generalizing this notion to a one-dimensional subset of . But they all tend to start with the thought that there is a one-to-one correspondence between such pairs and directed line segments: with the pair (negative) and (positive) one associates the line segment from to . It is natural to think of the points and as forming the boundary of this line segment, and of the signs associated with and as a sort of “orientation” associated with the “orientation” (or direction) of the segment. Perhaps it isn’t that natural if you’ve only ever lived in a one-dimensional world, but if you are already used to “generalizing backwards” (for example, saying things like that the next term in the sequence “unit sphere”, “unit circle”, … is “the set “), then the notion of a sign being a very primitive orientation should not be all that strange.
If we pursue this line of thought, bearing in mind that we want to increase the dimension of everything by 1, then we will find ourselves looking for an oriented one-dimensional set that arises as the boundary of an oriented two-dimensional set. This suggests many different possibilities, depending on whether we want to be specific or general. For instance, we could think about a familiar sequence of shapes, such as circle, sphere, 3-sphere, … or square, cube, hypercube,… in which case we would choose as our generalization something like a directed circle or square. Alternatively, since these shapes seem equally good, we could just go for any old directed closed curve.
Let’s leave that question to one side, and see where we have got to in our attempt to find a better generalization of the fact that functions with zero derivative are constant. For now I’ll use the word “curve” to mean “directed closed curve, either specific or general”. Before we worry about which curves to take, it would be a good idea to decide how we are going to generalize the concept from pairs of signed points to curves. That is, given a function and a curve $C$, what meaning can we attach to ?
To answer this, recall that we thought of the signs at and as something like orientations or directions. If we had wanted to be more formal, we could have associated the number -1 with and the number +1 with and said that we were considering the sum . And, given that we are thinking about directions and hoping to generalize to two dimensions, it doesn’t seem such a bad idea to think of -1 and 1 as vectors, in the one-dimensional space .
Now let’s return to the two-dimensional set up. What would be a natural way of associating a vector to each point of a directed curve? The obvious idea is to take a tangent vector in the direction of the curve. But what should its magnitude be? The most obvious idea is to make the magnitude 1 everywhere, but somehow this feels a bit arbitrary (though in fact it turns out to be OK). A different approach is to define a curve to be a function from to such that . This is a very standard move when one is dealing with paths, and it then allows us to define the tangent vector at to be the derivative .
So let’s go for that: a curve means a function of the kind above, and associated with each point in the curve is its tangent vector (so it might be an idea to insist that is differentiable).
Now let us return to the problem of defining . In the one-dimensional case we took a weighted sum of values of , with the weights given by the vectors at the points and . We can’t take a sum over all the points of the curve, but we could take an integral. The weights are the values , so the suggestion seems to be that we calculate the integral . That is, we integrate the values of round the curve, weighting each one with the tangent vector at the corresponding point.
How interesting: the correct generalization of is a path integral! It’s not the same as the path integral in Cauchy’s theorem, since is real-valued. (We could have made it complex-valued, but that would have been a rabbit out of a hat, so it is not allowed. It will of course be allowed if the idea arises of its own accord.)
Now let’s think about how to generalize the idea of having derivative zero everywhere in . The answer seems obvious—surely it is having derivative zero everywhere in . But to think that is to forget our rough restatement from earlier on. We described a function with zero derivative as one where was much smaller than whenever was small. We’ve gone to a lot of trouble to generalize the notion of . It has become the path integral of around a curve . Since is what generalizes the notion of the pair of signed points and , the obvious generalization of the rough one-dimensional definition of having a zero derivative everywhere is this: if is small then the path integral of around is much smaller.
In the one-dimensional case, we called a pair small if was small. What would be the appropriate definition of smallness for ? We’d probably like something geometrical, since is more naturally thought of as a geometrical object than . So we could interpret as the diameter of the pair and define to be small if it has small diameter in . If this turns out to be a bad definition later on, we can always modify it.
To keep the notation concise, let’s write , as we sort of did earlier, for the path integral of around (which is itself a parametrized closed curve given by a function , as discussed above). Now we need to think about when it is appropriate to say that is much smaller than the diameter of .
Again we can take our inspiration from the one-dimensional case. There the typical behaviour of a differentiable function is that should have order of magnitude : it has smaller order of magnitude only in the fortunate situation where the derivative is zero, and indeed that is what it means to have derivative zero. So in the two-dimensional case we should think about what the order of magnitude of is likely to be. Now our way of measuring the size of starts to look a little unsatisfactory: better to choose the length of . This can be defined satisfactorily as the integral over of the magnitude of , or if you want to keep life simple you can decide at this point to look just at circles or squares or something. The trivial upper bound for the size of is the maximum absolute value of multiplied by the length of . If is shrinking to zero in the neighbourhood of some point and is continuous, then after a while we can think of as roughly constant (since we are restricted to a small neighbourhood of ), so this has order of magnitude the length of . So it looks like a good idea to generalize the zero-derivative property as follows: if has small length then the path integral is significantly smaller.
However, we have been a little careless. To see why, let’s imagine that is a little square, centred at , and that the function takes us anticlockwise once round this square at constant speed. Then the tangent at each point of is a small vector pointing in the anticlockwise direction at each point. The path integral is then the sum of four parts. Since is small and is continuous (this we are assuming the whole time), is approximately equal to everywhere on . But then the four contributions to the path integral are all of roughly the same magnitude, and they point north, west, south and east. But that means that they more or less cancel: in particular, the path integral is a lot smaller than the length of .
If we now take a bit more trouble and work out the path integral for a typical differentiable function (just as we did in the one-dimensional case), which we may as well assume is linear (since it is, roughly), then we find that the typical magnitude of is proportional to the square of the length of and not the length itself. Roughly, the reason is this: one power comes from the fact that the tangent vectors have size proportional to the length of and the other comes from the fact that the amount can change from one side of the square to the other (thereby defeating the cancellation) is proportional to the length of as well. So now let us correct our generalized property to the following: if the length of is small, then the path integral is much smaller than the square of this length.
What does this property tell us about the function ? Well, continuing with the little squares, let’s take one with corners , , and . Let us continue to assume that is linear, and given by the formula . Then the path integral from to is the average value of on this segment, multiplied by the derivative of . If goes round the square at constant speed then the latter turns out to be . So this part of the path integral works out at . Doing similar calculations for all four parts and adding everything together, the only thing that doesn’t cancel works out at (if I haven’t made a mistake).
When will this be small compared with the square of the perimeter of the square, or equivalently, small compared with ? It seems that we need both and to be zero. But the purpose of and was to give us the best linear approximation to near . So what we seem to have ended up with is the condition that the derivative of should be zero.
This is embarrassing: we are back to the same condition as before, but now the conclusion we draw from it is weaker (that path integrals round closed curves are zero, which follows easily from being constant while the converse is far from true).
We could just give up at this point, but not if we have sensible mathematical instincts, one of which is, “Don’t give up until you have gone round and round in circles at least 100 times.” We haven’t even done so once. Here is one way (not the only one) of rescuing ourselves. There are a couple of places earlier where we made choices that were not forced on us. One was when we generalized to a function defined on . We decided to keep as a real-valued function, but it would have been just as natural (in the non-Gromov sense) to make it take values in . So perhaps we should try that.
An immediate discouraging thought is that functions to can often be thought of merely as pairs of functions to . Let’s keep that in mind as a challenge: at some point, if this is going to help, we must do something that isn’t just considering two functions at once. A second place where we made an arbitrary choice was when we defined the path integral. We decided to use the forward tangent vector, but it would have been equally natural, as far as we could tell at the time, to use the outward normal.
With those thoughts in mind, let us generalize the notion of path integral from -valued functions to -valued functions. If we want to keep the definition as close as possible to what we had before, then the main thing we have to decide is how to “multiply” by , given that both objects now belong to . At this point we can either use our prior knowledge that by far the best product on is obtained by identifying it with , or we can think of as two real-valued functions and remark (influenced by the thoughts of the previous paragraph) that we should do different things to the two components. So why not multiply one by the tangent and the other by the normal? That gives you multiplication of complex numbers (up to the symmetry between and ).
So now everything is as before except that the function is complex-valued and the product in the path integral is a product of complex numbers.
At this point I don’t want to say much more. If you go back to the estimate of the path integral for linear functions (regarding them as functions from to ) you now find that the only functions that work are ones of the form . This tells you that satisfies the Cauchy-Riemann equations. In other words, the condition that is much smaller than the square of the length of is telling us that is analytic. And therefore, after a long journey, we end up with the main assertion of this post: that Cauchy’s theorem is the natural two-dimensional generalization of the statement that a function from to with zero derivative is constant.
One possible objection to the above is that I am assuming a slightly stronger hypothesis in the two-dimensional case: not just that is much smaller than the square of the length of , but also that is differentiable as a function from to . I haven’t looked into whether the first implies the second.
Another idea that I have not explored is that this account could probably be extended until it ended up with a statement of Stokes’s theorem.
A final remark, which I may perhaps amplify on in a future post, is that it is possible to prove the one-dimensional statement in a way that generalizes very naturally to the standard proof of Cauchy’s theorem for triangles. But this post is now quite long enough.