Cauchy’s theorem is the assertion that the path integral of a complex-differentiable function around a closed curve is zero (as long as there aren’t any holes inside the curve where the function has singularities or isn’t defined). This theorem, which is fundamental to complex analysis, can be vastly generalized and seen from many different points of view. This post is about a little idea that occurred to me once when I was teaching complex analysis. I meant to include it as a Mathematical Discussion on my web page but didn’t get round to it. Now, while the novelty of having a blog still hasn’t worn off, I find I have the energy to put it here.

Let’s start with a simpler fact: that if is a function from to and the derivative of is everywhere zero, then is constant. What is the natural generalization of this fact to functions defined on the plane?

The most *obvious* generalization, which is not the same thing as we shall see, is just to interpret the individual words of the sentence in the new context. Given a function from to , we know what it means for the derivative of this function to be zero everywhere, at least if we’ve seen a bit of multidimensional calculus. For those who haven’t, it has a very similar meaning to the meaning for functions from to : for each in , the quantity tends to zero as tends to zero, where is any sensible measure of the size of . (Note that is a vector in . Perhaps the most common definition of is its distance from the origin, as calculated by Pythagoras’s theorem.)

It is fairly easy to prove that if has zero derivative everywhere then it is constant. To do so, one chooses two points and in and defines an auxiliary function . The idea is that goes in a path (in fact, a straight line) from to as goes from 0 to 1. It is then fairly easy to show that has derivative 0 everywhere, so it is constant, by the one-dimensional theorem. Therefore, since and , we must have . Finally, as and were arbitrary, we find that is constant.

Now I would like to claim that that generalization is *not* a natural one. Or rather, since there clearly is a sense in which it is natural, I want to argue that although it is *psychologically* natural, it is not *mathematically* natural. (At this point I had hoped to give a link to Gromov’s article called Spaces and Questions, which contains a discussion of the word “natural” means in connection with a conjecture. He feels very strongly that a natural conjecture is *not* just a conjecture that it is natural to think of. For him, a conjecture such as the Riemann hypothesis is a supreme example of what a natural conjecture should be like: not the sort of thing that would idly occur to you, and only gradually revealing how very fundamental it is. Unfortunately, I can’t find the article online.)

What reasons might there be to be dissatisfied with the generalization I have just talked about? I took the obvious two-dimensional generalizations of the concepts involved, and, lo and behold, the result was still true. Is that not an ideal outcome?

At this point I must confess to finding it hard to make precise what it is that I object to, but here is an attempt anyway. To begin with, the proof is ugly and non-canonical. It is never pleasant to define a parametrized line segment between two points (though it is often useful of course), especially when any old path would do. It is difficult to regard a proof as fully natural if it involves a choice quite as arbitrary as that one, and yet the statement seems to demand a choice of that kind. A related objection is that the proof shows that the two-dimensional nature of the statement is a bit of an illusion: in order to prove it, we just pick out a one-dimensional part of the function and prove it for that. (This is related to the disappointment that many mathematicians must have felt when first shown the “mean-value inequality” for multidimensional functions. It’s ugly and just doesn’t seem a worthy generalization of the mean-value theorem.)

What might a “genuinely two-dimensional” generalization of the simple fact about zero derivatives look like? To answer this question, let us give a rather loose description of the one-dimensional result and see if it helps. To say that a function has zero derivative everywhere is to say, very roughly, that is much smaller than when is small. (Incidentally, why am I doing this? The reason is rather similar to the reason that I got rid of the square root round in my post about cubics. In both cases I wanted to generalize something that was a bit too complicated for it to be obvious how all the various parts should be modified. So I needed to find a simpler formulation from which I could derive the more complicated one. The technique works if you manage to generalize the simpler formulation and can carry out an analogous derivation of the complicated formulation in the new context. The difference here is that my method of simplification is to make the idea to be generalized less precise, but that is not a profound difference, because I know how to make it precise — that’s the “derivation” in this case.) If that condition holds, then is constant. And “ is constant” means that, for every and every , is actually *equal* to . To summarize, the rough one-dimensional statement is that if is very small when is small, then is in fact equal to zero everywhere.

Now let’s ask what the equivalent is, in two dimensions, of picking two points and in and calculating the difference . We’ve tried picking two points in , but now, with our rough formulation, it is clear that that was not our only option. We have increased the dimension we are working in from 1 to 2. When we picked two points, we were picking zero-dimensional objects. So what integer relates to 2 in the way that 0 relates to 1? There are two obvious candidates: 0 and 1. We’ve tried 0 and didn’t like the result, so let’s try 1.

What this line of thought is telling us is that we should generalize the statements about points in to statements about 1-dimensional subsets of . But what subsets?

Let’s give ourselves a bit more information. When we calculate , we are doing two things. First we evaluate at and and then we subtract the first from the second. So we are giving a different status to and , associating a minus sign with the first and a plus sign to the second. That is, we have not just two points but two *signed* points, one positive and one negative.

There are many natural ways of generalizing this notion to a one-dimensional subset of . But they all tend to start with the thought that there is a one-to-one correspondence between such pairs and directed line segments: with the pair (negative) and (positive) one associates the line segment from to . It is natural to think of the points and as forming the *boundary* of this line segment, and of the signs associated with and as a sort of “orientation” associated with the “orientation” (or direction) of the segment. Perhaps it isn’t that natural if you’ve only ever lived in a one-dimensional world, but if you are already used to “generalizing backwards” (for example, saying things like that the next term in the sequence “unit sphere”, “unit circle”, … is “the set “), then the notion of a sign being a very primitive orientation should not be all that strange.

If we pursue this line of thought, bearing in mind that we want to increase the dimension of everything by 1, then we will find ourselves looking for an *oriented* one-dimensional set that arises as the boundary of an oriented two-dimensional set. This suggests many different possibilities, depending on whether we want to be specific or general. For instance, we could think about a familiar sequence of shapes, such as circle, sphere, 3-sphere, … or square, cube, hypercube,… in which case we would choose as our generalization something like a directed circle or square. Alternatively, since these shapes seem equally good, we could just go for any old directed closed curve.

Let’s leave that question to one side, and see where we have got to in our attempt to find a better generalization of the fact that functions with zero derivative are constant. For now I’ll use the word “curve” to mean “directed closed curve, either specific or general”. Before we worry about which curves to take, it would be a good idea to decide how we are going to generalize the concept from pairs of signed points to curves. That is, given a function and a curve $C$, what meaning can we attach to ?

To answer this, recall that we thought of the signs at and as something like orientations or directions. If we had wanted to be more formal, we could have associated the number -1 with and the number +1 with and said that we were considering the sum . And, given that we are thinking about directions and hoping to generalize to two dimensions, it doesn’t seem such a bad idea to think of -1 and 1 as *vectors*, in the one-dimensional space .

Now let’s return to the two-dimensional set up. What would be a natural way of associating a vector to each point of a directed curve? The obvious idea is to take a tangent vector in the direction of the curve. But what should its magnitude be? The most obvious idea is to make the magnitude 1 everywhere, but somehow this feels a bit arbitrary (though in fact it turns out to be OK). A different approach is to define a curve to be a function from to such that . This is a very standard move when one is dealing with paths, and it then allows us to define the tangent vector at to be the derivative .

So let’s go for that: a curve means a function of the kind above, and associated with each point in the curve is its tangent vector (so it might be an idea to insist that is differentiable).

Now let us return to the problem of defining . In the one-dimensional case we took a weighted sum of values of , with the weights given by the vectors at the points and . We can’t take a sum over all the points of the curve, but we could take an *integral*. The weights are the values , so the suggestion seems to be that we calculate the integral . That is, we integrate the values of round the curve, weighting each one with the tangent vector at the corresponding point.

How interesting: the correct generalization of is a path integral! It’s not the same as the path integral in Cauchy’s theorem, since is real-valued. (We could have made it complex-valued, but that would have been a rabbit out of a hat, so it is not allowed. It will of course be allowed if the idea arises of its own accord.)

Now let’s think about how to generalize the idea of having derivative zero everywhere in . The answer seems obvious—surely it is having derivative zero everywhere in . But to think that is to forget our rough restatement from earlier on. We described a function with zero derivative as one where was much smaller than whenever was small. We’ve gone to a lot of trouble to generalize the notion of . It has become the path integral of around a curve . Since is what generalizes the notion of the pair of signed points and , the obvious generalization of the rough one-dimensional definition of having a zero derivative everywhere is this: if is small then the path integral of around is much smaller.

In the one-dimensional case, we called a pair small if was small. What would be the appropriate definition of smallness for ? We’d probably like something geometrical, since is more naturally thought of as a geometrical object than . So we could interpret as the diameter of the pair and define to be small if it has small diameter in . If this turns out to be a bad definition later on, we can always modify it.

To keep the notation concise, let’s write , as we sort of did earlier, for the path integral of around (which is itself a parametrized closed curve given by a function , as discussed above). Now we need to think about when it is appropriate to say that is much smaller than the diameter of .

Again we can take our inspiration from the one-dimensional case. There the typical behaviour of a differentiable function is that should have order of magnitude : it has smaller order of magnitude only in the fortunate situation where the derivative is zero, and indeed that is what it means to have derivative zero. So in the two-dimensional case we should think about what the order of magnitude of is likely to be. Now our way of measuring the size of starts to look a little unsatisfactory: better to choose the *length* of . This can be defined satisfactorily as the integral over of the magnitude of , or if you want to keep life simple you can decide at this point to look just at circles or squares or something. The trivial upper bound for the size of is the maximum absolute value of multiplied by the length of . If is shrinking to zero in the neighbourhood of some point and is continuous, then after a while we can think of as roughly constant (since we are restricted to a small neighbourhood of ), so this has order of magnitude the length of . So it looks like a good idea to generalize the zero-derivative property as follows: if has small length then the path integral is significantly smaller.

However, we have been a little careless. To see why, let’s imagine that is a little square, centred at , and that the function takes us anticlockwise once round this square at constant speed. Then the tangent at each point of is a small vector pointing in the anticlockwise direction at each point. The path integral is then the sum of four parts. Since is small and is continuous (this we are assuming the whole time), is approximately equal to everywhere on . But then the four contributions to the path integral are all of roughly the same magnitude, and they point north, west, south and east. But that means that they more or less cancel: in particular, the path integral is a lot smaller than the length of .

If we now take a bit more trouble and work out the path integral for a typical differentiable function (just as we did in the one-dimensional case), which we may as well assume is linear (since it is, roughly), then we find that the typical magnitude of is proportional to the *square* of the length of and not the length itself. Roughly, the reason is this: one power comes from the fact that the tangent vectors have size proportional to the length of and the other comes from the fact that the amount can change from one side of the square to the other (thereby defeating the cancellation) is proportional to the length of as well. So now let us correct our generalized property to the following: if the length of is small, then the path integral is much smaller than the square of this length.

What does this property tell us about the function ? Well, continuing with the little squares, let’s take one with corners , , and . Let us continue to assume that is linear, and given by the formula . Then the path integral from to is the average value of on this segment, multiplied by the derivative of . If goes round the square at constant speed then the latter turns out to be . So this part of the path integral works out at . Doing similar calculations for all four parts and adding everything together, the only thing that doesn’t cancel works out at (if I haven’t made a mistake).

When will this be small compared with the square of the perimeter of the square, or equivalently, small compared with ? It seems that we need both and to be zero. But the purpose of and was to give us the best linear approximation to near . So what we seem to have ended up with is the condition that the derivative of should be zero.

This is embarrassing: we are back to the same condition as before, but now the conclusion we draw from it is weaker (that path integrals round closed curves are zero, which follows easily from being constant while the converse is far from true).

We could just give up at this point, but not if we have sensible mathematical instincts, one of which is, “Don’t give up until you have gone round and round in circles at least 100 times.” We haven’t even done so once. Here is one way (not the only one) of rescuing ourselves. There are a couple of places earlier where we made choices that were not forced on us. One was when we generalized to a function defined on . We decided to keep as a real-valued function, but it would have been just as natural (in the non-Gromov sense) to make it take values in . So perhaps we should try that.

An immediate discouraging thought is that functions to can often be thought of merely as pairs of functions to . Let’s keep that in mind as a challenge: at some point, if this is going to help, we must do something that isn’t just considering two functions at once. A second place where we made an arbitrary choice was when we defined the path integral. We decided to use the forward tangent vector, but it would have been equally natural, as far as we could tell at the time, to use the outward normal.

With those thoughts in mind, let us generalize the notion of path integral from -valued functions to -valued functions. If we want to keep the definition as close as possible to what we had before, then the main thing we have to decide is how to “multiply” by , given that both objects now belong to . At this point we can either use our prior knowledge that by far the best product on is obtained by identifying it with , or we can think of as two real-valued functions and remark (influenced by the thoughts of the previous paragraph) that we should do different things to the two components. So why not multiply one by the tangent and the other by the normal? That gives you multiplication of complex numbers (up to the symmetry between and ).

So now everything is as before except that the function is *complex*-valued and the product in the path integral is a product of complex numbers.

At this point I don’t want to say much more. If you go back to the estimate of the path integral for linear functions (regarding them as functions from to ) you now find that the only functions that work are ones of the form . This tells you that satisfies the Cauchy-Riemann equations. In other words, the condition that is much smaller than the square of the length of is telling us that is analytic. And therefore, after a long journey, we end up with the main assertion of this post: that Cauchy’s theorem is the natural two-dimensional generalization of the statement that a function from to with zero derivative is constant.

One possible objection to the above is that I am assuming a slightly stronger hypothesis in the two-dimensional case: not just that is much smaller than the square of the length of , but also that is differentiable as a function from to . I haven’t looked into whether the first implies the second.

Another idea that I have not explored is that this account could probably be extended until it ended up with a statement of Stokes’s theorem.

A final remark, which I may perhaps amplify on in a future post, is that it is possible to *prove* the one-dimensional statement in a way that generalizes very naturally to the standard proof of Cauchy’s theorem for triangles. But this post is now quite long enough.

September 19, 2007 at 2:38 pm |

[…] One way of looking at Cauchyās theorem […]

September 19, 2007 at 5:37 pm |

Dear Tim,

Both and are “local-to-global” results, in which the hypothesis asserts the vanishing of some infinitesimal expression and the conclusion is the vanishing of some global expression. But I am not sure that they are direct analogues of each other; indeed, both statements make sense, and are distinct, both on and on .

Another way to view Cauchy’s theorem is an assertion that every (continuously) differentiable function has an antiderivative. This is true infinitesimally (because has an antiderivative of ), and it propagates to be true locally (by summing up and estimating the errors), and then (assuming no topological obstructions, such as poles) it is true globally. In one dimension, the corresponding statement is that every continuous function has an antiderivative (i.e. the fundamental theorem of calculus). In two dimensions, one needs an extra order of control on the function (continuous differentiability rather than just continuity) because one needs one better order of control on the error term to compensate for the extra dimension (dividing a non-infinitesimal two-dimensional region into infinitesimal ones requires many more pieces than for a one-dimensional region, thus allowing many more errors to accumulate.)

September 19, 2007 at 6:54 pm |

OK perhaps it’s a slight exaggeration to say that Cauchy’s theorem is

thenatural generalization of implying constant. However, my main point is that you can reach a statement of Cauchy’s theorem by starting with the latter fact and trying to generalize it, following very standard mathematical instincts at every stage of the process. So in that sense I’m claiming that Cauchy’s theorem isanatural generalization of the statement about derivatives. Roughly speaking, it’s the generalization you get when you decide to increase the dimension of everything you talk about by 1, so pairs of points become closed curves, pairs of very close points become little squares (or anything else that tiles nicely), becomes , and so on. What I like about that is that it is slightly unexpected because other more obvious ways of generalizing don’t relate the two.September 19, 2007 at 7:49 pm |

Ah, I get it now, thanks. (I was viewing the transition from R to C by increasing the dimension n of the ambient space by one, without increasing the dimension k of the object one is integrating on in the conclusion.) So: the “f’=0 -> f const” statement corresponds to n=1,k=0 in the real case and n=2,k=0 in the complex case, Cauchy’s theorem corresponds to n=2,k=1, and the fundamental theorem of calculus on the real line corresponds to n=1,k=1.

[I am reminded of a talk I once heard Ed Witten give, about M-theory. He began with saying that quantum field theory is the theory of 0-dimensional objects, string theory is the theory of 1-dimensional objects, and M-theory is the theory of 2-dimensional objects. He then said that as everyone was familiar with the first two, he’ll jump straight to the third… and I soon got lost in the talk. Though, on another occasion, I heard Witten give a beautiful and accessible lecture on the geometric Langlands conjecture, so perhaps it just the fact that I find physics conceptually more difficult than mathematics sometimes.]

Incidentally, I wasn’t able to find the Gromov article on-line either, but I can at least give the link to the Mathscinet review, which is rather interesting in itself:

http://www.ams.org/mathscinet-getitem?mr=1826251

September 20, 2007 at 1:17 pm |

There is an analogue of this much farther along in complex analysis. One defines a holomorphic vector bundle on a complex manifold to be a vector bundle where the transition charts are locally defined by matrices of analytic functions. One then proves that such a bundle has a flat d-bar connection, meaning a connection \Delta_{X} that only works when the tangent vector field X is of the form d/d \bar{z}.

This never made sense to me until I realized that it was the complex analogue of the fact from real geometry that a bundle with a flat connection can be written so that the transition charts are given by constant functions.

September 20, 2007 at 2:29 pm |

Thanks for the hint to Gromovs article!

September 20, 2007 at 2:59 pm |

[…] of two mathematical theorems that proceed by analogy from a lower-dimensional case. The first is on how to discover the formula for solutions to cubic equations by analogy with the formula for […]

September 20, 2007 at 5:27 pm |

David, I’m curious about your assertion but I don’t quite understand what it means for a connection Delta_X to “only work” when X is a d-bar vector field in a local chart. Do you mean to say that Delta_X annihilates all holomorphic sections whenever X = d/d\bar{z} in a local chart? That would make sense, it says that the Cauchy-Riemann equations on a holomorphic vector bundle can be modeled by a global connection.

Also, if I understand correctly, the real geometry fact is the analogue of the

converseof the complex geometry one, right? I would imagine that the real geometry fact would be the statement that a system of charts with constant transition functions defines a flat connection.September 21, 2007 at 3:47 pm |

Sorry for the unclarity. The connection Delta_X is only defined when X is a vector field of the form sum f_i(z) d/d \bar{z_i}. Here the f can be any smooth functions, the point is that we don’t have any d/dz terms. (And then it is a connection, that is to say, it is linear in X and obeys the Liebnitz rule: Delta_X(fv)=X(f) v+f Delta_X(v).) The sections s with Delta_X(s)=0 are exactly the holomorphic sections.

In coordinates, the definition is simple enough. In any coordinate chart, set

Delta_{d/d \bar{z}} (v_1, …, v_n)=(d v_1/d \bar{z}, …, d v_n/d \bar{z})

Since the transition from one coordinate chart to another is given by a matrix g_{ij} of holomorphic functions, which have

d g_{ij} / d \bar{z}=0, this formula glues correctly between coordinate charts.

September 21, 2007 at 5:34 pm |

OK, I get it now, thanks. (Strangely, for some reason I have never encountered before a connection that was only defined for vector fields in a sub-bundle of the tangent bundle, but there is of course no reason to prohibit consideration of such objects.)

September 21, 2007 at 8:08 pm |

Dear Terry,

Some physical theories come and go, e.g., ether. Also, a safe bet

is to have a Nobel prize endorsement – applied (experimental)

physicists try to be scrupulous when giving them out. There was

an interesting article in NAMS Vol. 54 Issue 8 (2007) about this: The

Trouble with Physics: The Rise of String Theory, the Fall of a Science,

and What Comes Next, by Lee Smolin, reviewed by Brent Deschamp

from California State Polytechnic University.

September 28, 2007 at 9:15 pm |

Re Cauchy theorem: I donĀ“t know if IĀ“m missing the point here but maybe the right analogue for many variables of ĀØderivative of f(x) = 0 implies f locally constant” is that a differentiable vector field defined (for example) on an open convex set U of euclidean space and which is conservative (dVj/dxk = dVk/dxj) has a potential F. This can be proved a la Goursat: namely by repeated subdivision show the path integral of V around any triangle in U must be zero. Then define the potential F(X) to be the (straight) line integral of V from some fixed point A in U to X and use Goursat to deduce itĀ“s differentiable with gradient V. In the 2D complex variable case, given u(z)+iv(z) satisfying CR, the potentials for (u,v) and (-v,u) fit together to give an antiderivative for u+iv.

December 5, 2007 at 8:05 pm |

You can find Gromov’s paper in his homepage: http://www.ihes.fr/~gromov/index.html

March 19, 2008 at 7:14 pm |

there are a couple of ‘formulae that don’t parse’, in paras 16 and 23 of the first entry by gowers…doesn’t look very pretty!

October 5, 2008 at 8:02 pm |

[…] the way, Gowers wrote a post, one way of looking at Cauchy theorem, which is very interesting and tells us that the Cauchy’s theorem is the natural […]

April 15, 2009 at 10:14 am |

Dear Tim,

I’m a physics student. Let me ask you a question. I hop e it makes sense.

You say:

> Another idea that I have not explored is that this account could probably be extended until it ended up with a statement of Stokesās theorem.

I am very courious about this, and in particular about the relationship between Stoke’s theorem in the real plane and holomorphic functions on the complex plane.

As I understand it, Stoke’s theorem in a sense tells you that the information that an exact form encodes in the “bulk” of a compact region of the plane is all contained in the boundary of that region, at least when we talk of integrals and not of specific values.

Similarly, the information about the values of an holomorphic function in a compact domain is all contained in the values the function assumes on a closed path sorrounding the domain.

Is there a relationship between these facts?

For example, if on the real plane I know the values (x,y) of a 1-form w on the boundary dB, can I interpret it as a complex-valued function and extend it holomorphically in the bulk B? Would this holomorphical extension have anything to do with the exterior derivative dw, up to gauge transformations w -> w + dg ?

August 10, 2009 at 2:32 am |

@ tomate:

Yes, these are very much related. I would say that Cauchy’s Theorem is a special case of Stokes’s Theorem on the plane (aka Green’s Theorem).

For simplicity, let’s assume that all functions/forms are smooth (infinitely differentiable in the real sense), although weaker hypotheses are enough for Stokes’s Theorem, and weaker hypotheses imply this for complex-differentiable functions. Also, I’ll identify C with RĀ² in its guise as the domain of a holomorphic function, so a holomorphic function is a certain kind of smooth C-valued function f on RĀ².

What kind of smooth C-valued function on RĀ²? A complex-differentiable one; this means that its exterior derivative df is of the form f’ dz, where f’ is some other smooth C-valued function on RĀ² and dz is the exterior derivative of the identity function z. (Of course, we’re thinking of z now as a function from RĀ² to C, so it’s not exactly the identity function, but rather the function that maps (a,b) to a + ib.)

When you form a line/contour integral, you are integrating the 1-form Ļ = f dz. If f is complex-differentiable, then we have dĻ = df ^ dz = f’ dz ^ dz = 0, so Ļ is exact. Then Stokes’s Theorem tells us that the integral of Ļ around a closed curve is 0, which is Cauchy’s Theorem.

(Hopefully you’re getting follow-ups by email, or otherwise will actually read this!)

August 10, 2009 at 2:37 am |

I said that Ļ is exact, but really what I got is that Ļ is closed. At some point, you have to check that there are no holes (no singularities or otherwise undefined values for f in the interior of the curve), so that you can either conclude that Ļ is exact or else conclude that the curve is a boundary (which is the more direct way).

October 23, 2009 at 3:46 pm |

where would be a good place to start if im interested in this advanced math?

March 17, 2016 at 2:57 am |

[…] the way, Gowers wrote a post, one way of looking at Cauchy theorem, which is very interesting and tells us that the Cauchy’s theorem is the natural […]