The title of this post is meant to serve various purposes. First and foremost, it is a cheap trick designed to attract attention. Secondly, and relatedly, it is a nod to the amusing events of the last week or so. [Added later: they were from the last week or so when I wrote that sentence.] But there is a third reason that slightly excuses the first two, which is that the current state of play with the EDP project has a very PNP-ish feel to it. Indeed, that has been the case for a while: we are trying to find a clever decomposition of a diagonal matrix, which is a difficult search problem, even though we can be fairly confident that if somebody came up with a good candidate for a decomposition, then checking that it worked would be straightforward. And just in case that is not true, let’s make it trivially true by saying that we are searching for a decomposition that can easily be checked to work. If it can easily be checked to work, then it can easily be checked that it can easily be checked to work (the algorithm being to try to check it and see whether you succeed). But now I want to air a suggestion that reduces the search problem to another one that has similar properties but may be easier.
A brief word also on why I am posting again on EDP despite the fact that we are nowhere near 100 comments on the previous post. The main reason is that, now that the rate of commenting has slowed to a trickle, it is far from clear that the same rules should apply. I think the 100-comment rule was a good sufficient condition for a new post, but now I think I want to add a couple more: if there is something to say and quite a long time has elapsed since the previous post, or if there is something to say that takes a while to explain and is not a direct continuation of the current discussion, then it seems a good idea to have a new post. And both these conditions apply.
[Added later: this is a rather strange post written over a few weeks during which my thoughts on the problem were constantly changing. So everything that I say, particularly early on, should be taken with a pinch of salt as I may contradict it later. One approach to reading the post might be to skim it, read the very end a bit more carefully, and then refer back to the earlier parts if you want to know where various ideas came from.]
A simple question about functions on .
Let me cut to the chase and ask a question that I find quite nice and that can be understood completely independently of all the discussion of EDP so far. I will then try to motivate the question and sketch a proof that a positive answer to this question would imply EDP.
The question, in isolation, is this. Does there exist, for every constant a sequence with the following properties?
2. for every
Without the third condition, the answer is easily seen to be yes, since one can take and all the other to be zero.
After writing that question, I had to stop for a day or so, during which I realized that the answer was a trivial no. Indeed, if and then so
which rules out condition 3.
How could I have asked such a dull question? Well, the question may be dull but I think there remains some interest in the motivation for it, especially as it leads to a related question that may be harder to answer but have equal potential to be useful. If you want to cut to the new chase, then skip to the bottom of this post, where I shall ask a different question that again involves finding a function that satisfies various conditions (though this time the function is of two variables).
Where the question came from.
But if you would prefer some motivation, then here is how the first question arose. We would like to find a decomposition of the identity on the rationals. (By this I mean the matrix where and range over ) We want this decomposition to take the form where each and each is the characteristic function of a HAP, and the sum of the is small. (What precisely this smallness condition is I’ll discuss later.)
Now let me consider briefly a very general question related to the facetious title of this post: how is that mathematicians often manage to solve difficult problems with an NP flavour? That is, how is it that when they are faced with searching for an X that does Y, and most Xs clearly don’t do Y, they nevertheless often succeed in finding an X that does do Y. Obviously the answer is not a brute-force search, but what other kinds of searches are there?
One very useful method indeed is to narrow down the search space. If I am looking for an X that does Y, then one line of attack is to look for an X that does Y and Z. If I am lucky, there will exist Xs that do both Y and Z, and the extra condition Z will narrow down the search space enough to make it feasible to search.
One of the best kinds of properties Z to add is symmetry properties, since if we insist that X is symmetric in certain ways then we have fewer independent parameters. For example, if we want to find a solution to a PDE, then it may be that by imposing an appropriate symmetry condition one can actually ensure that there is a unique solution, and finding something that is unique is often easier than finding something that is far from unique (because somehow your moves are more forced).
Let us write for the characteristic function of the HAP Then we are looking for a decomposition as a linear combination of functions of the form which we have been writing as What symmetry conditions can we impose?
An obvious one to start with is that we could insist that and That is, we could attempt to find a decomposition of the form (Note that ranges over the rationals and over the positive integers.) Actually, I think that’s more of a simplicity condition than a symmetry condition, but the same cannot be said of the next condition, which is to insist that should be independent of This tells us that the decomposition “looks the same everywhere” and takes the form
Now what is the smallness property we need of the coefficients ? I won’t fully justify this here, because I have already discussed how to deal with the fact that the rationals form an infinite set. Let me simply say that if the for which are bounded, then we are done if can be made arbitrarily small. (Roughly speaking, we are interested in the ratio of the average diagonal entry to the average per Since the average diagonal entry of the identity is 1 and depends only on we are interested in )
Let be the function Then we would like to express the identity as a linear combination of the with coefficients with a small absolute sum. A good start might be to calculate It is the number of such that and for some positive integers If and are minimal integers such that then the only that “divide” and are numbers of the form where and We then get and so we need to be at most and also at most But the maximum of and is so we need to be at most Since any positive integer satisfying this condition works, we ind that
One nice thing to observe about this is that it depends only on and not on and themselves. This turns our question into a one-dimensional one (but unfortunately it is the question above that has a boring answer).
But let’s see why that is. We are now in a position where we want to write the identity as a linear combination of the functions But now we know that if we write what we are trying to do is write the function as a linear combination of the (since we want if and 0 otherwise.
How are we to deal with the functions given that they have unpleasant integer parts involved? There turns out to be a nice answer to this: look at the differences We see that if is a multiple of and 0 otherwise. Now where So the condition translates to the condition (interpreting to be 0). And which we want to be 1 when and 0 otherwise, which is where conditions 1 and 2 come from.
If we could get conditions 1-3 then I’m pretty sure we’d have a solution to EDP. But we can’t, so where does that leave us? In particular, are we forced to give up the very appealing idea of having a decomposition that is “the same everywhere”?
Actually no. It shows that we cannot combine the following two features of a decomposition: being the same everywhere, and using only “square” products — that is, products of the form rather than more general “rectangular” products of the form What happens if we allow ourselves a bit of extra generality? We would like to do this in a minimal way, so for the time being let us assume that and have the same common difference.
At first it may seem as though we gain nothing, because there is a sort of polarization identity operating. Observe first that if we decompose as a sum of matrices then we can replace each by its symmetrized form and we will have a decomposition of into symmetric parts. So without loss of generality we can assume that the coefficient of in any decomposition is the same as the coefficient of If is longer than then we can now use the fact that
to replace the rectangular products by a comparably small combination of square products. However, there is an important difference, which is that we are now allowing ourselves (square) products of HAPs of the form Perhaps surprisingly, this gives us a large amount of extra flexibility.
To demonstrate this, I need to go through calculations similar to the ones above, but slightly more complicated (though with a “magic” simplification later that makes things nice again). First, it turns out to be convenient to define to be the characteristic function of the HAP (One indication of this is that it has length or to be more accurate a piece of pedantry that turns out to be unexpectedly important later.) Now let us define to be We can calculate in a similar way to the way we calculated earlier. Let be minimal such that and let be the largest rational that divides both and which tells us once again that and For to contribute to the sum, we again need to be of the form for some positive integer but now we need and to lie in the interval Without loss of generality Then our requirements are that and Since is an integer, this is equivalent to requiring that and
The number of that satisfy this is obviously Again, the max with zero is extremely important to us, and not just because it makes the expression correct. So now we have our formula:
As in the previous one-dimensional case, we have ugly integer parts to contend with, not to mention the max with zero. But once again we can simplify matters by looking at telescoping sums and the “derivative” of our function, except that this time we want two-dimensional analogues. What we shall do is this (I think it is fairly obviously the natural thing to do so I won’t try to justify it in detail). Define a function by
We would now like to express an infinite linear combination as a linear combination of the (Note that is identically zero if Hence the range of summation.) Since we don’t actually know what the are, let us just imagine that we have some coefficients that satisfy
But turns out to be something we can understand very well, because the functions are unexpectedly simple. Recall that
where and are minimal such that So in principle we can calculate This looks unpleasant until we make a couple of simple observations. First of all, note that is constant on rectangles of sidelengths and (Here I am taking and fixed and varying and ) More precisely, there is a tiling of into such rectangles, and is constant on each of these rectangles. As you go horizontally from one rectangle to another to its right, increases by 1, and as you go vertically upwards it decreases by 1. If you put all these facts together, you find that on all multiples of and is zero everywhere else.
What does this tell us about the coefficients ? Well, we want to be 1 if and otherwise. It follows that whenever and are coprime, with then we want to be 1 if and otherwise. This is very similar to our one-dimensional criterion, but with the huge difference that the sets along which we are summing the function are disjoint. (They are lines through the origin, or rather intersections of those lines with )
We also have a smallness criterion. Recall that in the one-dimensional case we needed to be arbitrarily small (interpreting as 0). Here we need the very similar property that
is arbitrarily small. Let me briefly remark that there is nothing to stop us defining however we like for values of with and it will be convenient to exploit this.
So there is a rather simple looking question: can we find a function such that the “-norm of the mixed partial derivatives” is arbitrarily small, the sum along the diagonal is 1, and the sum over all other lines of positive rational gradient is 0?
Somehow this question doesn’t feel as though it should be that hard, since the constraints don’t seem to interact all that much. So it would be natural to think that either there is a simple reason for no such existing, just as there was in the one-dimensional case, or there is a fairly simple construction of such an example.
Solving the two-dimensional question.
If there is a construction, how might one expect it to work? There are two reasonable possibilities: either one writes something quite clever straight down (perhaps involving trigonometric functions somehow to get the sums to be zero along the lines of gradient not equal to 1), or one goes for a just-do-it proof. My feeling (with the benefit of some hindsight) is that there are few enough constraints for a just-do-it approach to be appropriate, and in any case attempting a just-do-it proof can hardly fail to increase one’s understanding of the task, whether or not the attempt succeeds.
I should interrupt what I’m writing to say that I’m almost certain that a just-do-it approach works. By that I mean that I’m sitting here about to write down what I think is a correct proof, but that proof has been sitting in my head for the last day or so and has not been written down. And I’m pretty sure that my arguments that this would solve EDP can be tightened up and turned into a rigorous proof too, though that will need to be checked very carefully. So right now I don’t see why we don’t have a complete proof of EDP.
I should also at this point make a general remark about Polymath. I’ve been spending quite a lot of time thinking about the problem on my own, which is contrary to the spirit of Polymath and therefore demands an apology — but also an explanation. The explanation is that I have been trying to write this post for a long time — I started it nearly a week ago — but my efforts have been interrupted by plane journeys and ICM activities, including blogging the ICM. However, plane journeys, long waits for opening ceremonies, lonely moments in my hotel room, etc., are very conducive to mathematical thought, and since most of the argument I am writing here is rather simple (it was just a question of finding the right idea to get it started), I have found that the length of what I need to put in the post has been increasing faster than the length of the post itself. And one other mitigating factor is that there has been very little reaction to my recent comments on the previous post, where I first set out some of this approach (see in particular the two most recent comments), so I feel more entitled to go it alone for a while than I would have if the project had been roaring ahead at the sort of speed it was going at earlier in the year. Needless to say, the proof, if it turns out to work, is still very much a joint effort — this most recent argument depends crucially on the insights that have emerged from the more active discussion phase — and will be another paper of D. H. J. Polymath.
I must repeat that this post is being written over the course of many days. Several hours have passed since I finished the last paragraph, and during them I thought a bit harder about the step I was most worried about, which was dealing with the fact that the rationals are infinite and that at some point one must do some kind of truncation. I now think that that step will be more problematic than I had thought, and it seems to me to be about 50-50 whether it will be possible to get it to work. Having said that, I feel as though something non-trivial has emerged from the argument I shall complete in a moment, and I will be quite surprised (not to mention disappointed) if it leads nowhere. What I’m really saying is that I have an argument that does exactly what seems to be needed, except that the sums involved are a bit too formal. Now it could be that establishing convergence in some appropriate sense is where the real difficulty of the problem lies. In that case, this approach will be much less useful than I thought. But I am hoping that it is more of a technical problem. No time to think about it right now as I am going to a performance of A Disappearing Number, a play about which I shall undoubtedly have more to write later.
WELL OVER A WEEK LATER
The first thing to say is that EDP definitely isn’t yet solved. I’ve also reformulated the problem again, in a way that I find clearer. I could give a long description of how the new perspective evolved from the one above, but I think instead I’ll just sketch what happened.
Solving the two-dimensional question, continued
The first thing to observe is that if we take the characteristic function of a rectangle, then the sum of the is 4, since the only contributions come from the four corners.
Therefore, all we have to do is this. We shall build up an infinite matrix as a linear combination of rectangles — that is, functions of the form where and are arithmetic progressions with common difference 1. We start by choosing a large integer We then put entries at every point That ensures that the sum along the main diagonal is 1, provided that we do not do anything else to that diagonal. However, it also causes the sums along several other lines to be non-zero, so there is work to do. Note that the sum of coefficients so far is since there is just one rectangle and its coefficient is
To do the rest of the work, let us enumerate all rationals greater than 1 as and deal with each gradient in turn. What “deal with” means is that we shall add in some very small multiple of a rectangle in such a way that the sum along the line of the gradient we are dealing with is now zero, and will agree that from then on we will not change any entry in that line. Can we do this? Well, let’s suppose that we are dealing with the gradient So far, the sum of the entries of the matrix along the line of gradient is say. So we must find a rectangle that contains many integer points along the line of gradient . And we also need it not to intersect any line of gradient with But if we go far enough up the line with gradient we will find ourselves at a very great distance from the lines we are trying to avoid, so we can find a very large rectangle that is disjoint from those lines and contains many integer points of the form If it contains such points, then we make the coefficient of that rectangle equal to Since we can get to be as big as we like, we can get this coefficient to be as small as we like. So we have “dealt with” the gradient and the problem is solved.
Why doesn’t that solve EDP?
Let me start by saying why I thought it might solve EDP. I thought that if the sum of coefficients associated with each was arbitrarily small, then that would be saying that in a certain sense the average weight of the coefficients was arbitrarily small. Since the average size of the diagonal entries is 1, this ought somehow to give us what we want.
To understand why this doesn’t work, and probably can’t be made to work, I find it helpful to reformulate the problem again. To do so, let us think about the operators once again. (These first came in fairly near the beginning of the section entitled “Rectangular products.”) In particular, let us think what one of these operators does to the function that is 1 at and 0 everywhere else.
In fact, let us be slightly more general, as it will be useful to us later. What is the value of at For it to be non-zero we need and and then it is 1. So if we sum over all then we get the number of with that property. Now each with that property is of the form for some so when we sum over we are counting the number of ways of writing as for some and some
That is, if we define to be the number of ways of writing as with and then That is another way of saying that where is a multiplicative convolution.
If you don’t like following calculations but want to know what the moral of all that is, then here you are. Let us define the multiplicative convolution of two functions and to be the function given by the formula
Then the operator takes a function to the multiplicative convolution of with the function which is itself the multiplicative convolution of the characteristic functions of the sets and (That is, as I’ve already said, is the number of ways of writing as a fraction with numerator in and denominator in )
Now we’re trying to express the identity as a small linear combination of these convolution operators, so we can turn the problem into a one-dimensional problem: to write the function (which is the identity for the multiplicative convolution) as a small linear combination of functions
I am about 99% sure that if we could do that with a finite linear combination then we would have a proof of EDP. I used to think that “whatever works for finite should also work for infinite but with absolutely summable coefficients,” but it seems that this is wrong because even if the coefficients can be made very small, the functions themselves may be getting quite large in the norms we care about, so the convergence may be much too weak.
That does indeed seem to be the case here. If you convert the just-do-it argument above into an argument to express in terms of the functions then you get pointwise (absolute) convergence, but that is not strong enough.
Let me just elaborate on the remark that this problem can be solved directly by means of a just-do-it argument. I won’t give the full argument, but I’ll just point out that for each rational you have the option of doing the following. You choose a very large integer and a very much larger integer You then set and Since is much larger than is very close to for every and Moreover, the number of pairs such that is (assuming that and are coprime, that is). So if we want to get rid of a value at the point then we can subtract the function which has a small coefficient and is supported on only a very small neighbourhood of
What is not so good about it is that the norm of is roughly which is not small at all.
Thinking slightly further about this, I see that what I said above about pointwise convergence was a bit too pessimistic: for example, we can easily organize for the convergence to be uniform. And indeed, if we think of as being, in its broad shape, roughly like a set of size close … actually, what is it like?
Rational numbers sort of repel each other, so if we’ve devised and such that all points in are close to then all points that are not actually equal to will tend to have rather large denominators, and therefore not to occur very frequently. So let’s hazard a guess that for the purposes of thinking about norms, we can think of as having a spike at where the value is and other values that are all around 1. That would mean that the pth power of the norm was around Thinking of and as constants, it looks as though we might expect some kind of good behaviour to kick in when passes 2. Remembering that we’re actually interested in times the norm, we get something like But if the value we were trying to get rid of was small, we can multiply that by a small factor Ah, but the problem is that there are an awful lot of those s to worry about, and they add up in an -ish way.
What that boils down to is that this kind of greedy keep-away-from-everything-else approach doesn’t give us any kind of useful convergence, which we need if we are to get the averaging argument to work. I realize that I haven’t been too clear about what kind of convergence would work, but there is one kind that is definitely OK, and that is if we have a finite linear combination. So let me now ask a completely precise question.
A sufficient (I’m almost sure) condition for EDP
Let and be subsets of We define to be the function whose value at is the number of ways of writing as with and I claim that EDP follows if for every there exists a way of writing the function (which is defined on and takes the value 1 at 1 and 0 everywhere else) as a finite linear combination where the and are intervals of positive integers and
In this post I have been looking at the representation-of-diagonals approach over the rationals with the extra symmetry constraint that the HAP products are all of the form where and that the coefficient of is independent of This symmetry condition forces the diagonal map we are trying to represent to be the identity. (As I write this it occurs to me that we might be able to take an arbitrary efficient representation of a diagonal matrix and average it over all multiples in order to obtain an efficient representation of the identity. So perhaps this “new” approach is equivalent to the old one — which would be quite encouraging as it would increase the chances that the new one can be made to work.)
Thinking through what this means, we find that we can reformulate the problem and obtain the question asked in the previous section.
Where next for EDP?
To finish, I’d like to ask if anyone has views about how they would like the EDP project to proceed. We are in uncharted waters at the moment: we have a project that started with a burst of activity — in fact it was sustained enough to count as more than just a burst — but now it has slowed right down and I’m not sure it will ever regain anything like its earlier intensity. And yet I myself am still enthusiastic about it, and perhaps I’m not alone in that.
My question is what should happen if it turns out that there are only two or three people, or even just one, who feel like working on the problem. In that case, there are various options. We could simply continue to post comments on this blog and not worry if there are very few participants or if the rate of posting is small. Or we could move the project to a different phase, more like a normal collaboration, with longer periods of private thought, but from time to time write comments or posts to make sure that any important developments are kept public (so that, in particular, anyone who wanted to join in would be free to do so). Or we could turn the project into a completely conventional collaboration, always with the understanding that the author of the eventual paper would be D. H. J. Polymath. And that’s not quite the most conventional it could go. The final possibility would be to close the project altogether, write a paper summarizing all the partial results, and then “free up” the problem for anyone who wants to think about it privately — if they used any of Polymath’s ideas they could simply refer to the paper that contained those ideas.
I think I favour keeping the project going as a Polymath project but with longer periods of private thought, but I don’t favour this strongly enough that I couldn’t be persuaded to change my mind if others feel differently. If that is the way we go, then my final question is, are there people out there who would like to be part of a small group of reasonably dedicated participants, or am I on my own now?