EDP18 — apparently P does not equal NP

The title of this post is meant to serve various purposes. First and foremost, it is a cheap trick designed to attract attention. Secondly, and relatedly, it is a nod to the amusing events of the last week or so. [Added later: they were from the last week or so when I wrote that sentence.] But there is a third reason that slightly excuses the first two, which is that the current state of play with the EDP project has a very P\neNP-ish feel to it. Indeed, that has been the case for a while: we are trying to find a clever decomposition of a diagonal matrix, which is a difficult search problem, even though we can be fairly confident that if somebody came up with a good candidate for a decomposition, then checking that it worked would be straightforward. And just in case that is not true, let’s make it trivially true by saying that we are searching for a decomposition that can easily be checked to work. If it can easily be checked to work, then it can easily be checked that it can easily be checked to work (the algorithm being to try to check it and see whether you succeed). But now I want to air a suggestion that reduces the search problem to another one that has similar properties but may be easier.

A brief word also on why I am posting again on EDP despite the fact that we are nowhere near 100 comments on the previous post. The main reason is that, now that the rate of commenting has slowed to a trickle, it is far from clear that the same rules should apply. I think the 100-comment rule was a good sufficient condition for a new post, but now I think I want to add a couple more: if there is something to say and quite a long time has elapsed since the previous post, or if there is something to say that takes a while to explain and is not a direct continuation of the current discussion, then it seems a good idea to have a new post. And both these conditions apply.

[Added later: this is a rather strange post written over a few weeks during which my thoughts on the problem were constantly changing. So everything that I say, particularly early on, should be taken with a pinch of salt as I may contradict it later. One approach to reading the post might be to skim it, read the very end a bit more carefully, and then refer back to the earlier parts if you want to know where various ideas came from.]

A simple question about functions on \mathbb{N}.

Let me cut to the chase and ask a question that I find quite nice and that can be understood completely independently of all the discussion of EDP so far. I will then try to motivate the question and sketch a proof that a positive answer to this question would imply EDP.

The question, in isolation, is this. Does there exist, for every constant c>0, a sequence (a_n) with the following properties?

1. \sum_na_n=1.

2. \sum_na_{rn}=0 for every r\geq 2.

3. |a_1|+\sum_{n=1}^\infty|a_{n+1}-a_n|\leq c.

Without the third condition, the answer is easily seen to be yes, since one can take a_1=1 and all the other a_n to be zero.

After writing that question, I had to stop for a day or so, during which I realized that the answer was a trivial no. Indeed, if \sum_na_n=1 and \sum_na_{2n}=0, then \sum_na_{2n-1}=1, so

\sum_n|a_{n+1}-a_n|\geq\sum_n|a_{2n}-a_{2n-1}|\geq 1,

which rules out condition 3.

How could I have asked such a dull question? Well, the question may be dull but I think there remains some interest in the motivation for it, especially as it leads to a related question that may be harder to answer but have equal potential to be useful. If you want to cut to the new chase, then skip to the bottom of this post, where I shall ask a different question that again involves finding a function that satisfies various conditions (though this time the function is of two variables).

Where the question came from.

But if you would prefer some motivation, then here is how the first question arose. We would like to find a decomposition of the identity on the rationals. (By this I mean the matrix \delta_{rs}, where r and s range over \mathbb{Q}.) We want this decomposition to take the form \sum_i\lambda_iP_i\otimes Q_i, where each P_i and each Q_i is the characteristic function of a HAP, and the sum of the |\lambda_i| is small. (What precisely this smallness condition is I’ll discuss later.)

Now let me consider briefly a very general question related to the facetious title of this post: how is that mathematicians often manage to solve difficult problems with an NP flavour? That is, how is it that when they are faced with searching for an X that does Y, and most Xs clearly don’t do Y, they nevertheless often succeed in finding an X that does do Y. Obviously the answer is not a brute-force search, but what other kinds of searches are there?

One very useful method indeed is to narrow down the search space. If I am looking for an X that does Y, then one line of attack is to look for an X that does Y and Z. If I am lucky, there will exist Xs that do both Y and Z, and the extra condition Z will narrow down the search space enough to make it feasible to search.

One of the best kinds of properties Z to add is symmetry properties, since if we insist that X is symmetric in certain ways then we have fewer independent parameters. For example, if we want to find a solution to a PDE, then it may be that by imposing an appropriate symmetry condition one can actually ensure that there is a unique solution, and finding something that is unique is often easier than finding something that is far from unique (because somehow your moves are more forced).

Let us write P_{d,m} for the characteristic function of the HAP \{d,2d,\dots,md\}. Then we are looking for a decomposition as a linear combination of functions of the form (x,y)\mapsto P_{d,m}(x)P_{d',m'}(y), which we have been writing as P_{d,m}\otimes P_{d',m'}. What symmetry conditions can we impose?

An obvious one to start with is that we could insist that d=d' and m=m'. That is, we could attempt to find a decomposition of the form \sum_{d,m}\lambda_{d,m}P_{d,m}\otimes P_{d,m}. (Note that d ranges over the rationals and m over the positive integers.) Actually, I think that’s more of a simplicity condition than a symmetry condition, but the same cannot be said of the next condition, which is to insist that \lambda_{d,m} should be independent of d. This tells us that the decomposition “looks the same everywhere” and takes the form \sum_{d,m}\lambda_mP_{d,m}\otimes P_{d,m}.

Now what is the smallness property we need of the coefficients \lambda_m? I won’t fully justify this here, because I have already discussed how to deal with the fact that the rationals form an infinite set. Let me simply say that if the m for which \lambda_m\ne 0 are bounded, then we are done if \sum_m|\lambda_m| can be made arbitrarily small. (Roughly speaking, we are interested in the ratio of the average diagonal entry to the average \sum_m|\lambda_{m,d}| per d. Since the average diagonal entry of the identity is 1 and \lambda_{m,d} depends only on m, we are interested in \sum_m\lambda_m.)

Let f_m be the function \sum_dP_{d,m}\otimes P_{d,m}. Then we would like to express the identity as a linear combination of the f_m with coefficients with a small absolute sum. A good start might be to calculate f_m(x,y). It is the number of d\in\mathbb{Q} such that x=rd and y=sd for some positive integers 1\leq r,s \leq m. If a and b are minimal integers such that ax=by, then the only d that “divide” x and y are numbers of the form z/t, where x=bz and y=az. We then get r=bt and s=at, so we need t to be at most m/b and also at most m/a. But the maximum of a and b is d(x,y), so we need t to be at most m/d(x,y). Since any positive integer satisfying this condition works, we ind that f_m(x,y)=\lfloor m/d(x,y)\rfloor.

One nice thing to observe about this is that it depends only on d(x,y) and not on x and y themselves. This turns our question into a one-dimensional one (but unfortunately it is the question above that has a boring answer).

But let’s see why that is. We are now in a position where we want to write the identity as a linear combination of the functions f_m. But now we know that if we write g_m(n)=\lfloor m/n\rfloor, what we are trying to do is write the function (1,0,0,0,\dots) as a linear combination of the g_m (since we want \sum_m\lambda_mf_m(x,y)=1 if d(x,y)=1 and 0 otherwise.

How are we to deal with the functions g_m, given that they have unpleasant integer parts involved? There turns out to be a nice answer to this: look at the differences g_m-g_{m-1}. We see that g_m(n)-g_{m-1}(n)=1 if m is a multiple of n and 0 otherwise. Now \sum_m\lambda_mg_m(n)=\sum_ma(m)(g_m(n)-g_{m+1}(n)), where a(m)=\lambda_1+\dots+\lambda_m. So the condition \sum_m|\lambda_m|\leq c translates to the condition \sum_m|a_m-a_{m-1}|\leq c (interpreting a_0 to be 0). And \sum_ma(m)(g_m(n)-g_{m-1}(n)=\sum_ra(rn), which we want to be 1 when n=1 and 0 otherwise, which is where conditions 1 and 2 come from.

If we could get conditions 1-3 then I’m pretty sure we’d have a solution to EDP. But we can’t, so where does that leave us? In particular, are we forced to give up the very appealing idea of having a decomposition that is “the same everywhere”?

Rectangular products.

Actually no. It shows that we cannot combine the following two features of a decomposition: being the same everywhere, and using only “square” products — that is, products of the form P\otimes P rather than more general “rectangular” products of the form P_1\otimes P_2. What happens if we allow ourselves a bit of extra generality? We would like to do this in a minimal way, so for the time being let us assume that P_1 and P_2 have the same common difference.

At first it may seem as though we gain nothing, because there is a sort of polarization identity operating. Observe first that if we decompose I as a sum of matrices A_i, then we can replace each A_i by its symmetrized form (A_i+A_i^T)/2 and we will have a decomposition of I into symmetric parts. So without loss of generality we can assume that the coefficient of P\otimes Q in any decomposition is the same as the coefficient of Q\otimes P. If P is longer than Q, then we can now use the fact that

P\otimes Q+Q\otimes P=P\otimes P-Q\otimes Q-(P\setminus Q)\otimes(P\setminus Q)

to replace the rectangular products by a comparably small combination of square products. However, there is an important difference, which is that we are now allowing ourselves (square) products of HAPs of the form \{rd,(r+1)d,\dots,sd\}. Perhaps surprisingly, this gives us a large amount of extra flexibility.

To demonstrate this, I need to go through calculations similar to the ones above, but slightly more complicated (though with a “magic” simplification later that makes things nice again). First, it turns out to be convenient to define P_{d,r,s} to be the characteristic function of the HAP \{rd,(r+1)d,\dots,(s-1)d\}. (One indication of this is that it has length s-r, or to be more accurate (s-r)\vee 0, a piece of pedantry that turns out to be unexpectedly important later.) Now let us define g_{r,s}(x,y) to be \sum_dP_{d,r,s}(x)P_{d,r,s}(y). We can calculate g_{r,s}(x,y) in a similar way to the way we calculated f_m(x,y) earlier. Let (a,b) be minimal such that ax=by, and let z be the largest rational that divides both x and y, which tells us once again that x=bz and y=az. For d to contribute to the sum, we again need d to be of the form z/t for some positive integer t, but now we need at and bt to lie in the interval [r,s). Without loss of generality a\leq b. Then our requirements are that at\geq r and bt<s. Since t is an integer, this is equivalent to requiring that t\geq \lceil r/a\rceil and t<\lceil s/b\rceil.

The number of t that satisfy this is obviously (\lceil r/a \rceil - \lceil s/b\rceil)\vee 0. Again, the max with zero is extremely important to us, and not just because it makes the expression correct. So now we have our formula:

g_{r,s}(x,y)=(\lceil r/a\rceil - \lceil s/b\rceil)\vee 0.

As in the previous one-dimensional case, we have ugly integer parts to contend with, not to mention the max with zero. But once again we can simplify matters by looking at telescoping sums and the “derivative” of our function, except that this time we want two-dimensional analogues. What we shall do is this (I think it is fairly obviously the natural thing to do so I won’t try to justify it in detail). Define a function h_{r,s} by


We would now like to express an infinite linear combination \sum_{r<s}\mu_{r,s}g_{r,s} as a linear combination of the h_{r,s}. (Note that g_{r,s} is identically zero if r\geq s. Hence the range of summation.) Since we don’t actually know what the \mu_{r,s} are, let us just imagine that we have some coefficients \lambda_{r,s} that satisfy





But \sum_{r,s}\lambda_{r,s}h_{r,s}(x,y) turns out to be something we can understand very well, because the functions h_{r,s} are unexpectedly simple. Recall that

g_{r,s}(x,y)=(\lceil r/a\rceil - \lceil s/b\rceil)\vee 0,

where a and b are minimal such that ax=by. So in principle we can calculate h_{r,s}(x,y). This looks unpleasant until we make a couple of simple observations. First of all, note that g_{r,s} is constant on rectangles of sidelengths a and b. (Here I am taking a and b fixed and varying r and s.) More precisely, there is a tiling of \mathbb{N}^2 into such rectangles, and g_{r,s} is constant on each of these rectangles. As you go horizontally from one rectangle to another to its right, g_{r,s} increases by 1, and as you go vertically upwards it decreases by 1. If you put all these facts together, you find that h_{r,s}(x,y)=-1 on all multiples of (a,b) and is zero everywhere else.

What does this tell us about the coefficients \lambda_{r,s}? Well, we want \sum_{r<s}\lambda_{r,s}h_{r,s}(x,y) to be 1 if x=y and 0 otherwise. It follows that whenever a and b are coprime, with a\leq b, then we want \sum_u\lambda_{au,bu} to be 1 if a=b and 0 otherwise. This is very similar to our one-dimensional criterion, but with the huge difference that the sets along which we are summing the function \lambda_{r,s} are disjoint. (They are lines through the origin, or rather intersections of those lines with \mathbb{N}^2.)

We also have a smallness criterion. Recall that in the one-dimensional case we needed \sum_n|a_n-a_{n-1}| to be arbitrarily small (interpreting a_0 as 0). Here we need the very similar property that


is arbitrarily small. Let me briefly remark that there is nothing to stop us defining \lambda_{r,s} however we like for values of (r,s) with r\geq s, and it will be convenient to exploit this.

So there is a rather simple looking question: can we find a function \lambda:\mathbb{N}^2\rightarrow\mathbb{R} such that the “\ell_1-norm of the mixed partial derivatives” is arbitrarily small, the sum along the diagonal is 1, and the sum over all other lines of positive rational gradient is 0?

Somehow this question doesn’t feel as though it should be that hard, since the constraints don’t seem to interact all that much. So it would be natural to think that either there is a simple reason for no such \lambda existing, just as there was in the one-dimensional case, or there is a fairly simple construction of such an example.

Solving the two-dimensional question.

If there is a construction, how might one expect it to work? There are two reasonable possibilities: either one writes something quite clever straight down (perhaps involving trigonometric functions somehow to get the sums to be zero along the lines of gradient not equal to 1), or one goes for a just-do-it proof. My feeling (with the benefit of some hindsight) is that there are few enough constraints for a just-do-it approach to be appropriate, and in any case attempting a just-do-it proof can hardly fail to increase one’s understanding of the task, whether or not the attempt succeeds.

I should interrupt what I’m writing to say that I’m almost certain that a just-do-it approach works. By that I mean that I’m sitting here about to write down what I think is a correct proof, but that proof has been sitting in my head for the last day or so and has not been written down. And I’m pretty sure that my arguments that this would solve EDP can be tightened up and turned into a rigorous proof too, though that will need to be checked very carefully. So right now I don’t see why we don’t have a complete proof of EDP.

I should also at this point make a general remark about Polymath. I’ve been spending quite a lot of time thinking about the problem on my own, which is contrary to the spirit of Polymath and therefore demands an apology — but also an explanation. The explanation is that I have been trying to write this post for a long time — I started it nearly a week ago — but my efforts have been interrupted by plane journeys and ICM activities, including blogging the ICM. However, plane journeys, long waits for opening ceremonies, lonely moments in my hotel room, etc., are very conducive to mathematical thought, and since most of the argument I am writing here is rather simple (it was just a question of finding the right idea to get it started), I have found that the length of what I need to put in the post has been increasing faster than the length of the post itself. And one other mitigating factor is that there has been very little reaction to my recent comments on the previous post, where I first set out some of this approach (see in particular the two most recent comments), so I feel more entitled to go it alone for a while than I would have if the project had been roaring ahead at the sort of speed it was going at earlier in the year. Needless to say, the proof, if it turns out to work, is still very much a joint effort — this most recent argument depends crucially on the insights that have emerged from the more active discussion phase — and will be another paper of D. H. J. Polymath.

I must repeat that this post is being written over the course of many days. Several hours have passed since I finished the last paragraph, and during them I thought a bit harder about the step I was most worried about, which was dealing with the fact that the rationals are infinite and that at some point one must do some kind of truncation. I now think that that step will be more problematic than I had thought, and it seems to me to be about 50-50 whether it will be possible to get it to work. Having said that, I feel as though something non-trivial has emerged from the argument I shall complete in a moment, and I will be quite surprised (not to mention disappointed) if it leads nowhere. What I’m really saying is that I have an argument that does exactly what seems to be needed, except that the sums involved are a bit too formal. Now it could be that establishing convergence in some appropriate sense is where the real difficulty of the problem lies. In that case, this approach will be much less useful than I thought. But I am hoping that it is more of a technical problem. No time to think about it right now as I am going to a performance of A Disappearing Number, a play about which I shall undoubtedly have more to write later.


The first thing to say is that EDP definitely isn’t yet solved. I’ve also reformulated the problem again, in a way that I find clearer. I could give a long description of how the new perspective evolved from the one above, but I think instead I’ll just sketch what happened.

Solving the two-dimensional question, continued

The first thing to observe is that if we take the characteristic function of a rectangle, then the sum of the |\lambda_{r,s}-\lambda_{r,s+1}-\lambda_{r+1,s}+\lambda_{r+1,s+1}| is 4, since the only contributions come from the four corners.

Therefore, all we have to do is this. We shall build up an infinite matrix as a linear combination of rectangles — that is, functions of the form P\otimes Q, where P and Q are arithmetic progressions with common difference 1. We start by choosing a large integer k. We then put entries k^{-1} at every point (x,y)\in \{1,2,\dots,k\}^2. That ensures that the sum along the main diagonal is 1, provided that we do not do anything else to that diagonal. However, it also causes the sums along several other lines to be non-zero, so there is work to do. Note that the sum of coefficients so far is k^{-1} since there is just one rectangle and its coefficient is k^{-1}.

To do the rest of the work, let us enumerate all rationals greater than 1 as q_1,q_2,\dots and deal with each gradient in turn. What “deal with” means is that we shall add in some very small multiple of a rectangle in such a way that the sum along the line of the gradient we are dealing with is now zero, and will agree that from then on we will not change any entry in that line. Can we do this? Well, let’s suppose that we are dealing with the gradient q_i. So far, the sum of the entries of the matrix along the line of gradient q_i is t, say. So we must find a rectangle that contains many integer points along the line of gradient q_i. And we also need it not to intersect any line of gradient q_j with j<i. But if we go far enough up the line with gradient q_i, we will find ourselves at a very great distance from the lines we are trying to avoid, so we can find a very large rectangle that is disjoint from those lines and contains many integer points of the form (x,q_ix). If it contains h such points, then we make the coefficient of that rectangle equal to -h^{-1}t. Since we can get h to be as big as we like, we can get this coefficient to be as small as we like. So we have “dealt with” the gradient q_i, and the problem is solved.

Why doesn’t that solve EDP?

Let me start by saying why I thought it might solve EDP. I thought that if the sum of coefficients associated with each d was arbitrarily small, then that would be saying that in a certain sense the average weight of the coefficients was arbitrarily small. Since the average size of the diagonal entries is 1, this ought somehow to give us what we want.

To understand why this doesn’t work, and probably can’t be made to work, I find it helpful to reformulate the problem again. To do so, let us think about the operators \sum_dP_{d,r,s}\otimes P_{d,r,s} once again. (These first came in fairly near the beginning of the section entitled “Rectangular products.”) In particular, let us think what one of these operators does to the function \delta_x that is 1 at x and 0 everywhere else.

In fact, let us be slightly more general, as it will be useful to us later. What is the value of (P_{d,r,s}\otimes P_{d,u,v})(\delta_x) at y? For it to be non-zero we need x\in\{du,\dots,d(v-1)\} and y\in\{dr,\dots,d(s-1)\}, and then it is 1. So if we sum over all d, then we get the number of d with that property. Now each d with that property is of the form x/w for some w\in\{u,\dots,v-1\}, so when we sum over d we are counting the number of ways of writing y as tx/w for some w\in\{u,\dots,v-1\} and some t\in\{r,\dots,(s-1)\}.

That is, if we define g_{r,s,u,v}(z) to be the number of ways of writing z as t/w with r\leq t<s and u\leq w<v, then (P_{d,r,s}\otimes P_{d,u,v})(\delta_x)(y)=g_{r,s,u,v}(y/x). That is another way of saying that (P_{d,r,s}\otimes P_{d,u,v})(\delta_x)=g_{r,s,u,v}*\delta_x, where * is a multiplicative convolution.

If you don’t like following calculations but want to know what the moral of all that is, then here you are. Let us define the multiplicative convolution of two functions f and g to be the function h given by the formula


Then the operator \sum_d P_{d,r,s}\otimes P_{d,u,v} takes a function f to the multiplicative convolution of f with the function g_{r,s,u,v}, which is itself the multiplicative convolution of the characteristic functions of the sets \{r,r+1,\dots,s-1\} and \{u^{-1},(u+1)^{-1},\dots,(v-1)^{-1}\}. (That is, as I’ve already said, g_{r,s,u,v}(z) is the number of ways of writing z as a fraction with numerator in \{r,r+1,\dots,s-1\} and denominator in \{u,u+1,\dots,v-1\}.)

Now we’re trying to express the identity as a small linear combination of these convolution operators, so we can turn the problem into a one-dimensional problem: to write the function \delta_1 (which is the identity for the multiplicative convolution) as a small linear combination of functions g_{r,s,u,v}.

I am about 99% sure that if we could do that with a finite linear combination then we would have a proof of EDP. I used to think that “whatever works for finite should also work for infinite but with absolutely summable coefficients,” but it seems that this is wrong because even if the coefficients can be made very small, the functions themselves may be getting quite large in the norms we care about, so the convergence may be much too weak.

That does indeed seem to be the case here. If you convert the just-do-it argument above into an argument to express \delta_1 in terms of the functions g_{r,s,u,v}, then you get pointwise (absolute) convergence, but that is not strong enough.

Let me just elaborate on the remark that this problem can be solved directly by means of a just-do-it argument. I won’t give the full argument, but I’ll just point out that for each rational q=a/b you have the option of doing the following. You choose a very large integer m and a very much larger integer n. You then set P=\{an,an+1,an+2\dots,a(n+m-1)\} and Q=\{bn,bn+1,\dots,b(n+m-1)\}. Since n is much larger than m, r/s is very close to a/b for every r\in P and s\in Q. Moreover, the number of pairs (r,s)\in P\times Q such that r/s=a/b is m (assuming that a and b are coprime, that is). So if we want to get rid of a value \alpha at the point q, then we can subtract the function \alpha m^{-1}P*Q^{-1}, which has a small coefficient and is supported on only a very small neighbourhood of q.

What is not so good about it is that the \ell_1 norm of m^{-1}P*Q^{-1} is roughly abm, which is not small at all.

Thinking slightly further about this, I see that what I said above about pointwise convergence was a bit too pessimistic: for example, we can easily organize for the convergence to be uniform. And indeed, if we think of m^{-1}P*Q^{-1} as being, in its broad shape, roughly like a set of size close … actually, what is it like?

Rational numbers sort of repel each other, so if we’ve devised P and Q such that all points in PQ^{-1} are close to a/b, then all points that are not actually equal to a/b will tend to have rather large denominators, and therefore not to occur very frequently. So let’s hazard a guess that for the purposes of thinking about \ell_p norms, we can think of P*Q^{-1} as having a spike at a/b where the value is m and abm^2 other values that are all around 1. That would mean that the pth power of the \ell_p norm was around m^p+abm^2. Thinking of a and b as constants, it looks as though we might expect some kind of good behaviour to kick in when p passes 2. Remembering that we’re actually interested in m^{-1} times the \ell_p norm, we get something like 1+m^{2/p-1}. But if the value \alpha we were trying to get rid of was small, we can multiply that by a small factor \alpha. Ah, but the problem is that there are an awful lot of those \alphas to worry about, and they add up in an \ell_1-ish way.

What that boils down to is that this kind of greedy keep-away-from-everything-else approach doesn’t give us any kind of useful convergence, which we need if we are to get the averaging argument to work. I realize that I haven’t been too clear about what kind of convergence would work, but there is one kind that is definitely OK, and that is if we have a finite linear combination. So let me now ask a completely precise question.

A sufficient (I’m almost sure) condition for EDP

Let P and Q be subsets of \mathbb{N}. We define P*Q^{-1} to be the function whose value at x is the number of ways of writing x as r/s with r\in P and s\in Q. I claim that EDP follows if for every c>0 there exists a way of writing the function \delta_1 (which is defined on \mathbb{Q} and takes the value 1 at 1 and 0 everywhere else) as a finite linear combination \sum_i\lambda_iP_i*Q_i^{-1} where the P_i and Q_i are intervals of positive integers and \sum_i|\lambda_i|\leq c.


In this post I have been looking at the representation-of-diagonals approach over the rationals with the extra symmetry constraint that the HAP products are all of the form P_{d,r,s}\otimes P_{d,u,v}, where P_{d,r,s}=\{dr,d(r+1),\dots,d(s-1)\} and that the coefficient of P_{d,r,s}\otimes P_{d,u,v}, is independent of d. This symmetry condition forces the diagonal map we are trying to represent to be the identity. (As I write this it occurs to me that we might be able to take an arbitrary efficient representation of a diagonal matrix and average it over all multiples in order to obtain an efficient representation of the identity. So perhaps this “new” approach is equivalent to the old one — which would be quite encouraging as it would increase the chances that the new one can be made to work.)

Thinking through what this means, we find that we can reformulate the problem and obtain the question asked in the previous section.

Where next for EDP?

To finish, I’d like to ask if anyone has views about how they would like the EDP project to proceed. We are in uncharted waters at the moment: we have a project that started with a burst of activity — in fact it was sustained enough to count as more than just a burst — but now it has slowed right down and I’m not sure it will ever regain anything like its earlier intensity. And yet I myself am still enthusiastic about it, and perhaps I’m not alone in that.

My question is what should happen if it turns out that there are only two or three people, or even just one, who feel like working on the problem. In that case, there are various options. We could simply continue to post comments on this blog and not worry if there are very few participants or if the rate of posting is small. Or we could move the project to a different phase, more like a normal collaboration, with longer periods of private thought, but from time to time write comments or posts to make sure that any important developments are kept public (so that, in particular, anyone who wanted to join in would be free to do so). Or we could turn the project into a completely conventional collaboration, always with the understanding that the author of the eventual paper would be D. H. J. Polymath. And that’s not quite the most conventional it could go. The final possibility would be to close the project altogether, write a paper summarizing all the partial results, and then “free up” the problem for anyone who wants to think about it privately — if they used any of Polymath’s ideas they could simply refer to the paper that contained those ideas.

I think I favour keeping the project going as a Polymath project but with longer periods of private thought, but I don’t favour this strongly enough that I couldn’t be persuaded to change my mind if others feel differently. If that is the way we go, then my final question is, are there people out there who would like to be part of a small group of reasonably dedicated participants, or am I on my own now?

101 Responses to “EDP18 — apparently P does not equal NP”

  1. Jason Dyer Says:

    I have made some progress on the graph theoretic approach but I have not written anything up because of lack of time. If your offer still stands to make a post outlining how a new approach would work I’d be willing to give it a go, but I’m not sure if that would inject any new energy.

  2. Kristal Cantwell Says:

    I think keeping this as a polymath project with longer periods of thought is a reasonable way to continue. This project interests me but I don’t know how much I can contribute to it. I don’t know what other people’s level of contribution will be.

  3. Polymath5 « Euclidean Ramsey Theory Says:

    […] Polymath5 By kristalcantwell There is a new thread for Polymath5 here. […]

  4. Thomas Sauvaget Says:

    Since now it is the reverse from the beginning (when the rate of comments was driving the rate of new posts), there’s the new possibility of setting a fairly regular pace for new posts, like a real weekly workgroup. This might then have a beneficial effect on the commitment of the participants (allowing it to fit in one’s regular schedule).

    So the transition to a polymath with longer periods of thought seems very natural. Also, keeping this as a polymath would allow students to still learn from your thought process, and any non-working ideas to be recorded. (I’m probably not able to contribute anything apart from a little numerics and plots, but would be very happy to learn how the problem can be solved).

  5. gowers Says:

    I don’t know whether everyone who wants to express an opinion about this has done so, but I’m not getting the impression that people are falling over themselves to participate fully, and I’m also getting the impression that if that is the case, then people wouldn’t mind if I worked privately but wrote regular posts to say how I was getting on and give details about any new thoughts I might have had — always on the understanding that anybody who wanted to contribute could at any stage join in, and that if suddenly four or five people became enthusiastic then the collaboration would revert to the old style of working.

    Incidentally, I find that one of the good things about Polymath that is quite independent of the number of people participating is that it forces you to make your thoughts clear enough to be explained to other people. So even a monomath project has some value.

    One other thing I wanted to say is that I sometimes worry that when I write a post such as the one above that contains some calculations, people are a little bit put off by the calculations and feel that to contribute they would need to invest a large amount of effort just to understand what on earth I was saying. I tried to write the last few sections so that one could understand those and ignore the rest of the post, but I realize that there is more that I could have said. So I’d like to ask whether this is partly the case. Should I be trying to write something like a paper, with precise results (along the lines of “If we could do A then we could do B”), precise conjectures, and plenty of chat in between? I’m tempted to do that anyway, because I think it could easily be modified to form part of whatever paper results from the project. (I’m assuming that there will be some kind of paper, whether or not the problem is solved.)

    • Christian Says:

      Yes, I think it would be very interesting and useful to have “something like a paper” about where exactly you currently are and what the next steps ought to be.

    • Klas Markström Says:

      I’m still interested in the project but I haven’t had much time to devote to it lately.

    • gowers Says:

      I’m now preparing a post in which I hope to give a rigorous proof of why the decomposition discussed above would imply EDP if it existed for arbitrarily small c. I think we now have a genuinely new avenue to explore, and that just like most of the other avenues it lends itself to a mixture of theoretical and numerical investigations.

  6. Alec Edgington Says:

    Do we have such a finite decomposition of \delta_1 for any c < 1?

    • gowers Says:

      That’s a good question, to which I do not know the answer. It fits well with something I was thinking a little earlier, which is that there might be more hope of finding interesting decompositions with this one-dimensional problem than there was with the earlier two-dimensional problem.

      The general problem is to represent one vector v as a linear combination \sum_i\lambda_iw_i with \sum_i|\lambda_i| as small as possible. I haven’t quite worked out what that is — is it simply a linear programming problem? Or is it a dual linear programming problem? In particular, is it a problem for which there are very efficient packages out there waiting to be used? If so, then perhaps there is a hope of using a mixture of theory and experiment to prove a lower bound for discrepancy that’s better than our current record of 3. But even if we didn’t manage that, I think any decomposition with c<1 would be interesting. (I don’t rule out, however, that there is some fairly simple decomposition that gives something like c=1/2 and that the real interest starts below that.)

      One other thing I’m not quite clear about is the extent to which examples such as the sequences of length 1124 translate into constraints on the decompositions one can hope to find. The reason for that is that passing to the rationals means that there isn’t a completely instant quantitative translation from one problem to the other. In principle, that might be good news as it might mean that in order to obtain a decomposition with, say, c=1/5, one would not have to look at functions with huge supports.

    • Alec Edgington Says:

      It looks like a convex optimization problem with linear constraints, which should be solvable quite efficiently – although the number of variables is large (of order n^4 if we consider all subintervals of \{1, \ldots, n\}). I’ll have a go at it with some small n and see what happens …

    • gowers Says:

      I think one can get away with just n^2 variables by considering functions just of the form g_{r,s,r,s}. The reason is that we have a sort of polarization identity here. (I think I made the same point but in a different context somewhere in the post above.) First of all, note that the value of g_{r,s,u,v} at x is the same as the value of g_{u,v,r,s} at x^{-1}. Therefore, given any decomposition of \delta_1 we can replace each g_{r,s,u,v} by (g_{r,s,u,v}+g_{u,v,r,s})/2 and obtain a symmetric decomposition: that is, one that always gives the same coefficient to g_{r,s,u,v} that it gives to g_{u,v,r,s}.

      Next, notice that, equating sets with their characteristic functions, [r,s]\times[u,v]+[u,v]\times[r,s] is equal to


      To convert these functions into functions defined on \mathbb{Q} we just think of each point of \mathbb{N}^2 as a rational number and sum up. (By “sum up” I mean that for a rational like 2/3 we would add up the values at (2,3), (4,6), (6,9) etc.) Therefore, g_{r,s,u,v}+g_{u,v,r,s} is equal to


      Thus, if we can decompose \delta_1 into functions of the form g_{r,s,u,v} with sum of coefficients at most c, then we can decompose into functions of the form g_{r,s,r,s} with sum of coefficients at most 4c.

      It’s not clear to me whether for small n the gain you get from having only n^2 variables is more or less important than that factor 4. But it certainly allows you to look at much larger n, so seems worth trying — after all, if it is possible to decompose \delta_1 into functions of the form g_{r,s,r,s} with a reasonably small constant c then one has a stronger result.

    • Alec Edgington Says:

      Ah yes, that looks useful.

      In fact the problem can be expressed as a linear optimization problem, at the expense of doubling the number of variables and adding some constraints: one just has to write \lambda_i = x_i^+ - x_i^-, with x_i^+, x_i^- \geq 0, and minimize \sum_i (x_i^+ + x_i^-).

      I’ve written a program in Python using the ‘cvxopt’ package to solve this. For a first attempt I allowed all possible pairs of intervals. It turns out that \sum_i \lvert \lambda_i \rvert < 1 is achievable, using subsets of \{1, \ldots, 6\}. The program came up with the solution: P_1 = \{1,2,3\}, P_2 = \{5,6\}, P_3 = \{2,3,4,5\}, P_4 = \{6\}, and \delta_1 = \frac15 (P_0/P_0 + P_1/P_1 - P_3/P_4 - P_4/P_3) (to abbreviate your notation slightly). It’s easy to check that this works.

      I’ll do some more investigation …

    • Alec Edgington Says:

      Sorry, I meant to write:

      \delta_1 = \frac15 (P_1/P_1 + P_2/P_2 - P_3/P_4 - P_4/P_3)

    • gowers Says:

      Simple as it is, I find that example rather encouraging because it genuinely relies on the new symmetric set-up, in the sense that the function P_1^2+P_2^2-P_3\times P_4-P_4\times P_3 is not supported on the diagonal of \mathbb{N}^2.

      I suppose one should check whether a tensor-power trick can be used to convert this example into one that gives you a constant (4/5)^k for arbitrary k. It would be a bit of a miracle if it did.

      Hmm … I can’t really do this in my head, but it feels as though the answer is that it doesn’t, because a product of HAPs is a multidimensional HAP. But I may check this just in case something can be got to work.

      Hmm again … can one do anything by taking the kth multiplicative convolution power of both sides of your identity? The right-hand side would not immediately be of the right form, but it’s not instantly clear to me that it can’t be decomposed into something of the right form. We would have 4^k terms, each multiplied by 5^{-k}, so we would need to find a way of decomposing each term into functions of the desired form with coefficients adding up to significantly less than (5/4)^k.

      I’m putting that idea up in an old-style Polymath spirit — I’m not sure when I’ll be able to get round to thinking about it properly.

    • Alec Edgington Says:

      A few more data. Optimizing over all possible pairs of intervals, new records for \sum \lvert \lambda_i \rvert are set at: n = 6 (\frac{4}{5}), n = 9 (\frac{7}{9}), n = 12 (\frac{3}{4}), n = 15 (\frac{17}{23}), n = 16 (\frac{5}{7}).

      Restricting to symmetric rectangles P/P, we first attain \sum \lvert \lambda_i \rvert < 1 at n = 12. The solution here is: P_1 = \{2, 3\}, P_2 = \{3, 4\}, P_3 = \{5, 6\}, P_4 = \{8, \ldots, 11\}, P_5 = \{8, \ldots, 12\}, P_6 = \{11, 12\}, and

      \delta_1 = \frac17 (\overline{P_1} + \overline{P_2} + \overline{P_3} + \overline{P_4} - \overline{P_5} + \overline{P_6})

      where \overline{P} is shorthand for P/P.

      This attains \frac67. The next record-breaker for the symmetric case is n = 42, where we attain \frac{11}{13}, with a combination of 11 intervals each with coefficient \pm \frac{1}{13}.

    • Alec Edgington Says:

      … And the next is n = 52, where we attain \frac{4}{5}, with a combination of 12 intervals each with coefficient \pm \frac{1}{15}.

      And the next is n = 60, where we attain \frac{8}{11}, with a combination of 8 intervals each with coefficient \pm \frac{1}{11}. For the record:

      \delta_1 = \frac{1}{11} (\overline{[9,10]} + \overline{[11,12]} + \overline{[14,15]} + \overline{[19,20]} + \overline{[29,30]} + \overline{[54,59]} - \overline{[54,60]} + \overline{[59,60]})

    • Alec Edgington Says:

      Hmm, there’s a pattern in that last decomposition. It’s:

      \delta_1 = \frac{1}{11} ( \sum_{r=1}^6 \overline{[\frac{N}{r}-1, \frac{N}{r}]} + \overline{[\frac{9}{10} N, N-1]} - \overline{[\frac{9}{10} N, N]})

      No time to think more about this now, but it seems quite encouraging …

    • gowers Says:

      I find the mere existence of these decompositions, whether patterned or not (though obviously patterns dangle in front of us the hope of generalization), pretty encouraging, though at this stage it’s not clear how encouraging they are. One thing I’m also finding is that, despite the appeal of having a 1D formulation, it’s perhaps easier to go back to the (equivalent) 2D set-up with lines in \mathbb{N}^2. What I mean is that if we want to check that a decomposition \sum_i\lambda_i\overline{P_i} works, then the easiest way of doing so (by hand, that is) is to look at the function \sum_i\lambda_iP_i\times P_i and check that the sum along the main diagonal is 1 and the sum along all other lines through the origin is 0.

      What I like about this is that in a certain sense it is strictly easier than what we were trying to do before. That is, before, we were trying to represent a diagonal as a linear combination of HAP products. Now all we need is that if you stand at the origin and do a sort of scan, it looks as though you have a diagonal matrix, even though in fact you may not. Another way of putting that is that you have lines going through the origin as test functions and if the inner product of \sum_i\lambda_iP_i\times P_i with any one of those lines is what it would be if you had a diagonal matrix, then you count the decomposition as being OK. And that’s a lot weaker than requiring the decomposition to give you a diagonal matrix.

      What I’ll do when I get the chance, unless you’ve already done so by then, is examine that pattern and see if it can be generalized. I think it is no coincidence that the record occurred at a nice smooth number like 60: if you have pairs (p,q) where p and q are large primes, then they are hard to deal with, but the interval [54,60] is full of smooth numbers. (OK there’s a bit of a problem with 59, but the contribution from [59,60] deals with that.)

  7. Christian Says:

    Alec, as you’ve probably worked out in the meantime for yourself, this idea brings us down to c=1/2+\epsilon. Why? Take any large natural number m, set n=m!, and note that (2m-2)\delta_1=\sum_{r=1}^{r=m}\overline{P_r}+\overline{Q}-\overline{R}, where P_r=[n/r-1, n/r] for r=1, 2, \ldots, m, Q=[n-m, n-1] and R=[n-m, n]. This yields a value of (m+2)/(2m-2) for c, which, of course, tends to 1/2 as m goes to infinity.

    • gowers Says:

      Christian, I’ve just found this comment of yours amongst my spam. I’ve no idea why it went there as it didn’t seem to have any obvious spam-like features, so apologies for that.

      One other small remark is that if you want the TeX to compile you have to write the word “latex” immediately after the dollar sign. For instance, to produce \int_0^1f(x)dx I would write “$***** \int_0^1f(x)dx$” except that instead of “*****” I would write “latex”. I have edited your comment by inserting the “latex”s.

    • Christian Says:

      Tim, many thanks for editing the formulae, and for the explanation of how to do so myself in the future. The following equation is just meant as “experiment” without relevance to EDP: \nu(C_n\oplus C_n)=2n-2

  8. Thomas Sauvaget Says:

    Small remark: Alec’s pattern is interesting, but it’s not clear how to generalize it so as to decrease c. Since a small c=\frac{u}{v} means u \ll v, and since u governs the number of intervals while v is the number of times we hit the diagonal (if I understood correctly), then somehow we should have “not many” intervals. So I’m not sure what to do with the sum of many small intervals in Alec’s example (at least it shouldn’t grow too fast as c decreases).

    • gowers Says:

      I’m not sure I fully follow what you are saying, but I agree with you to the extent that it looks very unlikely that one could solve the problem with one big interval that is cancelled out, except on the diagonal, by lots of small ones.

    • gowers Says:

      Let me try to put what you say in my own words, because I’m not sure I know what you mean by u and v. Suppose we have intervals P_i and coefficients \lambda_i (i=1,2,\dots,k) such that \sum_i\lambda_iP_i sums to zero along every line with gradient not equal to 1. What value of c does it give us? Well, the sum along the main diagonal is \sum_i\lambda_i|P_i|, and the sum of the absolute values of the coefficients is \sum_i|\lambda_i|. So we get \sum_i|\lambda_i|/\sum_i\lambda_i|P_i|. In the special case where each \lambda_i is \pm 1, we get k/\sum_i\lambda_i|P_i|. To make that more transparent, if the sum is \overline{P_1}+\dots+\overline{P_r}-\overline{Q_1}-\dots-\overline{Q_s}, then we get (r+s)/(\sum |P_i|-\sum|Q_j|). So we want a significant imbalance in the sizes of the P_i and Q_j.

      Another necessary condition is that the sum of \sum_i\lambda_iP_i\otimes P_i off the diagonal is zero. (To see this, just sum the sums over all non-diagonal lines.) In the case of the P_i and Q_j, this tells us that \sum_i|P_i|^2-\sum_i|P_i|=\sum_j|Q_j|^2-\sum_j|Q_j|. So, roughly speaking, we want the squares of the sizes of the negatively counted intervals and the positively counted intervals to be well balanced, but the sizes themselves not to be well balanced.

      A small thought is that this puts a number-theoretic constraint on the sizes of the P_i and Q_i that could perhaps be used to narrow down the search, especially if one just searches for \pm 1 sums.

      Going back to Thomas’s point, if we have one big interval, of size m, say, and try to cancel it out with lots of little intervals of bounded size, then it won’t work because we’ll need \theta m^2 little intervals in order to get the off-diagonal sum equal to zero, and then we have k=\theta m^2 and the imbalance in the sums of the sizes of the intervals equal to Cm^2 - m=Cm^2, so we will never do better than some absolute constant for c. But if our little intervals have size \log m, say, then we have roughly (m/\log m)^2 of them, and an imbalance of more like m^2/\log m. So that gives us c=1/\log m.

      The point I’m making here is that for all we know it might be possible to prove EDP by finding one big interval and cancelling it out using lots of small intervals, provided only that the sizes of the small intervals tend to infinity. (I call them small because I don’t care if they have size \log\log\log m, where m is the size of the big interval.)

    • Thomas Sauvaget Says:

      Sorry for the sloppy notation and poor wording, and thank you for these details, and the other constraints. Yes that’s what I had in mind: the interplay between the number of intervals r+s=u and the behaviour \sum_i|P_i|-\sum_i|Q_j|=v on the diagonal.

    • Alec Edgington Says:

      Yes, the pattern only generalizes in an obvious way to the extent of giving us c arbitraily close to \frac12. Indeed, for any k, we can write

      \delta_1 = \frac{1}{2k-1} (\sum_{r=1}^k \overline{[\frac{k!}{r}-1, \frac{k!}{r}]} + \overline{[k!-k, k!-1]} - \overline{[k!-k, k!]})

      since each of the off-diagonal terms in the 2 \times 2 blocks is cancelled by a term in the difference between the two large blocks.

      Perhaps this merely confirms Tim’s suggestion that the problem might only become interesting for c < \frac12 or something. But it seems worth asking whether we can generalize the pattern further to give us c \sim 1/m by taking a sum of blocks of size m plus a small number of larger blocks – although there may be a simple reason why that can’t work.

  9. Gil Kalai Says:

    At some point I also thought of writing up a post going back to some earlier avenues and questions. The Decomposition approach and related SDP and LP is very nice and it has several features that makes it the most promising, but sometime down the road it will be useful to think or write down some older approaches and problems. In any case, keeping this project open for several additional months seems preferable to me.

    • gowers Says:

      Any efforts in that direction would be greatly appreciated. For instance, one thing that I would be sorry not to return to is the behaviour of multiplicative functions produced by various greedy and not so greedy algorithms. There seemed to be some very interesting phenomena there that we failed to understand even heuristically.

      I haven’t pinned this down 100% and have therefore not said too much about it, but on a couple of occasions I have had thoughts related to decompositions that seem to lead either to the very nice observation that Terry made concerning reducing the problem to one about multiplicative functions or to something very like it. So it may be that there will be some kind of convergence of the different approaches. Another idea that it would be good to resurrect is the one that I know interests you, and interest me too — of trying to show that EDP is true for wide classes of sequences, such as sequences that do not correlate with character-like functions. A partial result like that, especially if it could be proved with a better bound than can be expected for EDP in general, could be very enlightening.

  10. gowers Says:

    Just to summarize what Alec obtains in this comment and the build-up to it, we now have a method that’s in retrospect fairly simple for getting a decomposition that will get us down to c=1/2. The idea is this. You choose N that is divisible by all of 1,2,\dots,k. You then take the difference between the square [N-k,N]^2 and the square [N-k,N-1]^2. This gives you all points of the form (N-j,N) or (N,N-j) with 0\leq j\leq k. We now want to cancel out all of those except the point (N,N). To cancel out the points (N-j,N) and (N,N-j) you use the square [N/j-1,N/j]^2. The reason that works is that the line through the point (N/j-1,N/j) also goes through the point (N-j,N) and it goes through none of the other points that we have left. And similarly for the reflected line on the other side of the diagonal. So we end up doing all the calculations we wanted to do.

    As a little experiment, let’s think what would happen if we took the square [N/j-2,N]^2 instead. Now we have a few more lines going through points in the square. The points above the main diagonal are (N/j-2,N/j-1), (N/j-2,N/j) and (N/j-1,N/j). Multiplying all these through by j gives us the points (N-2j,N-j), (N-2j,N) and (N-j,N).

    Ah, that itself isn’t ideal, but what if k is even and we take the square [N/2-k/2,N/2]^2 and subtract the square [N/2-k/2,N/2-1]^2? We can use that to knock out all points of the form (N,N-j) with j even, I think. If we get rid of all the others in a trivial way, does that improve c? And if it does, can we continue along similar lines and get c smaller still? I have to stop for the day, so I can’t think about this just yet.

  11. gowers Says:

    I’d like to mention an idea that I think is unlikely to work, but it feels like the next obvious thing to try after Alec’s construction. His construction works by taking what I like to think of as a sort of rotated-L-shaped “fly trap” [N-k,N]^2-[N-k,N-1]^2 and making sure that exactly one fly gets caught in each place. A “fly” is a point in one of the other squares that flies out along a straight line that starts at the origin.

    The “trivial” method of achieving this is to have exactly one fly (or rather, symmetric-about-diagonal pair of flies) per little square, and to make N so big and smooth that we can find such a thing for every point of the form (N-j,N) or (N,N-j). But that will never get us c better than 1/2 because we have so many squares of width 2. So we want a construction where most of the squares are significantly bigger.

    What I’m wondering is whether we can keep the fly trap idea, but do it much more economically by trying to hit almost all points in the trap rather than all of them. The basic idea would be that we could take squares of unbounded size and use those to create enough flies to hit a proportion 1-o(1) of the points in the trap, and then deal with the remaining points in a completely trivial way, knocking them out one by one. The trivial part would be uneconomical, but since the proportion of points belonging to the trivial part would be small, this wouldn’t matter.

    To give a slightly more precise idea of how a fly trap might catch the flies from a larger square, let us consider a square [r,s]^2 where s-r is of unbounded size (but we can think of it as a very large constant) and much smaller than r.

    Now let’s suppose that N is divisible by every number from r to s. Then every point (a,b)\in[r,s]^2 with a\leq b has some multiple of the form (u,N) with u\leq N, and every point (a,b)\in[r,s]^2 with a\geq b has some multiple of the form (N,u) with u\leq N. These points belong to a “fly trap”, but the difference now is that the width of the fly trap is much bigger (as a function of N) than it was in Alec’s construction: in order for N to be divisible by every number between r and s, it must be very much bigger than r or s; but the width of the trap is at least (s-r)N/r, or thereabouts.

    What this demonstrates is merely that it is possible for all the points in a rectangle of unbounded size to get caught in a fly trap. What it does not show is that we can use up more than a handful of points in the fly trap this way. My guess is that we can’t — that’s what I plan to think about next.

    (General remark: I’m temporarily back in normal Polymath mode, since there is some interaction going on.)

  12. Alec Edgington Says:

    I agree that finding a decomposition with c < \frac12 would be very interesting. Getting 3 \times 3 blocks to cancel seems to be much harder than with 2 \times 2 blocks!

    Here’s just a tiny thought in that direction. Closely related to the decomposition

    \sum_{r=1}^k [\frac{N}{r} - 1, \frac{N}{r}]^2 + [N-k, N-1]^2 - [N-k, N]^2

    is the decomposition

    \sum_{r=1}^k [\frac{N}{r}, \frac{N}{r} + 1]^2 + [N+1, N+k]^2 - [N, N+k]^2.

    If we add these two decompositions together, the pairs of terms [\frac{N}{r} - 1, \frac{N}{r}]^2 + [\frac{N}{r}, \frac{N}{r} + 1]^2 fill all but two of the points of a 3 \times 3 square, [\frac{N}{r} - 1, \frac{N}{r} + 1]^2, with the middle one counted twice.

    What if we try to form a decomposition with the sum

    \sum_{r=1}^{k} [\frac{N}{r} - 1, \frac{N}{r} + 1]^2 + [N-k, N-1]^2 - [N-k, N]^2 + [N+1, N+k]^2 - [N, N+k]^2?

    We’re left with the terms (\frac{N}{r} - 1, \frac{N}{r} + 1) and (\frac{N}{r} + 1, \frac{N}{r} - 1). But I can’t see how to construct a fly-trap to cancel these.

  13. gowers Says:

    One thought I had about fly traps is that we should expect them to be difficult to construct. To see what I mean by this, let’s imagine that we tried to create a fly trap out of a square [r,s]^2. If we just choose a fairly random pair of integers r and s (perhaps with s not much bigger than r) then we can expect [r,s]^2 to contain many pairs (x,y) of coprime integers, since there is a positive probability that two random integers are coprime. OK, they’re not quite random here because they are fairly close to each other, but I don’t think that makes a substantial difference. Indeed, if |x-y|=1 then they are guaranteed to be coprime, if |x-y|=2 and they are both odd then they are guaranteed to be coprime, and in general if |x-y|=d and x is coprime to d then they are guaranteed to be coprime. So it seems to me that if s-r isn’t tiny then we get a rich supply of coprime pairs in the square [s-r]^2.

    The problem with coprime pairs is that there is nothing for them to trap, since each one is the first integer point along the line from the origin.

    So how might we create a nice big and fairly simple rectilinear region that does not contain lots of coprime pairs? So far we have used the following simple technique: find a very smooth N and simply take an inverted L shape with its corner at (N,N). Each point in that shape will have a coordinate equal to N, and since N has lots of small factors, it is quite hard to be coprime to N.

    Now suppose we wanted to thicken up that inverted L shape a bit. What we would ideally like is an interval [N,N+r] such that most of the points in it are smooth enough to have common factors with almost all positive integers. But that gets us into worrying ABC territory — we can expect it to be difficult to cram lots of highly smooth integers into a small interval. (I’ve now looked up the ABC conjecture and I’m not sure after all that it is relevant. But I still don’t see how to get an interval that’s full of integers each with lots of distinct small prime factors, which is what would be needed for this to work.)

    A thickened inverted L can be thought of as a whole lot of inverted Ls all pushed together. It has the advantage that it can be represented very efficiently in terms of rectangles, but it’s not clear that we need such extreme efficiency, so perhaps a better idea is not to insist that they are all pushed together. That avoids the number-theoretic problems just discussed. So the plan would be something like this. First choose some largish positive integer m. Next, find a number of large integers N, all around the same size, and all of them products of some of the first m primes (not necessarily avoiding multiplicity). Since the sum of the reciprocals of the primes diverges, we can choose m large enough that \prod_{i=1}^m(1-p_i^{-1}) is very small, and if we do that then even if A is a random subset of \{1,2,\dots,m\} then \prod_{i\in A}(1-p_i^{-1}) will be small. So in that way we have a rich supply of integers that are of size somewhere around (p_1\dots p_m)^{1/2} and that are coprime to only a very small proportion of other integers.

    We could use these to create a sort of reception committee of fly traps. (The phrase “reception committee” is chosen for the benefit of Cambridge backgammon fans.)

    What I would be hoping for here is something like this. Suppose we now take a smallish square [u,v]^2. We would like it to cancel out some of the points in all those fly traps. Let’s consider a typical point (a,b)\in[u,v]^2 and let us suppose that u and v are of roughly the same size. Then as we take multiples of (a,b) we will expect them to hit any given fly trap with probability roughly u^{-1}. So we might hope that if we have about u fly traps, then we will typically expect to hit one and only one of them. But even better might be to have Cu fly traps for some large constant C so that concentration of measure could kick in and tell us that with high probability we will hit about C of the fly traps. What I’m hoping for here is that we could use probabilistic methods: we choose a bunch of random small squares [u,v]^2 and argue that the points in the fly traps mostly get hit about the same number of times. (I am very far from sure that this is correct, since it ought to depend on “how non-coprime” the coordinates of the points in the flytraps are. But perhaps some Erdős-Kac type phenomenon would tell us that most points were about the same in this respect.)

    The general idea here would be to hit most of the points in the fly traps about the same number of times, and then divide through the small squares by that number and subtract. This would leave us with an approximate cancellation, and the remaining error could be mopped up with a trivial decomposition.

    This is all a bit of a fantasy at the moment, but I think it’s the kind of thing we need to try.

  14. gowers Says:

    Here are a few more thoughts about the reception committee of fly traps idea.

    I won’t give the full details, but at first I thought I had an argument to show that it couldn’t work, but then I realized that it might just be possible, though I am at the moment ignoring certain aspects of the problem that are probably important. To be precise, I am ignoring the question of whether we can find sufficiently many smooth large integers that are all reasonably close to each other.

    The parameters we need to choose are the number of inverted Ls and their lengths. (To be clear, let us define the length of the set of integer points along the lines from (N,N) to (N,N-M) and (N-M,N) to be M. And let’s call that set L_{N,M}. After a bit of experimenting, I realized that we need really rather a lot of layers to the fly trap. Let me try to explain roughly why. Given a point in L_{N,M} that is not on the main diagonal, it is within M of the point (N,N), so there cannot be any positive integer points on the line joining that point to the origin with x coordinate less than N/M (since then it would be within less than 1 of the main diagonal). But that means that the probability of a multiple of a point in one of the small squares hitting a given inverted L can be at most M/N, which means that for the argument to work we will need at least N/M layers.

    I won’t say more, but those are the kinds of calculations that underlie the details of the following suggestion. I would like to choose roughly N/M inverted Ls of length roughly M between N and 2N. I would like them all to be of the form L_{N_i,M_i} with each N_i having enough small prime factors for almost all positive integers to have common factors with N_i. Note that each inverted L can be created with a contribution of 2 to the sum of the absolute values of the coefficients, so so far we have got up to 2N/M for this.

    Now for the cancelling part. This we shall create as a whole lot of fairly randomly chosen squares [r_i,s_i]^2 where each r_i has order of magnitude N and each s_i-r_i has order of magnitude M^{1/2}. That means that each square contains roughly M points off the diagonal, so in order to do the cancellation, we need about N/M of them. (In practice, I would expect to take more like kN/M of them and divide them all by k, just to smooth things out a bit.) If the probabilistic heuristic works, then each point will … oh dear, I’ve made a mistake.

    My mistake is that if we have N/M squares of width M^{1/2} then we’ll be forced to go all the way up to N/M^{1/2}, which means that for the probabilistic heuristic to work we’ll need N/M^{1/2} layers. Let me go back and do a few more calculations.

  15. gowers Says:

    This deserves a new comment. Let’s suppose that we have T inverted Ls of width M between N and 2N. Then the number of points we have to cancel is TM. (I’m stating everything up to an absolute constant.) For the probabilistic heuristic to work, we need the points we use to do the cancelling to have coordinates bounded above by T (or else the probability that they hit one of the layers of the reception committee is significantly less than 1 and we end up with lots of uncancelled points). But the number of points with coordinates bounded above by T that are close enough to the diagonal to cancel any of the points in the reception committee is at most T(TM/N)=T^2M/N. So we need T^2M/N to be at least TM, which implies that we need T to be at least N. But that’s a problem because we now have so many points we want to cancel that we are forced to have the little squares centred at points that are higher up than (N,N).

    It’s just conceivable that we can get round this by showing that if we choose our inverted Ls to be at fairly smooth heights, then the probability of hitting them is significantly larger. (There is some plausibility to this: for instance, if you go to the opposite extreme and take them at prime height then you cannot hit them.) However, the question in my mind then is whether this increase in probability is enough to compensate for the fact that there are fewer layers.

  16. gowers Says:

    There are all sorts of difficulties emerging with this basic idea, though I still don’t have a strong feeling about whether they are completely insuperable. Another thought is that if we are using a fairly random selection of small squares, then we will want the points (x,y) in the layers to have roughly the same size highest common factor. Why? Well, if the highest common factor of x and y is d, then the number of positive integer points on the line from (0,0) to (x,y) is d (the number of multiples of (x/d,y/d) up to (x,y)). And if we are choosing the squares randomly, then we’ll expect to hit each positive integer point with about the same probability.

    Can we find integers N such that for most small m the highest common factor of N and n is about the same? It doesn’t sound very likely, but let’s think about it by considering the number N=p_1\dots p_k and asking how we expect the highest common factor of N with a random integer n to be distributed.

    The log of the highest common factor will be \sum_{i=1}^k\epsilon_i\log p_i, where \epsilon_i=1 with probability 1/p_i. The expectation of this tends to infinity with k. If I’m not much mistaken, the variance is roughly the same size as the expectation, so the standard deviation, as a fraction of the mean, should tend to zero. So it looks as though there is some hope that the function is reasonably concentrated for nice smooth N.

    Except that I’m forgetting that I started by taking logs. So in order for the hcf itself to be reasonably concentrated, we would need the log of the hcf to be highly concentrated. In fact, I don’t believe that the hcf is concentrated at all: for example, if N is even, then I’d expect that conditioning on n being even would significantly increase the expected hcf with N.

    I’m not sure where that leaves us. I’m having a lot of difficulty thinking of even a vaguely plausible way of building a decomposition using lots of inverted Ls. But I don’t have much of an idea of a theoretical statement that expresses this difficulty, or of what other construction methods one might imagine using.

  17. gowers Says:

    I have a more general idea, but I’m far from sure that it works. It’s again motivated by the thought that we might be able to get somewhere by a clever probabilistic approach that gives a good approximation to a decomposition followed by a crude determinisitc approach to sweep up the remaining error.

    Here’s the thought. If we’re going to choose things randomly, then we will tend to hit lines with gradients that are rationals with small numerator and denominator more frequently than lines with gradients that are rationals with large numerator and denominator. Indeed, we will tend to hit a line with a frequency that’s proportional to the density of integer points along that line. Let’s quickly work out what that density is, on the assumption that the gradient of the line is fairly close to 1. If the gradient is b/a, where this fraction is in its lowest terms, then the integer points are (a,b), (2a,2b), (3a,3b), and so on. So they occur with frequency proportional to 1/a.

    If we have an arbitrary point (x,y) and y/x=b/a when written in its lowest terms, that tells us that a randomish set of density \gamma will hit the line containing (x,y) with frequency about \gamma/a.

    This suggests that we might be able to do something by constructing two randomish sets of density \gamma in different ways, one that hits the diagonal a lot and the other that does not hit it nearly so much. Then we could subtract the two and get a big trace and not all that big a sum of coefficients.

    For instance, one set could be made out of quite a lot of fairly long inverted Ls, arranged reasonably randomly, but based at points that are smooth enough that there aren’t too many coprime pairs. Then the other set could be made out of smallish squares that are on average somewhat nearer the origin.

    This sounds a bit like what I was suggestion before, but there are differences. One is that we would no longer be trying, hopelessly, to make all the hcfs about the same — instead, we would just go with the natural distribution that assigns more weight to points that are on dense lines. Another is that we wouldn’t be too hung up about whether the squares and the inverted Ls overlapped. We’d just be trying to produce the same natural distribution in two different ways.

    • gowers Says:

      Roughly speaking, what I think we’d be trying to show is that a random fraction (N+m)/N for smallish m and pretty smooth N is distributed very similarly to a random fraction a/b where a and b are close to each other and quite a lot smaller than N.

  18. gowers Says:

    Let me try to think in real time, so to speak, about whether there is any hope of getting a probability distribution in two different ways.

    Suppose that (unnecessarily specifically) N is a product of a random subset of the primes p_1,\dots,p_m. And suppose also that r is a random integer between 1 and M, where M is pretty large but quite a lot smaller than N. Then what is the distribution of (N-M)/N, or equivalently of M/N?

    If M is pretty large and random, then it will be divisible by p_i with probability p_i^{-1}, so we’ll expect it to be divisible by about (\log\log m)/2 of the p_i. So we should end up with a fraction R/S where S is something like a random product of m/2-(\log\log m)/2 of the primes p_i and R is coprime to S.

    That’s not quite what I want. Let’s think about it the other way. Suppose I have a point (a,b) such that a and b have no common factors. What is the probability that some multiple of (a,b) is of the form (N,N-m) (assuming that a>b). We need N to be a multiple of a, and then we are guaranteed to be OK provided only that a-b is not too big.

    But if we are choosing our N not to be a random integer but a random pretty smooth integer, then the probability that it is a multiple of a depends a lot on the prime factorization of a.

    I have to stop for now, but it’s not a bad moment to do so as things are starting to look difficult again.

  19. gowers Says:

    In this comment I’m going to try to explain the idea behind my previous comment a bit more coherently.

    Suppose we have a small “wedge” about the main diagonal. By that I mean that we take some small \delta and look at all positive integer points (x,y) such that y/x lies between (1+\delta)^{-1} and 1+\delta and both x and y are at most N. Almost certainly we will want \delta to tend to zero with N — at any rate, we definitely allow this. Let us call this wedge W(\delta,N).

    With each point (x,y) in \mathbb{N}^2 we associate the rational number y/x, which is the gradient of the line from (0,0) to that point. Also, with any subset A\subset\mathbb{N}^2 we associate the function f_A, defined on the positive rationals, where f_A(r) is the number of ways of writing r as y/x with (x,y)\in A. (We can think of this as a projection on to the line x=1, where we project each point along the line that joins it to the origin. We could also imagine that we are standing at the origin with a kind of scanner that can tell us, for any direction we point the machine in, how many points of A there are in that direction.)

    More generally, we can apply the above projection to multisets and functions. This will sometimes be useful. For instance, if I take a sum of sets of the form [r,s]^2 that are not necessarily disjoint, then I can simply sum up their projections. Just to be explicit, suppose I have a function \alpha defined on \mathbb{N}^2. Then I’ll define f_\alpha(r) to be \sum_{y/x=r}\alpha(x,y).

    What I’d like to do now, though I don’t have a strong feeling that it’s actually possible, is create some projected function in two ways, one using a function \alpha made out of a sum of small squares and one using a function \beta made out of big squares.
    I’m prepared to tolerate a certain amount of inaccuracy — precisely how much I haven’t worked out.

    Here is the kind of way that one might hope to do this. The first way is to take all intervals [r,s] such that s=\lfloor(1+\delta)r\rfloor and s\leq S, where S=N/C for some large constant C. (Our eventual aim is to prove that the discrepancy of every \pm 1 sequence is at least as big as any constant you like to choose. So the bigger the discrepancy you want to prove, the bigger this C will have to be.) If we now sum up the functions [r,s] to obtain a function \alpha (defined on W(\delta,S)) and project, we will obtain a function f_\alpha such that f_\alpha(a/b) is closely related to 1/b. More precisely, there are two factors at work here: the further the line is from the main diagonal, the less it will be hit by the squares that are working their way up, and the more points there are on the line, the more that line will be hit. I haven’t worked out carefully what function that ought to lead to.

    Now let us think how we might hope to create roughly the same projected function using larger squares. The basic idea would be to use quite a lot of inverted Ls, but put them (on average) further up the diagonal. Before I say more about how that might be done, let me explain why it would be good if we could do it.

    Suppose, just to idealize a bit, that we managed to get exactly the same distribution, once using a sum of squares [r,s]^2 with s-r typically of size about M, and once using a sum of inverted Ls of length typically around CM. Suppose also that the number of squares used is T. Then the number of off-diagonal elements of the squares would be around TM^2, so the number of inverted Ls needed would be about TM/C. That means that the sum of coefficients would be proportional to $latexs TM/C$ (since TM/C would be bigger than T). The sum along the diagonal would be TM-TM/C, or about TM. The ratio of the two would be about C, so we would obtain a c of about C^{-1} and a discrepancy bound of about C^{1/2}.

    But is there any hope of doing this? The most obvious obstacle is that if you take the inverted L L_{N,R} (defined to be the set of all points (x,y) such that one of x and y equals N and the other is between N-R and N) then for every point (N,N-u) such that N and u are coprime, you get a line that will not intersect any of the squares. So it is very important that we do not have many of these coprime pairs.

    But if we make sure that N has a reasonable number of small prime factors (large enough for the reciprocals of those factors to have a sum that’s distinctly bigger than 1) then this will happen for only a small fraction of u. (I’ve just noticed that I’ve overused N — once for the size of the wedge and once for the position of an inverted L. Let’s say that we are living inside the wedge W_{\delta,N_0} rather than the wedge W_{\delta,N}.) So then a typical point in L_{N,R} will belong to a line that contains other points significantly nearer to the origin.

    The real question is whether we can choose a bunch of different N in such a way that these other points will be distributed in roughly the same way as if we choose a random square that just fits into W(\delta,S) and then a random point in that square. I don’t have an answer to that for now, but the main purpose of this comment was to ask the question as precisely as I could.

  20. Alec Edgington Says:

    I’m not sure how to search directly for decompositions with c < \frac12, so in an attempt to discover some new patterns I tried forcing the decomposition (still restricted to symmetric rectangles P \times P) not to have too many 2 \times 2 blocks in it. I’m not sure how useful the results are, but will post them here in case they give inspiration to anyone else.

    First, I searched for the best solution with n = 60 that contained no blocks [a, a+1]^2 with a \leq 30. The best c one can obtain with this constraint is just over 1, with the decomposition illustrated here.

    Then, I searched for the best solution with n = 60 and the sum of the absolute values of the coefficients of the 2 \times 2 blocks no greater than \frac13. This time one can get c = \frac56, with the decomposition illustrated here.

    The patterns are quite similar, with 3 \times 3 blocks tending to be situated around divisors of n. They still rely on the basic fly-trap construction. Each fly trap is ‘smoothed’ with a 2 \times 2 square in the middle, meaning that it has a ‘cost’ of 3 (in terms of coefficients) for a ‘benefit’ of 1 (the spare point of the 2 \times 2 square). Of course, the rest of the benefit has to come from the cancellations in the arms.

    • Klas Markström Says:

      Alec, if you can modify your program to print the linear program to a fil in e.g. the MPS file format I can try to solve a few larger cases as well. That could be interesting for both the basic linear program and the restricted versions.

    • gowers Says:

      One quite interesting feature of the decomposition you obtain is that in a sense it does what I was trying to do theoretically: it finds a decomposition that works pretty well and then gets rid of a bit of error in a crude way by adding those 2\times 2 blocks (to deal with points like (59,60)). It’s weak evidence, but maybe it suggests that that kind of approach is on the right track.

    • Alec Edgington Says:

      Klas, I should be able to do that — I’ll have a go later today.

    • Alec Edgington Says:

      I think everybody should be able to access this:


      Let me know if not.

      It is a Python file. I’ve tried to explain the usage in the comments at the top and within the functions. You will need the ‘cvxopt’ module to run it (this is freely downloadable — I won’t post the link because that would make this message land in Tim’s moderation queue — it’s easy to find).

      Anyone with Python and cvxopt installed can use this program to solve decomposition problems restricted to any set of rectangles of the form P \times P. More complex constraint options could be added.

      You can also use it to output MPS files. Unfortunately, I haven’t myself managed to read from the resulting file using cvxopt’s tools, so I’m not very confident that this is working (the files it produces look reasonable, though). Klas, perhaps you can try it and let me know!

    • Alec Edgington Says:

      I’ve just tried opening the MPS file produced by the Python program using the LPSolve IDE in Windows. It does open it, and solve it — impresively fast.

    • Klas Markström Says:

      I am aso trying the program out now. I’m first runnign the python program with cvxopt and later I’ll try out the mps-files with one or two of my other solvers.

      Any opinions regarding what the most interesting cases to try here would be?

    • gowers Says:

      Unfortunately I’m not computer literate enough to try the program out. But I am in the middle of writing a post that I hope will suggest promising ways of narrowing down the search for examples. It should be ready in the next hour or two.

    • Alec Edgington Says:

      I think to start with it would be interesting to do a search with n = \mathrm{lcm}(1, \ldots, k) restricted to intervals of size up to k+1, and see if anything better than our existing construction comes up. I’ve done this for k = 8 (for which our existing construction is optimal), but you may be able to go further.

      By the way, I have a rather less memory-hungry version of the Python program, which I’ll upload in a bit.

    • Klas Markström Says:

      I’m running a loop which solves the program for increasing values of N. At N=420 there is a new record. I’ll post it, and the other solutions, after dinner.

    • gowers Says:

      When you say “new record” do you mean that we now have c<1/2? If so, I’d be extremely interested to study the example (but can wait for you to have your dinner).

    • Klas Markström Says:

      Not as exciting as that I’m afraid. I meant record in the sense Alec, used earlier, which in this case means that it beats 8/11. Somewhere between 800 and 900 the bound is improved a second time.

      Here are the best solutions the program has found so far, when restricted to using intervals of length at most 10.

      The file “data-file-10” increases N by 1 up to 439
      The file “data-file-10-2” increases N by 100 from 500 up to 1000

      When Alec has optimized the memory use I should be able to go to larger sizes as well.

    • Alec Edgington Says:

      New version here:


      (It’s rather more memory-efficient in the make_lp function. The files it produces will be just as big, though.)

      From your data so far it does look as if we are hitting a new record of (k+2)/(2k-1) at \mathrm{lcm}(1, \ldots, k).

    • gowers Says:

      Can I just ask what precisely the constraints are that you are operating under here? I can’t find the reference to 8/11.

    • Klas Markström Says:

      The second improvement is at 840, fitting the lcm pattern. The next lcm is 2520 and is probably out of range for the small server where I have been running this.
      I could put this on a parallel machine which could probably do several more values of k, but I’d rather not do that before we have a good reason to do so. That machine is quite busy already.

      I could run the program with intervals longer than 10 allowed, or one of the restricted versions

    • Klas Markström Says:

      The 8/11 comes from this comment

      The restriction, as far as I understands Alec’s program and my parameters to it, is that no intervals of length more than 10 are allowed in the definition of the squares.

      I can increase the bound from 10 to something higher, but from my test runs I know that up to N=163 taking intervals of length up to 30 does not improve the best value of c. It does however change the solution, which probably means that there are many ways of reaching the optimal value.

    • Alec Edgington Says:

      Klas, just on the off-chance, in light of Mark’s comment on EDP 20, do you think it might be worth trying n = 1125 (with squares up to size 10)?

    • Klas Markström Says:

      Alec, I’ve started a loop on a short interval around n=1125. I could probably increase a bit from size 10 later.

    • Alec Edgington Says:

      I’ve uploaded a new version, which has an additonal parameter to make_lp that lets you constrain the sum of the absolute values of the coefficients of the 2 \times 2 blocks to a limit of your choice:


      The idea is to force solutions away from the pattern we know about. It might be interesting to experiment with this. It occurs to me that a plotting function would also be useful, to help us visualize the solutions …

    • gowers Says:

      Ah, I now understand about the records. I would definitely like to be able to visualize some of these optimal configurations.

    • Klas Markström Says:

      Alec, I have added a solution file for n=1124…1126, and size 10

      I have added a solutions file for 2×2 maximum total weigth 0.5, with square size at most 12, and I am making one for maximum total weight 0.25

    • gowers Says:

      Klas, some of these solutions look interesting, but it’s a lot of work to read them. A couple of things that would I think make it easier would be (i) to multiply all coefficients by the least integer that makes them all integers (after very small approximations, which I imagine would be merely getting rid of tiny inaccuracies in the answers generated by the program) so that we could think of the coefficients as small integers rather than long decimals, and (ii) to present the intervals not in order of the sizes of their coefficients but in order of, say, the start points of the intervals. If you or Alec had time to do that, or even just to apply this simple transformation to the existing data, it would be very nice.

      One of the reasons I’m interested is that I am curious to know whether a recent comment of mine (the fourth comment on EDP20) gives a reasonable description of what is going on in the examples where you restrict the weight of the 2-by-2 squares. Briefly, the suggestion is that one could take a lot of pretty smooth numbers, put fly traps there, take a union of the basic 2-by-2 constructions, and then hope that there are enough overlaps in the 2-by-2 squares for it to be possible to make efficiency gains in a positive fraction of places.

    • Alec Edgington Says:

      I’ve written a function to produce and save a plot of the solution. It’s in this new version:


    • Alec Edgington Says:

      Here is an example of a plot produced from Klas’ example with n = 151 and the total weight of the coefficients of the 2 \times 2 blocks constrained to be at most \frac14:

      (I suggest saving it to your computer and viewing it in some sort of image viewer.) Blues are positive (darker more so) and reds are negative (darker more so). The colour mapping could be improved, but in this example it gives some idea of what’s going on.

    • Alec Edgington Says:

      I should have explained how to read the plot. It shows the main diagonal and points above it, rotated so that the main diagonal is the bottom horizontal. In other words, the point (p,q) in the solution, where q \geq p, is represented as the point (p+q, q-p) in the image.

    • Klas Markström Says:

      Alec, can you implement Tims suggestions regarding how to present the solutions?
      I think it might be a good idea to make a separate program which can read a solutions file of the kind have posted and transform it into a second updated file. That way we don’t have to redo any of the computations.

    • Klas Markström Says:

      I have added two solution files for even lower total weight of the 2×2 squares.

    • Alec Edgington Says:

      The following function will output the result with the intervals sorted in increasing order:

      def print_solution(lam):
          for I in sorted(lam.iterkeys()):
              print I, ":", lam[I]

      So you could do something like:

      result = interpret(x, I_list)

      I’m not sure how to do the conversion to integers algorithmically. We’d need some function that takes a vector \mathbf{x} \in \mathbb{R}^k, and presumably some ‘tolerance’ parameter, and outputs an integer f(\mathbf{x}) \in \mathbb{Z} such that f(\mathbf{x}) \mathbf{x} is ‘near enough’ to \mathbb{Z}^k. I suppose one could simply try positive integers m in turn until \max_i \Vert mx_i \Vert < \epsilon, where \Vert x \Vert is the distance from x to the nearest integer. I’ll see if something like that works …

    • gowers Says:

      A much more efficient method, which I tried by hand on one of the examples (or rather, with an on-screen calculator) is to use continued fractions. That is, you keep taking reciprocals and fractional parts until you suddenly hit something that’s surprisingly close to an integer, and then you work out what that means your original number must have been close to. I managed to discover that a certain decimal was extremely close to 167/145 this way, for instance.

    • Alec Edgington Says:

      Yes, that method works for small examples, but not for some of the larger ones. I tried it on the last entry in ‘data-file-12-max2=0.125’; the first number turned out to be well approximated by 28/1389, but mutiplying through by 1389 didn’t make the others into near-integers.

      As efficiency isn’t much of an issue in this case, I’ve gone with my naive method, with the proviso that we stop trying after a certain point.

      I’ve uploaded a new version of the code here:


      It includes the following functions:

      def print_solution(lam):
          for I in sorted(lam.iterkeys()):
              print I, ":", lam[I]
      # nearest integer
      nint = lambda x : -nint(-x) if x < 0 else (int(x) + (0 if x%1 < 0.5 else 1))
      def to_integer(lam, m_max = 1000, eps = 0.01):
          """ Convert solution to one with integer coefficients, if possible.
              m_max is the largest integer multiplier we're willing to consider.
              eps is how close the scaled-up values have to be to integers.
              Return the integer solution if possible, or None otherwise.
          l_vals = lam.values()
          n_vals = len(l_vals)
          m = 0
          m_ok = False
          while (m <= m_max) & (not m_ok):
              m += 1
              i = 0
              i_ok = True
              while (i < n_vals) & i_ok:
                  x = m * l_vals[i]
                  i_ok = (abs(x - nint(x)) < eps)
                  i += 1
              m_ok = i_ok
          # multiply though by m
          return dict((I, nint(m*lam[I])) for I in lam) if m_ok else None

      Tim, if you have Python installed (as I think you said you did), you should be able to use this on the solutions that Klas has already published. It doesn’t need any extra modules.

      Copy and paste the code above into a file, and save it as (for example) ‘convert.py’.
      At the bottom of the file, add the line:

      print_solutions(to_integer([stuff in curly braces pasted from Klas' output]))

      For example:

      print_solutions(to_integer({(1, 2): 0.33333333333333276, (3, 4): 0.33333333333333376, (2, 3): 0.33333333333333343, (2, 4): -0.33333333333333304}))

      Run the file:

      python convert.py

      It should print out (for this example):

      (1, 2) : 1
      (2, 3) : 1
      (2, 4) : -1
      (3, 4) : 1

      If you want to change the defaults for how far to go and how close to an integer the multiples must be, change the last line to (for example):

      print_solutions(to_integer([stuff pasted], m_max=2000, eps=0.001))

      (the defauts are 1000 and 0.01).

    • gowers Says:

      Thanks for that — after considerable effort I have managed to dig out that it was XCode that I used to create Python files, and then followed that up by dealing with several mysterious error messages until I finally got your code to work. (A few of the things that I was doing wrong — I had to sort out whether it was the contents of the curly brackets that needed inserting or those plus the curly brackets, and whether the square brackets should be removed, and the fact that it should be “print_solution” instead of “print_solutions”, and the fact that if the numbers weren’t rational enough I needed to try a smaller example. But I know that this kind of thing is absolutely par for the course, especially for someone with my levels of inexperience.) Anyhow, just to prove I managed it, here is one of Klas’s examples:

      (1, 3) : 8
      (2, 5) : 8
      (2, 6) : -8
      (3, 5) : 4
      (4, 6) : 20
      (5, 7) : 16
      (6, 8 ) : 8
      (6, 10) : -4
      (7, 10) : 4
      (9, 10) : 47
      (10, 12) : 19
      (10, 13) : 16
      (10, 15) : -16
      (11, 15) : 16
      (12, 14) : 7
      (12, 16) : -8
      (13, 15) : 16
      (13, 16) : 8
      (14, 15) : 19
      (14, 16) : 4
      (15, 17) : 15
      (19, 21) : 23
      (20, 23) : 12
      (20, 24) : -12
      (22, 25) : 12
      (22, 26) : -12
      (23, 26) : 12
      (24, 25) : 3
      (24, 29) : 16
      (24, 30) : -16
      (29, 30) : 39
      (30, 31) : 19
      (30, 36) : -19
      (31, 36) : 19
      (38, 41) : 23
      (38, 42) : -23
      (41, 42) : 23
      (44, 51) : 4
      (44, 52) : -4
      (45, 49) : 8
      (45, 50) : -12
      (45, 52) : 4
      (46, 49) : 4
      (46, 51) : -4
      (47, 49) : -12
      (47, 50) : 12
      (48, 49) : 15
      (48, 52) : -15
      (49, 52) : 15
      (54, 59) : 23
      (54, 60) : -23
      (59, 60) : 23

    • Klas Markström Says:

      I have to correct a mistake I made. The last two files were not for size at most 12, but rather at most 7. I forgot to change the size when I moved to program. I have renamed the two solution files accordingly.

      I am also producing a solution file for total weight of 2×2 at most 0.5 and size at most 30.

    • Alec Edgington Says:

      Oops, sorry about the mis-spelling Tim. Glad you figured it out!

    • gowers Says:

      No problem: I had a moment of inspiration, so that typo held me up much less than my general incompetence at doing anything at all along these lines.

    • Alec Edgington Says:

      The solutions where 2 \times 2 blocks are completely forbidden are interesting. One can do a bit better by increasing the maximum size of the interval. For example, with n=186 and intervals up to size 11, one can get c = 1.379553.

      A plot of this example is here:

      You need to zoom in to see the detail, but there are some interesting features, like the expensively-constructed ‘oval’ formation hovering above 180.

    • Klas Markström Says:

      I have added another solutions file, for maximum weight 0.5 on 2×2 and size at most 30.
      Tonight the program will run maximum weight 0 with size at most 30.

      Alec, if it is easy to do I think adding coordinates to the pictures you make would be helpful.

    • gowers Says:

      That is indeed interesting. I wonder whether the intuitive explanation for the oval is that it is a sort of edge effect — because the example is forced to stop at 186 it has to do ugly things when it finds a nice smooth number near there.

      Indeed, having asked that question, I wonder whether there might be some kind of “simulated annealing” approach to finding examples, where you would initially aim to get quite a lot of cancellation without worrying about every last detail, and would then have a kind of trade-off between changing what you’ve done so far (which you are reluctant to do) and getting more cancellation (which you want to do). In this example, one might do a crude version such as, say, looking for bigger examples that agree with this one up to 100.

    • gowers Says:

      Klas, I find your latest solutions file interesting. Here, just for easy reference, is a list of places where the record for c is broken by what seems to be a real amount rather than an artefact of the numerical algorithm. I’ll list the integer where the new record occurs and what the record is. I’ll also give rational approximations (which I think are in fact the “real” values rather than the decimals that precede them). Note that the process starts at N=6.

      N=6, c=1.00000000704, c\approx 1

      N=12, c=0.888888934189, c\approx 8/9

      N=42, c=0.846153865585, c\approx 11/13

      N=52, c=0.800000028296, c\approx 4/5

      N=60, c=0.750000275914, c\approx 3/4

      N=66, c=0.744186105894, c\approx 32/43

      N=126, c=0.741379361613, c\approx 43/58

      It’s hard to decide from that evidence alone what these numbers will do eventually, but whatever it is they’re doing they seem to set records at smooth numbers, and they also quite like moderate-sized primes such as 7, 11 and 13. I haven’t stared at the record-setting examples to see how they are exploiting these numbers.

    • Klas Markström Says:

      The loop which I am running now also seems to improve at smooth numbers, it is at N=118 right now, and I think that was the case for several of the earlier files too. But I haven’t examined the solutions. I’ll post the new solutions files tomorrow morning.

      One thing which could be interesting to do, but I don’t know if the cvxop package has any routines for it, would be to find all optimal solutions for some combination of N, max size and max weight. I have observed that sometimes the solutions, but not the value of c, changes when the maximum size of a square is increased.
      If there are several optimal solutions we might not be seeing the most structured ones now.

    • Klas Markström Says:

      I have now added the solutions file for maximum weight 0 and size at most 30.
      I have been making files for N up to 200 but this can be extended without too much effort.

    • Alec Edgington Says:

      On the existence of several optima: if the solution contains

      \sum_{r=1}^k [N/r-1, N/r]^2 - [N,N-k]^2 + [N,N-k+1]^2

      then this group of terms can always be replaced by

      \sum_{r=1}^k [N/r, N/r+1]^2 - [N,N+k]^2 + [N,N+k-1]^2

      or any convex combination of the two. Have you noticed any pairs of optima that differ in other ways? Perhaps one could force the solution into a ‘canonical’ form by adding a small adjustment to the quantity being minimized, in a consistent way.

      On the oval as an edge effect: it’s worth mentioning that that pattern remains optimal up to at least n = 201.

      A small, perhaps obvious, observation is that the weight of the solutions does tend to be concentrated at the lower end of the range. So perhaps we should be looking for patterns there.

    • Klas Markström Says:

      Alec I have not had time to examine the solutions closely enough to see if that is the only difference. I believe you can produce some examples to compare by just solving for some small N and taking maximum size 10 first and then 30.

      I am now running a loop which solves for increasing N, with no weight restriction, and maximum allowed size equal to N. I’ll probably have solutions up N around 140 by tomorrow.

    • Klas Markström Says:

      i have added the datafiles from last nights computer work.

      The file data-file-N contains the best solutions for N up to 142, with the interval size at most N and no weight restrictions.

      The file data-file-30-max2=0-2 continas a few more solutions in the same sequence as data-file-30-max2=0

    • Klas Markström Says:

      Alec, do you still have the general program which allows general rectangles rather than only squares in the decomposition?

    • Alec Edgington Says:

      Klas: yes, I think I do. I don’t have access to it at the moment, but I can upload it when I get home this evening.

    • Alec Edgington Says:


      See the function ‘optim’ in this new version:


      It works similarly to the other function except that it’s all in one. I think it’s clear enough how to separate out the bits that construct the program, solve it, and interpret it.

    • Klas Markström Says:

      Ok, I have it up and running now. I didn’t have pylab installed so I removed the plot routine. I’ll post a solutions file tomorrow.

  21. Christian Says:


    I’ve been wondering whether one could investigate the question as to whether we can get c< 1/2 experimentally, e.g. by just running your program longer, or by artificially making it to somehow "prefer" intervals having greater length than 2, or by "explicitly" making it to look for regions with many smooth numbers and trying to find the cancellations envisaged in Tim's previous comments by exhaustive search. Given that we've learned much by your 1/2+\varepsilon-example, I suppose that further experimantal results could be very illuminating.

    • gowers Says:

      In case anyone wonders why Christian wrote this after Alec’s previous comment, the explanation is that Alec’s comment went, for reasons I don’t understand, to the moderation queue.

    • Alec Edgington Says:

      Tim, I suspect that your privacy settings specify that any message with more than one external hyperlink goes to your moderation queue.

    • gowers Says:

      They do — I didn’t spot that your message had two external links …

  22. gowers Says:

    I absolutely agree with Christian that an experimental discovery of a configuration that gives c<1/2 would be great. Even if it turns out not to be illuminating (which would be the case if, e.g., it was rather complicated and not obviously generalizable), it would be very encouraging, and might give clues about what to expect decompositions to look like. And making a few guesses in advance and imposing them as constraints could indeed be a good way of speeding up the search process.

    Going back to the theoretical approach, I had a few more thoughts about the details of what I was suggesting in my previous comment, though I still don’t have any real confidence that that approach will work.

    The rough idea I had was this. Suppose you pick a random N. Then the probability that a divides N is 1/a. Now suppose you choose a random N that’s divisible by a prime p. Then the probability that a divides N is 1/a if p is not a factor of a, and p/a if p is a factor of a. More generally, if N is a random number divisible by d, then the probability that a divides N is (a,d)/a.

    Now we would like to choose N from a probability distribution that means that almost every a divides N with probability proportional to, but bigger than, 1/a. One way of doing that is to choose N with probability proportional to the sum of the reciprocals of the primes that divide N. Equivalently, if we ignore the constant of proportionality, we choose a random even N with weight 1/2, then a random N that’s divisible by 3 with weight 1/3, then a random N that’s divisible by 5 with weight 1/5, and so on, and for each a we define p(a) to be the expected weight of times that a divides N. (Sorry — that may not have been very clear. I hope it is understandable.) If the prime factors of a are p_1,\dots,p_k, then we will get a contribution of \sum_ip_i/ap_i=k/a. It is known that the number of distinct prime factors of a is well concentrated about its mean (as is the number of prime factors with multiplicity, which we could get by using prime powers instead of primes), which is \log\log a. So it looks as though this may achieve what we want with a factor \log\log N.

    I haven’t really explained properly the relevance of the calculation I’ve just done, but I have to stop for today. I’ll try to say things better tomorrow.

  23. gowers Says:

    Actually, there’s one more thing I should say right now. It’s that what I am talking about will not be of any interest unless the following is true, which may be obviously false. (Perhaps I’ll suddenly see it’s false when I go to bed.)

    Suppose you pick a random large integer N with probability proportional to the sum of the reciprocals of the primes that divide N. So you very slightly favour smooth numbers. And suppose you choose a random positive integer m between 1 and N. Does the probability that N and m are coprime tend to zero as N tends to infinity? I would need the answer to be yes. If it does tend to zero, it will do so pretty slowly I think.

    • gowers Says:

      To make the idea of “random large integer N” precise. one could e.g. pick a very large N_0 and choose a random N between N_0 and 2N_0.

    • gowers Says:

      It’s now the next day and it does indeed seem that a slight favouring of smooth numbers is not strong enough to make almost all pairs of numbers have common factors. I have a train journey later today during which I’ll try to work out whether this kills the approach or whether there is some way of varying the construction to deal with it.

  24. Sune Kristian Jakobsen Says:

    I’m still following the project, but unfortunately I don’t have time to think much about it. But I have a question: Does the 1124 example and the 246 multiplicative example of discrepancy 2 sequences give a lower bound on how complicated a decomposition with c<1/4 will be?

    • gowers Says:

      I’ve thought briefly about this question but don’t have a satisfactory answer. The transformation to the rationals and the averaging over lots of dilates makes it hard to draw a completely direct connection, but there must be things one can say. Also, it ought to be much easier to draw consequences from the character-like examples. I need to check, but I think it is fairly straightforward to show that to obtain a given value of c the decomposition must be “exponentially complicated” in c^{-1}.

  25. Thomas Sauvaget Says:

    I think the following decomposition has c=\frac{7x-2y+1}{5x+3y+2} where x,y are integers (so for example c=\frac{1}{33} for x=2, y=7), but it would definitely require independent confirmation :

    \delta_1=\frac{1}{5x+3y+2} ((x+1)\overline{[1,2,3]}+y\overline{[5,6]} +(-2x-1)[2,3,4,5]/[6]+(-2x-1)[6]/[2,3,4,5] +(2x-y+1)\overline{[10,11,12]}+(-2x+y-1)\overline{[10,11]}+(-2x+y-1)\overline{[11,12]} +x\overline{[1,2]}+x\overline{[2,3]}+x[1]/[3]+x[3]/[1])

    so that the mentionned example would be (perhaps please check this first):

    \delta_1=\frac{1}{33} (3\overline{[1,2,3]}+7\overline{[5,6]} -5[2,3,4,5]/[6]-5[6]/[2,3,4,5] -2\overline{[10,11,12]}+2\overline{[10,11]}+2\overline{[11,12]} +2\overline{[1,2]}+2\overline{[2,3]}+2[1]/[3]+2[3]/[1]).

  26. EDP20 — squares and fly traps « Gowers's Weblog Says:

    […] general construction that gives us arbitrarily close to 2. Rather than repeat it here, let me give a link to the relevant comment of Alec’s (if you think of the bar as squaring the interval, it will be consistent with what I am saying in […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: