EDP20 — squares and fly traps

I think this will be a bit long for a comment, so I’ll make it a post instead. I want to try to say as clearly as I can (which is not 100% clearly) what we know about a certain way of constructing a decomposition of the identity on \mathbb{Q}. Recall from the last post or two that what we want to do is this. Define a square in \mathbb{N}\times\mathbb{N} to be a set of the form [r,s]^2, where by [r,s] I mean the set of all positive integers n such that r\leq n\leq s. Let us identify sets with their characteristic functions. We are trying to find, for any constant C, a collection of squares S_1,\dots,S_k and some coefficients \lambda_1,\dots,\lambda_k with the following properties.

  • C\sum_{i=1}^k|\lambda_i|\leq\sum_{i=1}^k\lambda_it_i, where S_i=[r_i,s_i]^2 and t_i=(s_i-r_i+1) is the number of points in the interval that defines S_i, or, more relevantly, the number of points in the intersection of S_i with the main diagonal of \mathbb{N}\times\mathbb{N}.
  • Let f(x,y)=\sum_i\lambda_iS_i(x,y). Then for any pair of coprime positive integers a,b we have \sum_{n=1}^\infty f(na,nb)=0.

The second condition tells us that the off-diagonal elements of the matrix you get when you convert the decomposition into a matrix indexed by \mathbb{Q}_+ are all zero, and the first condition tells us that we have an efficient decomposition in the sense that we care about. In my previous post I showed why obtaining a collection of squares for a constant C implies that the discrepancy of an arbitrary \pm 1 sequence is at least C^{1/2}. In this post I want to discuss some ideas for constructing such a system of squares and coefficients. I’ll look partly at ideas that don’t work, so that we can get a sense of what constraints are operating, and partly at ideas that might have a chance of working. I do not guarantee that the latter class of ideas will withstand even five minutes of serious thought: I have already found many approaches promising, only to dismiss them for almost trivial reasons. [Added later: the attempt to write up even the half promising ideas seems to have killed them off. So in the end this post consists entirely of half-baked ideas that I’m pretty sure don’t work. I hope this will lead either to some new and better ideas or to a convincing argument that the approach I am trying to use to create a decomposition cannot work.]

Using squares and fly traps.

A general idea that I have not managed to rule out is to build a decomposition out of “squares and fly traps”. I’ve already said what a square is. If you take the two squares [r,s]^2 and [r,s-1]^2, then their difference is the set of all points (s,t) or (t,s) such that r\leq t\leq s. It has the shape of two adjacent edges of a square. It is this kind of shape that I am calling a fly trap.

The idea then is to take a collection of fly traps with negative coefficients and a collection of squares with positive coefficients. In order for the second condition to hold, we need the following to hold: as you go along any line from the origin other than the main diagonal x=y, if you sum up the coefficients associated with the squares you visit, then the result should be cancelled out by the sum of the coefficients associated with the fly traps. In particular, if all squares have coefficients equal to 1 and all fly traps have coefficients equal to -1, then the number of times the line hits a square should be the same as the number of times it hits a fly trap. (I think of the squares as sending out “flies” that are then caught by the fly traps, which have some nasty sticky substance at their points.)

It’s not really necessary for the fly traps all to point in the same direction, and there are other small adjustments one can make, but the basic square/fly-trap idea seems to be what the computer is telling us works best in very small cases. (It is far from clear that this is a good guide to what happens in much larger cases, but it seems sensible at least to consider the possibility.)

For a nice illustration of a square/fly-trap construction, see this picture that Alec produced. Alec also has a general construction that gives us C arbitrarily close to 2. Rather than repeat it here, let me give a link to the relevant comment of Alec’s (if you think of the bar as squaring the interval, it will be consistent with what I am saying in this post), and a link to a similar comment of Christian’s.

This example (or rather family of examples) uses a single fly trap of width k and squares of width 2 (unlike the example in Alec’s picture, which I therefore find more interesting, despite the fact that it gives a worse constant). It is instructive to see why this gives us a bound of C=2. If the fly trap has width k, then it has 2(k-1) off-diagonal points. So we need 2(k-1) flies. Each square of width 2 contributes two flies, so we need k-1 such squares. This means that \sum_i|\lambda_i|=k+1 (since the fly trap needs two squares to make it) and that \sum_i\lambda_it_i=2(k-1)-1=2k-3. The ratio of these two numbers tends to 2 as k tends to infinity.

It is not hard to see that if we could use squares of width 3 instead, then we would be able to get a constant arbitrarily close to 3. However, significant difficulties arise almost immediately. However again, this could be good news, because if we can find some way of getting C beyond 2, we may by that stage have found a pattern that can be generalized. And I think there is some hope of pushing C beyond 2, as I shall now try to explain.

One fly trap is not enough.

First, let us see why there is absolutely no hope of achieving this with just one fly trap. The argument is simple. Let L_{N,k} be the flytrap [N-k,N]^2-[N-k,N-1]^2. If all the off-diagonal points in the square [r,r+2]^2 are caught by this fly trap, what can we say about r? One necessary condition is that both r+1 and r+2 are factors of N. But this implies that N is at least r^2, which in turn implies that k is at least \sqrt{N}. Since we need almost all points in the fly trap to catch flies, we need at least \sqrt{N} distinct flies, which is more than r. So, roughly speaking, we need a constant fraction of the numbers s of order of magnitude \sqrt{N} to be such that both s and s+1 are factors of N. This just isn’t going to happen.

Note that if we make r smaller, to give numbers near r a better chance of dividing N, then we are forced to increase k (or else the flies miss the fly trap). And that makes things even worse — we now have fewer possible flies and a bigger fly trap.

I’m sure it would be easy to state and prove something rigorous here, but for now I’d prefer to leave those thoughts as a convincing enough demonstration that a single fly trap will not do the job. But if that’s the case, what can we do? Well, the obvious next thing to try is several fly traps.

Pure randomness as a way of catching flies.

How can we make use of multiple fly traps? A first thought is that if we take the square [r,r+2]^2 and send out some flies, we could create two traps, one to catch flies with maximum coordinate r+1 and the other to catch flies with maximum coordinate r+2. But the trouble with this is that it seems to be far too tailored to one particular square: it is hard to believe that such a trap could catch the flies from several different squares. We would be asking for two integers N_1 and N_2 such that there are many integers s such that one of s and s+1 divides N_1 and the other divides N_2.

Actually, on writing that I realize that I have given no thought to it at all, so perhaps it is worth trying to show that it cannot be done (just in case, contrary to expectations, it can be done).

Since s and s+1 are always coprime, there seems no point in giving N_1 and N_2 any common factors, so let’s take N_1 and N_2 to be highly smooth numbers that are coprime to each other. And let’s try to find 3-by-3 squares that send out flies that are caught by one of the fly traps L_{N_1,k} or L_{N_2,k}. (I’m assuming that N_1 and N_2 are roughly the same size. If it is convenient to take different ks then I don’t mind doing it, but I don’t expect it to help.)

If k is fixed and N_1 and N_2 are large, then … I think we’re completely dead. We have 2(k-1) fly trap points below the diagonal to get rid of and three flies below the diagonal per 3-by-3 square, so we need about 2k/3 squares. If [s,s+2]^2 is one of those squares, then for the fly at (s+2,s) not to miss the fly traps, we need s to be at least 2N/k, where N is the rough size of N_1 and N_2. So we need 2k/3 pairs \{s+1,s+2\} such that we can find an integer j\leq k/2 with j(s+1)=N_1 and s+2|N_2. But that more or less fixes the ratio of N_1 to N_2, and anyway 2k/3 is bigger than k/2.

From this I am happy to conclude that we need to change our attitude and go for many fly traps. The idea would be that the reason a fly hits a fly trap is not that the fly starts out at a very carefully chosen point (roughly speaking, a “factor” of the fly trap) but that there are enough fly traps for it to be almost inevitable that the fly will hit at least one of them.

What I mean by “pure randomness” is that we use the following mechanism for ensuring that the fly at (s,t) hits a trap. If s\geq t, then we simply make sure that there are at least s traps, fairly randomly placed, or rather As traps for some large constant A. Then the probability that the fly misses all traps is small: roughly speaking, the expected number of traps it hits is A, and if we can get enough independence then we can hope that all flies will hit roughly A of the traps. (This model turns out to be much too crude, since the probability of a fly hitting a trap depends very much on divisibility properties of the coordinates of the fly and the trap. But let us work with it for now.)

Some back-of-envelope calculations.

Let us try to check the feasibility of this idea. An initial observation is that most fly traps L_{N,k} are useless for our purposes. If you choose a random large integer N, then the fraction of integers coprime to N will be around 6/\pi^2. (If m is the other integer, then the probability that p divides both m and N is 1/p^2, so the probability that they are coprime is roughly \prod_p(1-1/p^2)=1/\zeta(2).) But if j is coprime to N, then the point (N,N-j) cannot catch any flies. If we have a large set of such points, then we are in trouble.

To deal with this, it seems that the only option we have is to insist that our fly traps L_{N,k} occur at highly composite values of N, so that almost all other integers have quite high common factors with N and therefore give rise to points that can catch many flies. It will be convenient to call N and k the height and width of the fly trap L_{N,k}. In that language, we want fly traps with highly composite heights. (Note that the height refers to the altitude of where the fly trap is placed, whereas the width measures the size of the trap itself. Indeed, “altitude” is probably a better word than “height” here, but I prefer an Anglo-Saxon word if there is one.)

Now let us suppose that we have T 3-by-3 squares, and a reception committee of fly traps with highly composite heights between N_0 and 2N_0. If the widths of the fly traps are all k (or perhaps all between k and 2k or something like that), then we’ll need 3T/(k-1) fly traps if we want a one-to-one correspondence between flies and trap points, and a bit more than that if we want each fly to hit A traps. Let us take 3AT/(k-1) fly traps.

Now consider a fly at (s,s+1), say. If its chances of hitting a given trap are 1/(s+1), then we’ll also need there to be about As fly traps. That is, we’ll want s to be about 3T/k. And for that fly not to miss the traps altogether (because its angle from the main diagonal is too large), we’ll need kN_0/3T to be at most k. So we’ll need T to be bigger than N_0/3. That looks pretty problematic, because we now need a very large number of fly traps, and it will not be possible to put them all at highly smooth heights between N_0 and 2N_0: there just aren’t that many highly smooth numbers.

Just to make that more conceptual, the problem we have is that there are two conflicting pressures on the flies. If they are not high enough, then the angle they make with the main diagonal is forced to be large and they therefore miss all the fly traps. But if they are too high, then they are very unlikely to hit any given fly trap, which forces the fly traps to be extremely numerous, which forces there to be several fly traps at non-smooth heights, and therefore several points in the traps that cannot catch flies.

Smooth traps and smooth squares.

Is there anything we can do to get round this problem? I think there may be. There was one questionable assumption in the discussion above, which was that the probability of a fly (s,s+1) hitting any given fly trap was about 1/s. The condition we need for this fly to hit the trap L_{N,k} is that s+1 should divide N and that s should be at least N/k. Now if we choose N randomly, then of course the probability that s+1|N is 1/(s+1). But if we choose it as a random number with lots of small prime factors, and if s+1 also has quite a lot of small prime factors, then we hugely increase the chances that s+1|N. For instance, if N is a random multiple of 6, and s+1 also happens to be divisible by 6, then the chances that s+1 divides N are now 6/(s+1).

Let us now go back to the attempt above. Again let us suppose that we have T 3-by-3 squares. Again, if we are taking fly traps L_{N,k} with N between N_0 and 2N_0, and if we want each fly to hit A traps, then we will need about 3TA/k or so traps. But now let us suppose that all the traps have very smooth heights. More precisely, let us suppose that all the heights N are such that all but a small proportion of integers have a fairly high common factor with N. Simplifying absurdly, let us suppose that this gains us an extra factor of D when we think about the probability that a fly (s,s+1) or (s,s+2) is caught by a given fly trap: now the probability is more like D/s rather than s. What does that do for us?

It means that now, if we want each fly to hit A traps, we’ll need not As traps (where s is the height of the fly) but more like As/D traps. We already know we need about 3TA/k traps, so equating the two we find that s needs to be about 3TD/k. And if we want a fly at that kind of height not to be too far from the diagonal to hit the traps, we need Nk/3TD to be at most k, which tells us that T should have approximate size N_0/D, which is rather better than the earlier estimate of N_0 (up to a constant).

But at this point we have an important question: are there enough highly smooth numbers N between N_0 and 2N_0?

To answer that, we need to think about what the typical probability gain is for a given number. Suppose, for instance, that N is divisible by \prod_{i\in X}p_i, where X is some set of small integers. For what D can we say that a random integer m has a good chance of having a highest common factor with N of at least D?

The expected number of i\in X such that p_i|m is \sum_{i\in X}p_i^{-1}, and we can expect this to be reasonably concentrated if the expectation is not too small. Writing M for \prod_{i\in X}p_i and assuming that X is a fairly dense set of primes (something like a random set of r of the first 2r primes, say) then the expectation will be around \log\log r, so the value we get for D, assuming (not quite correctly) that the primes p_i that divide m are fairly evenly distributed, ought to be around ((2r)!)^{\log\log r/2r}, or around r^{\log\log r}. (We could get there by saying that the typical size of a p_i is fairly close to r and we are choosing \log\log r of these primes.)

This is fairly worrying, because in order to gain a factor D, we have to make the set of N we are allowed to choose very much sparser. It seems as though we lose a lot more than we gain by doing this.

The “smooth squares” in the title of this section refer to the possibility that we might try to choose s so that both s+1 and s+2 have quite a lot of small prime factors. But such numbers are hard to come by, so again it seems that any gain one might obtain is more than compensated for by the loss in their frequency.

Special configurations of squares and fly traps.

Can we achieve what we want by making very careful selections of our Ns? It’s clear that something that helps us is to have pairs of heights N_1,N_2 such that N_2/N_1 is, when written in its lowest terms, of the form (a+1)/a or (a+2)/a.

It’s quite easy to find N_1,\dots,N_t such that all of N_i/N_1 are of this form: just make N_1 extremely smooth and make all the differences small. Then the differences will divide N_1 and we are done. But what if we try to ensure that N_2/N_1 and N_3/N_2 are of the required form? Then we need large positive integers a and b such that (1+1/a)(1+1/b) is of the form 1+1/c for a positive integer c. That is, we need the reciprocal of 1/a+1/b+1/ab to be an integer. Rearranging, we want ab/(a+b+1) to be an integer. It’s moderately reassuring to observe that this can be done: for instance, if a=3 and b=8 we get 24/12=2. But how about if a and b are very large? Or perhaps they don’t have to be very large, just as long as we can find a set a_1,\dots,a_r such that many of the ratios a_ia_j/(a_i+a_j+1) are integers.

Let’s think about this slightly further. Suppose we have such a collection of integers. Then choose N with enough factors for all the numbers I mention to be integers, and for each i let N_i=N(1+1/a_i). What we want is for N_i-N_j to divide N_i. That is, we want 1/a_i-1/a_j to divide 1+1/a_i. So we need a_j(a_i+1)/(a_j-a_i) to be an integer. (This doesn’t look very symmetric, but it is true if and only if a_i(a_j+1)/(a_j-a_i) is an integer.)

Suddenly this looks a bit easier. It looks as though we’ll be OK if we make the a_i all fairly smooth and make their differences small. Hmm … except that that doesn’t look easy after all, since if a_j is smooth and a_j-a_i is small, then a_i will not be all that smooth.

I won’t think about this for the time being, but I think it may be possible to construct, in a not quite trivial way, an arbitrarily long sequence of integers a_1,\dots,a_r such that a_j(a_i+1)/(a_j-a_i) is always an integer.

Let’s suppose we managed that. Would it help? What we could do is this. We let N be some huge factorial so that it’s divisible by whatever we need it to be divisible by. We then define the numbers N_1,\dots,N_r as above: that is, N_i=N(1+1/a_i). Since whenever i<j we have N_j/N_i of the form 1+1/u for some positive integer u, we can find an integer s such that us=N_i and (u+1)s=N_j.

Therefore, potentially at least, we could consider using the square [u-1,u+1]^2 to knock out some points in the fly traps at heights N_i and N_j.

However, for this to have a chance of working, we want u to be big, since otherwise our flies will be out wide again, which will force the traps to be big and we’ll get into all sorts of problems.

But it’s problematic either way. If we want traps of width at most some fixed k, then we need u to be at least N/k. For that we need the integers a_1,\dots,a_k to be of size at least \sqrt{N/k} (since u=a_i(a_j+1)/(a_i-a_j)), and more than that unless they are close together.

But we also need the a_i to divide N, so we can’t just choose the a_i and then make N huge. Rather, what we seem to want is a number N that has so many factors of size somewhat bigger than \sqrt{N/k} that we can find interesting clusters of them such that many of the numbers a_i(a_j+1)/(a_i-a_j) are integers.

I should think a bit more about how many of these numbers actually need to be integers. Perhaps we don’t need them all to be integers — if not, then we would have a much greater chance of success.

If the fly traps have width k, then we have rk points below the diagonal that need to be hit. Each good pair i,j leads to three flies that can do the hitting. So it looks as though 3\binom r2 needs to be bigger than rk.

I think I must have made a mistake here, since there are basically only two chances to hit the point (N_i,N_i-t): we must do so either at (N_i/t,N_i/t - 1) or at (2N_i/t,2N_i/t-2). So we need an extraordinary miracle to occur: it must be possible to partition (almost all of) the numbers N_i/t and 2N_i/t into pairs of consecutive integers. This does not feel possible to me.

I’m going to stop at this point. I’ll end with the obvious question: is it possible to create an example out of squares and fly traps? Part of me thinks that the square/fly-trap idea is almost forced, since we need the bigger points to avoid coprime pairs. I think also that I have not devoted enough space to discussing bigger fly traps — ones where the width is proportional to the height, say. This requires bigger squares, but it may be possible to do something. In fact, I’ll think about that (not for the first time) and if anything interesting comes out of it then I’ll extend this post.

63 Responses to “EDP20 — squares and fly traps”

  1. Mark Bennet Says:

    Just to observe that finding smooth numbers close enough together in bunches is at the heart of the problem. The discrepancy 2 sequence looked as though it might possibly go on for ever, and was certainly longer than most initial expectations. It failed because smooth numbers which are close together (in a sense to be made precise) belong to enough different HAPs to cause a blockage – which seems very close to the ‘let’s design a flytrap’ problem.

    Since we are now dealing with completely multiplicative sequences on the rationals, it is possible that there is a particularly effective flytrap associated in some way with the maximal completely multiplicative sequence with discrepancy 2. [maximal sequences which are not completely multiplicative might also play a part in this].

  2. gowers Says:

    I have an observation that is similar in spirit to yours. (In case Mark’s comment gets replied to, I’m referring to the first comment on this post.) In a sense the thing that is causing me most difficulty is that there are lots of points (x,y) such that x and y are coprime. This too seems to be closely related to the “true” reason that the problem is hard: that there are many pairs of numbers such that choosing the value at one of them has almost no bearing on the value at the other.

    A small comment is that we are not just restricting attention to completely multiplicative functions, though I suspect that we more or less can (especially if we allow them to be complex).

  3. gowers Says:

    I’ve just had a thought that may be quite helpful on the theoretical side. A problem I’ve been having up to now is that small squares do not contain many flies and therefore do not hit many points in the fly traps. But there is another way that we could consider hitting them, which is with thickened flytraps.

    Let me try to explain what I mean. Suppose that instead of a square of width 3, which gives us a contribution of 3 to the trace and 1 to the sum of coefficients, we were to take a function of the form [r,s]^2-[r,s-6]^2. This would contribute 6 to the trace and 2 to the sum of (absolute values of) coefficients, so would be as good in that sense, but it would also send out far more flies. That would allow us to choose far fewer functions of this kind, so we might be able to place them in very special places where all of s-5,\dots,s are highly composite.

    I don’t know whether this deals with my difficulties, but it gives us an extra parameter to play with, which can’t be a bad thing.

  4. gowers Says:

    One other small thought, which might conceivably lead to something that could be searched for experimentally. Perhaps we could pick a whole lot of multiples of k! for some small k, add together the corresponding Alec examples, and then hope that some of the 2-by-2 squares overlap or otherwise lead to potential gains in efficiency. For example, if we are lucky enough to have amongst our three 2-by-2 squares the squares [2s-2,2s-1]^2, [2s-1,2s]^2 and [s-1,s]^2, then we can replace them by a single 3-by-3 square [2s-2,2s]^2. This will give us the same set of flies, but the efficiency will have gone up from 2 to 3 (since the sum of coefficients is now 1 instead of 3 and the contribution to the trace is now 3 instead of 6).

    It’s just conceivable that we could get enough coincidences like this to push the value of c down below 1/2. But I think we might need a computer to search for them.

    • Thomas Sauvaget Says:

      This is a great general idea, but unless mistaken I think the particular replacement you mention will not appear when adding Alec examples (since all the little 2-by-2 squares there are of the form [2x-1,2x] by construction, so one shall never have a [2s-2;2s-1]).

      Perhaps one might try some reverse engineering, that is contruct three examples (even without a low c to start with) where one contains a [s-1;s], a second a [2s-1;2s] and a last a [2s-2;2s-1], in such a way that taking multiples allows to reproduce the trick again and again.

    • gowers Says:

      Oh yes. I think though that this can be dealt with by having some of the fly traps as inverted Ls and others as Ls. That is, some are of the form [r,s]^2-[r,s-1]^2 and others are of the form [r,s]^2-[r+1,s]^2.

      I’m in general keen on the idea of trying to get product constructions to work, so I’d be interested to understand better what you mean by your reverse engineering suggestion.

    • Thomas Sauvaget Says:

      What I meant is to design an example using your trick as a cornerstone. I think I finally managed to get it to work, the idea is to use something a bit reminiscent of telescoping series combined with your trick (resulting in an amplification of a small disturbance, like the tricki idea “create an epsilon of room”).

      Namely, at step s we build a decomposition which has both the pair [2s-2;2s-1]^2, [2s-1;2s]^2 and the third one but shifted by 1, [s+1;s+2]^2. So when we add the step s+1 we get the cancellation between the pair of step s+1 and the lone one of step s, i.e. we do the replacement you proposed and thus c decreases a little bit. When done and we add step s+2 and do the new replacement to decrease c still a bit more, and so on.

      Here is a drawing of an example which seems to work (it’s a picture of step s, the flies are cercled in green, and the traps in red). For instance at step 1 we have c=\frac{7+21}{0+2}=14, then adding step 2 to it and replacing we get c=\frac{(7+6)+(21+20)}{(0+2)+(2+2)}=9. So this way one would get c arbitratily small. Of course this very much requires independent confirmation, I may well have made an error again.

    • Thomas Sauvaget Says:

      Typo in previous message: it should read “shifted by 1, [s;s+1]^2“.

    • gowers Says:

      Thomas, I’m curious to understand your construction but I don’t. Would it be possible to say precisely what all the rectangles and coefficients are after, say, three steps of the construction?

    • Alec Edgington Says:

      I find the general idea interesting. It seems that Tim is proposing a kind of transformation rule whereby we can replace any decomposition that includes [s-1, s]^2, [2s-2, 2s-1]^2 and [2s, 2s-1]^2 with the same decomposition with those three terms replaced by [2s-2, 2s]^2, and still have a valid decomposition with smaller c. For example, we can replace [1,2]^2 + [2,3]^2 + [3,4]^2 + \ldots by [2,4]^2 + \ldots.

      Now there may be other such transformations. And we can always form a convex combination of decompositions that achieve c (or better), to obtain a new decomposition that achieves c (or better).

      So, I wonder whether given a sufficiently rich supply of decompositions (not necessarily particularly ‘good’ ones), of unbounded support, and of transformations like Tim’s, we could apply the transformations repeatedly, combined with taking convex combinations, to get better and better c.

      I’d be interested to know whether there are other similar transformations, or whether this is a ‘one-off’ …

    • Thomas Sauvaget Says:

      All pieces have the same coefficient except for signs, so that computing c is done with the formula you derived earlier c = \frac{\sum |\lambda_i|}{\sum \lambda_i |P_i|}=\frac{r+s}{\sum |R_i|-\sum |S_i|}, that is only the number of squares and rectangles and their sign and imbalance matter.

      Here is a new picture with much better explanations (please disregard the previous messy one as I’ve changed color conventions): it shows what happens in three steps: first we have piece s which is made of two parts, then we add to it piece s+1, and finally we apply your trick. I’ve corrected some errors from my previous message: in fact piece s has c=13, and the addition of piece s and piece s+1 after the trick becomes c=25/3=8.333....

      (you can click on the image to see it full size in your browser)

    • Thomas Sauvaget Says:

      In fact the general formula after adding n+1 pieces and doing the trick each time is c=\frac{25+25n}{2+4n}>\frac{25/4}=6.25, so it doesn’t decrease arbitrarily as I thought, and my construction fails blattantly.

      So to exploit properly your replacement idea, one should find a construction which manages at each step to use it more often than one adds new squares & rectangles, so as to obtain something like c=\frac{a+bn}{c+dn} with b<d.

  5. gowers Says:

    There seems to be a basic snag with what I was suggesting. Alec’s construction relies on our taking a multiple of k! so that we can pick off all the points in a fly trap of width k. (It would be sufficient, as he has noted elsewhere, to take the lowest common multiple of 1,2,…,k.) Then for each such multiple N, he takes 2-by-2 squares near the numbers N,N/2,N/3,\dots,N/k. Now we know that mod k! the number N/j is congruent to k!/j. It follows that these numbers are all well-separated mod k. And from that it follows that the kinds of coincidences I was hoping for, where we have pairs of numbers N_1,N_2 and numbers j_1,j_2\leq k such that N_1/j_1 is very close to N_2/j_2, simply don’t exist.

    I’m not sure about this, but I think it may be possible to prove that a construction with fly traps and with squares that are mostly of size 3 cannot be made to work. The rough idea of the proof (I have more of an idea of the proof than of the statement that it proves) is this. If you want fly traps of width k and you want all your flies to be at distance either 1 or 2 from the diagonal (defining the distance of (x,y) to the diagonal to be |x-y|), then almost every N used for a fly trap must be at a number such that 2N is divisible by almost every number from 1 to k. But that makes every N divisible by half the l.c.m. of \{1,2,\dots,k\}. Call that number M. This means that if we have N_1 and N_2, then the distance between N_1/j_1 and N_2/j_2 is at least M|1/j_1-1/j_2| if j_1\ne j_2, and at least M/j if j_1=j_2=j and N_1\ne N_2. So in all cases it is large.

    What I think this proves is that the flies that are very close to the diagonal and that can be caught in highly smooth traps are well separated, and therefore cannot be covered efficiently by squares. And if we try to deal with this by making the traps less smooth, then we will introduce lots of points that cannot catch any flies that are very close to the diagonal.

    I might have thought that we were in serious trouble at this point, but I think the thickened-L idea from a couple of comments back could be helpful.

  6. Jonathan Vos Post Says:

    “So in the end this post consists entirely of half-baked ideas that I’m pretty sure don’t work.”

    Depending on the Topology of the Mathematical Ideas subspace of the “Ideocosm” [the space of all possible ideas], sometimes a 1/2 baked idea + a 1/4 baked idea a 1/8 baked idea a 1/16 baked idea … in the limit becomes a complete and correct idea.

  7. Jonathan Vos Post Says:

    Must have missed the “+ key from lack of coffee. Okay, if you permit me to try again:

    Depending on the Topology of the Mathematical Ideas subspace of the “Ideocosm” [the space of all possible ideas], sometimes a 1/2 baked idea + a 1/4 baked idea + a 1/8 baked idea + a 1/16 baked idea … in the limit becomes a complete and correct idea.

    Or should I be defining a Selberg zeta function over all closed geodesics on dynamical systems on the Ideocosm for surfaces of constant curvature -1 in 1-to-1 correspondence with the periodic orbits of the associated geodesic flow which takes place on the unit tangent bundle of the manifold?

  8. gowers Says:

    Here’s an idea for a general method of construction. But before I say what it is, I want to make an important general point, which is that there isn’t a hugely strong reason to believe that the decompositions we are currently searching for actually exist. That is because in order to simplify the search we are placing an artificial constraint on ourselves: to consider only those HAP products P\otimes Q for which P and Q have the same common difference. Note that if this is the case, then we can divide them both through by the common difference, which is why we have been considering progressions with common difference 1. Maybe now that we have looked pretty hard it would be an appropriate moment to allow ourselves the occasional more general product if it seems to help. The only rule is that P and Q must be segments of homogeneous progressions, and WLOG their common differences are at least coprime. So for example, if we wanted to use \{2s,2s+2\}\times\{2s-1,2s,2s+1\} we would be free to do so. Of course, if we allow ourselves extra flexibility, then we’ll have to experiment a bit before we get a feel for what the smallest “easy” value of c is.

    But for the rest of this comment I want to stick with progressions of common difference 1. The basic idea can be summarized as follows: why not try to find a construction made out of fly traps and fly traps? That is, we could simply aim to take some linear combination of fly traps that cancels off the diagonal. (We know this is possible in non-trivial ways, since a square is itself a union of consecutive fly traps.)

    Now at first this is hopeless, since the efficiency of a fly trap is at most 1/2 (it adds 2 to the sum of coefficients and makes a difference of 1 to the trace), meaning that the best c we can hope for looks like being 2. And if the positive and negative coefficients are fairly well balanced, then the efficiency will be smaller still. So there are two improvements that would need to follow. First, we would want the fly traps with negative coefficients to be quite a bit bigger than the ones with positive coefficients. That way, we would use fewer of them, so the trace would be proportional to the total number of fly traps, and perhaps even (if the negative fly traps are much longer on average) almost equal to it. But that will get the efficiency to 1/2-\epsilon at best. To go any further we need something else to happen: we want a significant fraction of the positive fly traps to be partitionable into intervals of length greater than 1. If they also drop to the same point below the diagonal (which could be a rather awkward constraint to insist on — this is a potential problem with the suggestion), then they form thickened Ls and we can reduce the sum of coefficients.

    This focuses attention on the following class of functions defined on rationals in [0,1]. Let f_{n,k} be the characteristic function of the set \{1,1-1/n,\dots,1-k/n\}. Can we find a linear combination of these functions that cancels, or almost cancels (leaving us to mop up the error) in an interesting way? And can we do it with plenty of clusters of consecutive ns?

    I find this a clearer formulation of the problem — not too far from being equivalent to what we were trying to do anyway — since it focuses attention very firmly on the number theoretic questions that have to be solved. Also, one can simplify it a bit further by looking at g_{n,k}(x)=f_{n,k}(1-x), which is the characteristic function of \{0,1/n,\dots,k/n\}.

    Bearing in mind the first part of this comment, we would be just as happy if we had a cluster of ns that formed a HAP segment. And I think we also wouldn’t mind choosing characteristic functions of sets such as \{0,r/n,2r/n,\dots,kr/n\}.

    One other remark I’ll make briefly but I’ll try to work out the details and post them in another comment later. I think if we always take the same ratio (or almost the same ratio) k/n=\alpha for our functions f_{n,k}, then we’ll probably be doomed to failure, since then the functions will all have small inner product with the characteristic function of rationals less than 1-\alpha/2 minus the characteristic function of rationals greater than or equal to 1-\alpha/2, whereas this is not true of the main diagonal. This is the essentially the same point that was made before about its not being possible to prove that there is high discrepancy on a “full” HAP, meaning that we fix N and insist that the HAP consists of all multiples of some k up to k\lfloor N/k\rfloor.

  9. gowers Says:

    This comment is an attempt to understand when the existence of certain functions on \mathbb{Q}_+ gives us a proof that decompositions of a certain kind cannot work.

    Suppose, then, that we think we have a linear combination \sum_i\lambda_iS_i, where each S_i is a square P_i\times P_i, \sum_i|\lambda_i|\leq c\sum_i\lambda_i|P_i|, and the sum along off-diagonal lines is zero. It follows that for any function f defined on \mathbb{N}^2 that is constant along rays from the origin we have

    \sum_i\lambda_i\langle S_i,f\rangle = f(1,1)\sum_i\lambda_i|P_i|.

    Now \langle S_i,f\rangle=\sum_{x,y\in S_i}f(x,y), so if we can come up with a function f that has discrepancy at most C on any square S_i that we use in our decomposition, then the left hand side is at most C\sum_i|\lambda_i|. If in addition f(1,1)=1, then that tells us that C\sum_i|\lambda_i|\geq\sum_i\lambda_i|P_i|, which tells us that we cannot make c any better than 1/C with this collection of squares.

    In particular, if we could find a completely multiplicative \pm 1-valued function g with HAP discrepancy at most K, then we could define f(x,y)=g(x)g(y), and we would have the property f(ax,ay)=f(x,y), together with the property \sum_{(x,y)\in S_i}f(x,y)=(\sum_{x\in P_i}g(x))^2, so we would be able to take C=K^2.

    We shouldn’t be too discouraged by this of course: it is trivial that if there is a completely multiplicative function of bounded discrepancy then our approach is doomed to failure. Where this kind of observation might be useful is in demonstrating that certain even more restricted classes of decomposition cannot work.

    For example, here is a slightly strange restriction that we cannot ignore. Let \lambda be the Liouville function (that is, -1 to the power the number of prime factors). The above argument shows that to obtain a decomposition powerful enough to prove EDP we will have to use intervals P_i on which the sum of \lambda is unbounded. This means that if we have the idea of some clever construction using a particular huge and smooth N, or something like that, we have to have a reason that the intervals that arise are ones on which \lambda sometimes has a large sum. Since \lambda behaves in a fairly random way, especially on small scales, this looks like quite a hard thing to do explicitly. I think it more or less forces us to use more global arguments where there are so many intervals around that on average we are pretty sure that \lambda (or any other multiplicative function) will have big discrepancy on some of them.

    Ultimately, we won’t be trying to verify that multiplicative functions have large discrepancy on our collection of intervals. Rather, it will go the other way round — the existence of a clever decomposition will prove that this is the case. But it provides a plausibility check: if there aren’t very many intervals about and they are all quite small, and if there is enough freedom in the construction that we can regard where they are as somewhat random, then there is no reason for them not all to be intervals on which \lambda has small discrepancy, so the attempt is almost certainly doomed to failure.

    Now let me see whether the remark at the end of my previous comment is correct. Suppose we try to find a decomposition that is built out of squares [r,s]^2 that all have the property that s=\lfloor\alpha r\rfloor for some fixed \alpha>1. (Since a fly trap is a difference of two squares, this applies to fly traps too.) An obvious function that might have small discrepancy on all of those squares is one where you choose \beta such that exactly half of the square [1,\alpha]^2 (now considered as a square in \mathbb{R}^2) consists of points (x,y) with \beta^{-1}x\leq y\leq\beta x. Then one would expect the discrepancy on the integer approximations to these squares to be small too. What I don’t know at the time of writing is whether this discrepancy would be bounded or whether it would grow linearly. Hmm … or perhaps I do. It looks as though there would be a linear error arising from when the squares suddenly increase in width by 1 (in the integer case). So perhaps we can get away with squares of this form.

    Incidentally, I forgot a condition: we want to insist that f is symmetric in x and y, since we are insisting on the same for the decomposition.

  10. gowers Says:

    As an immediate application of the criterion in the previous comment, we can say something about the approach suggested in the comment before that. I suggested that it might be a problem if we always tried to take k=\lfloor\alpha n\rfloor for our functions f_{n,k}. And that is indeed the case, since if we define \phi(x,y) to be 1 if y\leq\alpha x/2 or x\leq\alpha y/2, and -1 otherwise, then \phi will be 1 on approximately half of each fly trap and -1 on approximately half. This will make the discrepancy bounded on each fly trap (because we’ve fixed \alpha) and will cause the approach to fail.

    What this shows is that if we build a decomposition entirely out of fly traps, then we will be forced to use many different shapes of fly trap (where I am defining the shape to be the approximate ratio of width to height). I would dearly like to understand how differing shapes can be of any help. Perhaps it’s just for the simple reason discussed earlier: that we want the bottoms of the fly traps to be at the same level so that we don’t waste coefficients dealing with the small errors that would otherwise appear there.

  11. gowers Says:

    In this comment I’m going to try to find some interesting linear dependences amongst sets of the form \{1/n,2/n,\dots,k/n\}.

    Alec’s example corresponds to this. We take n=k!, and then each n/j is an integer, so we can add up the singletons \{j/n\} and subtract the set \{1/n,\dots,k/n\}.

    As an experiment, let me try taking n=126 and k=10. Writing the fractions in their lowest terms we get


    If we subtract off the set \{1/63,2/63,3/63,4/63,5/63\} we get the set


    Apart from the 5/126 we can write this as a sum of singletons. And we can represent \{5/126\} as the difference between the multiples up to 5 and the multiples up to 4.

    Translating back into 2×2 squares and fly traps this gives a decomposition with c=3/2. Let me explain how I worked that out. Each AP of length more than 1 corresponds to a fly trap so it costs 2. For instance, the initial one consisting of the first ten multiples of 1/126 corresponds to the fly trap L_{126,10}. A singleton of the form 1/r can be realized as the 2×2 square [r-1,r]^2. And a singleton that is not of this form can be represented as a difference of two APs, so has a cost of 4. We have two APs, four reciprocals and one stray rational 5/126, so the total cost comes to 2+2+4+4=12. The sum along the diagonal is 2 for each 2×2 square, 0 for the difference of two fly traps that gives us 5/126, and the main two fly traps have coefficients of 1 and -1, so the total comes to 8.

  12. gowers Says:

    I seem to keep running into rather similar difficulties when I try to arrange for plenty of coincidences and efficiency gains. I want to see if I can understand these difficulties in a more precise way, by which I mean show that there are various requirements that hugely restrict the possibilities for any decomposition. At best, this would lead to more efficient ways of searching for decompositions.

    If we stick with the idea of making everything out of fly traps and then sticking some of the fly traps together to save on the cost of coefficients, then in broad terms what we need is this. In order for \sum_i\lambda_i|P_i| to be big compared with \sum_i\lambda_i, it is important that the sizes of the P_i for which \lambda_i is negative should be generally somewhat larger than the sizes of the P_i for which \lambda_i is positive. It is also vital that there should be very few points with coprime coordinates in the second half of the decomposition, since it is impossible to cancel these out efficiently. This seems to tell us that the heights of the higher fly traps should be very smooth.

    Let me try to quantify that last remark. For simplicity, let’s suppose that our combination of fly traps takes the form \sum_iL_i-\sum_jM_j, where all of the L_i and M_j are fly traps. And suppose that we have managed to do this in such a way that the M_j are on average bigger than the L_i, sufficiently so for the trace to be proportional to the number of L_i. And let’s suppose that the M_j contain t points with coprime coordinates. Each of these points has to be cancelled out in a very crude way, at a cost of 4, so 4t needs to be small compared with the number of L_i.

    Another constraint, which is quite hard to reconcile with the smoothness, is that we want there to be many consecutive heights. Combining that with the cancellation we need, that tells us that we want many heights to have factors that are very close to each other.

    I can feel this comment getting less and less precise, so I’m going to abort it and try again when I’m clearer in my mind about what it is I want to say.

  13. gowers Says:

    I’ve thought of a different way of trying to understand why the problem we are now thinking about is a difficult one. I more or less said it above, but now I think I have a clearer understanding of it. Actually, looking back, it seems that in this comment I came close to saying what I’m about to say here, but didn’t quite say it.

    Anyhow, I just want to make the simple observation that if the current approach succeeds, then it implies a rather strong looking discrepancy result, though quite how strong is something I’m still not clear about. The observation is this. Suppose we can find a \pm 1-valued function \phi on \mathbb{N}^2 that is constant along rays (that is, \phi(ax,ay) is always equal to \phi(x,y)) and that has discrepancy bounded above by C on all squares [r,s]^2. Then we cannot get a decomposition with c<1/C. I gave the proof in the previous comment, but I then concentrated on functions of the form \phi(x,y)=g(x)g(y) where g is completely multiplicative. However, it may be helpful to think about more general functions.

    For example, it occurred to me that if we could find a function \phi that is constant along rays and is such that for every r and s we have


    then our approach would fail, since such a function has discrepancy at most 4 on any square. (This can be checked.)

    However, it is quite easy to prove that no such function exists. The most conceptual way to see it is to observe that the entire function is determined by \phi(1,1). I’ll illustrate that with an example: \phi(5,8)=-\phi(6,8)=-\phi(3,4)=\phi(4,4)=\phi(1,1). One can show that \phi(x,y) is a sum of a Morse-like function in x and a Morse-like function in y. And such a function has no reason to be constant along rays — and indeed isn’t. The first place a problem shows up is this:



    But we need \phi(6,9)=-\phi(6,10).

    However, we could play a similar game with 4-by-4 squares, looking for a function \phi that sums to zero along every row and every column within each one of these squares. That allows a discrepancy of up to 16. Now we no longer have the situation where choosing the value at one place in a square forces all the other values — there are a number of different ways of getting all the row and column sums to be zero. So now it suddenly looks much much harder to prove that we cannot find such a function.

    An additional very important moral of this is that if we use a restricted class of squares, such as squares that have a very small side length compared with how far up they are, then this discrepancy result has to hold for that class of squares. Indeed, if you’ve got a collection of squares and want to prove a lower bound for the c that you can use it to obtain, then a good way is to choose a suitable function \phi. For instance, for Alec’s example one can choose \phi to be 1 when x=y and 0 otherwise. This has discrepancy 2 on all 2-by-2 rectangles and 1 on all fly traps, so we see immediately that it cannot do better than c=1/2 (though of course this is also easy to see directly).

    I’m fairly sure that this line of thought will make it very clear how the different squares must interact with each other if there is to be any hope of their beating the c=1/2 bound. Basically, it is essential that there are lots of rays that intersect lots of squares.

    • Alec Edgington Says:

      That’s a nice clarification of the problem.

      Do we gain anything by insisting that \phi be a \pm 1-valued function? It looks to me as if any real-valued function will do. (Indeed the example you give at the end is a \{0, 1\}-valued function.)

      In your earlier comment, you wrote:

      \langle S_i,f\rangle=(\sum_{x,y\in P_i}f(x,y))^2.

      Did you mean

      \langle S_i,f\rangle=\sum_{x,y\in P_i}f(x,y)?

    • gowers Says:

      I concentrated on \pm 1 functions just because they’re quite nice to think about, but if we wanted to investigate the problem computationally, then it would probably be easier to look at more general functions, since then we’d have a linear program rather than an integer program. I allowed myself 0 to analyse your example because it didn’t seem to be that easy to do so without it, which does confirm that the extra flexibility could be useful. I’ve corrected the earlier comment, which was indeed wrong (though the correction was slightly different because I needed to sum over S_i).

      If we do stick with \pm 1-valued functions, then we can formulate quite a nice combinatorial problem. Suppose we have a bunch of squares and we want to prove that no combination of them will give us a small value of c. Then we can do it by finding a \pm 1-valued function that’s constant on rays and that has small discrepancy on every square.

      The examples we’ve been looking at have tended to have rather small squares, so let us make the additional assumption that, apart from on the main diagonal, no square contains more than one point on any ray.

      Now let’s form a set system. Its vertices are the rays, and the sets are given by the squares. That is, if S is one of our squares, we associate with S the set of all rays that pass through S (or equivalently the function S/S, which because of our assumption takes values in \{0,1\} except at 1 where it takes the value |S|). We now want our set system to be sufficiently “dense” that it has a high discrepancy.

      This is a necessary condition, but it’s a pretty strong necessary condition that should hugely narrow down the search for a construction. Indeed, it could even narrow it down so much that we were able to prove that there was no such set system, which would force us to consider HAP products with different common differences.

      Now let’s narrow things down further and suppose that each ray that hits any squares at all hits exactly two squares (one with a coefficient of 1 and the other with a coefficient of -1). This tells us that in our set system each point is contained in precisely two sets. So an obvious preliminary question is whether we can think of any set system at all with the property that each point is contained in two sets and the discrepancy is at least, say, 10.

      If each point is contained in two sets, we can form a graph: its vertices are the sets and its edges are the points, so if x belongs to S and T then we regard S and T as joined and x as the name of the edge joining S to T. Now we are trying to find a \pm 1-valued function defined on the edges that is well-balanced at every vertex. Or rather, we want a graph where that cannot happen.

      So now we have a problem that I rather like: is it true that for every constant C there is a finite graph such that, however you 2-colour the edges, there must be a vertex such that the number of blue edges coming out of that vertex differs from the number of red edges coming out of that vertex by at least C?

      I’ll have a little think about this, but I’ll also post it now in case anyone else wants to think about it. It doesn’t seem as though it should be hard, but I don’t immediately see how to do it. I hope the answer will be that there does exist such a graph and that we can make it pretty sparse. If it turns out to be hard, then I’ll also post a question on Mathoverflow, but I won’t do that for the time being.

    • gowers Says:

      A small remark about that last problem: if we can partition the edges of the graph into cycles of even length, then we can just colour the edges alternately red and blue along each cycle and we will have a discrepancy of zero on the edges at each vertex. In particular, this happens if the graph is bipartite and every vertex has even degree.

      Also, if we have something like an expander of degree d, then we can probably partition it into a bounded number of cycles. (I don’t know any theorem that actually says that we can, however.) Then we could do something similar, colouring edges alternately along the cycles. If a cycle was odd, we would have to have two consecutive edges of the same colour, but if it was long, then we would have a huge amount of choice about where to put the one bad vertex. So presumably we could have at most one bit of badness at each vertex, which would give us a discrepancy of at most 2.

      So it seems that we want a graph that cannot be partitioned into even cycles and a few long odd cycles. It isn’t obvious to me how to construct such a graph.

      But actually I’m starting to think that this problem is not as relevant to EDP as I thought. After all, if we have a system of squares where each ray hits exactly two squares, then the resulting graph must be bipartite for there to be any chance of a decomposition. (Proof: the coefficient of a square must be minus the coefficient of any neighbouring square, so the signs of the coefficients give you your bipartition.)

      And now I think I have a trivial solution to the problem anyway. Consider the following algorithm for partitioning the edge set. We’ll construct a trail (that is, path that’s allowed to visit vertices more than once, but we are not allowed to revisit edges) and make it maximal in both directions. We do this by simply extending the trail however we like until we get stuck. How do we get stuck? The only way is if we arrive at a vertex and find that there are no unused edges leaving that vertex.

      Having constructed a maximal trail, we colour its edges alternately. This contributes zero to the discrepancy of all vertices except those at the two ends of the trail, where it contributes 1 (or possibly 0 or 2 if those two ends happen to be at the same vertex). But a very important point is that if we cannot extend the trail any further, then there are no edges left at the vertex that has a discrepancy of up to 2, so if we now build some new trails, we will not contribute any further to its discrepancy.

      So to continue, we simply remove all edges from the first maximal trail and start again. In that way we find a colouring of the edges such that at each vertex the discrepancy is at most 2.

      The moral of this, which is quite interesting actually, is that to find a decomposition of the kind we want, we must have at least one ray that meets more than two squares, from which it is reasonable to conclude that we will actually want several rays that meet quite a bit more than two squares.

    • Alec Edgington Says:

      It occurs to me that the problem of constructing a function \phi : \mathbb{N}^2 \to \pm 1, constant on rays, with low discrepancy on squares of the form P^2 where P is an interval feels considerably easier than the problem of constructing multiplicative \pm 1-valued functions on \mathbb{N} with low discrepancy on intervals (which we think is more or less equivalent to EDP). The reason is that for the latter problem, if we want to construct our function by assigning values to integers in order, we only have freedom to choose at primes, which are vanishingly rare as we go up. But for the two-dimensional problem, if we want to construct our function by assigning values in some moderately natural order, we have freedom to choose at all coprime pairs, which are relatively abundant – indeed, I think more than half of pairs will tend to be coprime. Given such a degree of freedom, it wouldn’t surprise me if it were quite easy (at least for a computer) to just write down a suitable function \phi with discrepancy 2 on squared intervals.

    • gowers Says:

      That’s certainly a tempting conclusion, and it may be correct. It would be quite interesting to get some idea of how easy the computer finds it, and perhaps we could prove quite quickly that any example with c<1/2 would need N to be at least 1000 or something.

      But it also seems to me conceivable that we would find that there were little clusters of smoothness — that is, rectangles that contain very few coprime pairs — that made the task difficult. It might even be that if we ran a program to find a function with small discrepancy on squared intervals, we could examine where it found itself doing a lot of backtracking, use that to identify a collection of points, and then search for a decomposition based on small squares that contain those points. This seems to me to be very much worth trying, as it could be an efficient method of searching for decompositions with significantly larger N.

      I was about to write “squares that contain very few coprime pairs” above, when it occurred to me that the fly trap idea is more or less forced on us because a square that sits on the diagonal (that is, one of the form [r,s]^2, contains s-r points of the form (x,x+1) (not to mention points of the form (2x-1,2x+1), (3x-1,3x+2), (3x-2,3x+1), etc.). So we have to go for long and thin.

      I’m a bit anxious about the fact that we need a collection of squares such that many rays meet them more than twice. Even the one-dimensional consequences of this seems hard: we appear to need something like a collection of intervals such that almost every HAP that intersects any one of the intervals intersects at least two of them, and several HAPs intersect more than two of them. And we want the average lengths of these intervals to be significantly greater than 1. That could be another way of narrowing down the search when N is large: first search for some good x coordinates, and then search for decompositions that are largely based on those x coordinates.

    • Klas Markström Says:

      The graph colouring problem is actually a variant of something which I have used in a paper on hard SAT-problem.

      Given a connected graph wit hall degrees even we can find an eulerian walk on the graph, and colour the edges alternatingly along that path. If there is an even number of edges this gives discrepancy 0 and if their number is odd we get discrepancy 1. An old theorem of Kotzig shows that this construction is optimal on all eulerian graphs.
      If there are vertices of odd degree we can add a matching among the odd degree vertices and use the eulerian walk construction on the new graph. By deleting the new edges we raise the discrepancy by at most 1. If there were vertices of even degree in the original graph we can avoid raising the discrepancy above 1 by starting the walk in a vertex of even degree.

    • Alec Edgington Says:

      I’ve done a simple experiment to search for functions \phi : \mathbb{N}^2 \to \pm 1, constant on rays, with low discrepancy on rectangles of the form [1,m] \times [1,n]. (This is enough to give us four times the discrepancy on all rectangles, and is somewhat easier to work with.) The experiment has shown that it is at least not totally trivial. That is, if we step along diagonals of constant sum, in order ((1,1), (2,1), (3,1), (2,2), (4,1), (3,2), (5,1), (4,2), (3,3), (6,1), …), and we assume that the function is symmetric, and whenever we reach a coprime pair (a,b) we assign it a value that minimizes \lvert \sum_{i \leq a} \sum_{j \leq b} \phi(i,j) \rvert (choosing positive values when the existing terms sum to zero), then the discrepancy goes up to at least 7 (we hit 7 at the point (143,13)).

  14. Polymath5 « Euclidean Ramsey Theory Says:

    […] Polymath5 By kristalcantwell There is a new thread for Polymath5 here. Let me update this there is another thread here. And yet another here. […]

  15. gowers Says:

    I’m going to think briefly about the 3-uniform hypergraph version of the graph problem I mentioned above that turned out to be easy and to have the opposite answer to the one I had hoped for.

    The question becomes this. Does there exist C such that, given any collection of sets of size 3, is it always possible to colour them red and blue in such a way that for every point the number of blue sets containing that point and the number of red sets containing that point differ by at most C?

    Let us call the points vertices and the 3-sets edges. (This is standard hypergraph terminology.) And let us define the discrepancy of a vertex to be the difference between the red degree of that vertex (meaning the number of red edges containing it) and the blue degree. Finally, let us define the edge discrepancy of the hypergraph to be the smallest we can make the maximum discrepancy at a vertex. (I’m calling it edge discrepancy because it is the edges that are coloured. The discrepancy of a hypergraph standardly refers to vertex colourings and minimizing the discrepancy over all edges.)

    So now we are asking whether the edge-discrepancy of a 3-uniform hypergraph is bounded by some constant C. This is a completely different problem from the graph case because the greedy algorithm we used there fails completely. Indeed, if you colour an edge red and then try to guarantee that you haven’t done too much damage, then you will typically have to colour three “neighbouring” edges blue, and then each of those will have two further neighbouring edges that need to be coloured red, and so on, expanding out into a 3-regular graph. Unlike a path, where there is trouble only at the end points, a 3-regular graph can grow at an exponential rate, which means that a positive proportion of its vertices could in theory be troublesome ones.

    That leads me to suspect that the answer is going to be that there are 3-uniform hypergraphs with unbounded discrepancy. (Whether we could realize such a hypergraph in terms of squares and flytraps and rays that pass through them is quite another matter of course.) The obvious thing to try would be a fairly sparse random 3-uniform hypergraph. That would give us the kind of exponential growth that we expect to cause trouble.

    It occurs to me that one might try to prove a lower bound on the discrepancy by the sorts of methods we are already using for EDP. That is, one might try to find a decomposition of the identity. Or perhaps one could work the other way round and try to build a hypergraph with a decomposition of the identity in mind.

    Let me say in more detail what I mean by that. As a matter of fact, it’s quite simple now that I come to think of it. Let us call the set of edges that contain a vertex a star (because that is what we would call it in the graphs case). We are trying to find a hypergraph such that the set of stars has unbounded discrepancy. Now each edge is contained in exactly three stars, so this is equivalent to finding a set system with unbounded discrepancy such that each point is in exactly three sets. So we’d like to find an efficient decomposition of the identity as a linear combination of products of sets with the property that no point is in more than three of the sets we use.

    I don’t immediately see how to do that, but I’d be quite surprised if it wasn’t possible.

    • Klas Markström Says:

      A quick comment before closing down for the evening.
      If the hypergraph is too sparse then the local lemma should provide a good colouring.

    • gowers Says:

      Ah, that’s a very helpful remark.

    • gowers Says:

      Actually, having thought a little bit further I’m starting to wonder whether that is actually the case. If you randomly colour, then even the individual events you are trying to make happen will have quite low probability: if you want the discrepancy on a set of size k to be at most 5, say, then you’ll get a probability proportional to k^{-1/2}. But I thought that for LLL you needed that the sum of the failure probabilities of an event and all the other events on which it depends to be less than 1.

    • Klas Markström Says:

      Yes the meaning of “too sparse” will definitely depend on the value of discrepancy we are considering. Getting things perfectly balanced will be difficult unless the graph is very sparse.
      A straightforward application of the LLL should give a bound on the discrepancy proportional to k^{1/2}, since, roughly, we need $e(2k+1)e^{-k^2}$ to be smaller than 1.

    • Klas Markström Says:

      Sorry, the “e^{-k^2}” should be the probability that the binomial random variable deviates more than k^{1/2} from k/2.

  16. Klas Markström Says:

    I have now added a solutions file, called data-file-rectangular, for the case where general rectangles, rather than only squares, are allowed.
    A quick inspection shows that the structure of the solutions has changed into a collection of small squares and narrow “bands”.

    Alec, could you modify the program to use only rectangles with area at most K?
    I think I can make the modification but I*m not comfortable with modifying someone elses code without having read enough of it to know that there are no unexpected side effects of the modifications.

    • gowers Says:

      A quick remark about this. From the point of view of proving EDP, rectangles are just as good as squares, since the inequality (\sum_{x\in P}f(x))(\sum_{y\in Q}f(y))\geq C implies a discrepancy of C^{1/2} just as well if P\ne Q as if P=Q. Since we are trying to optimize the discrepancy for what in the context of the problem are very small values of N, this probably means that the rectangle discrepancy gives us a better idea of what is going on. The reason we looked at square discrepancy was to try to reduce the number of variables. If looking for rectangles becomes prohibitively slow, then there is a compromise we might try, which is to look for arrangements of squares but to evaluate them differently, so that, for instance, a fly trap would have a cost of 1 rather than 2. In fact, that’s not quite correct. If we had a fly trap we would probably replace it by two rectangles of width 1 that overlapped on the diagonal, so we’d actually get a trace of 2 for a cost of 1 rather than a trace of 1 for a cost of 2.

      It would be quite interesting to see whether the “trivial” bound changes when one adopts this point of view. I think perhaps it doesn’t, as 2-by-2 squares have a trace of 2 for a cost of 1, and the extra flexibility afforded by rectangles doesn’t seem to improve that. But if that’s the case, then it is good news as it increases the chances of finding an arrangement that beats the c=1/2 bound.

    • Klas Markström Says:

      Earlier today I made the modifications needed in order to only use rectangles with a bounded are. Here area is really area, so a “2×2” square has area 1.

      I have added two solutions files data-file-rectangular-area-9 and data-file-rectangular-area-25

      The solutions reached at the end of the area 9 run stand out in that, up to signs, they really have only two different coefficients, 1/36 and 1/18, and are still made up of 2×2 squares and rectangles of width 1.

    • Klas Markström Says:

      I have made one more solutions file, now with bounded area and no $1xK$ strips for K greater than 20. The restrictions are needed to keep the memory use down.

      The file is “data-file-rectangular-area-4-length20” in http://abel.math.umu.se/~klasm/Data/EDP/DECOMP/

      The best solution in this class for N=66 is better than the best symmetric solution for N=840, and has only coefficient 1/12, with positive 2×2-squares and a single negative 1xK strip. However unless I am missing something a solution with that structure can never take us below C=1/2

  17. Alastair Irving Says:

    I’ve just written some code to solve the linear programming problem which corresponds to finding a real-valued function \phi on [1,N]^2 which is constant along rays, has \phi(1,1)=1 and which has maximum discrepancy over squares as small as possible.

    So far, the results correspond precisely with the results for the problem of finding the decomposition with the smallest sum of coefficients. Namely, for N<12 we get discrepancy 1 , which then goes up to 7/6 and then at N=42 this goes up to 13/11.

    I suspect the two problems may be identical. For the decomposition problem we wish to minimise \|x\|_1 constrained by Mx=(1,0,0,\ldots). For the discrepancy problem we wish to minimise \|M^Tx\|_\infty subject to x(1)=1. Considering the numerical evidence I think there might be some sort of duality argument to relate the two problems.

    This is my first post so I appologise if any of the LaTeX doesn't come out right.

    • gowers Says:

      They are indeed the same problem — that was the motivation behind the discrepancy version. Do you have a sense of which is easier computationally? From a theoretical point of view it seems better to look for decompositions, because the discrepancy result would be strictly stronger than the statement that completely multiplicative functions have unbounded discrepancy, which we don’t know how to prove. But I am interested in the possibility of using the linear programming problem as a way of searching for good decompositions, especially if that turns out to be more efficient than searching for them directly.

  18. gowers Says:

    Here’s a slightly strange observation, that’s either wrong or quite encouraging.

    Suppose we are trying to find \pm 1 values along the rays in such a way that the discrepancy on all rectangles is bounded. Then in particular this applies to rectangles of width 1. Now consider the rectangle of width m that consists of all points (m!,j), where j runs from 1 to m. Then the sum of the values on that rectangle must be bounded. Next, consider the rectangle that consists of all points (m!/2,j) where j runs from 1 to \lfloor m/2\rfloor. The values along here must be bounded too. Actually, I should say more: the partial sums as you go up the rectangle are bounded. But these are equal to the partial sums for the even j in the first rectangle. More generally, we find that the sums along all HAPs of js in the first rectangle have to be bounded. So if EDP is true then we can’t find a \pm 1 function that’s constant along rays and of bounded discrepancy on all rectangles.

    This doesn’t quite prove that we can find a decomposition because we have to allow more general functions. I haven’t yet thought about …

    Let’s just see what happens if we define f(x,y)=0 if either x or y is equal to 0 mod 3, 1 if x\equiv y mod 3, and -1 if x\not\equiv 1 mod 3. Is this constant on rays? Yes. Oh dear, it seems to have bounded discrepancy on all rectangles.

    Phew, that got me worried, but I’ve just realized that it’s NOT constant along rays. So after that moment of madness I’ll continue the interrupted sentence.

    … what happens if you do this.

    • Alec Edgington Says:

      Ah, that’s a good observation. My gut feeling was completely wrong (assuming EDP, of course)!

      I suppose the next thing to think about is what happens if we allow \pm 1 and zero.

      Alastair, could you post one or two of your solutions? It would be interesting to see how sparse the nonzero terms are.

    • gowers Says:

      I find this a nice problem. At first it seems as though having 1s down the diagonal is an extremely weak condition, but then one realizes that if the discrepancy on every rectangle has to be bounded, which is equivalent (if we assume symmetry) to the discrepancy on every square [r,s]^2 being bounded, then there must be a lot of -1s near the diagonal. And those imply a lot of -1s further away from the diagonal, which in turn force more +1s, and so on. At the moment, I don’t have a clear feeling about the theoretical version of exactly the question you ask: how sparse can the non-zero terms possibly be? It might be interesting to see whether if they have zero density (meaning the limit of the density inside [1,N]^2 as N tends to infinity) then the discrepancy must be unbounded. Or rather, it might be interesting to see whether that is any easier to prove than the general statement.

  19. Alastair Irving Says:


    My solutions are not sparse at all. For example for N=12, the solution is non-zero at all of the 91 coprime pairs involved. Some of the values are nice rationals, but others don’t appear to be, (although that’s maybe just a feature of how I’m converting to rationals). I can make the code, (which is for Matlab), or the solutions available if people want them but I don’t know how useful they’d be.


    Could you possibly clarify, or reference a comment clarifying, why the problems are identical. I understand your explanation in a previous comment that the existance of a function with discrepancy C forces the l_1 norm of the coefficients in a decomposition to be >1/C, but I can’t see why the bound is atained. Is there a general way we can convert from a function with minimum discrepancy to the decomposition with best possible sum?

    Computationally, solving the linear problem for discrepancy and that for decompositions seems fairly similar.

    • gowers Says:

      I should have done that in my previous comment. The proof (or at least the proof that I like) uses the finite-dimensional version of the Hahn-Banach theorem. If you can’t express a function f as c times a convex combination of functions g_i that belong to some class G, then f lies outside the convex hull of cG, so by the Hahn-Banach separation theorem there is a linear functional that separates f from cG. That is, there is a linear functional \phi such that \langle f,\phi\rangle=1 and \langle g,\phi\rangle <c^{-1} for every g\in G.

      In our case, f is the function \delta_1, that is, the function defined on positive rationals that’s 1 at 0 and 0 everywhere else. G consists of functions obtained by taking a square [r,s]^2 and counting for each rational how many pairs in the square have ratio equal to that rational. The property \langle f,\phi\rangle is telling us that \phi(1)=1, and the property that \langle \phi,g\rangle <c^{-1} for every g\in G is telling us that the discrepancy of the function \psi(x,y)=\phi(y/x) is at most c^{-1} on every square.

    • gowers Says:

      I think I would be quite interested in staring at your solutions for a bit, just to see whether anything can be read out of them. For example, I’d be interested to see whether there is a difference in behaviour at coprime pairs where the values are small, or where at least one of the numbers is reasonably smooth (obviously 12 is a bit small to tell that, but perhaps a point like (8,9) is different from a point like (5,7)), etc.

    • Alastair Irving Says:

      The solution for N=12 can be downloaded from http://dl.dropbox.com/u/3132222/12.txt

      Its a text file with 3 columns, the first two giving the coprime pair and the third the value of the function at that pair. I haven’t included rational approximations to these as some of them seem very spurious. My code doesn’t assume that the function is symmetric in interchanging the two coordinates, hence we have values for both (m,n) and (n,m). It looks like they’re all the same though, which isn’t surprising, but I haven’t checked it.

      I’ll modify the code to assume reflective symmetry tomorrow and thus be able to produce some bigger solutions.

    • gowers Says:

      If I’m not much mistaken, we can always replace a solution f(x,y) by (f(x,y)+f(y,x))/2, so assuming symmetry should be fine.

    • Alastair Irving Says:

      I’ve modified the code to assume symetry. The solution for N=12 looks the same, but I’ve replaced the old version with it anyway. I’ve also got a solution for N=42 which is at http://dl.dropbox.com/u/3132222/42.txt

    • Alec Edgington Says:

      Here’s a plot of the solution for N = 42:

      I suggest saving the image file to your computer and zooming in. Light pixels correspond to large positive values, dark ones to large negative values.

  20. gowers Says:

    A quick observation that’s so trivial it’s embarrassing, but I overlooked it for a while. It’s that while looking at rectangles of width 1 is sufficient to show that you can’t get a \pm 1-valued function that’s constant along rays and has bounded discrepancy on squares (if EDP is true), they are not enough to tell us about more general functions. The example is the function that’s 1 on the main diagonal and 0 everywhere else. This has discrepancy at most 1 on any rectangle of width 1.

    Of course, we know that rectangles of width 1 are not good enough to make efficient decompositions, so this doesn’t come as a huge surprise. It seems to indicate that the \pm 1 assumption is quite a strong one, though I’m not quite sure about that. It would be nice if the \{-1,0,1\} version had some one-dimensional consequence: I think I’ll think about that next.

  21. gowers Says:

    This is to report on something I tried that has not yet worked. I thought (and still think) that it would be interesting to see if we could obtain a non-trivial estimate for the best bound that can be obtained using squares of side length at most 3 and fly traps. The way I wanted to do it was to construct a function that had small discrepancy on all such sets. The trivial bound is obtained by means of the function that is 1 when x=y and 0 otherwise. This has discrepancy at most 3 on all the sets that interest us. So the question is whether we can improve on 3. Note that if we just go for 2-by-2 squares, then Alec’s family of decompositions shows that this function is the best possible, since 2 really is the best bound. So this question amounts to asking whether, if we allow ourselves 3-by-3 squares as well, we can obtain a constant arbitrarily close to 1/3.

    I set out on this hoping to find a clever function that would have discrepancy at most some constant less than 3 on all 2-by-2 squares, all 3-by-3 squares and all fly traps. (Of course, I’m always insisting that the function should be 1 on the diagonal and constant on rays.) I thought it was going to be easy because to deal with the squares I cared only about what happens for pairs (r,s) with |r-s| at most 2. And indeed, it is easy to get a non-trivial bound for the squares: for instance, if we define \phi(r,s) to be 1 if r=s and -5/8 if |r-s|=1 or 2, then the sum on all 2-by-2 squares is 3/4 and the sum on all 3-by-3 squares is -3/4. This starts to suggest a bound of 4/3, but we know that can’t be achieved, since the inclusion of 2-by-2 squares means that 2 is the best bound we can hope for. The reason this isn’t a contradiction is that we haven’t dealt with the fly traps.

    And that is where things start to get difficult. Once you start putting in lots of values off the diagonal, as we have now done, you commit yourself to many more. From this point of view, the choice of -5/8 all the way down the diagonals |r-s|=1 is disastrous for us, since it will force values of -5/8 everywhere on the fly traps of width k at k! (that is, the ones Alec used). So it will in fact give rise to unbounded discrepancy.

    Thus, there is quite a nice problem I don’t yet know how to answer. First and foremost, can one improve on the trivial bound of 3 for this problem? That is, can one find C<3 and a function that’s 1 on the main diagonal, constant on rays, and that has discrepancy at most C on all squares [r,s]^2 with |r-s|\leq 2 and on all fly traps? So far, I can’t even answer the following weaker question: can one get the discrepancy to be at most C on the squares and bounded on the fly traps?

    • gowers Says:

      A simple observation that has some bearing on the relationship between the two questions is that the weaker version of the question for 2-by-2 squares is simple. Indeed, suppose we manage to get the discrepancy to be at most 2(1-1/k) on all 2-by-2 squares [r,r+1]^2. Then the value of the function (assuming symmetry) has to be at most -1/k at all points (r,r+1). It follows that the value at (m!,m!+t) is at most -1/k for all t\leq m, so the discrepancy is unbounded on fly traps.

      I think this gives another perspective on why finding a decomposition using 3-by-3 squares is much harder than finding one using 2-by-2 squares: there is much more flexibility when it comes to devising functions with low discrepancy. For instance, we might try to do it as follows. First we define f(r,r+1) to be -1/k for every r. Then we do a bit of adjustment: we look at numbers m with lots of small factors, where there will now be fly traps with large discrepancy, and we make some adjustments to the values at (r,r+1) when either r or r+1 is of the form m/j for some small factor j of m. That will involve creating some 2-by-2 squares where the discrepancy is now slightly bigger than 2, but we could keep it down to 2+1/k perhaps.

      Actually, I’ll stop there because I’ve got confused about what my aim is. Is it to keep the discrepancy down to almost 2, or is it merely to keep it below 3-\epsilon? I’m interested in both problems, but they seem fairly different.

      Instead, I’ll just make the general point that if the only numbers where we make adjustments are of the form m/j, where m has many small factors and j is one of those small factors, then it’s not clear to me whether we have to make adjustments at big clusters of consecutive numbers. I think this may be at the heart of the problem — perhaps even at the whole of EDP — though I don’t yet have a precise formulation of what it is I’m asking.

  22. gowers Says:

    I want to think aloud for a bit about the problem mentioned in my previous comment. Suppose that we start with a first approximation to what we want, by setting f(r,r\pm 1) and f(r,r\pm 2) to be -1/2 for every r. Now we know that this causes problems for fly traps with lots of small factors, so the next step is to do an adjustment. For now I’ll concentrate on the “weak” problem, so all I want to do is make the discrepancy on fly traps bounded. Note that so far the discrepancy on the squares is 1 for 2-by-2 and 0 for 3-by-3, so we’ve got a bit of elbow room.

    Let’s suppose that we want to keep the discrepancy on fly traps below something like 1000. Then we have a problem at M only if there exists m such that the number of factors of M that are less than m outnumbers the number of non-factors of M that are less than m by at least 1000. Let’s suppose that M and M' are two such numbers, and that j and j' are factors of M and M' that are less than m. Then M and M' have at least 1000 factors in common, which … well, ordinarily it might suggest that M and M' differ by the lowest common multiple of some pretty huge number, so that M/j and M'/j' are not close to each other.

    But as I write that, I see that it’s not obviously true, which is good news. So let’s try to think in more detail about how it could be false.

    I’ll take an extreme example. First we insist that both M and M' are divisible by 1000!. Sorry, this example isn’t working. Back to my previous line of thought.

    Let’s just look at M for a bit. Let p_1,\dots,p_r be some primes that do not divide M. Then no number that is divisible by any of the p_i can be a factor of M. It follows (provided r is not too big) that the probability that a number less than m is a factor of M is at most (1-1/p_1)\dots(1-1/p_r) or so. From that it follows that the sum of the reciprocals of the p_i cannot be too large. Assuming that t is not too small, that tells us that almost all primes up to t are prime factors of M.

    Now I want to know whether it is possible to find some s such that there exist u and v less than m such that su is one number like that and (s+1)v is another. The difficulty is that s and s+1 are coprime, but maybe we can deal with that by multiplying them by u and v to get the extra factors we need. Except that that doesn’t seem very easy: u and v are much much smaller than s, so they don’t seem to have enough smoothness to create all the extra divisibility that we need.

    Let me try to say that more clearly. Here’s a number-theoretic question I don’t know the answer to. Fix a large positive integer t. For how big a k can we find sets A and B of integers between 1 and t, such that |A|=|B|=k and every number in A is coprime to every number in B? I suspect that k has to be quite a lot smaller than t. One possibility is to partition the primes and take …

    OK, that problem is easy. The best you can do is partition the primes into two and let A be all numbers you can make out of one set of primes and B be all numbers you can make out of the other. I feel as though that should make the product of the sizes of A and B be less than t, but I don’t yet see why I’m saying that. Yes I do. If we make A and B maximal like that, then every number up to t can be written uniquely as a product of something in A and something in B. (Just take the prime factorization and split it up in the way you have to.) But that just gives a lower bound on the product of the sizes of A and B. I’m not sure how to get an upper bound. Actually, it’s false, since just by taking primes we can get A and B to have size t/2\log t.

    OK, I can at least prove, but won’t bother with the details, that one or other of A and B must have size o(t).

    I need to stop this rambling comment for a bit. But it’s looking quite hard to demonstrate that any major problems would arise if one decided to adjust the values on the very smooth fly traps to be zero, and made any other changes that were implied by that. Of course, there would still be many more changes that had to be made.

  23. gowers Says:

    I’m finding it very difficult to say anything precise, or to decide whether certain things are likely to be possible. The question I’ve been struggling with above — can we have a function that is 1 when x=y, that has discrepancy at most 2.999 on all 2-by-2 and 3-by-3 squares, and that has bounded discrepancy on all fly traps — still seems to be hard, even though it is so weak that an answer either way would tell us little about EDP. But I still can’t solve it, so I want to make the question weaker still. Here, then, is a question that it really ought to be possible to answer.

    The question is this. It is easy to create functions that are 1 on the main diagonal and that have discrepancy at most 2.999 on all squares [r,s]^2 when |r-s|=1 or 2. However, the obvious ones have the property that they give rise directly to unbounded discrepancy on fly traps. What I want to do is say precisely what I mean by “give rise directly to” and then ask whether what I have observed is necessary, or whether there exist cleverer examples that do not give rise directly to unbounded discrepancy on fly traps.

    Here is the definition of “give rise directly to”. I just mean that once you’ve decided on the values of f(x,y) when |x-y| is 1 or 2, you then put in all other values that follow from the condition f(ax,ay)=f(x,y). You then see if for every C there exists a fly trap such that it is impossible to fill in the remaining values on that fly trap so as to keep the discrepancy below C. To be more precise, for each N, you fill in all the values f(N,j) that are forced by the constant-on-rays condition. Then you see whether there is an interval of js such that the values are all fixed and add up to more than 2C in modulus.

  24. gowers Says:

    An obvious way of weakening that question yet further is to insist that the interval in question starts at 1. So now the question is this. Suppose f is a function defined on the set of all (x,y) such that |x-y|\leq 2, and suppose that it is 1 when x=y and that the discrepancy on 2-by-2 and 3-by-3 squares of the form [r,s]^2 never exceeds 2.999. (To make the question yet weaker we could go for 2.1 instead, so our hypothesis is stronger. In fact, I think that is probably a good idea. Note also that I am assuming that f(r,r+1)=f(2r,2r+2) for every r.) Does it follow that for every C there exist positive integers k and M such that k!|2M and |\sum_{j\leq k}f(2M/j,2)|>C?

    I think I may already have established that this is not the case. If k!|2M, then k!/2|M. What can we say about M/j mod k! if it is an integer, and 2M/j otherwise? Answer: we know that it is a multiple of k!/2j or k!/j. Now if we have two distinct such numbers, they must differ by a lot. Either they will be two distinct multiples of k!/2j or k!/j, or they will use different js but will in any case be distinct multiples of k!/2jj' for some j,j'\leq k.

    So now all one has to do is define f as follows. If x=y then f(x,y)=1. If either x or y is divisible by k!/2, then f(x,y)=0. If |x-y|=1 or 2 and we have not already chosen the value of some f(ax,ay) to be 0, then we set the value of f(x,y) to be -1/2.

    This guarantees immediately that the discrepancy on any 2-by-2 square is either 2 or 1 (depending on whether we put in 0 or -1/2 off the diagonal). As for a 3-by-3 square [r-1,r+1]^2, I think the argument above shows that all its off-diagonal entries will be -1/2 unless one of r-1, r or r+1 is M/j or 2M/j for some M such that 2M is a multiple of k!. Now the only thing that can stop the discrepancy on a 3-by-3 square being at most 2 is if all the off-diagonal entries are 0. Since the numbers M/j and 2M/j are well-separated, the only way even the entries f(r,r\pm 1) can both be zero is if r=M/j or 2M/j. But even then, for the discrepancy to be more than 2, we need f(r-1,r+1) to be zero as well. So we need r-1 or r+1 to be of the form 2M/j. And that can’t happen.

    If that argument is correct, then what it shows is this. We can’t find a proof that the discrepancy has to exceed 2 (that is, improve on c=1/2) by taking a bunch of 3-by-3 squares and one fly trap and arguing that if the discrepancy is at most 2 on those squares then it forces us to choose values on some fly trap that add up to something large.

    That wasn’t very well put, but I think I’ve now thought of a nice way of putting it. I think it is possible to find a function with the following properties.

    1. f(x,y)=1 whenever x=y.

    2. f(ax,ay)=f(x,y) whenever |x-y|\leq 2.

    3. The discrepancy of f is at most 2 on all 2-by-2 or 3-by-3 squares.

    4. The discrepancy of f is bounded on all fly traps.

    In other words, any proof that the discrepancy is unbounded on fly traps if it is bounded on small squares would have to use the fact that f is constant not just on rays that go through points of the form (r,r\pm 1) or (r,r\pm 2) but on more general rays, even if we are talking only about 2-by-2 and 3-by-3 squares. I think the bound in 4 could be made pretty small too, but that needs checking.

  25. gowers Says:

    Let me dualize that last problem in an attempt to understand it better. I think that the existence of such a function is telling us that we cannot find a decomposition g=\sum_i\lambda_iS_i+\sum_j\mu_jF_j with the following properties.

    1. Each S_i is a 2-by-2 or 3-by-3 square and each F_j is a fly trap.

    2. g sums to 1 along the line x=y and to 0 along all lines that pass through a point of the form (r,r\pm 1) or (r,r\pm 2).

    3. For all (x,y) that do not belong to one of the above rays, g(x,y)=0.

    4. 2\sum_i|\lambda_i|+C\sum_j|\mu_j|< 1.

    Let me check that that does follow from the existence of a function f with the discrepancy property mentioned in the previous comment. I’ll do that by thinking about the sum \sum_{x,y}f(x,y)g(x,y). Let us split this sum up according to rays. The sum along the ray x=y is 1, since f is constantly 1 and g sums to 1. Along any ray that goes through a point of the form (r,r\pm 1) or (r,r\pm 2) the sum is zero, since f is constant and g sums to 0. Along any other ray, the sum is zero, since g is identically zero. So the sum is 1.

    Now let’s get a contradiction by summing instead over the squares and fly traps used to decompose g. We have that the sum is at most

    \sum_i|\lambda_i||\langle f,S_i\rangle|+\sum_j|\mu_j||\langle f,F_j\rangle|.

    But by hypothesis |\langle f,S_i\rangle|\leq 2 for every i and |\langle f,F_j\rangle|\leq C. So the sum is at most


    which, by hypothesis 4, is less than 1, the desired contradiction. As I say, I haven’t carefully checked, but I basically know that Hahn-Banach will show that this is necessary.

    I think the existence of f therefore implies that there is no way of obtaining a bound better than c=1/2 if we just use 2-by-2 squares, 3-by-3 squares and fly traps. (I don’t claim that I’ve definitively proved that — there are some important details that need checking, such as whether I can reduce C and what happens with intervals that do not start at 1. In fact, that second point may turn out to be particularly important.)

  26. Alec Edgington Says:

    I’ve been searching for symmetric functions taking values in \{\pm 1, 0\} with discrepancy 1 on rectangles of the form [1,m] \times [1,n]. (This implies a discrepancy of 4 on all rectangles). To narrow down the search, and because it seems to be possible so far, I’m restricting the non-zero values (a,b) to \max(a,b) \leq 2 \min(a,b).

    A depth-first search (favouring zero values) running in Python has so far reached 49:

    It seems to have got rather stuck at 50 (not surprisingly, since it seemed to go a bit mad at 25).

  27. gowers Says:

    That’s a great picture. For my own part, given that I can’t by theoretical means find a decomposition that gets a bound better than c=1/2, I’ve decided to concentrate on an easier task, which is to look for low-discrepancy functions that prove that decompositions with certain additional restrictions cannot exist. For instance, I think I may be able to manage a proof that it can’t be done with 3-by-3 squares and fly traps if all the fly traps have coefficients of the same sign, and perhaps even in general. I’m not actually claiming that as a result yet, but it seems a realistic target to determine whether it can be done. (I suppose I’d be happier if it could be done, but at the moment that is not what I believe.) I hope to have something a bit more definite to report over the next couple of days.

    If that works, then what I would hope to do is generalize it as much as possible, the aim being to narrow down considerably what a decomposition could look like. That could have two possible benefits: making the computational problem of finding a decomposition much easier, and making the theoretical problem somewhat easier.

    I’m interested in revisiting some of the experimental evidence from a few months ago from this point of view and seeing whether we can “dualize” it somehow. By that I mean that we could look at where various searches backtrack and try to build decompositions based on those points and relationships. For instance, perhaps the program could tell us that certain HAP-constraints were important ones and others were not. That would then give us a huge clue about how to find a decomposition.

    One snag is that the decompositions tend to be equivalent to vector-valued problems, so it may be that the evidence we have collected so far is not in fact helpful for finding decompositions. And it’s not completely obvious to me how one should search for best possible bounds in the vector-valued case.

  28. gowers Says:

    Another question I want to think about is the hypergraphs question I was discussing earlier. We have a collection of sets (squares or rectangles) and we want to find a function with small discrepancy on those sets, subject to the condition that certain values that that function takes have to be equal. To do this, we want to find a finite collection of the sets that we can use to form an efficient decomposition. But that will imply a discrepancy result for the following hypergraph. The vertices are the rays, and the hyperedges are sets of rays that go through a set in the collection.

    An important difference between this question and most discrepancy questions is that we are not assuming that the function takes values \pm 1, but instead are assuming that there is one vertex that belongs multiply to many hyperedges. It might be reasonable to assume that the ray x=y is the only ray that intersects any square in our collection more than once. In that case, we are trying to show that the effect is to give a positive kick to the sets in our collection that we cannot manage to cancel out.

    What I’d like to investigate is the number-theoretic question of how to find squares that create a hypergraph that has even the remotest chance of achieving this. For instance, if any square contains a point that is the only point in its ray that intersects a square in our collection, then we can give that point any value we like and cancel out the discrepancy on that square. So we may as well get rid of that square.

    I think I’ll write a new post soon and try to formulate various questions as precisely as I can.

  29. charikar Says:

    I haven’t followed the discussion in a while. I tried to catch up on the most recent comments in the last couple of days, but I am not up to speed on everything, so I may be asking naive questions and repeating things that you already know. It seems to me that the question being considered currently is very similar to some versions of the diagonal representation question we considered way back in EDP12 and EDP13. Is it the same as Problem 1 mentioned in Tim’s EDP13 post with the multiplicative constraint that Alec suggested in this comment ? The squares and fly-traps construction also looks similar to a construction we had earlier that was inspired by looking at LP solutions here .

    It may be worthwhile to look at LP solutions for inspiration in constructing functions f(x,y) that have low discrepancy on squares. In fact, if the squares and fly-traps construction does arise as an optimal solution to the LP, then the dual solution is a low discrepancy function with maximum discrepancy equal to the bound established by squares and fly-traps. One would hope that there should be considerable structure in this dual solution. One problem with looking for structure in these LP solutions is that you need large values of n since n=lcm(2,3,\ldots,k). A trick to handle this issue is to start at 0 instead of 1, so 0 plays the role of the highly divisible number. But I’m not sure if starting at 0 makes sense for the current discussion. In any case, it may be an easier variant to think about because there was a very definite trend in the optimal values of the LP for the problem over \{0,1,\ldots,n\} – in fact the optimal value seemed to be exactly 2-5/(n+2).

    I’m a little rusty, but I recall from looking at these LP solutions previously that it appeared that the optimal dual solution was probably not unique and there were some extra degrees of freedom. Even so, in principle, one should be able to determine the form of these dual solutions if indeed the squares and fly-traps construction arises as an optimal solution to the LP. What this tells us is that the squares used in the linear combination for the squares and fly-traps are the squares where the maximum discrepancy is achieved – all other discrepancy constraints are not tight. The tight constraints (provided we also know the sign for each of them) should determine a family of dual solutions and there must exist some choice of free variables such that the dual solution obtained satisfies all the discrepancy constraints.

    • gowers Says:

      How amusing that we are revisiting an earlier discussion without (at least in my case) realizing it. But I think it is the right thing to do, and that we shouldn’t have forgotten about it the first time round.

      What you say in the last paragraph is what I was saying, but more vaguely, in my last comment but one. I think it might be interesting to look at some of the experimental evidence coming out of the LP problem in order to get a feel for what the dual solutions are like — what I would be hoping is that it would be easier to spot patterns in the dual solutions.

      I suppose another idea, though it might be rather too ambitious, would be to try to think of a strengthening of the discrepancy statement, insisting on lower discrepancy for some squares than for others, so that all the constraints become tight. Equivalently, one would attach different weights to the squares in a decomposition, so that some of them were more expensive than others, in the hope that the best decomposition then used the squares much more uniformly.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: