## EDP22 — first guest post by Gil Kalai

### The general form of a discrepancy problem

Let $H$ be a hypergraph, i.e., a collection of subsets of a ground set $A$. The discrepancy of $H$, denoted by $disc(H)$ is the minimum over all functions $f:A \to \{-1,1\}$ of the maximum over all $S \in H$ of

$|\sum \{f(s):s\in S\}|$.

We will mention one additional definition, that of hereditary discrepancy. When $H$ is a hypergraph and $B\subset A$, the restriction of $H$ to $B$ is the hypergraph with vertex set $B$ whose edges are all sets of the form $S \cap B$ for edges $S$ of $H$. The hereditary discrepancy $herddisc(H)$ of $H$ is the maximum over all $B \subset A$ of the discrepancy of $H$ restricted to $B$.
Here is a link for a recent post discussing discrepancy and the famous Beck-Fiala theorem. The Beck-Fiala theorem assert that if every element in $A$ is included in at most $t$ sets in $H$ then $disc (H) \le 2t$. (Of course, the theorem applies also to the hereditary discrepancy.)

## The Erdős’ Discrepancy Problem reviewed

### The problem and links to polymath5 material

Erdős Discrepancy Problem (EDP). Is it possible to find a $\pm 1$-valued sequence $x_1,x_2,x_3\dots$ and a constant $C$ such that $|\sum_{i=1}^nx_{id}| \le C$ for every $n$ and every $d$?
A HAP (Homogeneous Arithmetic Progression) is an arithmetic prograssion of the form {k,2k,…,rk}. EDP asks about the sum of a $\pm1$ sequence on HAPs.
Given a $\pm1$ sequence $x_1,x_2,\dots$ define $E(n)$ to be the maximum of sums of subsequences over HAPs which are subsets of {1,2,3,…,n}. EDP asks if we can find a sequence for which $E(n)$ is uniformly bounded and we will be interested in finding sequences where $E(n)$ grows slowly.
EDP was extensively discussed and studied in polymath5. Here is the link of the first post. Here are links to all polymath5 posts. Here is the link to polymath5 wiki.
A sequence is called completely multiplicative if $x_{nm}=x_nx_m$ for every $n$ and $m$. EDP is of great interest even if we restrict our attention to completely multiplicative sequences. For those we have only to consider partial sums $\sum_{i=1}^r x_i$.

### An example where the discrepancy grows like log n.

The function that takes n to 1 if the last non-zero digit of n in its ternary representation is 1 and -1 if the last non-zero digit is 2 is completely multiplicative and the partial sum up to n is easily shown to be at most $\log_3 n+1$. Therefore, the rate at which the worst discrepancy grows, as a function of the length of the homogeneous progression, can be as slow as logarithmic.

### What a random assignment gives us.

A random sequence (or a random completely multiplicative sequence) gives us discrepancy close to $\sqrt n$. (There is apparently an additional $\sqrt{\log n}$ factor but I am not sure of the precise asymptotic behavior.)

## Two greedy algorithms

From now on we write $f(n)$ instead of $x_n$.
Greedy algorithm 1 (for multiplicative functions): Assign the value $f(p_n)$ as to minimize the maximum discrepancy in every partial sum whose terms are now determined.
Greedy algorithm 1 (for general functions): Assign the value $f(n)$ so as to minimize the maximum discrepancy in every HAP whose terms are now determined.
Problem 1: How does Greedy algorithm 1 perform?
Empirical observation: (This was claimed in some polymath 5 remarks but I am not sure if there was definite evidence. I would appreciate clarifications.) The discrepancy for sequences based on the greedy algorithm 1 (for multiplicative functions and for general functions) is $\Omega (\sqrt n)$.
Interpretation: Greedy algorithm 1 optimizes an “irrelevant task”.
We would like to suggest here
Greedy algorithm 2 (for multiplicative sequences): Assign the value $f(p_n)$ so as to minimize the maximum discrepancy in every partial sum where unassigned entries get the value zero.
Greedy algorithm 2 (for general sequences ): Assign the value f(n) as to minimize the maximum discrepancy in every partial sum in every HAP where unassigned entries get the value zero.
Problem 2: How does Greedy algorithm 2 perform?
Omri Floman ran Greedy 2 on inputs such as N=10000 and got a discrepancy of around 20 (there is a bit of randomness involved in cases of ties). For N=100000 he got about 45. It is unclear what the behavior is.
Problem 3: Can we find an optimal compromise between Greedy 1 and Greedy 2?
Problem 4: Can randomization help?
Of course, since we know that a sequence of length n and discrepancy log n exists if we draw a random $\pm 1$ sequence there is a probability larger than $2^{-n}$ of reaching such a low discrepancy sequence. What we want to ask is if randomization can lead to a method of getting a low-discrepancy sequence with larger probability, or even better with provable larger probability. (Or even better yet, to a sequence of provable low discrepancy via an effective algorithm.)
Problem 5: How do Greedy algorithms 1 and 2 perform if we apply them for a random ordering of {1,2,…,n}.

## Extensions of Erdős’ Discrepancy problems

Here are some variations of the Erdős’ Discrepancy Problem along with some guesses for the answers. I will explain where these guesses came from next time.
E0: Erdős’ Discrepancy Problem

Guess: $\sqrt{\log n}$

E1: Allow f(n) to attain values which are complex numbers of norm 1.

Guess, same answer $\sqrt{\log n}$

(Perhaps we can even consider norm-1 vectors in some Euclidean space or Banach space. If this is too premissive ($\ell_1^{\infty}$) we may go down to a constant.)

### Square-free integers

E2: The EDP for square-free integers
Here we simply consider sequences where $x_m=0$ if $m$ is not square-free. For this variation the multiplicative version of the problem is not a special case of the general question. (Here multiplicative means that $x_{nm}=x_nx_m$ if $n,m$ are coprime.)

Guess: for both versions, same answer $\sqrt{\log n}$.

### Random subsets of integers

E3: Instead of square-free integers consider a random dense subset of integers and assume that the sequence vanishes for indices not in the subset.
Guess: same answer $\sqrt{\log n}$.

### Random sets of primes

Here we consider multiplicative functions which are non zero only on a random dense sert of primes.
E4: The EDP for a random dense subset of primes.

Problem 6: Find a sequence for problems E2, E3, and E4 with discrepancy $o(\sqrt n)$, or better $n^{1/2-\epsilon}$ or better $polylog (n)$. Update: see below. It can be shown that sequences with discrepancy $n^{1/\log \log n}$ exist for all these variations.

### Prime power differences

E5: The EDP for HAP with prime power differences.

Guess: $\sqrt{\log \log n}$.

### Beurling primes and integers

Beurling primes were defined by Arne Beurling in 1937 and he also proved a prime number theorem for them. The most general definition is very simple: Consider a sequence of real numbers $1 regarded as “primes” and consider the (ordered) sequence of their products (multiplicities allowed) $1 as the “integers”. (We will assume that all products are distinct although for the original purpose of defining a zeta function multiplicities may play a role.) Beurling primes played a role in the polymath4 discussions.
E6: The EDP for Beurling primes and integers

Careless Guess: at most $\sqrt{\log n}$.

### Sune Kristian Jakobsen’s systems of pseudointegers

One way to think about Beurling primes is to identify $y_n$ with $n$ and to reorder the integers according to the ordering of the $y_n$s. Actually, given the ordering we can recover uniquely the Beurling primes. A much more general notion of “pseudointegers” was suggested over polymath5 by Sune Kristian Jakobsen. See also the overview over Polymath5’s wiki.
An ordering of the natural numbers is a SKJ-ordering if it fulfills the following two conditions:
1) If $a \ge b$ and $c \ge d$ then $ac\ge bd$.
and
2) for any $a$ the set $\{b: b is finite.
Remark: 1) Sune Kristian considered orderings on sequences of integers (the exponents in the prime factorization). This is equivalent to (but perhaps less provocative than) the formulation here. 2) We can expect that Beurling-orderings are a tiny tiny subset of SKJ-orderings.
Given an SKJ-ordering of the natural numbers we can ask the EDP for that ordering.
E7: The EDP for Sune Kristian Jakobsen systems of integers.

Careless Guess: at most polylog n

Problem 7: How does Greedy algorithm 2 perform for variations E2 -E7.
The answers for E2, E3 and E4 is especially interesting because these are examples where the best upper bounds I know behave like $\sqrt n$ and are obtained from a random assignment. (See problem 5.)
Problem 8: Prove the assertion of EDP (namely that the discrepancy is unbounded) for an SKJ-pseudointegers of your choice.

## Related Discrepancy Problems

### Discrepancy and hereditary discrepancy

Recall that for a hypergraph $H$ defined on a ser $A$ the discrepancy of $H$, denoted by $disc(H)$ is the minimum over all functions $f:A \to \{-1,1\}$ of the maximum over all $S \in H$ of

$|\sum \{f(s):s\in S\}|$.

When $H$ is a hypergraph and $B\subset A$, the restriction of $H$ to $B$ is the hypergraph with vertex set $B$ whose edges are all sets of the form $S \cap B$ for edges $S$ of $H$. The hereditary discrepancy $herddisc(H)$ of $H$ is the maximum over all $B \subset A$ of the discrepancy of $H$ restricted to $B$.

### An operation of hypergraphs

Let $H$ be a hypergraph on a vertex set $A$ and assume that $A$ is ordered, e.g., $A=\{1,2,\dots,n\}$. Consider the hypergraph $G$ obtained from $H$ by adding for every set $S \in H$ all its initial subsets w.r.t. the ordering. We will consider the operation of moving from $H$ to $G$. Note that for EDP all our variations E1-E8 the underlying hypergraph is obtained by this construction $H \to G$. (From a certain natural hypergraphs $H$). I further guess that for all the variations E1-E8 if we consider the hypergraph $H$ (before taking initial segments) then the discrepancy is bounded.

### Probabilistic versions

Let $A$ be a ground set and let $1\ge p_1\ge p_2\ge\dots\ge p_m>0$ be a set of reals. First consider the discrepancy problem for a random hypergraph $H$ with $m$ edges, whose ith edge is a random set $S_i$ so that every element of $A$ belongs to $S_i$ with probability $p_i$ (and all these events are statistically independent). Next we can move from $H$ to $G$ as described above.
The case of taking $G$ for the probabilities $p_i=1/i$ can be seen as a probabilistic analog of EDP. My guess for the discrepancy of this case is also $\sqrt{\log n}$. I also guess that the discrepancy of $H$ itself is bounded.
Problem 9: How do Greedy algorithms 1 and 2 perform for the random hypergraph $G$.

### Roth’s theorem about the discrepancy of arithmetic progressions

For EDP the ground set is all natural numbers, or just the set {1,2,…,n} and the hypergraph is the collection of all HAPs(homogeneous arithmetic progressions). Roth considered the hypergraphs of all arithmetic progressions. Roth proved that the discrepancy in this case is at $\Omega (n^{1/4})$. The existence of $f$ with discrepancy of $O(n^{1/4} \log^{5/2} n)$ was proved by Beck and the $\log^{5/2} n$ factor was removed by Matousek and Spencer. These works by Beck, Matousek and Spencer may be very relevant to prove the existence of low-discrepancy sequences for EDP and some of its extensions.

## Some results with Noga Alon

Here are some results proved in collaboration with Noga Alon which are based on the Beck-Fiala theorem and a general argument on moving from $H$ to $G$. (It is likely that some of them are known and we will be happy to know.)
Proposition 1: For $p_i=1/i$ the discrepancy of $G$ described above is at most $(\log^3n)$.
This follows from the following general proposaition:
Proposition 2: Let $H$ be a hypergraph on an $n$ element ordered set $X$ and let $d$ be the maximum degree of a point of $X$, then $disc(G)\le 2d log^2n$.
Proof: Let $X=X_1 \cup X_2$ be a partition of $X$ into two nearly equal intervals, $X_1=X_{11}\cup X_{12}$ a partition of $X_1$ into 2 nearly equal intervals and similarly $X_2= X_{21}\cup X_{22}$, etc ($\log n$ levels). Now define a new hypergraph $H'$ obtained from $H$ by replacing each set $A$ in $H$ by the following $1+2+4+...+n$ sets (possibly some empty):

$A\cap X(=A)$, $A\cap X_1$, $A\cap X_2$,$\dots,$ $A \cap X_{ij}$ (for all $i,j$), $\dots$.

Note that the maximum degree of a point in $H'$ is $d log_2 n$, hence the discrepancy of $H'$ is at most $2d\log n$ by Beck-Fiala’s theorem. Also, each initial segment of $A$ is a union of at most $\log n$ pairwise disjoint members of $H'$. This gives Proposition 1, and since for the case described in Proposition 2 the degree of vertices for $H$ behaves like $\log n$ we obtain Proposition 2 as well.
Proposition 3: for versions E1-E8 there are examples where the discrepancy is $n^{O(1/\log\log n)}$.
Proof: In all these examples we consider a hypergraph $H$ on the ground set A={1,2,…,n}, or on a subest of A with positive density (or for E7 and E8 on another non conventionally ordered set of integers). Then move to the hypergraph $G$ of initial subsets of edges of $H$. To apply Proposition 2 and Beck-Fiala’s theorem we need to find an upper bound on the degree of a vertex in the hypergraph $\cal H$. Consider the case of EDP. (The other cases are similar.) The maximum degree is obtained by an integer that is the product of the first k distinct primes. In this case the degree $2^k$ is smaller than $n^{1/\log \log n}$. Note that the proof applies even to hereditary discrepancy.
Problem 10: Use similar ideas to prove better (even polylog (n)) constructions for EDP and its variations E1E8.
I extend all my earlier guesses when we move from discrepancy to hereditary discrepancy.
Proposition 4: The hereditary discrepancy for the hypergraph of HAP on {1,2,…,n} is $\Omega (\sqrt{\log n/\log \log n})$.
The proof is obtained as follows: Consider the first $m$ primes, and a hypergraph $H$ on {1,2,…,m} of discrepancy $\sqrt m$. (For example, a Hadamard matrix of order $m$ describes such an hypergraph. ) Next, for every edge $S \in H$ consider the integer that correspond to products of primes whose indices are in $S$. Then restrict your attention only to these integers.
Propositions 3 and 4 show that the hereditary discrepancy of the hypergraphs of HAPs in {1,2,…,n} is between $\sqrt {\log n/\log\log n}$ and $n^{1/\log \log n}$.

## A Quick Outlook at Polymath 5: EDP 1-21

Before our quick review of polymath5, let me mention a major difficulty which seems relevant to all approaches:

If we allow zero entries in our sequence, even a small density of zero entries, then the discrepancy can be bounded.

For example consider the sequence 1, -1, 0, 1, -1, 0, 1, -1, 0, …
Experimentations: Computer experiments played a large role in polymath5. One of the most striking discoveries was a sequence of length 1124 of discrepancy 2. Later a second sequence of length 1124 and discrepancy 2 was found but not a larger sequence. Several people made great contributions with computer experiments.
Additive Fourier analysis ideas: Using Fourier analysis on our sequence and somehow reaching a contradiction when we assume that the discrepancy is bounded was a suggestion that was dominant in the first few threads and we came back to from time to time.
Terry Tao’s reduction: Terry Tao found a reduction from the general question to the variation of multiplicative functions with complex norm-1 values (E1). The beautiful proof relies on “multiplicative” Fourier analysis and it is striking how “little” the proof uses.
Problem 11 (Sune Kristian Jakobsen): Does Tao’s reduction apply to arbitrary SKJ-pseudointegers.
Semi-definite programming: A major turning point was a suggestion by Moses Charikar to use a natural semi-definite relaxation for the problem. This promising avenue was explored over several threads and here too computer experimentation was done. One reason to regard Charikar’s approach (and the related linear programming approach described next) as hopeful is precisely because it offers a convincing way to get around the difficulty demonstrated by the sequence 1, -1, 0, 1, -1, 0, …
A generalization and linear programming: The last few threads were centered around a different related relaxation proposed by Tim Gowers. The problem was generalized from a single-sequence problem to a pair-of-sequences problem. (This is a common motif in extremal combinatorics although here the motivation came from functional analysis.) Then relaxation of the problem led to a very appealing linear programming question.
Mathoverflow questions: Mathoverflow was used to ask several questions related to the project.
Participation: Polymath 5 attracted much interest and wide participation. Overall, it did not attract researchers with major prior interest in discrepancy theory.

## Discrepancy

Discrepancy theory is a huge an exciting area. Let me just give references to some books.
Jozsef Beck, William W. L. Chen: Irregularities of Distribution, Cambridge University Press, 1987. And, Jozsef Beck: Irregularities of Distribution (Cambridge Tracts in Mathematics Cambridge Tracts in Mathemat Volume 89)(Paperback)
Bernard Chazele: The Discrepancy Method: Randomness and Complexity
Jiri Matousek: Geometric Discrepancy: An Illustrated Guide (Algorithms and Combinatorics)
J. Beck: Combinatorial Games: Tic-Tac-Toe Theory, Cambridge University Press, 2008.
J. Beck: A forthcoming book.
Reading these books will prepare you better to deal with EDP and will enrich your life tremendously.

### 30 Responses to “EDP22 — first guest post by Gil Kalai”

1. Some Updates | Combinatorics and more Says:

[…] Erdos discrepancy problem (polymath5): Tim Gowers and I have a plan to try soon to revive the EDP project at least for a short while. We plan 3 posts about it in the second half of August over Tim’s blog. Update: The first post EDP22 just appeared. […]

2. Sune Kristian Jakobsen Says:

You have the formula $\sum \{f(s):s\in S\}$ twice in the post. I think there should be absolute value signs around it in both cases.

Corrected now — Tim.

• Gil Kalai Says:

Dear Sune, Right, my mistake. BTW did you explore further your notion of pseudointegers?

• Sune Kristian Jakobsen Says:

Not anymore than what i have posted

• Gil Kalai Says:

Sune, I found your pseudointegers very interesting. I just remind the reasers that they can be regarded as total order relations on the positive integers, where a<b implies ac<bc and there are only finitely many integers smaller than any number n. For example you can ask: if the number of integers smaller than a prime p is roughly p
does this implies the statement of the prime number theorem? (Something of this kind is known for Beurling primes.) Two further questions: 1) I dont remember if you had examples for your pseudoprimes where EDP was violated? 2) Did you check if Tao's reduction works in this generality?

• Sune Kristian Jakobsen Says:

1): Yes, here: https://gowers.wordpress.com/2010/02/08/edp7-emergency-post/#comment-6120 (as a reply to the comment you linked to. I couldn’t remember it before I read it).

We can also construct examples where the multiplicative EDP is true, by having arbitrarily long runs of consecutive squares.

2) No. I remember that I didn’t understood Tao’s proof, at least not well enough to check if it worked for the pseudointegers. I had hoped Tao could tell if it works!

• Sune Kristian Jakobsen Says:

“We can also construct examples where the multiplicative EDP is true, by having arbitrarily long runs of consecutive squares.” That does not hold: We can’t have two consecutive pseudointeger that are square in any set of pseudointegers. Assume for contradiction that x and y are consecutive and squares. We can write them as $x=a^2$ and $y=b^2$. But then $a^2, contradiction.

So I still haven't found a set of pseudointeger where I can prove EDP.

• Gil Kalai Says:

Dear Sune, it somehow looks to me that your pseudo-integers are the same as Beurling’s integers. Maybe we can relax your definition and insist that c and d are relatively prime to a and b to get a more general notion.

• Sune Kristian Jakobsen Says:

Sune, if you can tell me what the statement following “such that” was supposed to be, then I will edit the comment. Tim

Dear GIl, yes it seems to be almost the same. Two Beurling’s integers are also allowed to correspond to the same real number, and that corresponds to the statement that we can have pseudo-integers a and b such that ab^{n}$for all n. The only difference seems to be that pseudointegers have a complete ordering. The ordering of the Beurling’s integers come from a (possible not injective) map to the reals, so it might not be complete. But that is just a little extra structure so in the following I will use “Buerling’s integers” to mean what I used to call “pseudo integers” I think that if you have a set of Beurling’s integers then for all n, the set of the first n of these Beurling’s integers will be isomorphic to a subset of $\mathbb{N}$, and I think you lose this property if you generalise further. 3. Sune Kristian Jakobsen Says: For $p>0$ define $Z(p)$ to be the smallest possible discrepancy of a sequence over $\{-1,0,1\}$ where the density of 0’s is at most p. We know that this is finite for all p>0: If $p\geq 3^{-2n}$ we can choose the sequence that is 0 on all multiples of $3^{2n}$ and sends $a\cdot 3^k$ to $(-1)^{a+k}$ for all $a\in\{1,2\}, k<2n$. This should have discrepancy n. EDP is equivalent to $lim_{p\to 0}Z(p)=\infty$. We could hope that the behavior of Z is simpler than that of E, since in the definition of E we are postponing problems, while in the definition of Z we are spreading them out. • gowers Says: Could you elaborate on your last sentence: I find it intriguing but I don’t see exactly what you mean by it? • Sune Kristian Jakobsen Says: First, $a\cdot 3^k$ in the above should have been $a\cdot 3^k+b3^{k+1}$ (I hope). It seems that I was a bit too sloppy both when I read the definition of E and when I posted the above comment. I though the definition of E(n) was “the smallest possible discrepancy of a sequence of length n”. What I meant was: One way to investigate EDP is to fix some integer d and try to find the longest possible sequence with discrepancy below d. The sequences we get are then optimized to keep the discrepancy at most d for as long as possible, and it might be that for all extensions of the sequence the discrepancy grows to, say, 3/2d, within the next few terms. This is what I mean by “postponing the problems”. Instead we might try to keep the discrepancy below d while defining the sequence on a subset of $\mathbb{N}$ with positive density (and setting all other terms to 0). If the positions of these 0’s are periodic, as in the low-discrepancy examples we know, then it will be possible to fill out more terms without increasing the discrepancy by much. • Sune Kristian Jakobsen Says: The downside of this approach is that the definition of Z involves infinite sequences. It is not even clear if there is a way to compute Z(p) for a given p. • Alec Edgington Says: You could combine the two ideas, by considering $E(k,n)$, the minimal HAP-discrepancy over sequences $f : \mathbb{Z}^+ \to \{\pm 1, 0\}$ where $f(r) = 0$ for at most $k$ values of $r$ in the range $[1,n]$ (and there’s no restriction for values greater than $n$). Then your $E(n)$ is just $E(0,n)$, and $Z(p) = \lim_{n \to \infty} E(pn, n)$. It could be interesting to think specifically about where the zero values should be placed in order to minimize the discrepancy in this context. 4. Sasho Nikolov Says: Very interesting post! Some notes on the operation on hypergraphs. The short version of this comment is that Prop 2 can be tightened, and also generalized to show that the operation on hypergraphs increases hereditary discrepancy by at most a polylog factor. 1) Prop 2 holds with the bound $O(\sqrt{d}(\log n)^{3/2}) (assuming the number of sets of$latex H$ is $n^{O(1)}$). This follows from the following theorem by Banaszczyk:

> For any convex set $K\subseteq \mathbb{R}^m$ of gaussian measure at least 1/2 and any set of vectors $v_1, \ldots, v_n \in \mathbb{R}^m$ each of euclidean norm at most 1/10, one can find a $x \in \{-1, +1\}^n$ such that $\sum{x_i v_i \in K$.

Let $M$ be the incidence matrix of $H'$ (as you define it in your proof). Scale down the columns of $M$ by $1/(C\sqrt{d\log n})$ for a large enough constant $C$ so that each column has norm at most 1/10 and let the scaled columns be $v_1, \ldots, v_n$.

Let $L$ be the incidence matrix of $G$ and $N$ be such that $L = NM$. As you argue, each row of $N$ can be taken to be a 0-1 vector with at most $\log n$ 1’s, so each row of $N$ can be taken to have euclidean norm $\sqrt{\log n}$. Define $K = \{y: \|Ny\|_\infty \leq C' \log n\}$. Clearly $K$ is convex and for large enough $C'$ has gaussian measure at least 1/2. By Banaszczyk’s theorem, we can find $x$ such that $\sum{x_i v_i} \in K$, and scaling $(v_i)$ back up we have found $x$ such that $\|Lx\|_\infty = \|NMx\|_infty = O(\sqrt{d}(\log n)^{3/2})$.

2) If $G$ is the result of applying the transformation on $H$ and $H$ has $n^{O(1)}$ sets, $\mathsf{disc}(G) \leq \mathsf{herdisc}(H) (\log n)^{3}$.

The proof is similar to yours. Take $G_1$ be the result of replacing each set $A \in H$ with $A \cap X_1$ and $A \cap X_2$; $G_2$: the result of replacing $A$ with $A \cap X_{ij}$ for all $i, j$ and so forth. This gives us $G_1, \ldots G_k$ for $k = O(\log n)$. Since each $G_i$ is the union of *disjoint* restrictions of $H$, for all $i$ we have $\mathsf{disc}(G_i) \leq \mathsf{herdisc}(H)$. Then $H' = G_1 \cup \ldots \cup G_k$ and by a recent result of Matousek, $\mathsf{disc}(H') \leq \mathsf{herdisc}(H) O(\log^2 n)$. Finally, as you argue, $\mathsf{disc}(H) \leq \mathsf{disc}(H')\log n$.

5. Sasho Nikolov Says:

Sorry, the above didn’t parse right, I am not very good with this wordpress-latex thing..

Some notes on the operation on hypergraphs. The short version of this comment is that Prop 2 can be tightened, and also generalized to show that the operation on hypergraphs increases hereditary discrepancy by at most a polylog factor.

1) Prop 2 holds with the bound $O(\sqrt{d}(\log n)^{3/2})$ (assuming the number of sets of $H$ is $n^{O(1)}$). This follows from the following theorem by Banaszczyk:

For any convex set $K\subseteq \mathbb{R}^m$ of gaussian measure at least 1/2 and any set of vectors $v_1, \ldots, v_n \in \mathbb{R}^m$ each of euclidean norm at most 1/10, one can find a $x \in \{-1, +1\}^n$ such that $\sum{x_i v_i} \in K$.

Let $M$ be the incidence matrix of $H'$ (as you define it in your proof). Scale down the columns of $M$ by $1/(C\sqrt{d\log n})$ for a large enough constant $C$ so that each column has norm at most 1/10 and let the scaled columns be $v_1, \ldots, v_n$.

Let $L$ be the incidence matrix of $G$ and $N$ be such that $L = NM$. As you argue, each row of $N$ can be taken to be a 0-1 vector with at most $\log n$ 1’s, so each row of $N$ can be taken to have euclidean norm $\sqrt{\log n}$. Define $K = \{y: \|Ny\|_\infty \leq C' \log n\}$. Clearly $K$ is convex and for large enough $C'$ has gaussian measure at least 1/2. By Banaszczyk’s theorem, we can find $x$ such that $\sum{x_i v_i} \in K$, and scaling $(v_i)$ back up we have found $x$ such that $\|Lx\|_\infty = \|NMx\|_\infty = O(\sqrt{d}(\log n)^{3/2})$.

2) If $G$ is the result of applying the transformation on $H$ and $H$ has $n^{O(1)}$ sets, $\mathsf{disc}(G) \leq \mathsf{herdisc}(H) (\log n)^{3}$.

The proof is similar to yours. Take $G_1$ be the result of replacing each set $A \in H$ with $A \cap X_1$ and $A \cap X_2$; $G_2$: the result of replacing $A$ with $A \cap X_{ij}$ for all $i, j$ and so forth. This gives us $G_1, \ldots G_k$ for $k = O(\log n)$. Since each $G_i$ is the union of *disjoint* restrictions of $H$, for all $i$ we have $\mathsf{disc}(G_i) \leq \mathsf{herdisc}(H)$. Then $H' = G_1 \cup \ldots \cup G_k$ and by a recent result of Matousek, $\mathsf{disc}(H') \leq \mathsf{herdisc}(H) O(\log^2 n)$. Finally, as you argue, $\mathsf{disc}(H) \leq \mathsf{disc}(H')\log n$.

6. Gil Kalai Says:

Some information regarding greedy algorithm 2 can be found in
Robert Lemke Oliver’s answer to this MathOverflow question
http://mathoverflow.net/questions/105383/the-behavior-of-a-certain-greedy-algorithm-for-erds-discrepancy-problem

7. Alec Edgington Says:

Terry Tao’s reduction of the problem to one about multiplicative sequences (described here) also invites a class of negative attacks that may conceivably be easier than the direct search for a counterexample. Namely, can one find a method of generating a random multiplicative sequence $g : \mathbb{Z}^+ \to S^1$ such that (with respect to the resulting random distributiom) the expectation of $\lvert g(1) + \ldots + g(n) \rvert ^ 2$ is bounded as a function of $n$? Such a method would (as shown at the bottom of that wiki page) imply a counterexample to the Hilbert-space version of EDP.

8. Looking Again at Erdős’ Discrepancy Problem | Combinatorics and more Says:

[…] Autumn I prepared three posts on the problems and we decided to launch them now. The first post is here. Here is a related MathOverflow question. Discrepancy theory is a wonderful theory and while I am […]

9. Gil Kalai Says:

Perhaps it is worth adding also

Proposition 5: For every $\epsilon>0$ there is a set of the natural numbers of density $1-\epsilon$ such that restricted to this set the discrepancy of HAPs in {1,2,…,n} is at most polylog (n).
Proof: Simply consider natural numbers with at most $K \log \log n$ prime factors for sifficiently large $K$, and apply Proposition 2

.

10. Sasho Nikolov Says:

Last week Kunal Talwar and I thought a bit about the hereditary discrepancy of HAPs. We can show that the hereditary discrepancy is not polylogarithmic; in fact it is $n^{1/O(\log \log n)}$, which, as the upper bound in Gil’s post shows is tight up to the constant in the exponent.

Below I am writing a sketch of how the proof goes. Since the proof itself got a bit long, we wrote a note: you can find it here.

(1) Consider the following set system of boolean subcubes, let’s call it $\mathcal{S}^d$. Each element of the ground set is identified with a boolean vector $u \in \{0, 1\}^d$. We get each set from fixing some of the coordinates and leaving the other ones free. That is, each set $S_v$ is associated with a vector $v \in \{0, 1, *\}$ and consists of those $u \in \{0, 1\}^d$ for which $u_i = v_i$ whenever $v_i \neq *$.

(2) We construct a set of positive integers $B$ by associating an integer to each boolean vector $u \in \{0, 1\}^d$ so that each $S_v \in \mathcal{S}^d$ corresponds to a HAP restricted to $B$. This is not hard and is similar to the construction in the post. We pick the first $2d$ primes $p_{1, 0}, p_{1, 1}, \ldots, p_{d, 0}, p_{d, 1}$ and associate $\prod{p_{i, u_i}}$ to each $u \in \{0, 1\}^d$. The set $S_v$ corresponds to the HAP with difference $\prod_{v_i \neq *}{p_{i, v_i}}$.

(3) We show that $\mathsf{herdisc}(\mathcal{S}^d) = 2^{\Omega(d)}$

This proceeds in two steps:

3.1 Use the determinant lower bound of Lovasz, Spencer, and Vesztergombi to show that the matrix of Fourier characters of $\mathbb{F}_2^d$ of weight $\alpha d$ ($\alpha <1$ a constant) has hereditary discrepancy ${d \choose \alpha d}$. Call this matrix $G$

3.2. Show that $\mathsf{herdisc}(G) \leq 2^{\alpha d} \mathsf{herdisc}(\mathcal{S}^d)$. This goes by writing a character of weight $\alpha d$ as a linear combination of $2^{\alpha d}$ indicator vectors of subcubes. This is possible since a character of weight $\alpha d$ is a function of $\alpha d$ components.

We can choose $\alpha$ so that 3.1 and 3.2 imply (3).

11. From Discrepancy to Privacy, and back « Windows On Theory Says:

[…] the EDP, it is natural to ask how large the hereditary discrepancy is. Alon and Kalai show that it is and at most . They also showed that for constant , it is possible to delete an […]

12. The Quantum Debate is Over! (and other Updates) | Combinatorics and more Says:

[…]  Kunal Talwar. See this post From discrepancy to privacy and back and the paper. Noga Alon and I showed that it is  and at most , and to my surprise Alexander and Kunal showed that the upper bound is […]

13. A Few Mathematical Snapshots from India (ICM2010) | Combinatorics and more Says:

[…] and Talwar of the hereditary discrepancy analog for Erdos’ discrepancy problem. See also this post in Gowers’s […]

14. Open Collaborative Mathematics over the Internet – Three Examples | Combinatorics and more Says:

[…] Doron’s introduction; 00:43-4:00 On the screen: Erdős discrepancy problems: showing this post (EDP22) from Gowers’s blog. I talked  about the chaotic nature of mathematics on the Internet. Then I explained what are […]

15. Practically P=NP? | Gödel's Lost Letter and P=NP Says:

[…] in Logic and Computation. They have recently made important progress on the Erdős Discrepancy Problem using computer tools from logic. Unlike the approach of a PolyMath project on the problem, their […]