EDP23 — second guest post by Gil Kalai

The Large Deviation Heuristic: an example – triangle-free graphs

Here is a very general probabilistic-based heuristic that seems to give good predictions for questions related to EDP. I will refer to this heuristic as “LDH”. (In my polymath5 comments I referred to it as PH – probabilistic heuristic)). I am thankful to Noga Alon and to Yuval Peres for some helpful help.
Here is an example: Suppose we want to study the following extremal problem.

What is the largest number of edges in a graph on n vertices with no triangle.

If we use the probabilistic method we can ask what is the probability that a random graph in G(n,m) contains no triangle. As long as this probability is positive we know that a triangle-free graph with n vertices and m edges exists. (Being a little careful we can consider G(n,p) instead of G(n,m) where m=p{{n}\choose {2}}. Looking at random graphs gives us a perfectly correct proof of the assertion that there are triangle-free graphs with n vertices and Cn edges for every C.


1) Estimate naively the probability that a random graph in G(n,m) contains no triangle.

2) Choose m so that this estimated probability behaves like 1 over the number of graphs with n vertices and m edges.

So let’s implement this plan. The probability that a random graph in G(n,p) does not contain a specific triangle is 1-p^3. Naively assuming that these probabilities are independent we estimate the probability of not having any triangle as (1-p^3)^{{n}\choose {3}}. We want to find p so that this probability is roughly 2^{-H(p){{n}\choose{2}}}. This is the case when roughly p=n^{-1/2}.

LDH prediction for Mantel-Turán problem: The maximum number of edges in a triangle-free graph behaves like n^{3/2}.

In other words, the LDH gives two predictions that we will refer to as the “firm” prediction and the “weak” prediction.

A) (The firm prediction.) There exist triangle-free graphs with n vertices and n^{3/2} edges


B) (The weak prediction.) There are no (substantially) larger triangle-free graphs.

Prediction A is correct. There are indeed triangle-free graphs with n vertices and n^{3/2} edges. (But the LDH does not prove their existence.) Prediction B is miserably false: Actually there are graphs with n^2/4 edges without a triangle. The LDH heuristic ignores the fact that not containing one triangle is not independent of not containing another, and is completely blind to large bipartite graphs.

Excercise: What is the LDH prediction for the question: How large can a subset of the integers {1,2,…,n} be if it contains no 3-term arithmetic progression?

We would like to propose the following points regarding the large deviation heuristic:

  • LDH predictions about the existence of some combinatorial objects are quite often true. (We refer to such predictions as the firm predictions.)
  • LDH weak predictions are blind to various structured examples. Sometimes, if we understand the relevant structures we can update the large deviation predictions. (I will come back to this at the end of the post.)
  • LDH predictions appear to be quite good for the Erdős Discrepancy Problem (EDP) and for variations of EDP. This is what we are going to discuss now.

Since LDH is based on a heuristic method to compute probabilities it is quite possible that different heuristics will give different answers but, overall, we did not encounter this.

Problem 1: Find natural examples where the LDH firm prediction, namely the prediction for the existence of certain combinatorial objects, fails.

Problem 2: Find ways to improve the LDH when it fails. (Especially when the answer is known by other methods).

What does LDH predict for EDP?

Direct Computation

We consider the multiplicative version of Erdős Discrepancy Problem . The number of multiplicative \pm 1-sequences of length n is close to 2^{-n/\log n}. (These sequences are determined by their values on prime indices and the Prime Number Theorem tells us that the number of primes smaller than n behaves like n/\log n.) We expect that the LDH computations for the general question will give the same answer.

What we need to compute is:

What is the value of K=K(n) so that the probability p that all partial sums \sum_{i=1}^r\epsilon_i of a random \pm 1 sequence of length n belong to the interval [-K,K] satisfies p=2^{-n/\log n}?

This question can be formulated in terms of a simple random walk of length n on the line. We start at the origin and at each step we go a unit length to the left or to the right with probability 1/2 for each direction. We want to know the probability that the random walk will be confined to the interval [-K,K] and the value of K for which this probability is 2^{-n/\log n}.

Problem 3 (the answer is known): What is this value K=K(n)?

Towards answering problem 2, I asked over Mathoverflow “What is the probability that a random walk of length n will be confined to the interval [-K,K]” and Douglas Zare provided a very detailed answer. Yet, I did not complete the work needed to answer Problem 2. So a few weeks ago I asked Yuval Peres

What is the probability that the simple random walk of n steps will be confined to the interval [-K,K], and what is the value of K=K(n) for which this probability is 2^{-n/\log n}?

And a few hours later I received the following reply from Yuval:

Gil, the confinement probability in [-K,K] decays up to a constant like \exp(-cn/K^2) where c is known: it is \pi/2. This is classical and you can find it e.g. in Feller volume 2 or in Spitzer’s book. This holds for all K =o(\sqrt{n}). So the answer to your query is that K=C\sqrt{\log n} for a suitable C.

So, we get:

The LDH prediction for the EDP is that the maximum discrepancy of a multiplicative sequence of \pm 1 behaves like \sqrt {\log n}. (The same prediction applies to general sequences.)

General sequences and positive correlations.

If we consider general \pm sequences we get the same answer. We need to compute the probability that for a random sequence of length n, all HAP are confined to K. These HAP are random sequences of lengths n, n/2, n/3,\dots . The probability that the initial sums of a random sequence of length n/r is confined to [-K,K] is \exp(-cn/rK^2). If we assume that these probabilities are independent we get an estimate of \exp(-c n\log n/K^2) which behaves like 2^{-n} when K=\sqrt {\log n}. We get the same outcome. Of course the confinement of different HAPs to [-K,K] are not independent. In fact they seem positively correlated and this strengthens the case for the firm prediction. But I don’t know how such positive correlation can be used to prove that the firm prediction is correct. (Of course, the feeling that small discrepancy on different HAP are positively correlated is rather tentative. We know that for every individual HAP we have with positive probability discrepancy bounded by 1. Yet the probability that this happens for all HAPs is zero.)

Problem 4: Fix two positive integers r,s. For a function f:\{1,2,\dots n\}\to\{-1,1\} consider the two events

1) For a maximal HAP of gap r the initial sums of the function are confined to [-K,K].

2) For a maximal HAP of gap s the initial sums of the function are confined to [-K,K].

Are these two events positively correlated when n is large enough?

Indirect computations

At an earlier time, I had a version of the LDH based on gaps between vanishing partial sums. Let me discuss it here. We start with the following question: What is a probability that a sequence of \pm 1 of length t will not have a vanishing partial sum \sum_{i=1}^r\epsilon_i=0 where t/2\le r\le t? Another way to ask this question is:

What is the probability that a simple random walk of length t will not reach zero in the interval [t/2,t]?

For our purposes all we need to know is that this probability tends to some constant strictly between 0 and 1. The precise value is related to a classic question and let me cite another email by Yuval Peres about it:

The probability that a simple random walk will not meet 0 in the time interval [s,t], where s=xt, tends as t \to \infty to (2/\pi)\arcsin(\sqrt{x}). This is one of the two classical arcsine laws for random walks that you can find in many sources, including e.g. Durrett’s book or proposition 5.7 page 137 in My Brownian book . There you will see this law applies to all random walks with increments of mean zero and finite variance. More combinatorial arguments for the special case of SRW can be found in Feller vol I, as well as in these slides.

Given a random walk on the line we will try to estimate the probability that we can find sequence of indices i_1,i_2,\dots for which the random walk reaches the origin so that the differences between consecutive indices is between t/2 and t and compute the value of t for which this probability is 2^{-c_1n/\log n}. Since the probability for a random walk of t steps to reach the origin between the t/2th and tth steps is a certain constant we have the following LDH predictions:

(Firm prediction) There exists a multiplicative sequence of length n such that the gap between two consecutive vanishing partial sums is (up to a multiplicative constant) at most \log n.

(Weak prediction) This is best possible.

Some additional gymnastics allow us to move from the prediction regarding multiplicative sequences of length n where all the gaps between consecutive zeros behave like \log n to sequences where, in addition, the maximum discrepancy behaves like \sqrt {\log n}. The probability that between every two consecutive zeros the maximum discrepancy will be smaller than K\sqrt{\log n} is indeed small but if K is large this too behaves like 2^{-c_2 n/\log n} so the indirect applications of the LDH based on gaps between consecutive zeroes gives the same prediction as the direct prediction above.

Here too the computation for general sequences gives the same outcome and indicates positive correlation between events which in the heuristic are pretended to be independent.

(Let me remark that over the polymath5 threads there were several remarks in the direction of trying to show that there are no \pm 1 sequences of length n where the differences between consecutive vanishing partial sums are bounded for all HAP. This is weaker than what is required to show that the discrepancy is unbounded.)

Variations of EDP and related discrepancy problems:


Let me mention briefly several variations of EDP and what LDH says about them. The LDH is responsible for the guesses we propose in the previous post for variations E1-E8 of EDP.

The weak LDH predictions will not change if we consider \pm 1,0 sequences where the zero entries occupy the indices divisible by 3. But the prediction fails in this case (the discrepancy can be bounded).

If we consider only HAPs with prime power differences the LDH prediction is the square root of \log \log n. I would guess that this is the true behavior. Note that if we consider HAPs with prime differences, we have the same LDH prediction, but since we can have bounded discrepancy the weak LDH prediction is false.

The firm prediction of the LDH predicts a polylog(n) discrepancy (in fact even \sqrt{\log n} discrepancy) when we restrict our attention to square free integers and to random subsets of integers.

Probabilistic analogs for EDP

Problem 5: Does the LDH give good predictions for discrepancy problems for random hypergraphs H and G $on {1,2,…,n}?

Let \cal H be the hypergraph in which we consider precisely one edge E_r which contains every element with probability 1/r. Let \cal G be the hypergraph obtianed fron \cal H by adding as edges to all initial segments of edges in \cal H. The LDH predicts that the discrapancy of $\cal H$ is bounded. When we move from {\cal H} to {\cal G} then we come back to the \sqrt (\log n) prediction. Such probabilistic versions take away the number-theoretic aspect from EDP. Still the probabilities of low discrepancy for different edges are not independent and the problem still looks hard.

LDH and the six standard deviation theorem

Of course, it is of interest to understand the LDH predictions (both for \cal H and \cal G) for the general case when we have m probabilities p_1,p_2,\dots , p_m and the edges are m random subsets based on these probabilities. The most famous case is when we consider all p_is to be 1/2 and m=n. Joel Spencer’s Six Standard Deviation Theorem asserts that the discrepancy for n random subsets of \{1,2,\dots,n\} is at most 6 \sqrt n. Nore generally it was proved that the discrepancy of a random hypergraph with m edges behaves like \sqrt m when m \le n and like \sqrt{n \log(m/n)} for m \ge n.

Roth’s theorem and how the LDH predicts the answer

Recall that Roth’s theorem is about the discrepancy of the hypergraph whose edges correspond to all arithmetic progressions in {1,2,…,n}.

The LDH predicts the correct answer, at least roughly. The probability that a maximal AP of gap r will be confined to [-K,K] is \exp(-cn/K^2). When K=c\cdot n^{1/4} we have to multiply the rth power of \exp(-cn^{1/2}/r), for r=1,2,\dots, \alpha\sqrt n, where \alpha is small and this will give us c^{-n} for c, 1<c<2. The contribution coming from APs of larger gaps will be of a smaller order of magnitide.

Problem 6: Are the methods of proving upper bounds for the disrepancy problem for APs relevant for proving better upper bounds for EDP, its extensions, and its variations?

Large deviations for extremal graph properties revisited

Graph limits and the work of Chatterjee and Varadhan

We started this post by trying to answer a classical problem in extremal graph theory using a probabilistic heuristic which is based on large deviation estimates. It would not be irresponsible to say that the heuristic estimates proposed a very poor prediction to the extremal problem we considered.

Let H be a fixed graph and let T(n,H) be the maximum number of edges for a graph G on n vertices that does not contain H as a subgraph. Turán’s 1941 theorem determined the value of T(n,H) when H is a complete graph with r vertices. Turán’s theorem is the starting point of the wide and deep area of extremal graph theory. The case of triangle-free graphs goes back to Mantel in 1907.

One of the important discoveries in graph theory which is related both to additive number theory and to probability theory is the notion of limits of graphs. This notion is connected to the famous Szemeredi lemma. The theory of limits of graphs sheds new light on extremal graph theory; in a sense it tells us what the relevant structures for T(n,H) are when H is not bipartite.

A recent paper entitled The large deviation principle for the Erdős-Rényi random graph by Sourav Chatterjee and S. R. S. Varadhan revealed a connection between large deviation for properties of Erdős-Renyi graphs and graphons – limits of graphs. (It complements a large body of related results obtained by various other methods.) Here is the abstract.

Abstract: What does an Erdős-Renyi graph look like when a rare event happens? This paper answers this question when p is fixed and n tends to infinity by establishing a large deviation principle under an appropriate topology. The formulation and proof of the main result uses the recent development of the theory of graph limits by Lovasz and coauthors and Szemeredi’s regularity lemma from graph theory. As a basic application of the general principle, we work out large deviations for the number of triangles in G(n,p). Surprisingly, even this simple example yields an interesting double phase transition.

So we can “morally” understand why the weak prediction of LDH fails for the property of “including a triangle”. To obtain a better prediction we also need to condition on various relevant limit structures of the random graph.

LDH for the triangle-free process

Consider the following graph process. We start with the empty graph on n vertices and add random edges one after the other conditioned on not forming a triangle. Bohman proved that such a process for n=ct^2/\log t will lead with substantial probability to a triangle-free graph not containing an independent set of size t. This proved a conjecture of Joel Spencer and gave a new proof to a famous result by Jeong Han Kim (with a remarkable history that I won’t describe here). Can we apply the LDH to the triangle-free processes to give a heuristic argument why triangle-free graphs on n vertices with \Omega (n^2) edges exist?

Extremal problems for bipartite graphs

Problem 7: Let H be a fixed bipartite graph. Does the LDH give good predictions for T(n,H)?

It turns out that for extremal problems on graphs the LDH gives quite similar predictions to those obtained by the rigorous well known edge-deletion method based on the following proposition:

Proposition: If, for a random graph with n vertices and 2m edges, the expected number of copies of H is smaller than half the number of edges, then T(n,H)>m.

For Turán-type problems, the LDH’s predictions are quite similar to those obtained by the edge-deletion method. (I am not aware of a similar trick for discrepancy problems.)The LDH predicts the existence of H-free graphs with roughly \log n times more edges than what the edge-deletion method gives. Achieving an improvement of similar kind is a difficult and well-known problem in extremal combinatorics. See the paper: T. Bohman and P. Keevash. The early evolution of the H-free process. Inventiones Mathematicae, 181, 291–336, 2010.

The LDH prediction are still far from the correct answer. Erdős conjectured that for any bipartite H with degeneracy r the Turán number is at most O(n^{2-1/r}), and that the Turan number is O(n^{3/2}) iff the graph is bipartite and 2-degenerate. For more on that see, e.g., N. Alon, M. Krivelevich and B. Sudakov, Turan numbers of bipartite graphs and related Ramsey-type questions, Combinatorics, Probability and Computing 12 (2003), 477-494.

10 Responses to “EDP23 — second guest post by Gil Kalai”

  1. Gil Kalai Says:

    “Few weeks ago” (my correspondence with Yuval) refers to July/August 2011..

  2. Sasho Nikolov Says:

    Hi Gil,

    Unless I compute wrong, LDH gives prediction O(1) both for the discrepancy of 2 and the discrepancy of 3 permutations, right? (Correct in the case of 2 and wrong in the case of 3.) I think Spencer had a similar intuition why those discrepancies should be in fact constant.

    But if you look at the permutations problem and how the LDH estimate is computed, LDH assumes 3 independent random walks, while in fact in choosing the permutations we have quite a lot of freedom in correlating the walks. A lot more freedom than we do with HAPs.

  3. Gil Kalai Says:

    Dear Sasho,
    To the readers: Here is a link with a description of the 3-permutation problem http://gilkalai.wordpress.com/2011/08/29/alantha-newman-and-alexandar-nikolov-disprove-becks-3-permutations-conjecture/

    What you say is very interesting. I will try to check the computations (but like other heuristics LDH can be creatively adjusted at times) In a sense this is a question with one higher level of complication since we have a question on the maximum over a family of hypergraphs.

    Is the situation for random permutations known (or even obvious)?

    • Alantha Says:

      To my knowledge, it is not known what is the discrepancy of a set system based on 3 (or more) random permutations. (We can let the first permutation be the identity and then pick two other random permutations.)

      The pattern from the three permutations with high discrepancy might appear as an induced pattern if we choose long enough random permutations, but it is not clear if the other elements will cancel out the discrepancy from the pattern.

    • Sasho Nikolov Says:

      My wild guess is that LDH may be closer to the truth for O(1) random permutations. I.e., maybe three randomly permuted simple random walks still have their prefix sums mostly uncorrelated and the discrepancy is actually constant. But I do not know how to formalize anything like this.

  4. Gil Kalai Says:

    Another major recent breakthrough in discrepancy theory is the paper Constructive Algorithms for Discrepancy Minimization by Nikhil Bansal http://arxiv.org/abs/1002.2259 . A related important more recent paper is “The determinant bound for discrepancy is almost tight” by Jirka Matousek http://arxiv.org/pdf/1101.0767v1 .
    The linear-algebraic notions of discrepancy discussed in these papers may be relevant to various issues of the EDP project.

  5. Gil Kalai Says:

    Regarding the greedy algorithm considered in the previous post. The very nice answer by rlo suggests that this greedy algorithm gives discrepancy close to n^{1/3} or so, and rlo expect it also for the square free variation. Can an upper bound of n^{1/2-\epsilon} be proved? What about a lower bound of n^{\epsilon} .

    Another interesting question is if you can improve the greedy algorithm to get lower discrepancy. Our greedy ignore 0’s in intervals. A greedy algorithm that ignore intervals with 0’s was considered in earlier polymath5 threads and to the best of my memory achieve discrepancy n^{1/2} . Maybe a clever interpolation between these two variants will do a better job than both?

  6. pavl Says:

    diaskedastik0.blogspot.gr help me out with any new ideas

  7. EDP Reflections and Celebrations | Combinatorics and more Says:

    […] D(n) tends to infinity but how fast? It is reasonable to think that D(n) behaves asymptotically like  . (This is suggested, among other reasons, by certain probabilistic heuristics from EDP23.) […]

  8. Paul Balister, Béla Bollobás, Robert Morris, Julian Sahasrabudhe, and Marius Tiba: Flat polynomials exist! | Combinatorics and more Says:

    […] and also on related results in Discrepancy’s theory, see this post and this one and also this one on Gowers’s […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: