## A conversation about complexity lower bounds, IV

This is the fourth part of a dialogue between three characters 🙂 (a mathematical optimist who likes to suggest approaches to problems), 8) (a more sceptical mathematician who knows a small amount of theoretical computer science), and 😐 (an onlooker who occasionally asks for more detail, and occasionally makes comments about what 🙂 and 8) say). They are discussing the possibility of proving lower bounds for computational complexity. 🙂 wanted at first to prove that P$\ne$NP, but has now been persuaded that proving a lower bound for formula size is a more realistic target, at least for the time being, and is still a major unsolved problem. In part II of the conversation, 🙂 proposed trying to show that the dual of the $U^k$ norm of a low-complexity function cannot be too large, at least when $k$ has order of magnitude $\log n$. Numerous problems have been identified with this approach, but the idea has not yet been completely killed off. We return to the conversation as a new topic is introduced. This instalment is shorter than average, so I’ll probably post the next instalment sooner than average.

*****************************************************

8) You know, something has been bothering me about our discussion so far.

🙂 You’ve made that pretty clear.

8) I’m not talking about the fact that you don’t have any heuristic argument in favour of the $U^{C\log n}$ dual norm being a useful complexity measure, but about something else that would be equally fundamental to any lower bound proof.

🙂 What’s that?

8) Well, you haven’t said how you would go about proving that some function in NP has a large norm.

🙂 Oh yes, I’m sorry about that. I think I did briefly allude to that question but I never got round to telling you about my thoughts on it. Let me quickly do that and then we can go back to trying to decide whether this proposed complexity measure stands any chance of being a good one.

Clearly a major decision is what NP function to take, and I’ve got a candidate for that. It’s not the only function of its kind, but it’s the one I like best. It’s based on the following very nice question in complexity theory. It is known, at least if P$\ne$NP, that finding cliques in graphs is a hard problem. In fact, it is a very hard problem: there is not even a polynomial-time algorithm that will output $1$ if the graph contains a clique of size $n^{999/1000}$, $0$ if it does not contain a clique of size $n^{1/1000}$, and anything if the maximum clique size lies between these two bounds. In other words, even finding a very crude approximation to the maximum clique size is hard. (This result is proved using the famous PCP theorem of Arora, Lund, Motwani, Sudan, and Szegedy, which built on earlier work of several other authors.) But there’s a more specific problem that seems to be wide open. Take a random graph and either add a clique of size $m$ to it, or don’t. Is there a polynomial-time algorithm that can tell whether you have done so?

If $m$ is bigger than $\sqrt{n\log n}$ then it is an easy exercise to find the clique. (For instance, one can look for vertices of abnormally high degree and work from there.) If $m$ is bigger than $\sqrt n$ (the number of vertices in the graph), then more sophisticated spectral methods have been used to find it. (See this paper by Alon, Krivelevich and Sudakov. See also the interesting discussion of the problem in the introduction to a paper of Feige and Krauthgamer.) But if $m$ is, say, $n^{1/3}$, then nobody has the faintest idea how to find it, and if $m$ is something like $3\log_2n$, then the problem looks pretty hopeless.

😐 How big do you expect the largest clique to be in a random graph?

🙂 Oh, sorry. Well, all that matters is that it is at most $2\log_2n$. That’s quite easy to show. But in fact it is known to be close to this with high probability.

Why does it look hopelessly hard to detect a clique of size $m$ when $m$ is substantially smaller than $\sqrt{n}$? Well, a random graph with a small clique added to it looks so like a random graph that it is hard to imagine an algorithm (other than the obvious inefficient brute-force algorithm) that would tell the difference.

So it is reasonable to hypothesize that there is no such algorithm, and to try to exploit the apparent randomness of a random graph with an added small clique.

In some sense, one would like to prove that the set of all random graphs with an added clique is pseudorandom: that is, that one cannot tell the difference between it and the set of all random graphs, which, when you think about it, just means the set of all graphs. But for a very simple reason that is not true in the sense in which I defined pseudorandom sets earlier. Recall that I defined a subset $X$ of $\{0,1\}^n$ to be pseudorandom if for every polynomially computable Boolean function $f$, the probability that $f(x)=1$ is almost exactly equal to the probability that $f(x)=1$ given that $x\in X$. And “almost exactly equal to” means “closer to than $1/p(n)$ for any polynomial $p$“. But here we can take $f(x)$ to be $1$ if the number of edges in the graph is at least $\frac 12\binom n2$ and $-1$ otherwise. Then if $\binom n2$ is odd (which we could assume for simplicity), the probability that $f(x)=1$ is exactly $1/2$. But if we’ve added a random clique of size $m$ to the graph, then the probability that $f(x)=1$ is bigger than $1/2$. The difference isn’t all that great, but it’s certainly a power-type difference. (Even adding a single edge will affect the probability by at least $1/n^2$.)

What has gone wrong? Well, it is important not to confuse the following two problems.

1. You are given a graph and told that either it has been generated randomly or it has been generated randomly and then a clique of size $m$ has been added to it. Is there a polynomial-time algorithm that will tell you, with probability at least $3/4$, which is the case?

2. You are given a sequence of graphs and told that either they are all generated randomly or they are generated randomly and then enriched by the addition of cliques of size $m$. Can you tell in polynomial time which of the two possibilities holds?

The answer to 2 is trivially yes: you just count the edges, and a small bias will show up if the cliques are added. But what I am interested in is question 1. And counting edges gets you nowhere in that question, since when $m$ is substantially smaller than $\sqrt{n}$ the extra edges you get when you add a clique of size $m$ can easily be outweighed by random fluctuations in the number of edges of the random graph.

All right, now that that’s sorted out, let’s consider the Boolean function $f$ that is defined on all graphs with $n$ vertices. Actually, let me change my notation. I’ll call the number of vertices $r$ and I’ll set $n=\binom r2$. Then I can identify Boolean functions on $\{0,1\}^n$ with Boolean functions defined on the set of all graphs with $r$ labelled vertices.

😐 How do you do that?

🙂 The details aren’t too important. You just define some natural bijection $\phi$ between the numbers $\{1,2,\dots,n\}$ and the set of all subsets of $\{1,2,\dots,r\}$ of size 2. Then, given any $01$ sequence $x$ of length $n$, you define $G(x)$ to be the graph whose edges are all pairs $\phi(i)$ such that $x_i=1$. This is obviously a bijection. Then for any Boolean function $f$ on $\{0,1\}$ you define a function $f'$ on all graphs $G$ by taking $f'(G(x))=1$ if $f(x)=1$ and $f'(G(x))=-1$ if $f(x)=-1$.

😐 Sorry — I now see that it was indeed obvious.

🙂 That’s quite OK. Now let’s define $f(G)$ to be $1$ if $G$ contains a clique of size $m$ and $-1$ otherwise. That’s a classic example of a function in NP.

8) Indeed it is. But it’s not remotely quasirandom. For one thing, it takes the value $-1$ almost all the time. So even its $U^1$ norm is large!

😐 What’s the $U^1$ norm?

🙂 It’s not hard to see that the appropriate definition is $\|f\|=|\mathbb{E}_xf(x)|$. It is not in fact a norm but a seminorm (meaning that it can sometimes be zero but otherwise satisfies the usual properties of a norm). It is not too hard to show that $\|f\|_{U^k}$ increases as $k$ increases, so the larger $k$ is, the stronger a quasirandomness statement it is to say that $\|f\|_{U^k}$ is small. So if even the $U^1$ norm is large, $f$ is not in any sense quasirandom.

So why am I still smiling? Because this is precisely what I meant when I said that dual norms had other advantages. Even though this function is not quasirandom, we can still try to show that it correlates with a quasirandom function.

For instance, if you’re worried about the negative bias, then you can get rid of it in a very standard way: you replace $f(x)$ by the function $g(x)=f(x)-\mathbb{E}f$. This function has $U^1$ norm equal to zero, but $\langle f,g\rangle=\langle f,f\rangle-(\mathbb{E}f)^2=1-\mathbb{E}f^2,$ which is non-zero. Therefore, $\|f\|_{U^1}^*=\infty$.

That’s not a huge result, but it shows, in a trivial way, how proving that dual norms are big can be more feasible than trying to construct functions for which the original norms are small.

😐 Are you saying that once you’ve added a constant to your function $f$, you’ve got something quasirandom that correlates with $f$?

🙂 No, unfortunately not. The reason is that adding a clique of size $m$ messes up more than just the expected number of edges of $G$. For instance, it increases the expected number of triangles, and moreover it does so by more than can be accounted for just from the fact that the expected number of edges has increased. That’s because the edges you add to a random graph are bunched up into a clique.

😐 Why is that a problem?

🙂 Well, it tells us, for instance, that the average of $g$ in the set $\{G:12,23,13\in G\}$ is bigger than zero by some power of $n$. That will give a slight kick to the $U^2$ norm, since a function with tiny $U^2$ norm must have tiny average on any subspace of small codimension.

😐 What can you hope to do about that?

🙂 Well, there are all sorts of tricks one could imagine. I was hoping to make tiny adjustments to $g$ so that it became balanced again. For instance, in order to deal with the $U^2$ norm it might be possible just to tinker with the Fourier coefficients that correspond to small subsets of $\{1,2,\dots,n\}.$ Of course, for higher $U^k$ norms, we don’t have an adequate Fourier analysis (especially when $k$ is unbounded!), so we would have to argue more directly.

In the $U^2$ case, one could also give a pretty direct Fourier argument. The idea would be simply to compute the Fourier coefficients of $f$. Recall that if $A$ is a subset of $\{1,2,\dots,n\}$, then $\hat{f}(A)=\mathbb{E}f(x)(-1)^{\sum_{i\in A}x_i}.$ So $\hat{f}(A)$ measures the effect it has on the expectation of $f(x)$ if you know that the parity of the restriction of $x$ to $A$ is even. Note that in this case $A$ is a graph on $r$ vertices, so what we are interested in is the effect it has on the probability that a graph $G$ contains a clique of size $m$ if you condition on the parity of $G\cap A$. If $A$ is large, then one would expect this effect to be tiny, and even more so if the edges of $A$ are “scattered”. But what matters is how tiny, and at the moment I don’t see how to estimate that. But if my hunch is right that the hidden-clique function has a large dual norm, then one would find that it lots of small but non-zero Fourier coefficients, just as for a random function, so that the $U^2$ dual norm is huge.

The point of doing such a computation is that it would be something of a reality check: if the $U^2$ dual norm turned out not to be huge, then that would be bad news for this approach.

If one did manage to show that it was huge, then the next step would be to try to construct a function $g$ with very small $U^2$ norm that correlated with $f$, and to do it without using Fourier analysis. That could be pretty challenging, but it also seems to be fairly important, because although there are generalizations of Fourier analysis that work for higher $U^k$ norms, the statements are quite weak for our purposes, especially when we are working in a field of bounded characteristic, and in any case they would not apply to unbounded $k$.

😐 I’m slightly worried about something here.

🙂 Oh yes? What’s that?

😐 Well, you seem to be wanting to solve an NP problem, and moreover one that probably isn’t in P.

🙂 You mean the problem of finding a function $g$ such that $\|g\|_{U^k}$ is small and $\langle f,g\rangle$ is large?

😐 Yes.

🙂 Yes, but the point is that I’m trying to solve just one instance of the problem. I’m not trying to find some all-purpose algorithm for constructing an appropriate $g$ given an arbitrary $f$. I’m hoping to make use of the particular nature of the clique-detecting function when I build $g$.

8) In fairness to :|, even though you are right that, strictly speaking, you are not trying to solve a difficult problem in NP, the observation that you are trying to solve an instance of such a problem does place some serious restrictions on what your argument could look like. For instance, if you ever seem to be coming up with a technique for creating $g$ that would apply in general, then you’re in deep trouble, at least if it could be done by means of a polynomial-time algorithm, since that would show that estimating the $U^k$ dual norm was in P, which would naturalize your proof technique.

🙂 I see what you mean. But that doesn’t completely rule out some general algorithmic approach. For example, perhaps one could come up with an approach for creating a good $g$ that made a superpolynomial number of adjustments to $f$. That might even be quite natural when $k$ is unbounded, for reasons similar to the reasons that I gave in my heuristic argument that the $U^k$ norm might not be computable in polynomial time, even by a randomized algorithm.

8) OK, another question. Do you have some philosophical reason to expect the clique function to have a large dual norm?

🙂 Yes. I gave it earlier. It’s because a random graph with an added clique looks pretty much like a random graph. So I think the clique function is pseudorandom in a certain sense. And that ought to be reflected in a large $U^k$ dual norm.

8) That’s a very vague argument and in any case I have a problem with it. It feels as though you’re confusing the pseudorandomness of the graphs that you get by adding a small clique with the pseudorandomness of the function that takes the value 1 when a graph contains an added clique. For example, if I define $f(G)$ to be $1$ for every graph, then the function I get is highly non-random — I think you’ll have to admit that.

🙂 Either that or as random as it’s possible to get, because it’s an exactly uniform distribution.

8) Well, you can say that if you like, but you have to admit that it does not have a large $U^k$ dual norm. And yet, if you use this $f$ to choose a random graph, then the graph you get will be very random — that’s a tautologous statement. So getting random-looking graphs as your output doesn’t guarantee you a large $U^k$ norm.

🙂 OK, I need to think about this a bit harder. I think the point is going to be that the example you have just given is essentially the only one. That is, if you have a pseudorandom subset $X$ of $\{0,1\}^n$ that is not the whole of $\{0,1\}^n$, then it must have a large $U^k$ dual norm. At least when $k=2$ that follows from Parseval’s identity: when $X=\{0,1\}^n$, the Fourier transform of the characteristic function of $X$ is concentrated at one point, the empty set, but if $X$ is a set of small density, then this large Fourier coefficient does not make up all the $L_2$ mass of $X$ so you have to have a lot of smaller, random Fourier coefficients, the squares of which add up to what you want, which makes the $4/3$ powers add up to something huge.

8) OK, that’s slightly better, but I still have a niggling worry.

🙂 What’s that?

8) It’s that I don’t see a good reason for the $U^2$ Fourier calculation working out the way you want it to. After all, the function is defined in a rather concrete way. Why shouldn’t that cause its Fourier coefficients to decay more rapidly than they would for a random function?

🙂 I have to admit that at this stage I don’t have a good answer to that question. I’m just hoping.

But let me try to give you some idea of how I thought one might be able to construct a function $g$ with small $U^k$ norm and good correlation with $f$.

Recall that our method of hiding a clique in a graph is rather crude: we just take a random graph and add a random clique of size $m$ (by randomly choosing $m$ vertices and joining all the pairs that are not yet joined). The crudeness of this approach means that, as I pointed out earlier, it does not result in a pseudorandom way of producing graphs. However, I am hoping that it has a property that one might call weak pseudorandomness, which is $\epsilon$-pseudorandomness for some $\epsilon$ that tends to zero, but not at a superpolynomial rate. This weak pseudorandomness means that if you are just given one graph with a clique added, you cannot tell that it isn’t random, but if you are given a sequence of such graphs, then after a while you can tell.

This is not completely relevant to us, because we do not know how to prove the weak pseudorandomness. If we did, we’d have a proof that P$\ne$NP. So let me get back to the main point.

I propose trying to define, using combinatorial ideas, a new clique-hiding technique that is less crude and feels as though it might give rise to a strong pseudorandom set.

Here’s a very simple idea of the kind of thing I mean. As we’ve seen, just brutally adding a clique to a random graph will increase the expected number of edges. So we could at the same time remove a few edges at random, just to hide that effect. This would give us a slight adjustment to the distribution, but it ought to correlate pretty well with it. Unfortunately, a graph produced in this slightly more sophisticated way will still have too many triangles (since adding the clique adds quite a few triangles, but removing random edges doesn’t remove quite so many). Perhaps a cleverer idea would be to add a random clique of size $m$ but at the same time add a random independent set of size $m$ (by removing a few edges elsewhere). Oh, but that doesn’t quite work, because increasing the number of empty triangles is not quite the same as decreasing the number of full triangles. So instead, perhaps one could do the following. Take a random graph. Add a random clique of size $m$. Weight each edge according to how many triangles it belongs to. Remove random edges, with probabilities depending on these weights, organizing things so that the expected number of edges and triangles you remove is the same as the expected number of edges and triangles you added when you put in the random clique.

This just scratches the surface of the kind of thing one could do. The aim would be to come up with a super-clever model of random graphs with added cliques, where one tries much harder to “hide” the cliques, and then to prove that the resulting probability distribution (which can be thought of as a function on the set of all graphs) has high $U^k$ dual norm. Note that this new model would not even be a Boolean function, so one cannot easily talk about its belonging to any complexity class. This demonstrates one of the advantages of duality that I was talking about earlier: I can try to devise a function that looks highly random, and I do not have the strong constraint that it should be a Boolean function in NP. It just has to correlate with one.

Another quick remark that I’d like to make is that it might be an advantage to make the cliques we hide in a random graph smaller. Perhaps if they are only just bigger than $2\log_2n$, the size you expect in the random case, then the set of all graphs with cliques hidden inside them would have positive density. That might avoid certain technical problems.

8) OK, I think I’m getting some idea of what you’re on about here. It looks like a very tough programme to carry out, and I still think you should check whether it is even true for $U^2$, but at the moment I can’t think of a way of showing that it is bound not to work.

🙂 I see the fact that it’s a tough programme to carry out as a positive feature: if it looked as though it was going to be easy, then I’d be very suspicious. And I haven’t forgotten your other objection, when you accused me of just drawing the $U^k$ dual norm out of a hat without any reason to suppose that it is small for functions of low complexity. So I’m not getting excited. I just think that the approach should be thought about further.

### 5 Responses to “A conversation about complexity lower bounds, IV”

1. Ryan O'Donnell Says:

Amin Coja-Oghlan’s recentish paper —

http://www.matheon.com/preprints/2713_jfind4.pdf

— also has a nice survey of the state of the art, including his own work showing easiness results (for $m \approx \sqrt{n}$) and hardness results (for $m < \ln n$) even in the semirandom model, where an adversary can delete any edges it wants from a random graph + planted $m-clique$ (except for edges from the planted clique, of course).

On a more philosophical note… You can of course find a clique of size $m$ in $n^m$ time, which is only quasipolynomial in $n$ if $m$ is logarithmic (or polylogarithmic). Indeed, as Coja-Oghlan points out, Alon-Krivelevich-Sudakov can find a planted $m$-clique in a random graph in time $n^{O(\log n)}$ for any $m = o(\sqrt{n})$.

For the purposes of lower bounds, one often (but not always) does not make a big distinction between polynomial and quasipolynomial time/size. (We don't believe SAT is solvable even in $2^{n^{1-\epsilon}}$ time, let alone $2^{\log^{O(1)} n}$ time.)

Are the characters in the dialogue suggesting proving a lower bound for problems which are solvable by quasipolynomial size circuits? Or would the optimist 🙂 hope to separate NP from quasipolynomial-size circuits as well?

• gowers Says:

That’s a good question. Ideally, 🙂 would want to prove that P$\ne$NP, but by this stage of the conversation anything new and exciting would do. However, there seems to be little prospect of that (even by the end of the tenth instalment).

I still think there might be a more realistic Polymath project in there somewhere. Perhaps when I get to the end I’ll collect together all the questions that seem interesting and replace the old proposal 5 by a set of proposals 5a to 5d (or whatever letter I get up to).

And thanks for drawing my attention to that paper — I look forward to checking it out.

• gowers Says:

Sorry, I didn’t really answer your question properly. What I mean is that 🙂 would be ecstatic to find a quasipolynomial function that didn’t have a polynomial-time algorithm, even if finding a superquasipolynomial lower bound for an NP function would be more intellectually satisfying.

I presume the $n^{O(\log n)}$ algorithm you refer to is based on showing that if you know there’s a clique of size $m$ and you find all the cliques of size $100\log n,$ then almost certainly their union (or something pretty close to their union) is the clique of size $m.$ Is that roughly the idea? If you changed the probability so that the expected size of the biggest clique was $n^{1/4},$ say, would that make it much harder to find a clique of size $100n^{1/4}$?

• Gil Kalai Says:

Slightly related (But in the AC_0 shallower water): I think it is a very interesting problem to show that the Boolean function f describing the property: “a graph on n vertices has a clique of size 2 log n” cannot be approximated by a function in AC_0 (a bounded depth polynomial size Boolean circuit). It is known that f itself is not in AC_0 (Paul Beame was the first to show it I think). Of course depth 2 suffices if you allow quasi polynomial circuits. (Actually I am not aware of any function computable in quasi-polynomial size bounded depth circuits where it is known that the function cannot be approximated by a function in AC_0.)

2. Troy Lee Says:

I wanted to mention an approach using dual norms to show lower
bounds in the model of communication complexity. Maybe the analogy is helpful.

I am thinking the multiparty number-on-the-forehead
model of communication complexity. In this model k-players wish to evaluate
a Boolean function $f(x_1, \ldots, x_k)$ where each $x_i$ is a n-bit string. In what follows
think of the output of f as being $+1$ or $-1$. Player
$i$ receives as
input all of the $x_j$ except $x_i$ (which is “written on his forehead”). The question is
how many bits the players need to communicate in order to evaluate the function.
It is usually assumed that the communication is broadcast and known to all players.
So the required communication will be a number between 0 and n.

The norm that has been used in this setting is quite similar in spirit to the $U_k$ norm. The “k”
version of the norm is applied to a k-argument function f and is defined as follows.

$\displaystyle \|f\|_k^{2^k}= \Ex_{x_1^0, x_1^1, \ldots, x_k^0,x_k^1} \prod_{\ell \in \{0,1\}^k} f(x_1^{\ell_1},\ldots, x_k^{\ell_k})$

Babai, Nisan, and Szegedy showed that if a function has small such norm, then it requires a lot
of communication. To show that f is hard, it also suffices to find some g such that g correlates with
f and g has small norm—in other words, one can lower bound the dual norm $\| \cdot \|_k^*$.

More precisely, the way this is applied to the communication problem is that
$\displaystyle CC_k(f) \ge \log (\|f\|_k^* / 2^{kn})$
where $2^{kn}$ is a normalization factor I have hopefully gotten correct, and $CC_k$ is the k-party communication complexity.
The “set intersection” function, determining if there is a position where
all the $x_i$ share a 1, is an example of a function where the
norm is relatively large, yet one can still show a strong lower bound via the dual norm approach.

The main drawback with this technique is that it gives lower bounds of
size at most $n/2^k$, and so becomes trivial for $k=\log n$ many players. This is where
things start to get most interesting for applications to circuit complexity. Combining a result of Hastad-Goldmann with a circuit simulation result of Beigel-Tarui (building on Yao and Allender), any function which can be
computed by a polynomial size, constant depth circuit with AND, NOT and mod m gates
has a number-on-the-forehead protocol with polylog many players and polylog communication.
Needless to say, we currently don’t know of any explicit function hard for this class of circuits (even
just with mod 6 gates).