This is the fourth part of a dialogue between three characters
(a mathematical optimist who likes to suggest approaches to problems), 8) (a more sceptical mathematician who knows a small amount of theoretical computer science), and
(an onlooker who occasionally asks for more detail, and occasionally makes comments about what
and 8) say). They are discussing the possibility of proving lower bounds for computational complexity.
wanted at first to prove that PNP, but has now been persuaded that proving a lower bound for formula size is a more realistic target, at least for the time being, and is still a major unsolved problem. In part II of the conversation,
proposed trying to show that the dual of the norm of a low-complexity function cannot be too large, at least when
has order of magnitude
. Numerous problems have been identified with this approach, but the idea has not yet been completely killed off. We return to the conversation as a new topic is introduced. This instalment is shorter than average, so I’ll probably post the next instalment sooner than average.
*****************************************************
8) You know, something has been bothering me about our discussion so far.
You’ve made that pretty clear.
8) I’m not talking about the fact that you don’t have any heuristic argument in favour of the dual norm being a useful complexity measure, but about something else that would be equally fundamental to any lower bound proof.
What’s that?
8) Well, you haven’t said how you would go about proving that some function in NP has a large norm.
Oh yes, I’m sorry about that. I think I did briefly allude to that question but I never got round to telling you about my thoughts on it. Let me quickly do that and then we can go back to trying to decide whether this proposed complexity measure stands any chance of being a good one.
Clearly a major decision is what NP function to take, and I’ve got a candidate for that. It’s not the only function of its kind, but it’s the one I like best. It’s based on the following very nice question in complexity theory. It is known, at least if PNP, that finding cliques in graphs is a hard problem. In fact, it is a very hard problem: there is not even a polynomial-time algorithm that will output
if the graph contains a clique of size
,
if it does not contain a clique of size
, and anything if the maximum clique size lies between these two bounds. In other words, even finding a very crude approximation to the maximum clique size is hard. (This result is proved using the famous PCP theorem of Arora, Lund, Motwani, Sudan, and Szegedy, which built on earlier work of several other authors.) But there’s a more specific problem that seems to be wide open. Take a random graph and either add a clique of size
to it, or don’t. Is there a polynomial-time algorithm that can tell whether you have done so?
If is bigger than
then it is an easy exercise to find the clique. (For instance, one can look for vertices of abnormally high degree and work from there.) If
is bigger than
(the number of vertices in the graph), then more sophisticated spectral methods have been used to find it. (See this paper by Alon, Krivelevich and Sudakov. See also the interesting discussion of the problem in the introduction to a paper of Feige and Krauthgamer.) But if
is, say,
, then nobody has the faintest idea how to find it, and if
is something like
, then the problem looks pretty hopeless.
How big do you expect the largest clique to be in a random graph?
Oh, sorry. Well, all that matters is that it is at most . That’s quite easy to show. But in fact it is known to be close to this with high probability.
Why does it look hopelessly hard to detect a clique of size when
is substantially smaller than
? Well, a random graph with a small clique added to it looks so like a random graph that it is hard to imagine an algorithm (other than the obvious inefficient brute-force algorithm) that would tell the difference.
So it is reasonable to hypothesize that there is no such algorithm, and to try to exploit the apparent randomness of a random graph with an added small clique.
In some sense, one would like to prove that the set of all random graphs with an added clique is pseudorandom: that is, that one cannot tell the difference between it and the set of all random graphs, which, when you think about it, just means the set of all graphs. But for a very simple reason that is not true in the sense in which I defined pseudorandom sets earlier. Recall that I defined a subset of
to be pseudorandom if for every polynomially computable Boolean function
, the probability that
is almost exactly equal to the probability that
given that
. And “almost exactly equal to” means “closer to than
for any polynomial
“. But here we can take
to be
if the number of edges in the graph is at least
and
otherwise. Then if
is odd (which we could assume for simplicity), the probability that
is exactly
. But if we’ve added a random clique of size
to the graph, then the probability that
is bigger than
. The difference isn’t all that great, but it’s certainly a power-type difference. (Even adding a single edge will affect the probability by at least
.)
What has gone wrong? Well, it is important not to confuse the following two problems.
1. You are given a graph and told that either it has been generated randomly or it has been generated randomly and then a clique of size has been added to it. Is there a polynomial-time algorithm that will tell you, with probability at least
, which is the case?
2. You are given a sequence of graphs and told that either they are all generated randomly or they are generated randomly and then enriched by the addition of cliques of size . Can you tell in polynomial time which of the two possibilities holds?
The answer to 2 is trivially yes: you just count the edges, and a small bias will show up if the cliques are added. But what I am interested in is question 1. And counting edges gets you nowhere in that question, since when is substantially smaller than
the extra edges you get when you add a clique of size
can easily be outweighed by random fluctuations in the number of edges of the random graph.
All right, now that that’s sorted out, let’s consider the Boolean function that is defined on all graphs with
vertices. Actually, let me change my notation. I’ll call the number of vertices
and I’ll set
. Then I can identify Boolean functions on
with Boolean functions defined on the set of all graphs with
labelled vertices.
How do you do that?
The details aren’t too important. You just define some natural bijection between the numbers
and the set of all subsets of
of size 2. Then, given any
sequence
of length
, you define
to be the graph whose edges are all pairs
such that
. This is obviously a bijection. Then for any Boolean function
on
you define a function
on all graphs
by taking
if
and
if
.
Sorry — I now see that it was indeed obvious.
That’s quite OK. Now let’s define to be
if
contains a clique of size
and
otherwise. That’s a classic example of a function in NP.
8) Indeed it is. But it’s not remotely quasirandom. For one thing, it takes the value almost all the time. So even its
norm is large!
What’s the norm?
It’s not hard to see that the appropriate definition is . It is not in fact a norm but a seminorm (meaning that it can sometimes be zero but otherwise satisfies the usual properties of a norm). It is not too hard to show that
increases as
increases, so the larger
is, the stronger a quasirandomness statement it is to say that
is small. So if even the
norm is large,
is not in any sense quasirandom.
So why am I still smiling? Because this is precisely what I meant when I said that dual norms had other advantages. Even though this function is not quasirandom, we can still try to show that it correlates with a quasirandom function.
For instance, if you’re worried about the negative bias, then you can get rid of it in a very standard way: you replace by the function
. This function has
norm equal to zero, but
which is non-zero. Therefore,
.
That’s not a huge result, but it shows, in a trivial way, how proving that dual norms are big can be more feasible than trying to construct functions for which the original norms are small.
Are you saying that once you’ve added a constant to your function , you’ve got something quasirandom that correlates with
?
No, unfortunately not. The reason is that adding a clique of size messes up more than just the expected number of edges of
. For instance, it increases the expected number of triangles, and moreover it does so by more than can be accounted for just from the fact that the expected number of edges has increased. That’s because the edges you add to a random graph are bunched up into a clique.
Why is that a problem?
Well, it tells us, for instance, that the average of in the set
is bigger than zero by some power of
. That will give a slight kick to the
norm, since a function with tiny
norm must have tiny average on any subspace of small codimension.
What can you hope to do about that?
Well, there are all sorts of tricks one could imagine. I was hoping to make tiny adjustments to so that it became balanced again. For instance, in order to deal with the
norm it might be possible just to tinker with the Fourier coefficients that correspond to small subsets of
Of course, for higher
norms, we don’t have an adequate Fourier analysis (especially when
is unbounded!), so we would have to argue more directly.
In the case, one could also give a pretty direct Fourier argument. The idea would be simply to compute the Fourier coefficients of
. Recall that if
is a subset of
, then
So
measures the effect it has on the expectation of
if you know that the parity of the restriction of
to
is even. Note that in this case
is a graph on
vertices, so what we are interested in is the effect it has on the probability that a graph
contains a clique of size
if you condition on the parity of
. If
is large, then one would expect this effect to be tiny, and even more so if the edges of
are “scattered”. But what matters is how tiny, and at the moment I don’t see how to estimate that. But if my hunch is right that the hidden-clique function has a large dual norm, then one would find that it lots of small but non-zero Fourier coefficients, just as for a random function, so that the
dual norm is huge.
The point of doing such a computation is that it would be something of a reality check: if the dual norm turned out not to be huge, then that would be bad news for this approach.
If one did manage to show that it was huge, then the next step would be to try to construct a function with very small
norm that correlated with
, and to do it without using Fourier analysis. That could be pretty challenging, but it also seems to be fairly important, because although there are generalizations of Fourier analysis that work for higher
norms, the statements are quite weak for our purposes, especially when we are working in a field of bounded characteristic, and in any case they would not apply to unbounded
.
I’m slightly worried about something here.
Oh yes? What’s that?
Well, you seem to be wanting to solve an NP problem, and moreover one that probably isn’t in P.
You mean the problem of finding a function such that
is small and
is large?
Yes.
Yes, but the point is that I’m trying to solve just one instance of the problem. I’m not trying to find some all-purpose algorithm for constructing an appropriate given an arbitrary
. I’m hoping to make use of the particular nature of the clique-detecting function when I build
.
8) In fairness to
, even though you are right that, strictly speaking, you are not trying to solve a difficult problem in NP, the observation that you are trying to solve an instance of such a problem does place some serious restrictions on what your argument could look like. For instance, if you ever seem to be coming up with a technique for creating that would apply in general, then you’re in deep trouble, at least if it could be done by means of a polynomial-time algorithm, since that would show that estimating the
dual norm was in P, which would naturalize your proof technique.
I see what you mean. But that doesn’t completely rule out some general algorithmic approach. For example, perhaps one could come up with an approach for creating a good that made a superpolynomial number of adjustments to
. That might even be quite natural when
is unbounded, for reasons similar to the reasons that I gave in my heuristic argument that the
norm might not be computable in polynomial time, even by a randomized algorithm.
8) OK, another question. Do you have some philosophical reason to expect the clique function to have a large dual norm?
Yes. I gave it earlier. It’s because a random graph with an added clique looks pretty much like a random graph. So I think the clique function is pseudorandom in a certain sense. And that ought to be reflected in a large dual norm.
8) That’s a very vague argument and in any case I have a problem with it. It feels as though you’re confusing the pseudorandomness of the graphs that you get by adding a small clique with the pseudorandomness of the function that takes the value 1 when a graph contains an added clique. For example, if I define to be
for every graph, then the function I get is highly non-random — I think you’ll have to admit that.
Either that or as random as it’s possible to get, because it’s an exactly uniform distribution.
8) Well, you can say that if you like, but you have to admit that it does not have a large dual norm. And yet, if you use this
to choose a random graph, then the graph you get will be very random — that’s a tautologous statement. So getting random-looking graphs as your output doesn’t guarantee you a large
norm.
OK, I need to think about this a bit harder. I think the point is going to be that the example you have just given is essentially the only one. That is, if you have a pseudorandom subset of
that is not the whole of
, then it must have a large
dual norm. At least when
that follows from Parseval’s identity: when
, the Fourier transform of the characteristic function of
is concentrated at one point, the empty set, but if
is a set of small density, then this large Fourier coefficient does not make up all the
mass of
so you have to have a lot of smaller, random Fourier coefficients, the squares of which add up to what you want, which makes the
powers add up to something huge.
8) OK, that’s slightly better, but I still have a niggling worry.
What’s that?
8) It’s that I don’t see a good reason for the Fourier calculation working out the way you want it to. After all, the function is defined in a rather concrete way. Why shouldn’t that cause its Fourier coefficients to decay more rapidly than they would for a random function?
I have to admit that at this stage I don’t have a good answer to that question. I’m just hoping.
But let me try to give you some idea of how I thought one might be able to construct a function with small
norm and good correlation with
.
Recall that our method of hiding a clique in a graph is rather crude: we just take a random graph and add a random clique of size (by randomly choosing
vertices and joining all the pairs that are not yet joined). The crudeness of this approach means that, as I pointed out earlier, it does not result in a pseudorandom way of producing graphs. However, I am hoping that it has a property that one might call weak pseudorandomness, which is
-pseudorandomness for some
that tends to zero, but not at a superpolynomial rate. This weak pseudorandomness means that if you are just given one graph with a clique added, you cannot tell that it isn’t random, but if you are given a sequence of such graphs, then after a while you can tell.
This is not completely relevant to us, because we do not know how to prove the weak pseudorandomness. If we did, we’d have a proof that PNP. So let me get back to the main point.
I propose trying to define, using combinatorial ideas, a new clique-hiding technique that is less crude and feels as though it might give rise to a strong pseudorandom set.
Here’s a very simple idea of the kind of thing I mean. As we’ve seen, just brutally adding a clique to a random graph will increase the expected number of edges. So we could at the same time remove a few edges at random, just to hide that effect. This would give us a slight adjustment to the distribution, but it ought to correlate pretty well with it. Unfortunately, a graph produced in this slightly more sophisticated way will still have too many triangles (since adding the clique adds quite a few triangles, but removing random edges doesn’t remove quite so many). Perhaps a cleverer idea would be to add a random clique of size but at the same time add a random independent set of size
(by removing a few edges elsewhere). Oh, but that doesn’t quite work, because increasing the number of empty triangles is not quite the same as decreasing the number of full triangles. So instead, perhaps one could do the following. Take a random graph. Add a random clique of size
. Weight each edge according to how many triangles it belongs to. Remove random edges, with probabilities depending on these weights, organizing things so that the expected number of edges and triangles you remove is the same as the expected number of edges and triangles you added when you put in the random clique.
This just scratches the surface of the kind of thing one could do. The aim would be to come up with a super-clever model of random graphs with added cliques, where one tries much harder to “hide” the cliques, and then to prove that the resulting probability distribution (which can be thought of as a function on the set of all graphs) has high dual norm. Note that this new model would not even be a Boolean function, so one cannot easily talk about its belonging to any complexity class. This demonstrates one of the advantages of duality that I was talking about earlier: I can try to devise a function that looks highly random, and I do not have the strong constraint that it should be a Boolean function in NP. It just has to correlate with one.
Another quick remark that I’d like to make is that it might be an advantage to make the cliques we hide in a random graph smaller. Perhaps if they are only just bigger than , the size you expect in the random case, then the set of all graphs with cliques hidden inside them would have positive density. That might avoid certain technical problems.
8) OK, I think I’m getting some idea of what you’re on about here. It looks like a very tough programme to carry out, and I still think you should check whether it is even true for , but at the moment I can’t think of a way of showing that it is bound not to work.
I see the fact that it’s a tough programme to carry out as a positive feature: if it looked as though it was going to be easy, then I’d be very suspicious. And I haven’t forgotten your other objection, when you accused me of just drawing the dual norm out of a hat without any reason to suppose that it is small for functions of low complexity. So I’m not getting excited. I just think that the approach should be thought about further.
October 7, 2009 at 3:57 pm |
Amin Coja-Oghlan’s recentish paper –
http://www.matheon.com/preprints/2713_jfind4.pdf
– also has a nice survey of the state of the art, including his own work showing easiness results (for
) and hardness results (for
) even in the semirandom model, where an adversary can delete any edges it wants from a random graph + planted
(except for edges from the planted clique, of course).
On a more philosophical note… You can of course find a clique of size
in
time, which is only quasipolynomial in
if
is logarithmic (or polylogarithmic). Indeed, as Coja-Oghlan points out, Alon-Krivelevich-Sudakov can find a planted
-clique in a random graph in time
for any
.
For the purposes of lower bounds, one often (but not always) does not make a big distinction between polynomial and quasipolynomial time/size. (We don't believe SAT is solvable even in
time, let alone
time.)
Are the characters in the dialogue suggesting proving a lower bound for problems which are solvable by quasipolynomial size circuits? Or would the optimist
hope to separate NP from quasipolynomial-size circuits as well?
October 7, 2009 at 5:18 pm
That’s a good question. Ideally,
would want to prove that P
NP, but by this stage of the conversation anything new and exciting would do. However, there seems to be little prospect of that (even by the end of the tenth instalment).
I still think there might be a more realistic Polymath project in there somewhere. Perhaps when I get to the end I’ll collect together all the questions that seem interesting and replace the old proposal 5 by a set of proposals 5a to 5d (or whatever letter I get up to).
And thanks for drawing my attention to that paper — I look forward to checking it out.
October 7, 2009 at 10:08 pm
Sorry, I didn’t really answer your question properly. What I mean is that
would be ecstatic to find a quasipolynomial function that didn’t have a polynomial-time algorithm, even if finding a superquasipolynomial lower bound for an NP function would be more intellectually satisfying.
I presume the
algorithm you refer to is based on showing that if you know there’s a clique of size
and you find all the cliques of size
then almost certainly their union (or something pretty close to their union) is the clique of size
Is that roughly the idea? If you changed the probability so that the expected size of the biggest clique was
say, would that make it much harder to find a clique of size
?
October 8, 2009 at 3:49 am
Slightly related (But in the AC_0 shallower water): I think it is a very interesting problem to show that the Boolean function f describing the property: “a graph on n vertices has a clique of size 2 log n” cannot be approximated by a function in AC_0 (a bounded depth polynomial size Boolean circuit). It is known that f itself is not in AC_0 (Paul Beame was the first to show it I think). Of course depth 2 suffices if you allow quasi polynomial circuits. (Actually I am not aware of any function computable in quasi-polynomial size bounded depth circuits where it is known that the function cannot be approximated by a function in AC_0.)
October 8, 2009 at 12:57 pm |
I wanted to mention an approach using dual norms to show lower
bounds in the model of communication complexity. Maybe the analogy is helpful.
I am thinking the multiparty number-on-the-forehead
where each
is a n-bit string. In what follows
or
. Player
receives as
except
(which is “written on his forehead”). The question is
model of communication complexity. In this model k-players wish to evaluate
a Boolean function
think of the output of f as being
input all of the
how many bits the players need to communicate in order to evaluate the function.
It is usually assumed that the communication is broadcast and known to all players.
So the required communication will be a number between 0 and n.
The norm that has been used in this setting is quite similar in spirit to the
norm. The “k”
version of the norm is applied to a k-argument function f and is defined as follows.
Babai, Nisan, and Szegedy showed that if a function has small such norm, then it requires a lot
.
of communication. To show that f is hard, it also suffices to find some g such that g correlates with
f and g has small norm—in other words, one can lower bound the dual norm
More precisely, the way this is applied to the communication problem is that

is a normalization factor I have hopefully gotten correct, and
is the k-party communication complexity.
share a 1, is an example of a function where the
where
The “set intersection” function, determining if there is a position where
all the
norm is relatively large, yet one can still show a strong lower bound via the dual norm approach.
The main drawback with this technique is that it gives lower bounds of
, and so becomes trivial for
many players. This is where
size at most
things start to get most interesting for applications to circuit complexity. Combining a result of Hastad-Goldmann with a circuit simulation result of Beigel-Tarui (building on Yao and Allender), any function which can be
computed by a polynomial size, constant depth circuit with AND, NOT and mod m gates
has a number-on-the-forehead protocol with polylog many players and polylog communication.
Needless to say, we currently don’t know of any explicit function hard for this class of circuits (even
just with mod 6 gates).