This is the fourth part of a dialogue between three characters (a mathematical optimist who likes to suggest approaches to problems), 8) (a more sceptical mathematician who knows a small amount of theoretical computer science), and (an onlooker who occasionally asks for more detail, and occasionally makes comments about what and 8) say). They are discussing the possibility of proving lower bounds for computational complexity. wanted at first to prove that PNP, but has now been persuaded that proving a lower bound for formula size is a more realistic target, at least for the time being, and is still a major unsolved problem. In part II of the conversation, proposed trying to show that the dual of the norm of a low-complexity function cannot be too large, at least when has order of magnitude . Numerous problems have been identified with this approach, but the idea has not yet been completely killed off. We return to the conversation as a new topic is introduced. This instalment is shorter than average, so I’ll probably post the next instalment sooner than average.
8) You know, something has been bothering me about our discussion so far.
You’ve made that pretty clear.
8) I’m not talking about the fact that you don’t have any heuristic argument in favour of the dual norm being a useful complexity measure, but about something else that would be equally fundamental to any lower bound proof.
8) Well, you haven’t said how you would go about proving that some function in NP has a large norm.
Oh yes, I’m sorry about that. I think I did briefly allude to that question but I never got round to telling you about my thoughts on it. Let me quickly do that and then we can go back to trying to decide whether this proposed complexity measure stands any chance of being a good one.
Clearly a major decision is what NP function to take, and I’ve got a candidate for that. It’s not the only function of its kind, but it’s the one I like best. It’s based on the following very nice question in complexity theory. It is known, at least if PNP, that finding cliques in graphs is a hard problem. In fact, it is a very hard problem: there is not even a polynomial-time algorithm that will output if the graph contains a clique of size , if it does not contain a clique of size , and anything if the maximum clique size lies between these two bounds. In other words, even finding a very crude approximation to the maximum clique size is hard. (This result is proved using the famous PCP theorem of Arora, Lund, Motwani, Sudan, and Szegedy, which built on earlier work of several other authors.) But there’s a more specific problem that seems to be wide open. Take a random graph and either add a clique of size to it, or don’t. Is there a polynomial-time algorithm that can tell whether you have done so?
If is bigger than then it is an easy exercise to find the clique. (For instance, one can look for vertices of abnormally high degree and work from there.) If is bigger than (the number of vertices in the graph), then more sophisticated spectral methods have been used to find it. (See this paper by Alon, Krivelevich and Sudakov. See also the interesting discussion of the problem in the introduction to a paper of Feige and Krauthgamer.) But if is, say, , then nobody has the faintest idea how to find it, and if is something like , then the problem looks pretty hopeless.
How big do you expect the largest clique to be in a random graph?
Oh, sorry. Well, all that matters is that it is at most . That’s quite easy to show. But in fact it is known to be close to this with high probability.
Why does it look hopelessly hard to detect a clique of size when is substantially smaller than ? Well, a random graph with a small clique added to it looks so like a random graph that it is hard to imagine an algorithm (other than the obvious inefficient brute-force algorithm) that would tell the difference.
So it is reasonable to hypothesize that there is no such algorithm, and to try to exploit the apparent randomness of a random graph with an added small clique.
In some sense, one would like to prove that the set of all random graphs with an added clique is pseudorandom: that is, that one cannot tell the difference between it and the set of all random graphs, which, when you think about it, just means the set of all graphs. But for a very simple reason that is not true in the sense in which I defined pseudorandom sets earlier. Recall that I defined a subset of to be pseudorandom if for every polynomially computable Boolean function , the probability that is almost exactly equal to the probability that given that . And “almost exactly equal to” means “closer to than for any polynomial “. But here we can take to be if the number of edges in the graph is at least and otherwise. Then if is odd (which we could assume for simplicity), the probability that is exactly . But if we’ve added a random clique of size to the graph, then the probability that is bigger than . The difference isn’t all that great, but it’s certainly a power-type difference. (Even adding a single edge will affect the probability by at least .)
What has gone wrong? Well, it is important not to confuse the following two problems.
1. You are given a graph and told that either it has been generated randomly or it has been generated randomly and then a clique of size has been added to it. Is there a polynomial-time algorithm that will tell you, with probability at least , which is the case?
2. You are given a sequence of graphs and told that either they are all generated randomly or they are generated randomly and then enriched by the addition of cliques of size . Can you tell in polynomial time which of the two possibilities holds?
The answer to 2 is trivially yes: you just count the edges, and a small bias will show up if the cliques are added. But what I am interested in is question 1. And counting edges gets you nowhere in that question, since when is substantially smaller than the extra edges you get when you add a clique of size can easily be outweighed by random fluctuations in the number of edges of the random graph.
All right, now that that’s sorted out, let’s consider the Boolean function that is defined on all graphs with vertices. Actually, let me change my notation. I’ll call the number of vertices and I’ll set . Then I can identify Boolean functions on with Boolean functions defined on the set of all graphs with labelled vertices.
How do you do that?
The details aren’t too important. You just define some natural bijection between the numbers and the set of all subsets of of size 2. Then, given any sequence of length , you define to be the graph whose edges are all pairs such that . This is obviously a bijection. Then for any Boolean function on you define a function on all graphs by taking if and if .
Sorry — I now see that it was indeed obvious.
That’s quite OK. Now let’s define to be if contains a clique of size and otherwise. That’s a classic example of a function in NP.
8) Indeed it is. But it’s not remotely quasirandom. For one thing, it takes the value almost all the time. So even its norm is large!
What’s the norm?
It’s not hard to see that the appropriate definition is . It is not in fact a norm but a seminorm (meaning that it can sometimes be zero but otherwise satisfies the usual properties of a norm). It is not too hard to show that increases as increases, so the larger is, the stronger a quasirandomness statement it is to say that is small. So if even the norm is large, is not in any sense quasirandom.
So why am I still smiling? Because this is precisely what I meant when I said that dual norms had other advantages. Even though this function is not quasirandom, we can still try to show that it correlates with a quasirandom function.
For instance, if you’re worried about the negative bias, then you can get rid of it in a very standard way: you replace by the function . This function has norm equal to zero, but which is non-zero. Therefore, .
That’s not a huge result, but it shows, in a trivial way, how proving that dual norms are big can be more feasible than trying to construct functions for which the original norms are small.
Are you saying that once you’ve added a constant to your function , you’ve got something quasirandom that correlates with ?
No, unfortunately not. The reason is that adding a clique of size messes up more than just the expected number of edges of . For instance, it increases the expected number of triangles, and moreover it does so by more than can be accounted for just from the fact that the expected number of edges has increased. That’s because the edges you add to a random graph are bunched up into a clique.
Why is that a problem?
Well, it tells us, for instance, that the average of in the set is bigger than zero by some power of . That will give a slight kick to the norm, since a function with tiny norm must have tiny average on any subspace of small codimension.
What can you hope to do about that?
Well, there are all sorts of tricks one could imagine. I was hoping to make tiny adjustments to so that it became balanced again. For instance, in order to deal with the norm it might be possible just to tinker with the Fourier coefficients that correspond to small subsets of Of course, for higher norms, we don’t have an adequate Fourier analysis (especially when is unbounded!), so we would have to argue more directly.
In the case, one could also give a pretty direct Fourier argument. The idea would be simply to compute the Fourier coefficients of . Recall that if is a subset of , then So measures the effect it has on the expectation of if you know that the parity of the restriction of to is even. Note that in this case is a graph on vertices, so what we are interested in is the effect it has on the probability that a graph contains a clique of size if you condition on the parity of . If is large, then one would expect this effect to be tiny, and even more so if the edges of are “scattered”. But what matters is how tiny, and at the moment I don’t see how to estimate that. But if my hunch is right that the hidden-clique function has a large dual norm, then one would find that it lots of small but non-zero Fourier coefficients, just as for a random function, so that the dual norm is huge.
The point of doing such a computation is that it would be something of a reality check: if the dual norm turned out not to be huge, then that would be bad news for this approach.
If one did manage to show that it was huge, then the next step would be to try to construct a function with very small norm that correlated with , and to do it without using Fourier analysis. That could be pretty challenging, but it also seems to be fairly important, because although there are generalizations of Fourier analysis that work for higher norms, the statements are quite weak for our purposes, especially when we are working in a field of bounded characteristic, and in any case they would not apply to unbounded .
I’m slightly worried about something here.
Oh yes? What’s that?
Well, you seem to be wanting to solve an NP problem, and moreover one that probably isn’t in P.
You mean the problem of finding a function such that is small and is large?
Yes, but the point is that I’m trying to solve just one instance of the problem. I’m not trying to find some all-purpose algorithm for constructing an appropriate given an arbitrary . I’m hoping to make use of the particular nature of the clique-detecting function when I build .
8) In fairness to :|, even though you are right that, strictly speaking, you are not trying to solve a difficult problem in NP, the observation that you are trying to solve an instance of such a problem does place some serious restrictions on what your argument could look like. For instance, if you ever seem to be coming up with a technique for creating that would apply in general, then you’re in deep trouble, at least if it could be done by means of a polynomial-time algorithm, since that would show that estimating the dual norm was in P, which would naturalize your proof technique.
I see what you mean. But that doesn’t completely rule out some general algorithmic approach. For example, perhaps one could come up with an approach for creating a good that made a superpolynomial number of adjustments to . That might even be quite natural when is unbounded, for reasons similar to the reasons that I gave in my heuristic argument that the norm might not be computable in polynomial time, even by a randomized algorithm.
8) OK, another question. Do you have some philosophical reason to expect the clique function to have a large dual norm?
Yes. I gave it earlier. It’s because a random graph with an added clique looks pretty much like a random graph. So I think the clique function is pseudorandom in a certain sense. And that ought to be reflected in a large dual norm.
8) That’s a very vague argument and in any case I have a problem with it. It feels as though you’re confusing the pseudorandomness of the graphs that you get by adding a small clique with the pseudorandomness of the function that takes the value 1 when a graph contains an added clique. For example, if I define to be for every graph, then the function I get is highly non-random — I think you’ll have to admit that.
Either that or as random as it’s possible to get, because it’s an exactly uniform distribution.
8) Well, you can say that if you like, but you have to admit that it does not have a large dual norm. And yet, if you use this to choose a random graph, then the graph you get will be very random — that’s a tautologous statement. So getting random-looking graphs as your output doesn’t guarantee you a large norm.
OK, I need to think about this a bit harder. I think the point is going to be that the example you have just given is essentially the only one. That is, if you have a pseudorandom subset of that is not the whole of , then it must have a large dual norm. At least when that follows from Parseval’s identity: when , the Fourier transform of the characteristic function of is concentrated at one point, the empty set, but if is a set of small density, then this large Fourier coefficient does not make up all the mass of so you have to have a lot of smaller, random Fourier coefficients, the squares of which add up to what you want, which makes the powers add up to something huge.
8) OK, that’s slightly better, but I still have a niggling worry.
8) It’s that I don’t see a good reason for the Fourier calculation working out the way you want it to. After all, the function is defined in a rather concrete way. Why shouldn’t that cause its Fourier coefficients to decay more rapidly than they would for a random function?
I have to admit that at this stage I don’t have a good answer to that question. I’m just hoping.
But let me try to give you some idea of how I thought one might be able to construct a function with small norm and good correlation with .
Recall that our method of hiding a clique in a graph is rather crude: we just take a random graph and add a random clique of size (by randomly choosing vertices and joining all the pairs that are not yet joined). The crudeness of this approach means that, as I pointed out earlier, it does not result in a pseudorandom way of producing graphs. However, I am hoping that it has a property that one might call weak pseudorandomness, which is -pseudorandomness for some that tends to zero, but not at a superpolynomial rate. This weak pseudorandomness means that if you are just given one graph with a clique added, you cannot tell that it isn’t random, but if you are given a sequence of such graphs, then after a while you can tell.
This is not completely relevant to us, because we do not know how to prove the weak pseudorandomness. If we did, we’d have a proof that PNP. So let me get back to the main point.
I propose trying to define, using combinatorial ideas, a new clique-hiding technique that is less crude and feels as though it might give rise to a strong pseudorandom set.
Here’s a very simple idea of the kind of thing I mean. As we’ve seen, just brutally adding a clique to a random graph will increase the expected number of edges. So we could at the same time remove a few edges at random, just to hide that effect. This would give us a slight adjustment to the distribution, but it ought to correlate pretty well with it. Unfortunately, a graph produced in this slightly more sophisticated way will still have too many triangles (since adding the clique adds quite a few triangles, but removing random edges doesn’t remove quite so many). Perhaps a cleverer idea would be to add a random clique of size but at the same time add a random independent set of size (by removing a few edges elsewhere). Oh, but that doesn’t quite work, because increasing the number of empty triangles is not quite the same as decreasing the number of full triangles. So instead, perhaps one could do the following. Take a random graph. Add a random clique of size . Weight each edge according to how many triangles it belongs to. Remove random edges, with probabilities depending on these weights, organizing things so that the expected number of edges and triangles you remove is the same as the expected number of edges and triangles you added when you put in the random clique.
This just scratches the surface of the kind of thing one could do. The aim would be to come up with a super-clever model of random graphs with added cliques, where one tries much harder to “hide” the cliques, and then to prove that the resulting probability distribution (which can be thought of as a function on the set of all graphs) has high dual norm. Note that this new model would not even be a Boolean function, so one cannot easily talk about its belonging to any complexity class. This demonstrates one of the advantages of duality that I was talking about earlier: I can try to devise a function that looks highly random, and I do not have the strong constraint that it should be a Boolean function in NP. It just has to correlate with one.
Another quick remark that I’d like to make is that it might be an advantage to make the cliques we hide in a random graph smaller. Perhaps if they are only just bigger than , the size you expect in the random case, then the set of all graphs with cliques hidden inside them would have positive density. That might avoid certain technical problems.
8) OK, I think I’m getting some idea of what you’re on about here. It looks like a very tough programme to carry out, and I still think you should check whether it is even true for , but at the moment I can’t think of a way of showing that it is bound not to work.
I see the fact that it’s a tough programme to carry out as a positive feature: if it looked as though it was going to be easy, then I’d be very suspicious. And I haven’t forgotten your other objection, when you accused me of just drawing the dual norm out of a hat without any reason to suppose that it is small for functions of low complexity. So I’m not getting excited. I just think that the approach should be thought about further.