In my previous post, I suggested several possible Polymath projects. One of them, which initially seemed to be the favourite (and was my own favourite) was to have a look at an idea I had had for getting round the Razborov-Rudich obstacle to proving lower bounds in computational complexity. (This is closely related to the P versus NP problem, though my idea did not obviously apply to that problem itself.) However, now I am fairly sure I have understood why it cannot work: somehow, even if you come up with an idea for a proof that is not directly ruled out by their result, you find that slightly more complicated arguments, based on their idea, do eventually show that it doesn’t work.
While thinking about my approach, I wrote what was intended to be a series of five initial posts for the Polymath project if it happened. I thought I would experiment, so instead of explaining the basic concepts in the usual way I wrote a dialogue, which I hoped would be easier to read. It is between three characters, who are denoted by smileys (or is that smilies? no, perhaps not). The main character,🙂, is a mathematical optimist, continually becoming enthusiastic about ideas that have not been fully thought through. The next character, 8) , is much more sceptical, and also knows more about computational complexity (a lot less than an expert in the area, but enough to merit a “cool” smiley). Finally,😐 is a neutral observer who occasionally asks for fuller explanations when🙂 or 8) seem to be omitting them, and sometimes makes easy but useful mathematical observations.
To some extent these characters, particularly the first two, represent different sides of me as I think about the problem. However, I do not always agree with what they say (sometimes this is obvious, since they do not always agree with each other), and I sometimes allow them to get things wrong, or to think that they have had an idea when in fact that idea is already known. For example, towards the end of this first part of the conversation,🙂 has the idea of applying the insights of Razborov and Rudich to other computational models. Not surprisingly,🙂 was not the first to think of doing that, as an internet search later revealed.
Since I wrote the post about Polymath projects, the conversation has expanded and is now in seven parts of about 5000 words each. Be warned in advance that pretty well every idea that🙂 has is eventually doomed to fail. But I hope that non-experts will find the failure instructive even if experts just find it wearily familiar.
I also hope that some of the later parts of the conversation might lead to fruitful exchanges of comments even if they don’t lead to a proper Polymath research project. For instance, at least at the time of writing, I don’t have completely satisfactory general explanations for why some of🙂’s approaches fail. And some of the heuristic arguments (of which there are many) presented here are open to criticism. (Another warning: some of the early arguments are definitely wrong, even when the three characters appear to take them seriously, and their wrongness is pointed out only quite a bit later. It may be a little unconventional to explain mathematics by putting forward incorrect ideas and correcting them only after their consequences have been explored in detail, but there are plenty of conventional accounts of computational complexity for those who want them.)
For these kinds of reasons, it seems to me that I might as well post the dialogue rather than letting it go to waste. It’s even conceivable that somebody will be prompted by the ideas here to have better ideas, in which case maybe something serious could come out of it after all. I plan to post the seven parts (or more if I continue to add to them) of the dialogue at a rate of about one a week, to keep this blog ticking over until I am ready to start a Polymath project on something else. The first part is very much an introduction.
🙂 Guess what I’ve been thinking about recently?
8) The Riemann hypothesis?
🙂 Very funny. But actually you’re not far off. I’ve been wondering about the P versus NP problem.
8) Wonderful. Join the club. But I should warn you that it drives people mad.
🙂 I can believe that, as it’s driving me a bit mad actually.
8) Well, it’s possible that I may be able to put you out of your misery by showing that whatever you are trying cannot work. That would replace madness by mere disappointment.
🙂 I suppose if I end up sadder but wiser, then at least I’m wiser.
8) Wow, you’re definitely a half-full type of person aren’t you?
🙂 I am actually, not just in real life but in mathematical research as well.
😐 Would someone mind explaining what the P versus NP problem is?
🙂 Oh, sorry. Well, P is the class of computational problems that can be solved in polynomial time. For example, multiplying two -digit numbers can be done in roughly steps (or fewer if you use clever methods) and is therefore a problem in P. It is a widely accepted model for the class of problems that we can solve on a computer without taking unacceptably long. As for NP, it is the class of problems with the property that a solution can be verified in polynomial time. For example, the problem of determining whether a graph with vertices contains a clique of size is in NP because if somebody claims that a particular set of vertices is a clique, then this can easily be checked in polynomial time (simply by looking at every pair of vertices in and checking that they form an edge).
If you want to know more, I suggest looking at the Wikipedia article on the topic, which is OK. I don’t want to say too much more myself, other than that it is widely considered to be one of the most important problems in mathematics. That’s because what I’ve really been thinking about is a purely mathematical problem that is “morally equivalent” to the P versus NP problem.
8) Which formulation are you talking about?
🙂 I’m talking about finding superpolynomial lower bounds for the circuit complexity of an NP problem. In particular, I’m talking about the following formulation. Let’s define a Boolean function to be a function from to . (Usually the range is , but not always, and for the ideas I shall be looking at it is more convenient to take .) Let us say that a Boolean function is basic if it is of the form for some . (The basic functions are often called the Rademacher functions.) And let us define the Boolean operations to be the operations , , and , where and take the pointwise maximum and minimum, respectively, of and .
A straight-line computation of a Boolean function is then a sequence such that each member of the sequence is either a basic function or is created from one or two earlier functions using a Boolean operation. A Boolean function is said to have circuit complexity at most if there is a straight-line computation of of length at most .
It is not too hard to show that if there exists a Boolean function in NP (which means that you encode an NP problem in an obvious way and set to be 1 if the answer is yes for and otherwise) that has superpolynomial circuit complexity, then P does not equal NP.
8) Sounds OK to me.
😐 You may know all about that, but I find it an interesting formulation, since it makes no mention of non-mathematical concepts such as “algorithm”. Somehow, the problem feels more approachable when you put it like this.
8) It does until you try to solve the reformulated problem …
🙂 Well, let me tell you my idea for solving the reformulated problem. It’s an obvious idea really.
8) I’m suspicious already. If it’s an obvious idea, then it’s almost certainly known not to work.
🙂 Well can I try it out anyway? The thought was that one could try to define a function (the letter standing for “complexity”) with the following properties. (i) for every basic function . (ii) for every . (iii) If and are small, then so are and . (iv) is very large for some function that belongs to NP. I admit that I have just been a bit vague about property (iii), but the idea would be to use it to give an inductive argument that is small for every function of polynomial circuit complexity, and hence that does not have polynomial circuit complexity.
8) Have you actually got any candidates for such a function ?
🙂 Well, I haven’t yet thought of one that works, but the interesting thing is that my attempts so far have failed for two very different reasons.
8) This sounds familiar, but let’s hear what these attempts were.
🙂 Well, a thought that occurred to me was that maybe functions of low circuit complexity have some kind of structural property that shows that they are not truly random, whereas some function in NP might lack that property. So one would like to have some measure of pseudorandomness and show that the measure was small for functions of low circuit complexity but large for at least one function in NP. In additive combinatorics, there are many measures of pseudorandomness, but let me mention just one: we could define to be the reciprocal of the modulus of the largest Fourier coefficient of , since having no large Fourier coefficients is a well-known measure of quasirandomness (as it would be called in this context).
But this doesn’t work. By Parseval’s identity, the norm of a function is equal to the norm of its Fourier transform. A Boolean function has norm 1 (since its modulus is 1 everywhere), so the sum of the squares of the moduli of the Fourier coefficients is 1. It follows that at least one Fourier coefficient has modulus at least . But there are polynomially computable Boolean functions for which this bound is attained: I’m fairly sure an example is the function , where . Thus, there is no hope of using this particular measure of quasirandomness to distinguish between functions in P and functions in NP.
8) You said you ran into another difficulty. Can you explain what that was?
🙂 Well, many of my attempts failed for similar reasons to the attempt above. So then I started to wonder whether there existed a function with anything like the properties I wanted. And it occurred to me that there does, but for a trivial reason. I define to be the maximal function such that for every basic function , for every , and and are always at most .
We can say a little bit about this function. Suppose that has circuit complexity at most . Then an easy inductive argument shows that is at most …
8) Can I interrupt at this point?
8) You are defining something very similar to what is known as a formal complexity measure, and the correct version of what you are about to say is that is trivially at most the formula-size complexity of rather than its circuit complexity.
😐 What is formula-size complexity?
8) If I’m forced to explain it in🙂’s terminology, then I define it as the number of basic functions (with multiplicity) you need in a formula that builds out of basic functions using the Boolean operations. For example, writing for the function , the function has formula-size complexity at most 4, since I’ve just written it using four basic functions.
Anyhow, the formula-size complexity of a function can be bigger than its circuit complexity, and probably it can be much bigger. Therefore, using a formal complexity measure is not that great as a method for proving that PNP.
🙂 Oh no. That’s stupid of me. But it’s still an interesting problem to prove lower bounds for formula-size complexity isn’t it?
8) Very much so. The best known lower bound for the formula-size complexity of a function in NP is . Any improvement on that would be very interesting, and a superpolynomial lower bound would be a big big result.
🙂 Yes, but not a million-dollar result.
8) I wouldn’t worry about that if I were you. Isn’t lasting fame in the theoretical computer science community good enough for you?
🙂 Er … well, it would be good, but if I could have a million dollars as well then I wouldn’t say no. But actually it’s not really that. It’s that showing that PNP is the big one. I wouldn’t have to explain to colleagues in other subjects what I had done. The formula-size problem is more of interest to specialists.
8) That’s true I suppose. But it’s still quite an easy problem to explain to non-specialists. Can we stop talking about this and get back to the mathematics?
🙂 All right, I’ll talk about formula size for now, and leave for later the question of whether any of this would apply to circuit complexity.
Let’s quickly prove that if we set to be the formula-size complexity of , then we get the largest possible formal complexity measure. First, it is trivial to verify that is a formal complexity measure. Now suppose that is another one and let be a function of formula-size complexity . Then either is plus or minus a basic function, in which case and are both 1, or the smallest formula for writes as or for two functions and of formula size and with . But by induction and ; since is a formal complexity measure it follows that is at most .
So, if some specific function has superpolynomial formula size, then there is a formal complexity measure that is big for and small for all functions of small formula size. But that is a completely uninteresting result, since the formal complexity measure in question is the formula size.
So the problem I have is this. All the formal complexity measures I can think of fail for one of the following two reasons. Either is big even for some easily computable functions, so showing that is big doesn’t prove that is hard to compute, or is small for all low-complexity functions, but for trivial reasons, so showing that is big is trivially equivalent to showing that is hard to compute. There doesn’t seem to be anything in between.
8) Very succinctly put. Did you know that there are rigorous results that show that the kinds of difficulties you are encountering are not just due to your lack of imagination but are actually essential features of the problem?
🙂 I had heard something of the kind, but I’d love to hear more.
8) Well, the most famous result of this kind, and the one that best explains your particular difficulties (other people have had different difficulties that can be explained in other ways) is due to Razborov and Rudich. They defined a notion that they call a natural proof, and they showed that, subject to the truth of certain unproved conjectures that almost everybody believes, there cannot be a natural proof that PNP.
To explain their idea, let me say first what a pseudorandom subset of is, where is some finite set. Let and let be some class of functions from to
Since there are many levels of the set-theoretic hierarchy involved here, let me quickly clarify where appears. Each function can be thought of as some attempt to recognise elements of The very best one could hope for would be for to be the characteristic function of but we might well be happy if for every and with probability at most 1/2 if is chosen purely randomly. At any rate, if then is a function from to So is a property of Boolean functions that are defined on
Anyhow, we shall say that is –pseudorandom for if
for every function . Loosely speaking, one could say that functions in “cannot tell the difference” between random elements of and random elements of . A case of particular interest is when is the set of all functions of polynomial circuit complexity (or, since that doesn’t quite make sense, let’s say of circuit complexity at most , where is the cardinality of ), and when is some function that tends to zero faster than the reciprocal of any polynomial. Then we say to ourselves that no polynomial time algorithm can tell the difference between a random element of and a random element of .
😐 A small question. Couldn’t we just let be the set and let ? I’d find that less confusing somehow.
8) That’s a better question than you think. As it happens, I very deliberately didn’t do that, and I believe that by the end of my discussion you will be less confused than if I had.
In fact, let me tell you straight away why I went for a general set . It’s because I’m actually going to use this concept in the case where is the set , so is not but . In other words, there are even more levels of the set-theoretic hierarchy involved than I was letting on just now. If you want to get your head round this, it’s best not to think of as having anything to do with the number . Just think of it as some group with elements. It so happens that is a power of 2, that the group is , and that we don’t care all that much about its group structure, but put all those thoughts out of your mind and think about functions defined on a finite group that take the values .
Now the thought that underlies the work of Razborov and Rudich is one that you have probably had yourself: it seems to be extraordinarily hard to tell the difference between a random Boolean function of low complexity and a purely random Boolean function. In fact, so much does this seem to be the case, that one might even hazard a guess that some theorem along the following lines is true: no polynomial-time algorithm can tell the difference between a random Boolean function of low complexity and a purely random Boolean function.
What does this mean? In order to stick to our resolution not to think about , let us write for . Then a random Boolean function is a random function from to , while a random low-complexity Boolean function is a function from to that is built out of basic functions in a certain way. (More precisely, it comes at the end of a shortish sequence of functions, each of which is obtained by applying a random Boolean operation to some random earlier functions in the sequence, according to some suitable probability distribution — let’s not worry about the details for now.)
Now it could be that the reason that the random low-complexity functions are hard to distinguish from purely random functions is that the random low-complexity functions form a pseudorandom subset of the set of all Boolean functions on .
Here, it helps to visualize the elements of in a line, so a Boolean function on is just a string of length . We then pick out of these functions and call them basic, and a low-complexity Boolean function is a function that can be built out of basic functions in a particularly simple way. What does it mean to say that the low-complexity Boolean functions form a pseudorandom subset? It means that if is any function from to that can be computed in polynomial time (as a function of of course — that’s the parameter we care about here), then the probability that given that has low complexity is almost exactly equal to the probability that .
😐 One small remark. The function isn’t a Boolean function according to the definition we have been using. But of course we can interpret it as one by identifying with in an obvious way. That makes sense of the statement that can be computed in polynomial time.
8) You’re right, and that’s an argument in favour of making all our Boolean functions take values in .
🙂 Granted, but for the kinds of thoughts I was having it was more natural to have functions that typically averaged . For example, when discussing Fourier analysis or more general quasirandomness norms it’s more convenient to use -valued functions.
8) Let’s just all agree that we can switch from one notation to the other as we see fit. Anyhow, the point is that if you buy the assertion that a random low-complexity function is polynomially indistinguishable from a purely random Boolean function, then you can kiss goodbye to any thoughts of an easy proof that PNP.
🙂 That sounds sort of plausible, but can you spell it out?
8) Certainly. Let me put the argument a bit loosely though. If you want to prove that PNP, then an obvious approach would be to find some “simplicity property” that all functions in P have, and some function in NP does not have. That property will be a property of Boolean functions on , or equivalently of elements of . Now let’s suppose that the property is something that can be calculated in polynomial time. Then every random low-complexity function will have the property (by hypothesis), and since the property does not distinguish between random low-complexity functions and purely random functions, it follows that almost every function has the property. To put it loosely, random functions have the property too.
Now that’s a bit strange, since the property was supposed to be one that applied to “simple” functions, and for any sensible definition of “simple” one would want a random function to be simple with only very tiny probability. But this argument shows that you can’t have that. If you define some notion of simplicity in the hope of proving that functions in P are simple and some function in NP is not simple, then either your notion of simple will not be computable in polynomial time (in the size of the group on which your functions are defined) or it will apply to almost all functions.
🙂 Wow. I can see that that rules out a lot of the ideas I had been having. For example, the Fourier transform of a function on the group can be calculated in time that is polynomial in , and almost all functions have no large Fourier coefficient, so it was guaranteed in advance that the property “has no large Fourier coefficient” would apply not just to some low-complexity functions but in fact to almost all low-complexity functions.
8) Actually, I must admit that I oversimplified a bit. What Razborov and Rudich actually showed was not that the low-complexity functions form a pseudorandom subset of the set of all Boolean functions, but rather that it is possible to find a probability distribution that is concentrated on the low-complexity functions such that a random function chosen from this probability distribution cannot be distinguished by a polynomial-time algorithm from a purely random Boolean function. To oversimplify again, but much less, they found a clever subset of the set of all low-complexity functions rather than taking all of them. Also, and importantly, in order to show this they relied on an unproved hypothesis that asserts that certain kinds of pseudorandom generators exist. But this hypothesis, which is actually stronger than the hypothesis that PNP, is widely believed, and if it is true then the argument above still goes through.
🙂 I’m going to have to go away and think about this.
😐 Meanwhile, can this paper be found online?
😐 Thanks very much. I’ll go and have a look at those. By the way, where does the term “natural proof” come from?
8) Oh sorry, I forgot to say. Well, Razborov and Rudich call a proof that PNP natural if it defines some property of simplicity with the following features:
(i) all low-complexity functions are simple;
(ii) a random function is, with high probability (or even with non-tiny probability), not simple;
(iii) whether or not a function is simple can be determined in polynomial time (in );
(iv) some function in NP fails to be simple.
The discussion above can be rephrased as saying that there is no property that satisfies (i) to (iii). In particular, if pseudorandom generators exist (which they do if, for instance, factorizing is hard), then there is no natural proof that PNP.
🙂 Hmm. I’m not sure how to react to this. Initially I thought that perhaps I would be OK because, as you’ve pointed out to me, I was effectively thinking about formula size rather than circuit complexity. And the Razborov-Rudich result doesn’t tell us that there is no natural proof that some function in NP has large formula size.
But then it occurred to me that we could use their basic insight to give at least a heuristic argument that there is no natural proof that a function has large formula size either. Can I try it out on you?
8) Sure, go ahead.
🙂 Well, let me try to convince you that there is no natural proof that some specific function in NP has formula size greater than . To do this, I’ll define a notion of a random formula inductively as follows. Let the basic functions be . I start by writing each basic function out times, and also minus each basic function times. I now have a list, with lots of repeats, of length . I now make a formula out of this in steps as follows. At the first step, I choose at random two functions and in my list, and I randomly decide either to take or to take . And then I remove and from the list and add the function . I then keep on repeating this process. At each stage, the number of functions in the list goes down by , so after steps I’m down to one function. That is my random function of formula size at most .
8) So what?
🙂 Well, unfortunately I don’t know how to prove anything about this random model. But it is at least plausible that a random function produced in this way is indistinguishable to any polynomial-time algorithm from a purely random Boolean function. And if that’s the case, then exactly the same argument goes through, and no natural proof could give lower bounds for formula size.
8) Yes, but what makes you say that that huge assumption of yours is even plausible, let alone true?
🙂 I don’t know. There just seems to be some kind of meta-principle that everything is either easy or impossible. So if there isn’t some obvious way to look at the values taken by a Boolean function defined by a smallish formula and work out what that formula is, then probably even telling the difference between a random function of low formula size and a purely random function is a problem that can’t be solved in polynomial time. And even if we merely suspect that that might be the case, then it should make us happier if our attempt to prove a lower bound for formula size was not an attempt to find a natural proof.
8) Hmm … that feels a bit circular to me. It’s a bit like saying, “If there’s no such thing as a natural proof then there’s no such thing as a natural proof.”
🙂 I don’t see it that way. The way I’d put it is, “My gut tells me that no polynomial-time algorithm can distinguish between a random low-formula-size function and a purely random function, so if I try to solve that problem by finding a suitable complexity measure, then I’m going to have to look for complexity measures that cannot be calculated in polynomial time.”
While I’m at it, I’d like to mention another problem that I find very appealing. It’s sometimes thought of as a toy version of the P versus NP problem, in that the difficulties that arise feel quite similar.
The question is this: find an explicit non-singular matrix over (where “explicit” means that it can be computed in polynomial time) such that the number of Gaussian row-operations needed to turn it into the identity is superlinear.
😐 Can you explain that in slightly more detail?
🙂 OK. Over any field, any non-singular matrix can be turned into the identity by a combination of three operations: interchanging two rows, adding one row to another, and multiplying a row by a non-zero scalar. A simple counting argument shows that the number of row operations you need to turn a random non-singular matrix over into the identity is at least . (Sketch: the number of non-singular matrices is not far off , and the number of sequences of at most row-operations is at most , since the number of distinct row-operations is at most . Putting these two facts together and using the fact that row-operations are invertible, one gets the result after an easy computation.) But it is very hard to prove that any specific matrix needs more than, say, operations, though it is easy to define matrices that at least appear to need more than this.
😐 Thanks, that was useful.
🙂 The point I wanted to make about this problem is that it too may be difficult for a formal reason. Suppose it is the case that there is no polynomial-time algorithm that can tell the difference between a purely random matrix and a matrix that is produced from the identity by applying to it a random sequence of row-operations. I don’t have any evidence in favour of that supposition, but neither can I think of some sophisticated algorithm for row-reducing a matrix that will always find a short sequence of operations if such a sequence exists. So I find the supposition at least moderately plausible.
If it is true, then there is no natural proof that some specific matrix cannot be row-reduced using a linear number of operations. Here, a natural proof would be a polynomially computable property of matrices that all products of linearly many elementary matrices had, but that some explicit matrix lacked.
8) OK, thanks for that. You seem to have become a sort of optimistic pessimist.
🙂 I see what you mean. But actually I’m a bit depressed by all this.
8) Well, there are some people, including Razborov and Rudich in their original paper, who maintain that one shouldn’t necessarily react that way. If somebody shows you that a certain type of argument cannot work, then it should narrow down the search space and make finding a proof easier. So one can see the Razborov-Rudich argument as a very helpful hint rather than as a brick wall.
🙂 Yes, I like that. The trouble is that right now I can’t see how to get into this nice area that lies between natural proofs and useless universal proofs. I think we’d better stop this conversation for now, but I’ll be back.
8) I’m looking forward to it.