Introduction
The purpose of this post is to add some rigour to what I wrote in the previous post, and in particular to the subsection entitled “Why should we believe that the set of easily computable functions is a ‘random-like’ set?” There I proved that if the Rubik’s-cube-like problem is as hard as it looks, then there can be no polynomial-time-computable property that distinguishes between a random composition of 3-bit scramblers and a purely random Boolean function. This implies that there can be no polynomial-time-computable “simplicity” property that is satisfied by all Boolean functions of circuit complexity at most
that is not satisfied by almost all Boolean functions.
I personally find the assumption that the Rubik’s-cube-like problem is hard very plausible. However, if you disagree with me, then I don’t have much more I can say (though see Boaz Barak’s first comment on the previous post). What Razborov and Rudich did was to use a different set of random polynomial-time-computable functions that has a better theoretical backing. They build them out of a pseudorandom function generator, which in turn is built out of a pseudorandom generator, which is known to exist if the discrete logarithm problem is hard. And the discrete logarithm problem is hard if factorizing large integers is hard. Since many people have tried hard to find an algorithm for factorizing large integers, there is some quite strong empirical evidence for this problem’s being hard. It’s true that there are also people who think that it is not hard, but the existence of a pseudorandom generator does not depend on the hardness of factorizing. Perhaps a more significant advantage of the Razborov-Rudich argument is that any pseudorandom generator will do. So the correctness of their conclusion is based on a weaker hypothesis than the one I used earlier.
Pseudorandom generators
It’s time I said in more detail what a pseudorandom generator is. Suppose you have a Boolean function , with
. Then you have two obvious probability distributions on
. The first is just the uniform distribution, which we can think of as choosing a random 01-string of length
. The second is obtained by choosing an element uniformly at random from
and applying the function
. This we can think of as a pseudorandom 01-string of length
. The idea is that if
mixes things up sufficiently, then there is no efficient algorithm that will give significantly different results when fed a purely random 01-string and a pseudorandom 01-string.
We can be slightly more formal about this as follows. Suppose is a Boolean function. Define
to be the probability that
when
is chosen randomly from
. Define
to be the probability that
when
is chosen randomly from
. We say that
is an
–hard pseudorandom generator if
whenever
can be computed in time
.
It may look a little strange that appears twice there. Shouldn’t one talk about a
-hard pseudorandom generator, where the number of steps is
and the difference in the probabilities is at most
? The reason for setting
equal to
is that, up to a polynomial, it is the only interesting value, for the following reason. Suppose that the difference in the probabilities is
. Then if we run the algorithm
times, the difference in the expected number of times we get 1 is
. If that is significantly bigger than
, then the probability that the difference in the actual number of times we get a 1 is not at least
will be small, so we can detect the difference between the two with high probability by counting how many 1s each one gives. This happens when
is proportional to
. Speaking a little roughly, if the probabilities differ by
, then you need at least
runs of the experiment and at most
runs to tell the difference between random and pseudorandom, where
and
are fixed polynomial functions. Since running the experiment
times doesn’t affect the complexity of the detection process by more than a polynomial amount when
depends polynomially on
, we might as well set
: if you prefer a bigger
you can get it by repeating the experiment, and there is nothing to be gained from a smaller
since the difference between random and pseudorandom is already hard to detect.
Intuitively, a pseudorandom generator is a function from a small Boolean cube to a big one whose output “looks random”. The formal definition is making precise what “looks random” means. I took it to mean “looks random to a computer program that runs in polynomial time” but one can of course use a similar definition for any model of computation, or indeed any class whatever of potential distinguishing functions. If no function in that class can distinguish between random functions and images of
with reasonable probability, then
is pseudorandom for that class.
Pseudorandom function generators
A pseudorandom generator produces a small subset of
(technically it’s a multiset, but this isn’t too important) with the property that it is hard to distinguish between a random string in
and a purely random string. However, sometimes we want more than this. For example, sometimes we would like to find a function from
to
that “looks random”. We could of course think of such a function as a 01 string of length
(that is, as a list of the values taken by the function at each point of
) and use a pseudorandom generator to generate it, but that will typically be very inefficient.
Here is a much better method, which was devised by Goldreich, Goldwasser and Micali. (Their paper is here.) Let be a pseudorandom generator. (Later I’ll be more precise about how hard it needs to be.) We can and will think of this as a pair of functions
, each one from
to
. If we are now given a string
, we can use it and the functions
and
to define a function
. We simply take the composition of
s and
s that correspond to
. That is,
. To put that another way, you use the digits of
to decide which of
and
to apply. For instance, if
, then you apply
, then
, then
, then
, then
.
What we actually want is a function to , but if we take the first digit then we’ve got one. We can think of it as a function of two variables: given
and
, then
equals the first digit of
. But a function of two variables can also be thought of as a collection of functions of one variable in two different ways. We’ve thought so far of
as indexing functions and
as being the argument, but now let’s switch round: we’ll take
as the index and
as the argument. That is, let’s write
for
, which is itself just the first digit of
.
We now have two probability distributions on the set of all functions from to
. One is just the uniform distribution — this is what we mean by a random function. The other is obtained by choosing a random string
of length
, applying the composition
to it and taking the first digit — this is what we mean by a pseudorandom function (associated with this particular construction).
Note that we are now in a very similar situation to the one we were in with 3-bit scramblers earlier. We have a small bunch of efficiently computable functions — the functions — and it is hard to distinguish between those and entirely random functions. But now we shall be able to prove that it is hard, subject to widely believed hardness hypotheses. Also, even if you don’t believe those hypotheses, the reduction to them is interesting.
How easily can be computed? Well, given
, we have to calculate the result of applying to
a composition of
functions, each of which is a polynomial-time-computable function from
to
. So the number of steps we need is
for some polynomial
. This is polynomial in
provided that
is polynomial in
. Accordingly, we take
for some large constant
.
The idea now is to show that if we can distinguish between a random function of the form and a genuinely random function, then the pseudorandom generator
is not after all very hard: in fact, it will have hardness at most
, which is substantially less than exponential in
. Since the pseudorandom generator was arbitrary, this will show that no pseudorandom generator of that hardness exists.
By the way, let me draw attention to the parts of this proof that have always caused me difficulty (though I should say again that it’s the kind of difficulty that can be overcome if one is sufficiently motivated to do so — I’ve just been lazy about it up to now). The first is the point about the roles of the two variables and
above and the way those roles switch round. Another is a wrong argument that has somehow made me feel that what is going on must be subtler than it actually is. That argument is that a pseudorandom generator is a function defined on
, so its hardness is a reasonable function of
, while the kind of pseudorandomness we’re interested in takes place at the level of Boolean functions defined on
, which have domains of size
, so breaking those in polynomial time in
will surely have no bearing on the far smaller function that makes the pseudorandom generator.
I didn’t express that wrong argument very well — necessarily, since it’s wrong — but the thing I’ve been missing is that is quite large compared with
, and we are making really quite a strong assumption about the hardness of the pseudorandom generator. Specifically, we’re not just assuming that the generator has superpolynomial (in
) hardness: we’re assuming that its hardness is at least
for some small positive constant
. That way the hardness can easily be comparable to
. So there isn’t some clever way of “dropping down a level” from subsets of
to subsets of
or anything like that.
The third thing that got in the way of my understanding the proof is connected with levels. It’s surprising how often something easy can feel hard because it is talking about, say, sets of sets of sets. Here it is important to get clear about what we’re about to discuss, which is a sequence of probability distributions of Boolean functions from to
. This shouldn’t be too frightening, since we’ve already discussed two such probability distributions: the uniform distribution and the distribution where you pick a random
and take the function
. What we’re going to do now is create, in a natural way, a sequence of
probability distributions that get gradually more and more random. That is, we’ll start with the the distribution that’s uniform over all the
, and step by step we’ll introduce more randomness into the picture until after
steps we’ll have the uniform distribution over all functions from
to
.
I’m going to describe the sequence of distributions in a slightly different way from the way that Razborov and Rudich describe it. (In particular, their distributions start with the uniform distribution and get less and less random.) I don’t claim any particular advantage for my tiny reformulation — I just find it slightly easier. I also feel that if I’ve reworked something in even a minor way then it’s a sign that I understand it, so to a large extent I’m doing it for my own benefit.
First of all, let us take the set of binary sequences of length less than or equal to . These
sequences form a binary tree
if we join each sequence to the two sequences you get by appending a 0 and a 1. Let us take a sequence of trees
, where
consists just of the root of
(that is, the empty sequence) and
is the full tree
, and let us do so in such a way that each
is obtained from
by finding a leaf and adding its children: that is, picking a sequence
in
that is not a sequence of length
and is not contained in any other sequence in
and adding the sequences
and
. It is not hard to see that this can be done, and since we add two sequences at a time and get from the tree that consists just of the empty sequence to the tree of all
binary sequences of length at most
, there are
trees in this sequence of trees.
Given a tree , we create a probability distribution on the set of functions from
as follows. For every
we let
be maximal such that the subsequence
belongs to
. We then take a random
, apply the composition
, and take the first digit of the result. If
then we interpret this to mean that we simply pick a random
. An important point to be clear about here is that the random points
do not depend on
. So what I should really have said is that for each vertex
of
we pick a random
. Then for each
we find the maximal subsequence
that belongs to
, apply to
the composition
, and pass to the first digit. If
we interpret the composition as the identity function, so we simply take
.
Note that if , then this is just applying the composition
to
and taking the first digit, which is exactly what it means to take a random function of the form
. Note also that if
, then for every
all we do is take
, which is another way of saying that whatever
is, we choose a random
and take its first digit, which of course gives us a function from
to
chosen uniformly at random. So it really is the case that the first distribution in our sequence is uniform over all
and the last distribution is uniform over all functions from
to
.
Now let’s think about the difference between the distribution that comes from and the distribution that comes from
. Let
and
be the binary sequences that belong to
but not to
. Let us also write
for the random element of
associated with any given 01-sequence
. Let
be the length of
. Then the one thing we do differently when evaluating the random image of
is this. If
has
as an initial segment, then instead of evaluating the first digit of
we evaluate the first digit of
. If
does not have
as an initial segment, then nothing changes.
Note that the first of these evaluations can be described as the first digit of . The basic idea now is that if we can distinguish between that and
, then we can distinguish between
and
. But
is a purely random sequence in
whereas
is a random output from the pseudorandom generator.
Let us now remind ourselves what we are trying to prove. Suppose that is a simplicity property that can be computed in time
(which is another way of saying that it can be computed in time that is polynomial in
). By “simplicity property” I mean that
holds whenever
has circuit complexity at most
, where
is the polynomial function described earlier, and
does not hold for almost all functions. Actually, we can be quite generous in our interpretation of the latter statement: we shall assume that if
is a purely random function, then
holds with probability at most
.
If has those two properties, then
holds for every
, and therefore
I wrote there to mean the probability when
is chosen uniformly at random, and
to mean the probability when
is chosen uniformly at random.
Let me now write for the probability distribution associated with
. From the above inequality and the fact that there are
of these distributions it follows that there exists
such that
That is, the probability that holds if you choose a function randomly using the
st distribution is greater by
than the probability if you use the
th distribution.
What we would like to show now is that this implies that the hardness of the pseudorandom generator is at most
. To do that, we condition on the values of
for all sequences
other than
,
and
. (Recall that
was defined to be the unique sequence such that
and
belong to
but not to
.) By averaging, there must be some choice of all those sequences
such that, conditioned on that choice, we still have
We now have a way of breaking the pseudorandom generator . Suppose we are given a sequence
and want to guess whether it is a random sequence of the form
(with
chosen uniformly from
) or a purely random element of
. We create a function from
to
as follows. For each
, let
be the maximal initial segment of
that belongs to
. If
is not equal to
, then take the first digit of
, where
is the length of
and
is the fixed sequence from the set of sequences on which we have conditioned. If
and has length
, then apply
to the left half of
if the next digit of
is 0 and to the right half of
if it is 1. Then take the first digit of the result.
If is a random sequence of the form
, then what we are doing is choosing a random
and taking the first digit of
. Therefore, we are choosing a random function
according to the distribution
, conditioned on the choices of
. If on the other hand
is a purely random sequence in
, then we are choosing a random function
according to the distribution
under the same conditioning. Since the probabilities that
holds differ by at least
and
can (by hypothesis) be computed in time
, it follows that the hardness of
is at most
.
Since for an arbitrarily small constant
, it follows that if there is a polynomial-time-computable property that distinguishes between random and pseudorandom functions from
to
, then no pseudorandom generator from
to
can have hardness greater than
.
A small remark to make at this point is that the hardness of the generator needs to be defined in terms of circuit complexity for this argument to work. Basically, this is because it is not itself that we are using to distinguish between random and pseudorandom sequences but a function that is created out of
(using in particular lots of restrictions of the random
) in a not necessarily uniform way. So even if
can be computed in polynomial time, it does not follow that there is an algorithm (as opposed to circuit) that will break the generator in time
.
Can the “backwards” argument be given a similar foundation?
Recall that earlier I proposed a way of getting round the natural-proofs barrier and proceeded to argue that it almost certainly failed, for reasons very similar to the reasons for the natural-proofs barrier itself. The question I would like to consider here is whether that argument can be made to rely on the hardness of factorizing rather than on the hardness of a problem based on 3-bit scramblers that does not, as far as I know, connect to the main body of hardness assumptions that are traditionally made in theoretical computer science.
Here is an informal proposal for doing so. Let , let
, identify the points of
with graphs on
(labelled) vertices in some obvious way and let
take the value 1 if the corresponding graph contains a clique of size
and 0 otherwise. Also, let
be the function that is 1 if the
th edge belongs to the graph and 0 otherwise. Those functions are chosen so that the function
is (trivially) an injection.
Now there is a concept in theoretical computer science called a pseudorandom permutation from to
. I’ll define it properly in a moment, but for now it’s enough to say that if you try to invent the definition for yourself, you’ll probably get it right. Roughly, however, it’s a permutation of
that depends on a randomly chosen string in
for some suitable
and is hard to distinguish from a purely random permutation of
. Importantly, pseudorandom permutations exist if pseudorandom functions exist (I’ll discuss this too), as was shown by Luby and Rackoff.
So let’s compose the function with a pseudorandom permutation
, obtaining a function
.
Actually, one thing you might not guess if you try to define a pseudorandom permutation is that the permutation and its inverse should both be efficiently computable functions. Because of that, if we are provided the values of
for a graph
, we can easily tell whether or not
contains a clique of size
: we just compute
, which gives us
, and look at the first digit.
Now let’s suppose that we have a progress-towards-cliques property that takes the value 1 for a sequence of functions
if, given
it is easy to determine whether
contains a clique of size
, and suppose that
does not apply to almost all sequences of functions. That is, let us suppose that if
is a purely random sequence of functions (subject to the condition that the resulting function to
is an injection) then the probability that
satisfies
is at most
.
Next, suppose that we have a permutation and we want to guess whether it has been chosen randomly or pseudorandomly. Composing it with the function
and applying
to the resulting function, we get 1 if
has been chosen pseudorandomly. Note that to do this calculation we need at most
steps to calculate
and, if
is polynomial-time computable, at most
steps to determine whether
holds. So if
has hardness at least
, it follows that the probability that a random injection
yields functions
that satisfy
is at least
. For sufficiently large
, this is a contradiction.
I’m fairly confident that I can make the above argument precise and rigorous, but it may be a known result, or folklore, or uninteresting for some reason I haven’t noticed. If anyone who knows what they are talking about thinks it’s worth my turning it into a note and at least putting it on the arXiv, then I’m ready to do that, but perhaps it is better left in its current slightly informal state.
October 8, 2013 at 10:34 am |
[…] source: Razborov and Rudich’s natural proofs argument (about P vs NP) […]
October 8, 2013 at 10:46 am |
[…] Razborov and Rudich’s natural proofs argument (about P vs NP) Source: https://gowers.wordpress.com/2013/10/07/razborov-and-rudichs-natural-proofs-argument/ 0 […]
October 8, 2013 at 10:51 am |
[…] via Hacker News https://gowers.wordpress.com/2013/10/07/razborov-and-rudichs-natural-proofs-argument/ […]
October 11, 2013 at 3:08 am |
still “puzzling” over your scrambler construction. again, great topic, glad you’re returning to it. a few ideas.
– isnt this staring us in the face? could it be that the scramber composition is itself actually a complexity measuring function exactly in the sense of the razborov-rudich paper? ie the amount of 3-bit scrambling required to compute a function is a measure of its complexity?
– suppose that it was a complexity measuring function, what would a proof look like that it either fits into the natural proof framework and therefore cannot prove P!=NP, or does not, ie could potentially prove P!=NP?
– think the lessons of the paper are being misconstrued. could it possibly be that there are many nearly equivalent complexity-measuring functions, its just that its so hard to apply them to problems/proofs?
– my pov, razborov himself gave a complexity-measuring function mechanism/framework in his famous separation result demonstrating exponential size circuit lower bound for monotone clique function!
– where is the proof that it cannot be generalized? yes, have heard of his paper ruling that out, but who has carefully walked through and analyzed that proof? imho, its undeservedly obscure, a critical pivot, and likely does not really rule out a construction that could work for monotone circuits….
– could it actually be easy to create complexity-measuring functions that dont fall into the natural proofs framework? could such constructions already exist? but maybe the problem is that not a single one has been proven to be one, but as soon as a single one is discovered/proved, many will be found to be equivalent?
– have myself come up with what I suspect is a complexity-measuring function based very closely/directly on razborovs proof exponential monotone lower bound proof [by approximations]…. the details of such a proof are tricky…. looking for volunteer/collaborators…. unfortunately, frankly, seriously, after quite a few months searching, there are likely only a few ppl in the world qualified, capable, or motivated [if any]….
outline for a NP vs P/poly proof based on monotone circuits, hypergraphs, factoring, and slice functions
October 11, 2013 at 3:54 pm |
another thought. the natural proofs paper looks at a “natural” property of functions that can separate P/NP. however, think a little about this. how can the property be measured? it must be essentially an algorithm taking some kind of input, ie a function. algorithms are TMs, so say it is
. but again the function is not for a fixed input width, therefore likely it would have to be specified again in some kind of encoding; the only “natural” encoding for such an input would be again an encoded algorithm, eg a TM specification/encoding of a TM
, say
. so then the problem is, does there exist a machine
that can determine whether (iff)
runs in P time. but that is undecidable by Rices thm!
therefore, maybe the way to go is not to look for a binary property determining separation between P/NP as the RR proof is focused on. maybe the only “complexity measuring functions” that can possibly succeed are continuous measurements, and even a “magic threshhold” for such measurements must be impossible. such a complexity measuring function is, “how many gates are required to compute this function”. that measuring function seems to be nearly impossible to work/compute directly. therefore are there analogs of that which exist which can be computed? Razborov himself has shown the answer to this is yes in his own exponential monotone circuit proof (for the clique function)….
October 11, 2013 at 6:52 pm |
wild synchronicity, a bit.. spooky action at a distance? look what just showed up on arxiv: A quantum lower bound for distinguishing random functions from
random permutations by Yuen…. formally, seems rather similar to what you’re researching [except for the QM angle twist!]…. so maybe some of the cutting edge work in this area is in qm computation.
would anyone else be interested in further cyber discussion/a stackexchange chat room study group on these subjects of foremost significance? the rooms work well for ongoing/longterm discussions …. plz reply on my blog if interested
October 14, 2013 at 3:06 pm |
Perhaps we need to consider the problem of separating the classes P and NP logically from a finitary arithmetical—rather than computational—perspective.
Specifically, one which is based on explicitly highlighting that the assignment of satisfaction and truth values to number-theoretic formulas under an interpretation can be constructively defined in two distinctly different ways: (a) in terms of algorithmic verifiability, and (b) in terms of algorithmic computability.
From this perspective, standard interpretations of the formal definitions of the classes P and NP can be seen to implicitly assume that every propositional formula which is algorithmically verifiable is necessarily algorithmically computable; which implies that the differentiation between the classes P and NP is only quantitative, and can therefore be adequately expressed in terms of computational complexity.
This could be why such interpretations seem dead-ended, since we can argue that there are classically defined arithmetic formulas which are algorithmically verifiable but not algorithmically computable; and so the differentiation between the classes P and NP is also qualitative, and cannot be adequately expressed in terms of only computational complexity.
Specifically, we can show how Gödel’s
-function uniquely corresponds some real numbers to algorithmically verifiable arithmetical formulas that are not algorithmically computable.
If we could locate one such formula, it would be a genuinely random, and not pseudo-random, number generator!
October 14, 2013 at 3:17 pm |
My apologies: Incomplete Link
Perhaps we need to consider the problem of separating the classes P and NP logically from a finitary arithmetical perspective, which is based on explicitly highlighting that the assignment of satisfaction and truth values to number-theoretic formulas under an interpretation can be constructively defined in two distinctly different ways: (a) in terms of algorithmic verifiability, and (b) in terms of algorithmic computability.
From this perspective, standard interpretations of the formal definitions of the classes P and NP can be seen to implicitly assume that every propositional formula which is algorithmically verifiable is necessarily algorithmically computable; whence the differentiation between the classes P and NP is only quantitative, and can therefore be adequately expressed in terms of computational complexity.
This could be why such interpretations seem dead-ended, since we can argue that there are classically defined arithmetic formulas which are algorithmically verifiable but not algorithmically computable; and so the differentiation between the classes P and NP is also qualitative, and cannot be adequately expressed in terms of only computational complexity.
Specifically, we can show how Gödel’s
-function uniquely corresponds some real numbers to algorithmically verifiable arithmetical formulas that are not algorithmically computable.
If we could locate one such, it would therefore be a genuinely random, and not pseudo-random, number generator!
October 16, 2013 at 2:28 am |
another interesting, apparently-open question, somewhat in the direction of TChows work: ~2 decades after natural proofs, are there proofs in complexity theory that are distinctly not natural proofs? none seem to have been identified anywhere, but maybe its not that they are nonexistent but just that they’re hard to identify…. could there be some way to recast the “natural proof” framework that makes “un”natural proof structures easier to spot/recognize/analyze? personally I find the probabilistic statement a bit awkward and suggest there may be a somewhat more “natural” way to refer to the problem in terms of “errors” (mismatches) between two functions using the hamming distance concept.
October 18, 2013 at 11:22 am |
[…] to 0: I am not alone! See for instance this blogpost of Timothy Gower’s representing one extreme; and this plaintive appeal at the […]
July 10, 2015 at 4:37 pm |
[…] 16. Razborov and Rudich’s natural proofs argument | Gowers’s Weblog […]
August 15, 2017 at 5:40 am |
[…] Timothy Gowers, Razborov and Rudich’s natural proofs argument, 7 October […]