## Razborov and Rudich’s natural proofs argument

### Introduction

The purpose of this post is to add some rigour to what I wrote in the previous post, and in particular to the subsection entitled “Why should we believe that the set of easily computable functions is a ‘random-like’ set?” There I proved that if the Rubik’s-cube-like problem is as hard as it looks, then there can be no polynomial-time-computable property that distinguishes between a random composition of $n^k$ 3-bit scramblers and a purely random Boolean function. This implies that there can be no polynomial-time-computable “simplicity” property that is satisfied by all Boolean functions of circuit complexity at most $n^k$ that is not satisfied by almost all Boolean functions.

I personally find the assumption that the Rubik’s-cube-like problem is hard very plausible. However, if you disagree with me, then I don’t have much more I can say (though see Boaz Barak’s first comment on the previous post). What Razborov and Rudich did was to use a different set of random polynomial-time-computable functions that has a better theoretical backing. They build them out of a pseudorandom function generator, which in turn is built out of a pseudorandom generator, which is known to exist if the discrete logarithm problem is hard. And the discrete logarithm problem is hard if factorizing large integers is hard. Since many people have tried hard to find an algorithm for factorizing large integers, there is some quite strong empirical evidence for this problem’s being hard. It’s true that there are also people who think that it is not hard, but the existence of a pseudorandom generator does not depend on the hardness of factorizing. Perhaps a more significant advantage of the Razborov-Rudich argument is that any pseudorandom generator will do. So the correctness of their conclusion is based on a weaker hypothesis than the one I used earlier.

#### Pseudorandom generators

It’s time I said in more detail what a pseudorandom generator is. Suppose you have a Boolean function $\phi:\{0,1\}^k\to\{0,1\}^n$, with $k. Then you have two obvious probability distributions on $\{0,1\}^n$. The first is just the uniform distribution, which we can think of as choosing a random 01-string of length $n$. The second is obtained by choosing an element uniformly at random from $\{0,1\}^k$ and applying the function $\phi$. This we can think of as a pseudorandom 01-string of length $n$. The idea is that if $\phi$ mixes things up sufficiently, then there is no efficient algorithm that will give significantly different results when fed a purely random 01-string and a pseudorandom 01-string.

We can be slightly more formal about this as follows. Suppose $f:\{0,1\}^n\to\{0,1\}$ is a Boolean function. Define $P(f)$ to be the probability that $f(x)=1$ when $x$ is chosen randomly from $\{0,1\}^n$. Define $P_\phi(f)$ to be the probability that $f(\phi(y))=1$ when $y$ is chosen randomly from $\{0,1\}^k$. We say that $\phi$ is an $M$-hard pseudorandom generator if $|P(f)-P_\phi(f)|\leq M^{-1}$ whenever $f$ can be computed in time $M$.

It may look a little strange that $M$ appears twice there. Shouldn’t one talk about a $(\delta,M)$-hard pseudorandom generator, where the number of steps is $M$ and the difference in the probabilities is at most $\delta$? The reason for setting $\delta$ equal to $M^{-1}$ is that, up to a polynomial, it is the only interesting value, for the following reason. Suppose that the difference in the probabilities is $\delta$. Then if we run the algorithm $t$ times, the difference in the expected number of times we get 1 is $t\delta$. If that is significantly bigger than $\sqrt{t}$, then the probability that the difference in the actual number of times we get a 1 is not at least $t\delta/2$ will be small, so we can detect the difference between the two with high probability by counting how many 1s each one gives. This happens when $t$ is proportional to $\delta^{-2}$. Speaking a little roughly, if the probabilities differ by $\delta$, then you need at least $P_1(\delta^{-1})$ runs of the experiment and at most $P_2(\delta^{-1})$ runs to tell the difference between random and pseudorandom, where $P_1$ and $P_2$ are fixed polynomial functions. Since running the experiment $t$ times doesn’t affect the complexity of the detection process by more than a polynomial amount when $t$ depends polynomially on $M$, we might as well set $\delta=M^{-1}$: if you prefer a bigger $\delta$ you can get it by repeating the experiment, and there is nothing to be gained from a smaller $\delta$ since the difference between random and pseudorandom is already hard to detect.

Intuitively, a pseudorandom generator $\phi:\{0,1\}^k\to\{0,1\}^n$ is a function from a small Boolean cube to a big one whose output “looks random”. The formal definition is making precise what “looks random” means. I took it to mean “looks random to a computer program that runs in polynomial time” but one can of course use a similar definition for any model of computation, or indeed any class whatever of potential distinguishing functions. If no function in that class can distinguish between random functions and images of $\phi$ with reasonable probability, then $\phi$ is pseudorandom for that class.

#### Pseudorandom function generators

A pseudorandom generator produces a small subset $X$ of $\{0,1\}^n$ (technically it’s a multiset, but this isn’t too important) with the property that it is hard to distinguish between a random string in $X$ and a purely random string. However, sometimes we want more than this. For example, sometimes we would like to find a function from $\{0,1\}^k$ to $\{0,1\}^k$ that “looks random”. We could of course think of such a function as a 01 string of length $k.2^k$ (that is, as a list of the values taken by the function at each point of $\{0,1\}^k$) and use a pseudorandom generator to generate it, but that will typically be very inefficient.

Here is a much better method, which was devised by Goldreich, Goldwasser and Micali. (Their paper is here.) Let $\phi:\{0,1\}^k\to\{0,1\}^{2k}$ be a pseudorandom generator. (Later I’ll be more precise about how hard it needs to be.) We can and will think of this as a pair of functions $(\phi_0,\phi_1)$, each one from $\{0,1\}^k$ to $\{0,1\}^k$. If we are now given a string $x\in\{0,1\}^n$, we can use it and the functions $\phi_0$ and $\phi_1$ to define a function $\phi_x:\{0,1\}^k\to\{0,1\}^k$. We simply take the composition of $\phi_0$s and $\phi_1$s that correspond to $x$. That is, $\phi_x=\phi_{x_n}\circ\dots\circ\phi_{x_1}$. To put that another way, you use the digits of $x$ to decide which of $\phi_0$ and $\phi_1$ to apply. For instance, if $x=(0,1,1,0,1)$, then you apply $\phi_0$, then $\phi_1$, then $\phi_1$, then $\phi_0$, then $\phi_1$.

What we actually want is a function to $\{0,1\}$, but if we take the first digit then we’ve got one. We can think of it as a function of two variables: given $x\in\{0,1\}^n$ and $y\in\{0,1\}^k$, then $\psi(x,y)$ equals the first digit of $\phi_x(y)$. But a function of two variables can also be thought of as a collection of functions of one variable in two different ways. We’ve thought so far of $x$ as indexing functions and $y$ as being the argument, but now let’s switch round: we’ll take $y$ as the index and $x$ as the argument. That is, let’s write $\psi_y(x)$ for $\psi(x,y)$, which is itself just the first digit of $\phi_x(y)$.

We now have two probability distributions on the set of all functions from $\{0,1\}^n$ to $\{0,1\}$. One is just the uniform distribution — this is what we mean by a random function. The other is obtained by choosing a random string $y$ of length $k$, applying the composition $\phi_x$ to it and taking the first digit — this is what we mean by a pseudorandom function (associated with this particular construction).

Note that we are now in a very similar situation to the one we were in with 3-bit scramblers earlier. We have a small bunch of efficiently computable functions — the functions $\psi_y$ — and it is hard to distinguish between those and entirely random functions. But now we shall be able to prove that it is hard, subject to widely believed hardness hypotheses. Also, even if you don’t believe those hypotheses, the reduction to them is interesting.

How easily can $\psi_y$ be computed? Well, given $x\in\{0,1\}^n$, we have to calculate the result of applying to $y$ a composition of $n$ functions, each of which is a polynomial-time-computable function from $\{0,1\}^k$ to $\{0,1\}^k$. So the number of steps we need is $nP(k)$ for some polynomial $P$. This is polynomial in $n$ provided that $k$ is polynomial in $n$. Accordingly, we take $k=n^\alpha$ for some large constant $\alpha$.

The idea now is to show that if we can distinguish between a random function of the form $\psi_y$ and a genuinely random function, then the pseudorandom generator $\phi$ is not after all very hard: in fact, it will have hardness at most $2^{O(n)}$, which is substantially less than exponential in $k$. Since the pseudorandom generator was arbitrary, this will show that no pseudorandom generator of that hardness exists.

By the way, let me draw attention to the parts of this proof that have always caused me difficulty (though I should say again that it’s the kind of difficulty that can be overcome if one is sufficiently motivated to do so — I’ve just been lazy about it up to now). The first is the point about the roles of the two variables $x$ and $y$ above and the way those roles switch round. Another is a wrong argument that has somehow made me feel that what is going on must be subtler than it actually is. That argument is that a pseudorandom generator is a function defined on $\{0,1\}^k$, so its hardness is a reasonable function of $k$, while the kind of pseudorandomness we’re interested in takes place at the level of Boolean functions defined on $\{0,1\}^n$, which have domains of size $2^n$, so breaking those in polynomial time in $2^n$ will surely have no bearing on the far smaller function that makes the pseudorandom generator.

I didn’t express that wrong argument very well — necessarily, since it’s wrong — but the thing I’ve been missing is that $k$ is quite large compared with $n$, and we are making really quite a strong assumption about the hardness of the pseudorandom generator. Specifically, we’re not just assuming that the generator has superpolynomial (in $k$) hardness: we’re assuming that its hardness is at least $2^{k^c}$ for some small positive constant $c$. That way the hardness can easily be comparable to $2^n$. So there isn’t some clever way of “dropping down a level” from subsets of $\{0,1\}^n$ to subsets of $\{1,2,\dots,n\}$ or anything like that.

The third thing that got in the way of my understanding the proof is connected with levels. It’s surprising how often something easy can feel hard because it is talking about, say, sets of sets of sets. Here it is important to get clear about what we’re about to discuss, which is a sequence of probability distributions of Boolean functions from $\{0,1\}^n$ to $\{0,1\}^n$. This shouldn’t be too frightening, since we’ve already discussed two such probability distributions: the uniform distribution and the distribution where you pick a random $y\in\{0,1\}^k$ and take the function $\psi_y$. What we’re going to do now is create, in a natural way, a sequence of $2^{n+1}-1$ probability distributions that get gradually more and more random. That is, we’ll start with the the distribution that’s uniform over all the $\psi_y$, and step by step we’ll introduce more randomness into the picture until after $2^{n+1}-1$ steps we’ll have the uniform distribution over all functions from $\{0,1\}^n$ to $\{0,1\}$.

I’m going to describe the sequence of distributions in a slightly different way from the way that Razborov and Rudich describe it. (In particular, their distributions start with the uniform distribution and get less and less random.) I don’t claim any particular advantage for my tiny reformulation — I just find it slightly easier. I also feel that if I’ve reworked something in even a minor way then it’s a sign that I understand it, so to a large extent I’m doing it for my own benefit.

First of all, let us take the set of binary sequences of length less than or equal to $n$. These $2^{n+1}-1$ sequences form a binary tree $T$ if we join each sequence to the two sequences you get by appending a 0 and a 1. Let us take a sequence of trees $T_1,\dots,T_{2^n-1}$, where $T_1$ consists just of the root of $T$ (that is, the empty sequence) and $T_{2^n-1}$ is the full tree $T$, and let us do so in such a way that each $T_i$ is obtained from $T_{i-1}$ by finding a leaf and adding its children: that is, picking a sequence $\sigma$ in $T_{i-1}$ that is not a sequence of length $n$ and is not contained in any other sequence in $T_{i-1}$ and adding the sequences $\sigma 0$ and $\sigma 1$. It is not hard to see that this can be done, and since we add two sequences at a time and get from the tree that consists just of the empty sequence to the tree of all $2^{n+1}-1$ binary sequences of length at most $n$, there are $2^n-1$ trees in this sequence of trees.

Given a tree $T_i$, we create a probability distribution on the set of functions from $\{0,1\}^n\to\{0,1\}$ as follows. For every $x\in\{0,1\}^n$ we let $m$ be maximal such that the subsequence $(x_1,\dots,x_m)$ belongs to $T_i$. We then take a random $y\in\{0,1\}^k$, apply the composition $\phi_{x_n}\circ\dots\circ\phi_{x_{m+1}}$, and take the first digit of the result. If $m=n$ then we interpret this to mean that we simply pick a random $y\in\{0,1\}^k$. An important point to be clear about here is that the random points $y$ do not depend on $x$. So what I should really have said is that for each vertex $\sigma$ of $T_i$ we pick a random $y(\sigma)\in\{0,1\}^k$. Then for each $x$ we find the maximal subsequence $\sigma=(x_1,\dots,x_m)$ that belongs to $T_i$, apply to $y(\sigma)$ the composition $\phi_{x_n}\circ\dots\circ\phi_{x_{m+1}}$, and pass to the first digit. If $m=n$ we interpret the composition as the identity function, so we simply take $y(\sigma)$.

Note that if $i=1$, then this is just applying the composition $\phi_{x_n}\circ\dots\circ\phi_{x_1}$ to $y(\emptyset)$ and taking the first digit, which is exactly what it means to take a random function of the form $\psi_y$. Note also that if $i=2^{n+1}-1$, then for every $x$ all we do is take $y(x)$, which is another way of saying that whatever $x$ is, we choose a random $y$ and take its first digit, which of course gives us a function from $\{0,1\}^n$ to $\{0,1\}$ chosen uniformly at random. So it really is the case that the first distribution in our sequence is uniform over all $\psi_y$ and the last distribution is uniform over all functions from $\{0,1\}^n$ to $\{0,1\}$.

Now let’s think about the difference between the distribution that comes from $T_i$ and the distribution that comes from $T_{i+1}$. Let $\sigma 0$ and $\sigma 1$ be the binary sequences that belong to $T_{i+1}$ but not to $T_i$. Let us also write $y(\rho)$ for the random element of $\{0,1\}^k$ associated with any given 01-sequence $\rho$. Let $m$ be the length of $\sigma$. Then the one thing we do differently when evaluating the random image of $x$ is this. If $x$ has $\sigma$ as an initial segment, then instead of evaluating the first digit of $\phi_{x_n}\circ\dots\circ\phi_{x_{m+1}}(y(\sigma))$ we evaluate the first digit of $\phi_{x_n}\circ\dots\circ\phi_{x_{m+2}}(y(\sigma x_{m+1}))$. If $x$ does not have $\sigma$ as an initial segment, then nothing changes.

Note that the first of these evaluations can be described as the first digit of $\phi_{x_n}\circ\dots\circ\phi_{x_{m+2}}(\phi_{x_{m+1}}(y(\sigma)))$. The basic idea now is that if we can distinguish between that and $\phi_{x_n}\circ\dots\circ\phi_{x_{m+2}}(y(\sigma x_{m+1}))$, then we can distinguish between $(\phi_0(y(\sigma)),\phi_1(y(\sigma)))$ and $(y(\sigma 0),y(\sigma 1))$. But $(y(\sigma 0),y(\sigma 1))$ is a purely random sequence in $\{0,1\}^k$ whereas $(\phi_0(y(\sigma)),\phi_1(y(\sigma)))$ is a random output from the pseudorandom generator.

Let us now remind ourselves what we are trying to prove. Suppose that $S$ is a simplicity property that can be computed in time $2^{O(n)}$ (which is another way of saying that it can be computed in time that is polynomial in $2^n$). By “simplicity property” I mean that $S(f)$ holds whenever $f:\{0,1\}^n\to\{0,1\}$ has circuit complexity at most $m$, where $m=nP(k)$ is the polynomial function described earlier, and $S(f)$ does not hold for almost all functions. Actually, we can be quite generous in our interpretation of the latter statement: we shall assume that if $f$ is a purely random function, then $S(f)$ holds with probability at most $1-2^{-O(n)}$.

If $S$ has those two properties, then $S(\phi_y)$ holds for every $y\in\{0,1\}^k$, and therefore

$\displaystyle P_y(S(\phi_y))-P_f(S(f))\geq 2^{-O(1)}.$

I wrote $P_y$ there to mean the probability when $y$ is chosen uniformly at random, and $P_f$ to mean the probability when $f$ is chosen uniformly at random.

Let me now write $P_i$ for the probability distribution associated with $T_i$. From the above inequality and the fact that there are $2^n-1$ of these distributions it follows that there exists $i$ such that

$\displaystyle P_{i+1}(S(f))-P_i(S(f))\geq 2^{-O(1)}.$

That is, the probability that $S$ holds if you choose a function randomly using the $i+1$st distribution is greater by $2^{-O(1)}$ than the probability if you use the $i$th distribution.

What we would like to show now is that this implies that the hardness of the pseudorandom generator $\phi$ is at most $2^{O(1)}$. To do that, we condition on the values of $y(\tau)$ for all sequences $\tau$ other than $\sigma$, $\sigma 0$ and $\sigma 1$. (Recall that $\sigma$ was defined to be the unique sequence such that $\sigma 0$ and $\sigma 1$ belong to $T_{i+1}$ but not to $T_i$.) By averaging, there must be some choice of all those sequences $y(\tau)$ such that, conditioned on that choice, we still have

$\displaystyle P_{i+1}(S(f))-P_i(S(f))\geq 2^{-O(1)}.$

We now have a way of breaking the pseudorandom generator $\phi$. Suppose we are given a sequence $z\in\{0,1\}^{2k}$ and want to guess whether it is a random sequence of the form $(\phi_0(y),\phi_1(y))$ (with $y$ chosen uniformly from $\{0,1\}^k$) or a purely random element of $\{0,1\}^{2k}$. We create a function from $\{0,1\}^n$ to $\{0,1\}$ as follows. For each $x$, let $\tau$ be the maximal initial segment of $x$ that belongs to $T_i$. If $\tau$ is not equal to $\sigma$, then take the first digit of $\phi_{x_n}\circ\dots\circ\phi_{x_{r+1}}(y(\tau))$, where $r$ is the length of $\tau$ and $y(\tau)$ is the fixed sequence from the set of sequences on which we have conditioned. If $\tau=\sigma$ and has length $m$, then apply $\phi_{x_n}\circ\dots\circ\phi_{x_{m+2}}$ to the left half of $z$ if the next digit of $x$ is 0 and to the right half of $z$ if it is 1. Then take the first digit of the result.

If $z$ is a random sequence of the form $(\phi_0(y),\phi_1(y))$, then what we are doing is choosing a random $y$ and taking the first digit of $\phi_{x_n}\circ\dots\circ\phi_{x_{m+1}}(y)$. Therefore, we are choosing a random function $f$ according to the distribution $P_i$, conditioned on the choices of $y(\tau)$. If on the other hand $z$ is a purely random sequence in $\{0,1\}^{2k}$, then we are choosing a random function $f$ according to the distribution $P_{i+1}$ under the same conditioning. Since the probabilities that $S$ holds differ by at least $2^{-O(1)}$ and $S$ can (by hypothesis) be computed in time $2^{O(n)}$, it follows that the hardness of $\phi$ is at most $2^{O(n)}$.

Since $n=k^c$ for an arbitrarily small constant $c$, it follows that if there is a polynomial-time-computable property that distinguishes between random and pseudorandom functions from $\{0,1\}^n$ to $\{0,1\}$, then no pseudorandom generator from $\{0,1\}^k$ to $\{0,1\}^{2k}$ can have hardness greater than $2^{k^\epsilon}$.

A small remark to make at this point is that the hardness of the generator needs to be defined in terms of circuit complexity for this argument to work. Basically, this is because it is not $S$ itself that we are using to distinguish between random and pseudorandom sequences but a function that is created out of $S$ (using in particular lots of restrictions of the random $y(\tau)$) in a not necessarily uniform way. So even if $S$ can be computed in polynomial time, it does not follow that there is an algorithm (as opposed to circuit) that will break the generator in time $2^{O(n)}$.

### Can the “backwards” argument be given a similar foundation?

Recall that earlier I proposed a way of getting round the natural-proofs barrier and proceeded to argue that it almost certainly failed, for reasons very similar to the reasons for the natural-proofs barrier itself. The question I would like to consider here is whether that argument can be made to rely on the hardness of factorizing rather than on the hardness of a problem based on 3-bit scramblers that does not, as far as I know, connect to the main body of hardness assumptions that are traditionally made in theoretical computer science.

Here is an informal proposal for doing so. Let $n=\binom m2$, let $r, identify the points of $\{0,1\}^n$ with graphs on $m$ (labelled) vertices in some obvious way and let $f:\{0,1\}^n\to\{0,1\}$ take the value 1 if the corresponding graph contains a clique of size $r$ and 0 otherwise. Also, let $e_i$ be the function that is 1 if the $i$th edge belongs to the graph and 0 otherwise. Those functions are chosen so that the function $(f,e_1,\dots,e_n):\{0,1\}^n\to\{0,1\}^{n+1}$ is (trivially) an injection.

Now there is a concept in theoretical computer science called a pseudorandom permutation from $\{0,1\}^s$ to $\{0,1\}^s$. I’ll define it properly in a moment, but for now it’s enough to say that if you try to invent the definition for yourself, you’ll probably get it right. Roughly, however, it’s a permutation of $\{0,1\}^s$ that depends on a randomly chosen string in $\{0,1\}^k$ for some suitable $k$ and is hard to distinguish from a purely random permutation of $\{0,1\}^s$. Importantly, pseudorandom permutations exist if pseudorandom functions exist (I’ll discuss this too), as was shown by Luby and Rackoff.

So let’s compose the function $(f,e_1,\dots,e_n):\{0,1\}^n\to\{0,1\}^{n+1}$ with a pseudorandom permutation $\pi$, obtaining a function $\theta:\{0,1\}^n\to\{0,1\}^{n+1}$.

Actually, one thing you might not guess if you try to define a pseudorandom permutation is that the permutation and its inverse should both be efficiently computable functions. Because of that, if we are provided the $n+1$ values of $\theta(G)$ for a graph $G$, we can easily tell whether or not $G$ contains a clique of size $r$: we just compute $\pi^{-1}(\theta(G))$, which gives us $(f(G),e_1(G),\dots,e_n(G))$, and look at the first digit.

Now let’s suppose that we have a progress-towards-cliques property $I$ that takes the value 1 for a sequence of functions $u_1,\dots,u_{n+1}$ if, given $u_1(G),\dots,u_{n+1}(G)$ it is easy to determine whether $G$ contains a clique of size $m$, and suppose that $I$ does not apply to almost all sequences of functions. That is, let us suppose that if $(v_1,\dots,v_{n+1})$ is a purely random sequence of functions (subject to the condition that the resulting function to $\{0,1\}^{n+1}$ is an injection) then the probability that $(v_1,\dots,v_{n+1})$ satisfies $I$ is at most $1-2^{-O(n)}$.

Next, suppose that we have a permutation $\pi:\{0,1\}^{n+1}\to\{0,1\}^{n+1}$ and we want to guess whether it has been chosen randomly or pseudorandomly. Composing it with the function $(f,e_1,\dots,e_n)$ and applying $I$ to the resulting function, we get 1 if $\pi$ has been chosen pseudorandomly. Note that to do this calculation we need at most $2^n$ steps to calculate $(f,e_1,\dots,e_n)$ and, if $I$ is polynomial-time computable, at most $2^{Cn}$ steps to determine whether $I$ holds. So if $\pi$ has hardness at least $2^{(C+1)n}$, it follows that the probability that a random injection $g:\{0,1\}^n\to\{0,1\}^{n+1}$ yields functions $g_1,\dots,g_{n+1}$ that satisfy $I$ is at least $1-2^{-(C+1)n}$. For sufficiently large $C$, this is a contradiction.

I’m fairly confident that I can make the above argument precise and rigorous, but it may be a known result, or folklore, or uninteresting for some reason I haven’t noticed. If anyone who knows what they are talking about thinks it’s worth my turning it into a note and at least putting it on the arXiv, then I’m ready to do that, but perhaps it is better left in its current slightly informal state.

### 10 Responses to “Razborov and Rudich’s natural proofs argument”

1. Razborov and Rudich’s natural proofs argument (about P vs NP) « Boardmad Says:

[…] source: Razborov and Rudich’s natural proofs argument (about P vs NP) […]

2. Razborov and Rudich’s natural proofs argument (about P vs NP) | Rocketboom Says:

[…] Razborov and Rudich’s natural proofs argument (about P vs NP) Source: http://gowers.wordpress.com/2013/10/07/razborov-and-rudichs-natural-proofs-argument/ 0    […]

3. Razborov and Rudich’s natural proofs argument (about P vs NP) | Enjoying The Moment Says:

[…] via Hacker News http://gowers.wordpress.com/2013/10/07/razborov-and-rudichs-natural-proofs-argument/ […]

4. vznvzn Says:

still “puzzling” over your scrambler construction. again, great topic, glad you’re returning to it. a few ideas.
- isnt this staring us in the face? could it be that the scramber composition is itself actually a complexity measuring function exactly in the sense of the razborov-rudich paper? ie the amount of 3-bit scrambling required to compute a function is a measure of its complexity?
- suppose that it was a complexity measuring function, what would a proof look like that it either fits into the natural proof framework and therefore cannot prove P!=NP, or does not, ie could potentially prove P!=NP?
- think the lessons of the paper are being misconstrued. could it possibly be that there are many nearly equivalent complexity-measuring functions, its just that its so hard to apply them to problems/proofs?
- my pov, razborov himself gave a complexity-measuring function mechanism/framework in his famous separation result demonstrating exponential size circuit lower bound for monotone clique function!
- where is the proof that it cannot be generalized? yes, have heard of his paper ruling that out, but who has carefully walked through and analyzed that proof? imho, its undeservedly obscure, a critical pivot, and likely does not really rule out a construction that could work for monotone circuits….
- could it actually be easy to create complexity-measuring functions that dont fall into the natural proofs framework? could such constructions already exist? but maybe the problem is that not a single one has been proven to be one, but as soon as a single one is discovered/proved, many will be found to be equivalent?
- have myself come up with what I suspect is a complexity-measuring function based very closely/directly on razborovs proof exponential monotone lower bound proof [by approximations]…. the details of such a proof are tricky…. looking for volunteer/collaborators…. unfortunately, frankly, seriously, after quite a few months searching, there are likely only a few ppl in the world qualified, capable, or motivated [if any]….
outline for a NP vs P/poly proof based on monotone circuits, hypergraphs, factoring, and slice functions

5. vznvzn Says:

another thought. the natural proofs paper looks at a “natural” property of functions that can separate P/NP. however, think a little about this. how can the property be measured? it must be essentially an algorithm taking some kind of input, ie a function. algorithms are TMs, so say it is $M(x)$. but again the function is not for a fixed input width, therefore likely it would have to be specified again in some kind of encoding; the only “natural” encoding for such an input would be again an encoded algorithm, eg a TM specification/encoding of a TM $m'(y)$, say $x_{m'}$. so then the problem is, does there exist a machine $M(x_{m'})$ that can determine whether (iff) $m'(y)$ runs in P time. but that is undecidable by Rices thm!

therefore, maybe the way to go is not to look for a binary property determining separation between P/NP as the RR proof is focused on. maybe the only “complexity measuring functions” that can possibly succeed are continuous measurements, and even a “magic threshhold” for such measurements must be impossible. such a complexity measuring function is, “how many gates are required to compute this function”. that measuring function seems to be nearly impossible to work/compute directly. therefore are there analogs of that which exist which can be computed? Razborov himself has shown the answer to this is yes in his own exponential monotone circuit proof (for the clique function)….

6. vznvzn Says:

wild synchronicity, a bit.. spooky action at a distance? look what just showed up on arxiv: A quantum lower bound for distinguishing random functions from
random permutations
by Yuen…. formally, seems rather similar to what you’re researching [except for the QM angle twist!]…. so maybe some of the cutting edge work in this area is in qm computation.
would anyone else be interested in further cyber discussion/a stackexchange chat room study group on these subjects of foremost significance? the rooms work well for ongoing/longterm discussions …. plz reply on my blog if interested

7. Bhupinder Singh Anand Says:

Perhaps we need to consider the problem of separating the classes P and NP logically from a finitary arithmetical—rather than computational—perspective.

Specifically, one which is based on explicitly highlighting that the assignment of satisfaction and truth values to number-theoretic formulas under an interpretation can be constructively defined in two distinctly different ways: (a) in terms of algorithmic verifiability, and (b) in terms of algorithmic computability.

From this perspective, standard interpretations of the formal definitions of the classes P and NP can be seen to implicitly assume that every propositional formula which is algorithmically verifiable is necessarily algorithmically computable; which implies that the differentiation between the classes P and NP is only quantitative, and can therefore be adequately expressed in terms of computational complexity.

This could be why such interpretations seem dead-ended, since we can argue that there are classically defined arithmetic formulas which are algorithmically verifiable but not algorithmically computable; and so the differentiation between the classes P and NP is also qualitative, and cannot be adequately expressed in terms of only computational complexity.

Specifically, we can show how Gödel’s $\beta$-function uniquely corresponds some real numbers to algorithmically verifiable arithmetical formulas that are not algorithmically computable.

If we could locate one such formula, it would be a genuinely random, and not pseudo-random, number generator!

8. Bhupinder Singh Anand Says:

Perhaps we need to consider the problem of separating the classes P and NP logically from a finitary arithmetical perspective, which is based on explicitly highlighting that the assignment of satisfaction and truth values to number-theoretic formulas under an interpretation can be constructively defined in two distinctly different ways: (a) in terms of algorithmic verifiability, and (b) in terms of algorithmic computability.

From this perspective, standard interpretations of the formal definitions of the classes P and NP can be seen to implicitly assume that every propositional formula which is algorithmically verifiable is necessarily algorithmically computable; whence the differentiation between the classes P and NP is only quantitative, and can therefore be adequately expressed in terms of computational complexity.

This could be why such interpretations seem dead-ended, since we can argue that there are classically defined arithmetic formulas which are algorithmically verifiable but not algorithmically computable; and so the differentiation between the classes P and NP is also qualitative, and cannot be adequately expressed in terms of only computational complexity.

Specifically, we can show how Gödel’s $\beta$-function uniquely corresponds some real numbers to algorithmically verifiable arithmetical formulas that are not algorithmically computable.

If we could locate one such, it would therefore be a genuinely random, and not pseudo-random, number generator!

9. vznvzn Says:

another interesting, apparently-open question, somewhat in the direction of TChows work: ~2 decades after natural proofs, are there proofs in complexity theory that are distinctly not natural proofs? none seem to have been identified anywhere, but maybe its not that they are nonexistent but just that they’re hard to identify…. could there be some way to recast the “natural proof” framework that makes “un”natural proof structures easier to spot/recognize/analyze? personally I find the probabilistic statement a bit awkward and suggest there may be a somewhat more “natural” way to refer to the problem in terms of “errors” (mismatches) between two functions using the hamming distance concept.

10. The PvNP Separation Problem | Foundations of Mathematics, Logic & Computability Says:

[…] to 0: I am not alone! See for instance this blogpost of Timothy Gower’s representing one extreme; and this plaintive appeal at the […]