## Injections, surjections and all that

My spies tell me that the second lecture of the Group Theory course contained a discussion of functions, and in particular bijections — for which it was necessary to prove a few results about injections and surjections. Since a good understanding of functions is essential throughout mathematics, perhaps a post on the topic would be in order. My aim in these posts is not to cover the material all over again, but rather to stress the points that you need to understand in order to be able to write proofs. So let me give a few thoughts about functions, in no particular order.

1. The range and domain matter.

Two functions are considered equal if they have the same domain and the same range, and if they do the same thing to everything in the domain. In other words, there is more to a function than what it does. For example, let’s write $\mathbb{R}$ for the set of real numbers and $\mathbb{R}_+$ for the set of non-negative real numbers, and let’s define two functions as follows.

• $f:\mathbb{R}\to\mathbb{R}$ and $f(x)=x^2$ for every $x\in\mathbb{R}$.
• $g:\mathbb{R}_+\to\mathbb{R}_+$ and $g(x)=x^2$ for every $x\in\mathbb{R}_+$.
• In a certain sense, these two functions are defined by the same mathematical process — that of taking a real number and squaring it. But the process is not a function. They are different functions because they have different domains and ranges.

Why do we insist on this point? One pretty convincing reason is that if we didn’t, then we wouldn’t be able to talk about injections, surjections and bijections. For example, the function $g$ I have just defined is a bijection, whereas the function $f$ that I defined before it was neither an injection nor a surjection.

2. Learn the definition of injection. Don’t try to reconstruct it for yourself.

I’ve lost count of the number of times I have seen attempted explanations like this for what it means to say that $f$ is an injection.

(i) $f(x)$ is always unique.

(ii) $f$ takes exactly one value for every $x$.

(iii) If $x=y$ then $f(x)=f(y)$.

Those three attempted definitions are all completely wrong. Here are four correct ones. For all three I’ll assume that $f$ is a function with domain $A$ and range $B$.

(i) Every $y\in B$ has at most one preimage. [A preimage of $y$ means an element $x\in A$ such that $f(x)=y$.]

(ii) For every $x,x'\in A$, if $f(x)=f(x')$, then $x=x'$.

(iii) For each $y\in B$ there is at most one $x\in A$ such that $f(x)=y$.

(iv) For every $x,x'\in A$, if $x\ne x'$, then $f(x)\ne f(x')$.

Of these, I recommend avoiding (iii). It’s correct, but it seems to be the definition that leads to the kind of nonsense of the first three definitions if you misremember it. Note that (ii) and (iv) are closely related: to convert (ii) into (iv) I simply replaced the statement

• $f(x)=f(x')\implies x=x'$
• inside the quantifier by its contrapositive

• $x\ne x'\implies f(x)\ne f(x')$.
• I recommend usually avoiding (iv) as well. Why? Because it is usually easier to work with the “positive” hypothesis that $f(x)=f(x')$ than it is to work with the “negative” hypothesis that $x\ne x'$.

3. How to prove that a function is an injection.

If you are asked to show that a function $f:A\to B$ is an injection, then my advice is to use definition (ii) above, by which I of course mean the correct definition (ii) rather than the incorrect one. Here’s the definition again.

• For every $x,x'\in A$, if $f(x)=f(x')$, then $x=x'$.
• How do we use that? Well, it involves a two-step process that you should program into your brain until you do it without thinking. If you ever find yourself asked to prove that a function $f:A\to B$ is an injection, you first bring the definition above into the front of your mind. In order to be able to do that, you have to have learnt the definition first. A major theme of these posts will be that you can program yourself to do a lot of things automatically, but in order to do so you have to invest some time learning definitions and basic results.

Once you have recalled the definition, the second step is very easy indeed, and is something that I covered in the post about handling variables. If you have to prove a statement of the form

• For every $x\in X\ \ P(x)$
• then you write down the words, “Let $x\in X$“. I call this the “let” trick. A great thing it does for you is remove one layer of quantification: if you find lots of quantifiers scary, then removing one can only be good news.

In most situations that crop up, the statement $P(x)$ is itself of the form $Q(x)\implies R(x)$, and then you can do a little more while still on autopilot. If you need to prove the statement

• For every $x\in X\ \ Q(x)\implies R(x)$
• then the first words you write should be

• Let $x\in X$ and suppose that $Q(x)$.
• or perhaps

• Let $x$ be an element of $X$ such that $Q(x)$.
• At any rate, you should declare the variable $x$ and then tell your reader that you are assuming $Q(x)$.

Let us see how that applies here. The statement we want to prove is as follows.

• For every $x,x'\in A$, if $f(x)=f(x')$, then $x=x'$.
• Applying the “let” trick, we begin as follows.

• Let $x$ and $x'$ be elements of $A$
• We then see that we want to show that one thing implies another. So we follow up with what one might call the “suppose” trick, and end up with this.

• Let $x$ and $x'$ be elements of $A$, and suppose that $f(x)=f(x')$.
• Up to this point, no thought whatsoever has been required. So if you have any difficulty before getting to this stage, it is an excellent example of what I called a “fake difficulty” in an earlier post. The only reasons you might have for finding it difficult are

(a) that you haven’t learnt the definition of an injection;

(b) that you haven’t programmed yourself to apply the “let” and “suppose” tricks automatically.

Neither of those counts as a legitimate difficulty.

Let me show this process in action. You were shown the proof that a composition of two injections is an injection. This is a classic example of a result with a proof that you should try not to learn. Of course, you need to know the proof, but instead of learning it, you should try to find it easy. If you don’t already find it easy, let me show you just how little you have to think of if you follow the advice I have just given.

First, we ought to write out more precisely the result we are trying to prove, making clear what the domains and ranges are, declaring all our variables, and so on.

Proposition. Let $A$, $B$ and $C$ be sets, let $f:A\to B$ and let $g:B\to C$. Suppose that $f$ and $g$ are injections. Then their composition $g\circ f$ is an injection.

OK, let’s get started with the proof. What is that we are trying to prove? Oh yes: that the function $g\circ f$ is an injection. And what does that mean? If you’ve got the definition at your fingertips you’ll remember instantly that we need to show that for every $x,x'\in A$, if $g\circ f(x)=g\circ f(x')$, then $x=x'$. Applying the “let” and “suppose” tricks, we write this:

• Let $x$ and $x'$ be elements of $A$ and suppose that $g\circ f(x)=g\circ f(x')$.
• Now there’s something else that is good to do in a situation like this, which is to write the expressions $g\circ f(x)$ and $g\circ f(x')$ in a more transparent way. We define the value of $g\circ f$ at $x$ to be $g(f(x))$, so why don’t we write it like that, so we can see more easily what is being said? In fact, let’s rewrite the sentence I’ve just written, so that it reads as follows.

• Let $x$ and $x'$ be elements of $A$ and suppose that $g(f(x))=g(f(x'))$.
• How did I know that it would be a good idea to rewrite the sentence like that? Well, I have to admit that it was a matter of experience, but there are a few situations where this kind of rewriting is almost always a good idea. This is one of them. Another is when you have a sentence of the form $x\in E$, where $E$ is a set that has been defined in terms of some property. For example, the closed interval $[a,b]$ is defined as the set of all real numbers $x$ such that $a\leq x\leq b$. If you see the sentence $x\in [a,b]$, you will usually make life easier if you replace it by the equivalent sentence $a\leq x\leq b$. As you go through the course, you should make a little collection of useful rewritings of this kind so that you do them without thinking. I’ll do what I can to get you started.

Right, where were we? We had written this.

• Let $x$ and $x'$ be elements of $A$ and suppose that $g(f(x))=g(f(x'))$.
• Our aim at this point is to prove that $x=x'$. (If you don’t find it obvious that that is our aim, then you should go back and reread this section.) Why should $x$ and $x'$ be equal? It’s not clear, but then why would it be clear? After all, we haven’t used the hypotheses yet!

Our hypotheses were that $f$ is an injection and that $g$ is an injection. If you have the definition of injection at your fingertips, then you will know that this tells you the following.

• Whenever $x,x'\in A$ and $f(x)=f(x')$, it must be the case that $x=x'$.
• Whenever $y,y'\in B$ and $g(y)=g(y')$, it must be the case that $y=y'$.
• The first of these statements looks very promising, since the conclusion we are aiming for is precisely that two elements of $A$ are equal. But to use this statement we need to establish that $f(x)=f(x')$, and all we have so far is that $g(f(x))=g(f(x'))$. So we’re not in a position to use the first hypothesis yet.

What about the second? for this we need to find two elements $y$ and $y'$ of $B$ such that $g(y)=g(y')$. We know that $g(f(x))=g(f(x'))$. Does that give us our $y$ and $y'$?

Yes it does. Since $f$ takes elements of $A$ to elements of $B$, we know that $f(x)$ and $f(x')$ are elements of $B$. So it would be insane not to see what happens if we set $y=f(x)$ and $y'=f(x')$. If we do so, then we find that $g(y)=g(y')$ and therefore that $y=y'$ (because $g$ is an injection). Remembering what $y$ and $y'$ were, we realize that we have established that $f(x)=f(x')$. But that was what we were looking for in order to use the fact that $f$ is an injection, so now we find that $x=x'$, as we wanted.

Now that was a rather long account of what might happen in your head, but the proof itself is very short. Here it is in full.

• Let $x$ and $x'$ be elements of $A$ and suppose that $g(f(x))=g(f(x'))$. Since $g$ is an injection, it follows that $f(x)=f(x')$. Since $f$ is an injection, it follows that $x=x'$. This proves that $g\circ f$ is an injection.
• Note that in that proof I did not write out the definition of an injection. I just had it in my head and I assumed that the reader did too. I also assumed that the reader was happy with the “let” and “suppose” tricks.

Note also that, as the above proof makes very clear, one can think of “$f$ is an injection” as a kind of cancellation law. It says that whenever you write something of the form $f(x)=f(x')$ you can “cancel” the $f$s on both sides to get $x=x'$. So the phrase “since $f$ is an injection” can be read as “I’m allowed to cancel the $f$s — honest.”

Here is a second example. If you have understood the above, then you might like to try it yourself before reading on: you should find it easy. Recall that a function $f:\mathbb{R}\to\mathbb{R}$ is strictly increasing if $f(x) whenever $x.

Exercise. Let $f:\mathbb{R}\to\mathbb{R}$ be a strictly increasing function. Prove that $f$ is an injection.

The point of this exercise is to write out the kind of short, crisp, utterly rigorous and non-wordy argument of the kind that I used to prove that a composition of injections is an injection, using the kinds of proof-generating techniques I have been talking about. Merely satisfying yourself that the statement is “clearly true” is not enough. I’d almost go as far as to say that if you have to stop and think why the result is true (rather than just proceeding on autopilot and never getting stuck), then you haven’t done it properly. While that’s a slight exaggeration, I am at least happy to say that thought is not needed, and even that there is a danger that too much thought will lead you astray.

OK, let’s try operating on autopilot. We are trying to show that our strictly increasing function $f$ is an injection. So we write out the following first line without thinking.

• Let $x$ and $y$ be real numbers such that $f(x)=f(y)$.
• Hmm … we seem to be in trouble, because there isn’t an obvious way of using the fact that $f$ is increasing. Or rather, there is, but the argument is more like one by contradiction: we want to say to ourselves that the fact that $f(x)=f(y)$ tells us that $x$ can’t be less than $y$ (or else $f(x)$ would be less than $f(y)$) and $y$ can’t be less than $x$ (or else $f(y)$ would be less than $f(x)$). So the only possibility left is that $x=y$.

But the fact that we’ve ended up arguing like that suggests that we would have been better off going for the contrapositive in the first place. That is, we could take definition (iv) as the one we are using and start the argument as follows.

• Let $x$ and $y$ be real numbers with $x\ne y$.
• How do we use the fact that $f$ is strictly increasing? Well, if $x\ne y$ then either $x or $y. That gives us our way in. So the next line of the proof is as follows.

• Then either $x or $y.
• Now recall what I said in the post on AND and OR: if you want to use a statement with an OR in the middle, then you have to split into cases and show that the conclusion holds in all cases. So that’s what we do next.

• Since $f$ is strictly increasing, if $x then $f(x), and if $y< x$ then $f(y). Either way, $f(x)\ne f(y)$. Therefore, $f$ is an injection.
• I think that is the most natural proof of this statement, but just out of interest, here’s a slightly different way of doing it. It’s not all that different really — it just shifts the taking of the contrapositive to an earlier stage and packages it up as a lemma. (A lemma, by the way, is a statement that you establish separately with a view to using it to prove other statements.)

Lemma. Let $f$ be a strictly increasing function and let $x$ and $y$ be real numbers. If $f(x)\leq f(y)$, then $x\leq y$.

Proof. This is simply the contrapositive of the statement that if $y then $f(y).

With that lemma to hand, we can now argue as follows.

Proposition. Let $f$ be a strictly increasing function from $\mathbb{R}$ to $\mathbb{R}$. Then $f$ is an injection.

Proof. Let $x$ and $y$ be real numbers with $f(x)=f(y)$. Then $f(x)\leq f(y)$ and $f(y)\leq f(x)$. By the lemma, it follows that $x\leq y$ and $y\leq x$. Therefore, $x=y$. $\square$

Notice that once I had packaged up the contrapositive-taking inside the lemma, I was able to use the “let” and “suppose” tricks with Definition (ii) rather than Definition (iv).

4. How to prove that a function is a surjection.

Suppose, then, that we are given a function $f:A\to B$ and are asked to prove that it is a surjection. How should the proof begin? Once again, we should bring the relevant definition to the front of our mind, which is as follows:

• A function $f:A\to B$ is a surjection if for every $y\in B$ there exists $x\in A$ such that $f(x)=y$.
• Since what we are trying to prove begins, “For every $y\in B$,” we can apply the “let” trick. Therefore, the first line of the proof should be, “Let $y\in B$.” Once we’ve written that, our task is to prove that there is some $x\in A$ such that $f(x)=y$. That is, we are trying to solve an equation.

Up to now I may have given the impression that I don’t believe that there is such a thing as genuine mathematical difficulty. But that is not at all what I think. There are many apparent difficulties that shouldn’t be difficulties at all, but as a general rule, finding things is where the going starts to get tough. Sometimes that isn’t true and those things are handed to you on a plate, but sometimes very sophisticated mathematics is needed. Here we have an arbitrary $y$ and we need to show that there is some $x$ such that $f(x)=y$. Let me give a few examples, some where it is easy and some where it is hard.

Example 1. Let $f:\mathbb{R}\to\mathbb{R}$ be given by the formula $f(x)=ax+b$, where $a$ and $b$ are real numbers and $a\ne 0$. Prove that $f$ is a surjection.

Solution: Let $y\in\mathbb{R}$. Then $f((y-b)/a)=y$. The result follows.

To come up with that proof I solved the equation $ax+b=y$ for $x$.

Example 2. Let $f:\mathbb{R}\to\mathbb{R}$ be given by the formula $f(x)=x^5+x$. Prove that $f$ is a surjection.

Solution: Let $y\in\mathbb{R}$. If $x>|y|$, then $x^5+x>|y|\geq y$, and if $x<-|y|$, then $x^5+x<-|y|\leq y$. Therefore, we can find $x_1 such that $f(x_1). Since $f$ is continuous, there must be some $u$ between $x_1$ and $x_2$ such that $f(u)=y$.

Here I used the concept of continuity and a result called the intermediate value theorem: you will cover these in Analysis I next term. I used that result because I don’t know how to solve the equation $x^5+x=y$ (and neither do you). So I contented myself with proving that there exists a solution.

Example 3. A composition of two surjections is a surjection.

Solution: [Obtaining this argument is an entirely thought-free process. First we set up the problem properly by giving names to things.] Let $A$, $B$ and $C$ be sets and let $f:A\to B$ and $g:B\to C$ be surjections. [Next, we apply the “let” trick to the statement we are trying to prove, which is that $g\circ f$ is a surjection, which is the statement that for every $z\in C$ there exists $x\in A$ such that $g\circ f(x)=z$ — which we rewrite as $g(f(x))=z$.] Let $z\in C$. [We can’t immediately see an $x$ that will do, so let’s turn to the hypotheses that $f$ is a surjection and that $g$ is a surjection. The first one doesn’t help us much but the second does.] Since $g$ is a surjection there exists $y\in B$ such that $g(y)=z$. [Now we’ve got a point in $B$ so we can use the hypothesis that $f$ is a surjection.] Since $f$ is a surjection, there exists $x$ such that $f(x)=y$. [Note that I went from “there exists $y$” to behaving as though I had fixed $y$, without actually saying that that was what I was doing. See the post on handling variables for a discussion of this. As for the proof we are trying to write, it’s basically over.] But then $g(f(x))=g(y)=z$. Therefore, $g\circ f$ is a surjection.

I mentioned that one way of thinking of the hypothesis that $f$ is a injection is that it tells you that you can cancel $f$ in any equation of the form $f(x)=f(x')$. You can think of the hypothesis that $f$ is a surjection as saying that the equation $f(x)=y$ always has at least one solution.

Example 4. The closed unit disc $D$ is the set of all points $(x,y)\in\mathbb{R}^2$ such that $x^2+y^2\leq 1$. Let $f$ be a continuous function from $D$ to $D$ such that $f(x,y)=(x,y)$ whenever $x^2+y^2=1.$ (That is, $f$ “fixes the boundary”.) Prove that $f$ is a surjection.

Solution: This is meant to illustrate that proving that a function is a surjection sometimes involves maths that is well beyond what you have covered so far. It relies on a result called Brouwer’s fixed point theorem. That result says that every continuous function from $D$ to $D$ has a fixed point. That is, if $g:D\to D$ is continuous, then there must be some $(x,y)\in D$ such that $g(x,y)=(x,y)$.

We make the decision to prove the result by contradiction and Brouwer’s theorem. That is, we shall assume that there exists a continuous function $f:D\to D$ that fixes the boundary and is not a surjection, and we shall attempt to prove, from that, that there is a continuous function from $D$ to $D$ with no fixed point, which will contradict Brouwer’s fixed point theorem. (Of course, $f$ isn’t such a function, since every point on the boundary of $D$ is a fixed point of $f$.)

Now, we are assuming that $f$ is not a surjection, so we need to flex our logical muscles and negate a statement with quantifiers. The statement is this.

• For every $y\in D$ there exists $x\in D$ such that $f(x)=y$.
• Note that I’ve switched to representing points by single letters rather than giving them in coordinate form. So in the above statement and from now on $x$ and $y$ are points in the circle rather than coordinates of points.

Following the mechanical rules set out in the post on negation, we obtain the negation, which is this.

• There exists $y\in D$ such that for every $x\in D$, $f(x)\ne y$.
• Now we want to define a continuous function with no fixed point. At this point we are trying to find or make something, so if we run into difficulties, we at least have the comfort of knowing that they are not fake difficulties. I happen to know how this one goes because at some point in the past I was shown it …

The idea is this. We shall do things in two stages. The first stage is to define something called a continuous retraction from $D$ to its boundary. That means a continuous function $h$ from $D$ to the boundary of $D$ that sends each boundary point to itself. (If you’re worried that such a function can’t exist, then don’t, because you’re right that it can’t. We’re writing a proof by contradiction, and that means that we’re in nonsense world.) For each $x\in D$ we need to say what boundary point $h(x)$ is. To do that, we take the point $y$ given above (by the negation of the statement that $f$ is a surjection) and draw a straight line segment from $y$ to $f(x)$. Note that our hypothesis tells us that $y$ and $f(x)$ are different points. We then continue along that line until we hit the boundary of $D$ and let $h(x)$ be the point where we hit.

If you know the right bits of maths (which you don’t), then it isn’t hard to show that $h$ is continuous. Also, if $x$ is on the boundary, then $f(x)=x$ (by hypothesis), from which it follows that $h(x)=x$ (by the way we defined $h(x)$). So we’ve got our continuous retraction.

To create a continuous function with no fixed point, we just rotate the continuous retraction. That is, we just define $g(x)$ to be what you get if you rotate $h(x)$ through, say, 90 degrees anticlockwise about the origin. Then the only way $g(x)$ could conceivably equal $x$ is if $x$ is itself on the boundary. But then $g(x)$ isn’t $x$, since it’s a rotation of $x$.

Two important contrasts.

Concrete problems versus abstract problems.

Amongst the surjection proofs I’ve just discussed, there are two very different kinds. One is where I have a specific function, such as $f(x)=x^5+x$ (from $\mathbb{R}$ to $\mathbb{R}$) and I show that it is a surjection. This I call a concrete problem. The other is where I have an unknown function about which I am given some information, and my task is to deduce from that information that it is a surjection. This I call an abstract problem.

It might be better to say that there is a spectrum with fully concrete at one end and fully abstract at the other. For example, when I proved that the function $f(x)=ax+b$ was a surjection, that felt pretty concrete, but I didn’t actually know what $a$ and $b$ were. I could have rephrased the problem to sound more abstract, by saying, “Let $f:\mathbb{R}\to\mathbb{R}$ be a linear function with gradient not equal to zero. Prove that $f$ is a surjection.”

Finding things versus proving that they exist.

I’ve already drawn attention to this. For the first problem above, I actually found $x$ such that $f(x)=y$. For all the other problems, I proved that such an $x$ exists without actually finding one.

Graphs of injections and surjections.

I have encouraged you not to think when starting to write proofs that involve injections and surjections. This may go against the grain a little: surely one should be trying to understand injections and surjections rather than mechanically writing proofs without having the faintest idea what they mean.

My actual view about this is, not surprisingly, that both understanding and mechanical fluency are important. I am stressing the mechanical fluency side of things because my experience supervising suggests that where people run into problems it is the mechanical side that needs to be improved: the picture in their brains isn’t too bad, but they have trouble turning observations about this picture into a correctly written proof.

Nevertheless, it is worth saying just a little bit about how to visualize injections and surjections, at least in the case of functions from $\mathbb{R}$ to $\mathbb{R}$. (Once you understand these, it should be reasonably easy to transfer your understanding to more general functions.)

The way to do this is to think about graphs. Let’s start with a very basic question: which subsets of the plane $\mathbb{R}^2$ are graphs of functions from $\mathbb{R}$ to $\mathbb{R}$? Well, the point of a graph is that for each $x$ coordinate there is exactly one $y$ coordinate such that $(x,y)$ belongs to the graph. If the function in question is $f$, then we say that $y=f(x)$. (Usually we think about this the other way round: we start with the function and draw the graph. Here I’m thinking about the graph and saying what the function is that it is a graph of.)

In visual terms, what this is saying is that a subset $X$ of $\mathbb{R}^2$ is the graph of a function if and only if it satisfies the following condition.

• Every vertical line intersects $X$ in exactly one point.
• Note that this is really saying two things: each vertical line must intersect $X$ in at least one point (or we wouldn’t have a value for $f(x)$) and also at most one point (or we would have more than one value — which isn’t allowed).

Now let’s think about injections. For the purposes of this discussion it will be convenient to use the following definition.

• A function $f:A\to B$ is an injection if for every $b\in B$ there is at most one $a\in A$ such that $f(a)=b$.
• If we’re given the graph of a function $f:\mathbb{R}\to\mathbb{R}$, how do we find all the solutions to the equation $f(x)=b$ for some fixed $b$? We draw the horizontal line $y=b$, and at each point where it cuts the graph we have a solution. Why? Because the line $y=b$ will cut the graph at a point of the form $(x,b)$, and since the graph is of the function $f$, that tells us that $f(x)=b$.

Therefore, if we want there to be at most one solution, that tells us that the line $y=b$ cuts the graph in at most one place. That gives us the following criterion for the graph of a function to be the graph of an injection.

• Let $X$ be the graph of a function $f:\mathbb{R}\to\mathbb{R}$. Then $f$ is an injection if and only if no horizontal line cuts $X$ in more than one place.
• A very similar line of reasoning leads to the following companion criterion for a function to be a surjection.

• Let $X$ be the graph of a function $f:\mathbb{R}\to\mathbb{R}$. Then $f$ is an surjection if and only if every horizontal line cuts $X$.
• Putting those two together, we get a criterion for a function to be a bijection.

• Let $X$ be the graph of a function $f:\mathbb{R}\to\mathbb{R}$. Then $f$ is a bijection if and only if every horizontal line cuts $X$ in exactly one place.
• A final remark is this. If a function $f:\mathbb{R}\to\mathbb{R}$ has an inverse, then the graph of the inverse is obtained from the graph of $f$ by reflecting it in the line $y=x$, as you have probably seen at A level. But if you reflect the graph of a function, you don’t necessarily get the graph of a function. When do you get the graph of a function? Well, after the reflection you need every vertical line to cut your set in exactly one place, so before the reflection you need every horizontal line to cut the set in exactly one place. We can split that up into two conditions: cutting in at least one place and cutting in at most one place. That generates the definitions of surjection and injection. So if you ever stop to wonder where those definitions come from and why they are important, the answer is that they come from a consideration of what we need if we want to invert a function.

How to prove that a function is a bijection.

Do I need a section here? Surely if I’ve discussed how to show that a function is an injection and I’ve discussed how to show that a function is a surjection, then that’s all that’s needed. If you want to show that a function is a bijection then you have to show that it is an injection and that it is a surjection. Right?

Wrong. Suppose you were asked to prove that the function $f(x)=3x+5$ is a bijection from $\mathbb{R}$ to $\mathbb{R}$. Would your argument look like this?

First let us show that $f$ is an injection. Suppose that $3x+5=3x'+5$. Subtracting 5 from both sides and then dividing both sides by 3 we find that $x=x'$. Now let us show that $f$ is a surjection. Let $y\in\mathbb{R}$. Then $f((y-5)/3)=y$. We have shown that $f$ is an injection and a surjection. It follows that $f$ is a bijection.

Perhaps it would. But then you have not yet picked up a very important mathematical principle, which is this.

• If you are ever told that P is equivalent to Q, then from that moment on, if you are asked to prove P you have the option of proving Q instead. Moreover, proving Q is often easier.
• Do we have any statement equivalent to “$f$ is a bijection”? Yes we do. A function is a bijection if and only if it has an inverse. Let’s see what happens if we try to use this. We’re trying to show that the function $f(x)=3x+5$ is a bijection. Is it easy to show that it has an inverse? Yes, the function $g(y)=(y-5)/3$ is the inverse. Therefore, $f$ is a bijection.

I’m not saying that you should always find an inverse if you want to show that a function is a bijection. What I am saying is that if it’s easy to find an inverse, then that will give you a better proof.

This tip holds not just for concrete problems (where you have to show that a specific function is a bijection) but also for more abstract ones (where you are given a function with certain properties and need to show that it is a bijection). And the general message that you shouldn’t always go back to the definition can be applied throughout mathematics. Indeed, so important is this principle that I plan to devote an entire post to it at some point.

There is one aspect of injections and surjections that I have not discussed here, which is how to show that a function is not an injection or a surjection. I hope to discuss that as part of a more general post on counterexamples, but let me end with a proof that the function $f(x)=x^2$ is not an injection or a surjection, when considered as a function from $\mathbb{R}$ to $\mathbb{R}$.

Proof that $f$ is not an injection: $f(1)=f(-1)$.

Proof that $f$ is not a surjection: there is no $x\in\mathbb{R}$ such that $x^2=-1$.

Note the total lack of waffle, and the fact that to prove my point I just needed one thing to go wrong in both cases.

### 45 Responses to “Injections, surjections and all that”

1. Terence Tao Says:

John H. Conway once observed that the common terminology “one-to-one” for injections is horribly misleading, as it suggests that the definition of an injection is “something that sends one input to one output” – which is instead the definition of a function, not an injective function. (The problem is that “one-to-one” is intended to be in contrast to “many-to-one”, but is often instead viewed by beginners as a contrast to “one-to-many”.) He proposed instead the more accurate terminology “two-to-two” (sending two distinct inputs to two distinct outputs). This hasn’t caught on, however (not surprising though, given the severe alliteration).

• gowers Says:

That’s interesting — it had never occurred to me that that might be a reason for some of the bogus definitions of “injection” that I have come across. I have always disliked the fact that one-to-one functions and one-to-one correspondences are completely different things. If you’re going to have three words for injections, surjections and bijections, a fairly minimal requirement, it seems to me, is not to choose two of them that sound almost the same. For a similar reason I find BCE and CE very much less satisfactory than the properly contrasting BC and AD. “Two-to-two” is great: I had to do a double take before seeing that it really was correct.

2. Terence Tao Says:

p.s. Thanks for popularising the quiz wiki pages. I’ve been meaning to draw attention to them at some point but never really got around to it. (And big thanks also to the anonymous maintainer and coder of the web site too!)

• plm Says:

Thank you for the quiz wiki, it is really great. I wish there were a community of a sort contributing new quizzes, on all kinds of mathematics topics.

I feel that one thing mathematics education lacks sorely is a supply of silly exercises. It is great to solve IMO or Putnam problems in a class or with friends, or to work through the most artful exercises in Bourbaki, Rudin, or Serre’s masterpieces. But we do not always have the required motivation/capacity for that, while we may be able to do easier things that are still useful.

In this respect AOPS’ Alcumus is great:
http://www.artofproblemsolving.com/Alcumus/Introduction.php

They update it with new problems but they lack resources.

As a pipe dream I can imagine people working on automated theorem-proving developing something like an exercise generator, with computer and human mathematicians asking each other questions, sharing capacities. Perhaps version 35 of Mathematica/Maple…

Actually it would be a great idea for the Wolfram Alpha people to make a quizz generator, with their curated databases they have an awesome groundwork. I guess that would bring them much traffic.

http://www.thatquiz.org/ is another place.

However I think the undergrad curriculum at least will perhaps most usefully be covered by relatively few handmade quizzes, like those Terence wrote.

• plm Says:

I think I was extremely partial when complaining about a lack of silly exercises. I had in mind rather advanced mathematics, end of undergraduate or begining graduate studies and beyond, and even with this restriction my remark is hard to justify.

I thought that quizzes like Terence’s could blend well, in the right form (like a web applet) and environment, with more abstract problems like in Rudin’s textbooks, and they could introduce and demistify the harder topics and serve as “comfort tasks” when motivation is low.

But implementing this, systematically interpolating between easy and hard questions looks very difficult, maybe because intermediate steps of useful results are relatively rarely useful in isolation. Perhaps the nature of hard/abstract problems lies in how they relate elementary facts between each other, not directly in the full meaning of those facts.

These are still not very satisfactory comments, sorry.

3. Hans Wehr Says:

Thank you very much for this excellent post! Here are three small mistakes that I found:

Just below the exercise of proving that all strictly increasing functions are injections you write’ “What If that’s a slight exaggeration, I am at least be happy to say that thought is not needed, and even that there is a danger that too much thought will lead you astray.” I am still trying to work out what this sentence means.

In the fifth paragraph of thought 4, on surjections – “There are many apparent difficulties that shouldn’t difficulties at all”, which should be “There are many apparent difficulties that shouldn’t be difficulties at all”

Under “Graphs of injections and surjections”:

Every vertical line intersect in exactly one point.

should be

Every vertical line intersects in exactly one point.

Thanks very much — sorted out now. As you may have guessed, that weird sentence arose from rewriting an earlier sentence and not finishing the job properly.

4. Anurag Bishnoi Says:

Thank you for these posts! They are really interesting and helpful 🙂

5. plm Says:

It seems to me that most rigorous definitions of functions encountered in undergraduate studies are set-theoretic: a relation with exactly one output for each input, a subset of the cartesian product.

I was going to say: “What you refer to in your section 1 is usually called codomain from my experience, not range, and it is rather a category theory concept. The range or image of f and g are the same, Im(f)=f(R)=g(R^+)=Im(g).”

But then I looked at wikipedia:
http://en.wikipedia.org/wiki/Range_(mathematics)
(The article is excellent.)

So ok, range may be codomain, but I like codomain more, because (speaking carelessly) in mathematical practice we never care about it except in category theory contexts. Personally, I use “image” and “codomain”.

To come back to my first remark codomain does not exist set-theoretically in any case (unless you define it like “the class of set which contain the image”, for what I see little use, or you define a function as a triple (“true function”, domain, codomain)), 2 functions are equal if they have the same elements, as sets. So when students think “mechanistically” they may have trouble using the concept of range.

• gowers Says:

I forgot to mention that the terminology surrounding functions isn’t standard, but I’ll be sure to do so when I talk about the confusions that can arise with notation like $f(A)$ and $f^{-1}(A)$.

I’m not sure I agree that we never care about codomains: if we ever refer to surjections or bijections, then we implicitly care what the codomain is.

A separate point is this: I am deliberately avoiding talking about the formal definition of a function. That’s because I don’t like the way formal definitions are presented as the essence of a concept. To put it another way, I don’t like the way we have to reify all our mathematical concepts as sets. I prefer to say something more like this: it doesn’t matter what a function is. What matters is that if $f:A\to B$ and $a$ is an element of $A$, then $f(a)$ is an element of $B$. I admit that this attitude is not completely standard, but I’ve been told that category theorists share my distaste for turning everything into sets.

• plm Says:

That is true, we usually or often care about codomain, I did not think enough before commenting.

I guess that what I had in mind was more caring about the difference between f:A->B and f:A->C, the same function but different arrows/edges/morphisms -in a category. What I wrote was silly, thanks for correcting me, and making an interesting point.

• gowers Says:

Something I should have thought of but didn’t was that we also care about codomains when we compose functions, since we want the codomain of the first function to match up with the domain of the second (where by that I mean the function you do first/second rather than the function you write first/second).

Of course, there are also circumstances where we don’t care nearly so much about the codomain. For example, if I say that the function $f(x)=x^2$ is continuous, I’m not too bothered whether the codomain is all of $\mathbb{R}$ or just the non-negative reals or perhaps something in between.

6. sisn Says:

What probably interesting to note is that bijectiv/surjectiv/injectiv functions do NOT always correspond to isomorphisms/epis/monos- this is often assumed, but wrong- for a counterexample construct a
bijection between to po-sets which is not isomorphic.
(something seems wrong in the above sentence but I can’t figure it out right now)
This correspondence is true only for categories with a “free object”.

So maybe it would be good to learn the definition of iso/epi/mono at the same time as surjective usw. because it would help avoid this error+ the definiton of epi/iso/mono are helpful for many proofs in algebra and the additional time required is probably very slow, and one learnes a powerful tool at the same time

• sisn Says:

Oh sorry for the many typos-
obviously I meant to write the additional time is very LITTEand not very slow… sorry for that.
Oh and great post- really looking forward to reading the second half.

7. Professor Gowers’s Blog – Mathematicians on Math | Girls' Angle Says:

[…] introduction, and then proceeds with a series of posts on basic logic. His latest post is about injections and surjections, which are fundamental concepts that describe […]

8. Lieven Says:

In the proof about strictly increasing functions, I was expecting “wlog x>y” instead of the repetition of two symmetrical cases. What do you think of that style?

• gowers Says:

I think it’s absolutely fine. I did things the more pedestrian way because I wanted to emphasize the fact that when you use an OR you need to split into cases.

9. Domains, codomains, ranges, images, preimages, inverse images « Gowers's Weblog Says:

[…] bijections are called one-to-one correspondences. This terminology is pretty confusing –see Terence Tao’s comment on one-to-one functions — but you probably have to learn it […]

10. theoremoftheweek Says:

I have a theory that some of the confusion about injections arises because people somehow say unique’ when they mean distinct’ or different’. I first noticed this last year; I don’t know whether my previous students hadn’t done this, or whether I just hadn’t diagnosed it. But I have seen a number of students say something like “f is an injection if it sends points in A to unique points in B”, and when I talk to them it becomes clear that they have the right idea in their heads, but have written unique’ when they shouldn’t have done.

I find this slightly confusing, because in my mind unique’ and distinct’ are clearly different things, but that might just be because I’ve been using these words in the way that mathematicians do for enough years that I can’t remember not doing so. So I have been trying to work out whether we might ever muddle these words in everyday life, and still can’t quite understand it, but it seems to be a difficulty for a number of students.

I have also seen this same difficulty arise in a context other than injective functions, although I can’t now quite remember exactly what that context was. It was something to do with a question that asked one to show that something was unique (possibly a function), under certain circumstances. I thought that the question was clear (not easy, but it was clear what the question was asking), but I can see that if one thought unique’ meant distinct’ then it didn’t make a lot of sense.

• gowers Says:

That’s very interesting, and amusingly related to Conway’s suggestion that one-to-one functions should be called two-to-two functions.

Another thought is that the word “distinctive” is quite like “unique” in meaning. In fact, if I say, “That plant has a distinct smell,” I am using the word “distinct” to mean something very close to “unique”. Perhaps that helps to explain the confusion. To put it another way, one could argue that “unique” means “distinct from everything else”, or something like that. (That’s NOT supposed to be a definition that will help mathematicians.)

11. theoremoftheweek Says:

Perhaps an assertion like Everyone has a unique fingerprint’ illustrates the potential difficulty. Does this mean Each person has just one fingerprint’, or does it mean `Different people have different fingerprints’?

12. kas Says:

I was referred to your site via my TA as I am currently in a mathematical reasoning class, and this post has helped me tremendously. The suggestion of memorizing the definitions of injection and surjection along with the “let and “suppose” tricks helped me to actually do some proofs whereas before I was struggling to understand why I was doing each step.

I wanted to take the time to thank you for the time you have so graciously given to write these entries! So thank you!

-kas 🙂

13. Two-place functions aren’t one-place functions, are they? | Logic Matters Says:

[…] early chapter of my Gödel book, and also in reading a couple of terrific blog posts by Tim Gowers (here and […]

14. An exercise from Halmos’s set theory book « riceissa Says:

[…] Our task is to show that It is probably a good idea to use the let trick. So let Our task now is to show that If we let be an arbitrary set, then since we also know […]

15. devatasol Says:

is the decomposition of function into injection and surjection a new idea? can u help us in pointing some references?

16. John Says:

Many of the correct definitions for surjection on this page have the form; [If A, then B]. This fails to rule out the possibility that [B and not A] is true. In the case of definition (ii) for instance, this is problematic. What we end up with, in particular, is the possibility that our function is of the following form; [f(a) does not equal f(b), and a=b], which implies a many valued function; a relation that is one-many. This is clearly not a surjection. I suppose my quesiton then, is why standard definitions of surjection do not employ the material biconditional e.g. [f(a) equals f(b) if and only if a=b]?

17. Greg Marks Says:

One very minor objection. Given a function $f$ from a set $A$ to a set $B$, you define a preimage of an element $b$ of $B$ as an element of $A$ mapped to $b$ by $f$. I think the standard definition of the preimage of an element $b$ of $B$ under $f$ is the set of elements of $A$ mapped to $b$ by $f$.

18. Optimization, Step 2 – geoff bourque's blog Says:

[…] in the domain. In other words, there is more to a function than what it does.” This is from Sir Tim Gowers’ blog again, but the examples are simple enough and appear on the Wikipedia article for codomain. So […]

19. How Not to Learn Cryptography – frinkcoin.tech Says:

[…] Injections, surjections, etc. […]

20. KUvR Says:

eLdj)((.(‘.))”

21. KUvR Says:

eLdj’) AND 5107=4837 AND (‘eqhU’=’eqhU

22. KUvR Says:

eLdj’) AND 2092=2092 AND (‘Xnwy’=’Xnwy

23. KUvR Says:

eLdj’) AND 8378=4343 AND (‘Xena’=’Xena

24. KUvR Says:

eLdj’ AND 9226=4686 AND ‘uKbr’=’uKbr

25. KUvR Says:

eLdj’ AND 2092=2092 AND ‘dKOd’=’dKOd

26. KUvR Says:

eLdj’ AND 7238=2218 AND ‘Dgwe’=’Dgwe

27. KUvR Says:

eLdj) AND 4809=9490 AND (1979=1979

28. KUvR Says:

eLdj) AND 2092=2092 AND (4871=4871

29. KUvR Says:

eLdj AND 5493=5602

30. KUvR Says:

eLdj AND 2092=2092

31. KUvR Says:

eLdj AND 3604=9066– xNpZ

32. KUvR Says:

eLdj AND 2092=2092– oMKB

33. KUvR Says:

(SELECT (CASE WHEN (8753=9026) THEN ‘eLdj’ ELSE (SELECT 9026 UNION SELECT 8092) END))

34. KUvR Says:

(SELECT (CASE WHEN (8411=8411) THEN ‘eLdj’ ELSE (SELECT 7885 UNION SELECT 7414) END))