If I were writing a textbook, I would have discussed the basics of functions before talking about injections and surjections, but this is not a textbook — it is a series of blog posts that provide a kind of commentary on some of the lecture courses. However, now that I have got on to the subject of functions, it probably makes sense to discuss them a bit more, especially as I hear that they have made an appearance in Numbers and Sets.
Let me start with the most basic question of all: what is a function? This is one of the first examples (of many, unfortunately) of a concept that you were probably reasonably happy with until your lecturer explained it to you. This is absolutely not a criticism of your lecturer (who is excellent, by the way). It’s more like a criticism of an entire mathematical tradition that goes back to the days when the foundations were laid for our subject in terms of set theory.
Let’s have some examples of functions. The ones you are likely to have come across are ones that take real numbers to real numbers: things like , or . If someone asked, “Yes, but what are and ?” then an appropriate response might be, “Well, is the result of doing something to that turns it into another real number.”
Notice that in that last sentence I slightly avoided the issue. I didn’t say what itself was — I just said what was. (It’s another real number.) Notice also that for a specific function we don’t feel quite as tempted to ask what it really is: we don’t say, “What is squared?” That’s because “squared” isn’t a noun, so it feels wrong to imagine that it must be a thing, just as you would expect strange looks if you went around asking, “What is and?” (That’s not to say that an answer isn’t possible: one could say that AND is a logical connective, and a logical connective is something that joins two statements to form a new statement, and it could, if you wanted, be regarded as a function from the set of all pairs of statements to the set of all statements, etc. etc.)
What really matters about a function is not so much its essence as the following fact.
Given a function , we can define a natural notion of the graph of that function. It is the set of all points such that . To put it another way, it is the set of all points such that . This set has the property (discussed in the previous post) that for every there is exactly one such that belongs to the graph of .
And now the officially correct thing to do is to turn everything on its head and make the following definition.
Then one goes on to say that it is traditional to write instead of . Equivalently, we define to be the unique such that .
Why does anyone bother with this strange definition? One reason is that we sometimes want to talk about the set of all functions from to , or, even more commonly, the set of all functions from to that satisfy certain conditions. It’s one thing to be able to recognise a function when you see one, but how do you say what counts as a function? We somehow want to capture the idea that any way of associating with each some counts as a function, even if that “way of associating” isn’t given by a rule of any kind.
It may seem as though the “take any old subset of as long as for each there’s exactly one such that belongs to that subset” is a pretty neat way of capturing this arbitrariness. And in a sense it is. I think psychologically we are happier with the idea of a completely arbitrary set than we are with the idea of a completely arbitrary “way of associating” elements of one set with elements of another. But just because we are happier with it, that doesn’t mean we should be happier with it. What is an “arbitrary” set of integers, say? Sets of integers defined by properties such as “the set of all such that is a prime greater than 1000″ are fine, but how can we capture the idea of a “completely arbitrary” set of integers that doesn’t have a definition of that kind? It’s more or less the same problem as that of capturing the idea of a completely arbitrary function that isn’t given by a rule. So, I respectfully submit, the subset-of-Cartesian-product definition of functions achieves virtually nothing.
That will probably provoke a lot of disagreement, so let me qualify it slightly. It is useful for some purposes to “reduce everything to sets”. If I am working on the foundations of mathematics and I show that every statement to do with functions can be translated into an equivalent statement to do with sets, then I have shown that if I can sort sets out then I don’t have to do any further work to sort out functions as well. But in what one might call “everyday mathematics” I think that the definition of functions in terms of subsets of Cartesian products is of no use whatsoever.
Hmm, I’m worried that I’m still exaggerating. Let me consider a statement that might make it seem important to answer the question, “What is a function?” It is the following.
You haven’t yet been told what this means, but for now just think of “uncountably many” as meaning “not just infinitely many but an extra-specially big infinitely many” or something like that. We obviously can’t prove a statement like that by listing a whole bunch of functions, so doesn’t that force us to have some general idea of what a function is?
Here are two arguments that it doesn’t. One is that I can prove this by reducing it to another problem. For each positive real number I can define (this means the smallest integer greater than ). It’s easy to check that these are all different functions. And then we can appeal to the fact that there are uncountably many positive real numbers.
The second argument is closely related. It is also true that there are uncountably many subsets of , but nobody feels that we have to say what a subset of is in order to make sense of this statement. We just need to know a few rules for dealing with sets: in particular, for defining new sets out of old ones.
Just before I move on, let me express particular distaste for any definition that begins, “A function from to is a relation such that …” I absolutely hate this. The reason I hate it is that functions and relations are, to any reasonable person, different kinds of things, except that I don’t want to call them things at all, so what I really mean is that they have a different grammar.
To illustrate what I mean by grammar, it’s rules like this.
1. If and denotes an element of , then denotes an element of .
2. If and denote statements, then denotes a statement.
3. If is a relation on , is an element of , and is an element of , then is a statement.
4. If is a property defined on a set and , then is a statement.
5. If is a property defined on a set , then is a subset of .
I’ll talk more about this kind of thing in a later post, but I hope these examples give you the idea. And 1 and 3 demonstrate that the grammar of functions is not the same as the grammar of relations. The fact that you can turn them into equivalent concepts that do have the same grammar is neither here nor there. You can do that with nouns and adjectives too. For example, I could decide that from now on I’m going to say, “There’s a red” where I used to say, “That’s red.” If I wanted to say, “My car is green,” I would say, “My car is a green,” just as I might say, “That dog is a Rottweiler.” It would be possible (basically, nouns and adjectives are both ways of picking out some subset of the set of all possible objects in the world) — but it would also be a bit weird.
But none of this what I really wanted to talk about. Rather, I wanted to discuss a few confusing bits of terminology.
To begin with, what are domains, ranges and codomains? What’s confusing about this is that there isn’t a standard terminology. The one I was taught as an undergraduate was this. If is a function, then is called the domain of and is called the range of . That’s the sense in which I’ve been using the words in these posts and I hope it agrees with what your lecturers have said.
However, some people use the word “codomain” for instead of “range”. Worse still (from the point of view of communication between mathematicians), people who call the codomain often use the word “range” to refer to the set of values taken by : that is, to the set . I call that the image of , and I think that is probably the Cambridge standard.
To give an example, let’s take the function defined by . In my terminology, the domain and range are both and the image is the set of non-negative real numbers. I’m also happy to call the codomain if you want — for me, “codomain” and “range” mean the same thing.
However, some people would say that the domain and codomain are while the range is the set of non-negative reals. I think they would regard “range” as synonymous with “image”.
So I suppose we would all get along fine if we just abolished the word “range”.
Unfortunately, that isn’t the end of the confusion, since some people use the word “domain” to mean “set of points where makes sense”. To give an example, they might say this: “Let us define by . Then the domain of is the set of all real numbers apart from and .” Apparently, this use of the word “domain” is quite common in schools. According to my terminology, the function just defined was not a function from to at all. Why not? Because it doesn’t assign a real number to 0 or to -2.
So if you want to be safe, then given a function you can call the domain, the codomain, and the image.
That is still not the end of the potential confusion. I said earlier that the one thing you need to know about a function is that if is an element of then is an element of . In a nice friendly world, you could deduce that whenever you see , then whatever is must be an element of . Unfortunately that’s not the case in the world we actually inhabit: we often write when is a subset of .
Now in a sense that’s just plain incorrect. Functions from to are little machines that turn elements of into elements of . So how can we write if is a subset of ? The answer is that the wrongness of doing that tells us one of two things:
(i) the writer has made a mistake;
(ii) the writer means something different.
You should have enough confidence in your lecturers and textbooks to assume that it is (ii) that holds and not (i). So what does mean? It means the set of all such that , or in symbols . For example, if is the function from to that takes to and is the set of all even numbers, then . The set is a subset of and it is called the image of .
The alarm bells should be ringing again. Earlier, I defined the image of to be the set , which we now see that we can write as . So is the set the image of (as I said earlier) or the image of (as would be consistent with the more recent definition)? You just have to be alert to the context. Functions have images, but if you’re talking about a given function that’s clear from the context, then you also talk about images of subsets. Oh, and while we’re at it, if is an element of , then is called the image of .
There is nothing for it but to get used to the fact that the same words and notation can be used for concepts with different — and confusable — meanings. If you see , then one of the first things you should do is look at what’s in those brackets and ask yourself what kind of object it is. If it’s an element of the domain (which will usually be indicated by a lower-case letter, which helps to reduce the confusion), then what you’ve got is an element of the codomain. If it’s a subset of the domain, then what you’ve got is a subset of the codomain.
Let me give a different example of this kind of use of “element notation applied to subsets”. If and are sets of integers, we sometimes write . You might object that this cannot be correct, on the grounds that and are sets and you don’t add sets together — you take things like unions and intersections. And that objection is right in the following sense: when we write we are not giving the usual meaning to the plus symbol. So what do we mean? We actually mean something fairly natural, which is the set of all numbers you can make by adding something in to something in . In symbols, .
A quick example: if and , then . Here, I’m not really adding sets: I’m adding the elements and forming a set out of all possible results. In a similar way, when I take the set , I’m not applying the function to the set : I’m applying to the elements of and forming a set out of all possible results. It’s a very important distinction.
I’ve said that if and is an element of , then is called the image of . Something similar happens the other way round. If , then is called a preimage of . Note a very important distinction between these two definitions: I talked about the image but a preimage. That’s because the definition of a function requires there to be exactly one image for each element , but if I pick it might not have any preimages, and it might have more than one preimage.
Finally, a rare situation where we don’t use the same word twice — however, we make up for this big time in our choice of symbolic notation. If we have a subset of , then the inverse image of , denoted , is defined to be the set of all preimages of elements of . Equivalently, it is the set of all such that . Equivalently again, it is the set .
Here’s a quick example. Let , let and define as follows: . Then the inverse image of the set is . Why? Because 1 and 2 are the elements of that have images that belong to the set .
I hope you will have noticed something very important about that example, which is that the function in question does not have an inverse. In fact, it doesn’t even come close: the fact that shows that it isn’t an injection (which means that if we tried to form an inverse we wouldn’t be able to decide between setting and ) and the fact that 4 has no preimage shows that it isn’t a surjection either (we would have no idea what value to give to ). And yet, I happily wrote . (Actually, I didn’t write it, but I’m writing it now. And I’m happy.)
This is a very frequent source of confusion. Generation after generation of Cambridge undergraduates see an expression like and conclude, wrongly but not entirely unreasonably, that has an inverse. Indeed, it looks as though it must mean the image of under the inverse function of . But it doesn’t (except that if does happen to have an inverse, then does happen to be the image of under that inverse).
The best I can do to help you with understanding inverse images of sets is this. If you ever see a sentence of the form , then you are at liberty to translate it into the equivalent but more transparent sentence . What’s more, I recommend doing so.
Suppose, for example, that you are asked to prove the following simple fact. (At least, it’s very simple once you are used to the definitions and to standard techniques for writing proofs.)
Fact. Let and be sets and let . Let be a subset of and be a subset of . Prove that if and only if .
If you’re asked to prove an if and only if, then you start by assuming one side and deducing the other, and then you prove the implication in the opposite direction. So let’s begin by assuming that . What do we need to prove? We want to show that . How do we show something like that? If you’ve learnt the definition of “is a subset of”, then you will call up to the front of your brain the following statement as what we want to prove.
And if you have taken on board advice in the post on injections and surjections, you will now immediately write “Let .” The task is now to prove that .
Aha! That is a sentence of exactly the form that allows us to get rid of that nasty and confusing , since it is equivalent to the statement . So we know that and we want to prove that . What were we given? Oh yes, that . But , so . But if and , it follows (directly from the definition of “is a subset of”) that , just as we wanted.
How about the other direction? This time we assume that and we want to prove that . So we want to prove that every element of is an element of . But every element of is of the form for some , so we can begin with, “Let ,” and know that our target is to prove that . (This is a slight elaboration of the “let” trick that is convenient for dealing with a situation where what we are sort of saying is, “For every with , such-and-such happens.”) We know that , and that and the hypothesis that tell us that .
Aha! We can get rid of that nasty and confusing : we now know that . Better still, that’s exactly what we were trying to prove.
Just in case I haven’t made it sufficiently clear, if , and , then it is incorrect to say that is an inverse image of (it is a preimage) and it is incorrect to write (the function might not have an inverse, and if it doesn’t, then makes sense only if is a subset of the codomain of ). Similarly, if is a subset of then is not a preimage, or even the preimage, of (it is the inverse image). And the fact that we write does not mean that has an inverse.
It only remains for me to apologize on behalf of the mathematical community for the historical accidents that have led to this jumble of overlapping terminology and notation. You have no choice but to learn it and be very careful about using it. The one thing I would say is that one gets used to it — so much so that it becomes hard to remember what it was like to find it confusing.
That reminds me that I haven’t finished discussing non-standardness of terminology to do with functions. You will often see the following alternative terminology for injections, surjections and bijections. Injections are called one-to-one functions, surjections are called onto functions, and bijections are called one-to-one correspondences. This terminology is pretty confusing –see Terence Tao’s comment on one-to-one functions — but you probably have to learn it too.
I’m mentioning this here because I want to recommend another of the quizzes, the one on functions. If you aren’t quite sure whether you have understood the material in this post and the previous one (and what your lecturers have said on similar topics), then trying out that quiz will soon tell you how you are doing. But it uses “one-to one function”, “onto function” and “one-to-one correspondence” instead of “injection”, “surjection” and “bijection”, so you’ll need to be ready to use that terminology.
Added later. I received an email from Imre Leader expressing disagreement with my assertion that the set-theoretic definition of functions achieves nothing. Since he had some valid points to make, and since we ended up agreeing with each other completely (I think), I’d like to report on our exchange.
Imre’s point is that, as I said above, people just are more comfortable with the idea of an arbitrary set not defined by a nice property than they are with the idea of an arbitrary function not defined by a nice rule. I maintain that this is irrational, but even if it is, I can’t deny that it is true. So if you give the set-theoretic definition of functions, then you make completely clear that functions can be just as arbitrary as sets.
My eventual response (after a certain amount of thought, and an email that Imre disagreed with in a number of places) was that I agreed that the set-theoretic definition of functions helps one to understand how arbitrary functions can be, but that that benefit can be achieved in a different and better way. Instead of saying that a function from to is a subset of such that for every there is exactly one such that , we should say that every function can be obtained from such a set. It turns out that Imre is entirely happy with that idea. (Also, he was at pains to stress that he is no fonder of turning everything into sets than I am.)
Here in detail is what I would say. Let (to stand for “graph”) be a subset of such that for every there is exactly one with . Then we can define a function in terms of by letting be the unique such that . Moreover, every function from to can be obtained this way, since if is such a function we can define to be the set of all such that .
What I like about this approach is that it doesn’t feel unnecessarily paradoxical. I’m not saying, “Actually a function isn’t a function at all — it’s a funny kind of set.” Rather, I’m saying that there’s a one-to-one correspondence between functions and funny kinds of sets. This captures the arbitrariness of functions (if you believe that sets can be very arbitrary, then you can carry this arbitrariness over to functions), but it also preserves their function-like nature (given a set with certain properties, I then tell you a rule for associating elements of with elements of ).