Something that happens very often in lecture courses is that you are presented with a definition, and soon after it you are told that a certain property is equivalent to that definition. This equivalence means that in principle one could have chosen the property as the “definition” and the definition as an equivalent property. To put that differently, suppose you are developing a piece of theory and have some word you want to define. To pick an imaginary example, suppose you have a notion of a set being “abundant”. Suppose that a set is defined to be abundant if it has property P, and that property P is equivalent to property Q. There may well not be much to choose between the following pair of alternatives. On the one hand you can say, “Definition: A set is abundant if it has property P,” and follow that with, “Proposition: A set is abundant if and only if it has property Q,” while on the other you can say, “Definition: A set is abundant if it has property Q,” and follow that with, “Proposition: A set is abundant if and only if it has property P.”
That is a simple observation, and one that you would have been likely to make for yourself, or to have had pointed out to you at some stage. But it has a very important practical consequence, which I can sum up as a slogan.
Rather than say in detail what I mean, I am going to discuss several examples. Some of them you will meet in Group Theory and Numbers and Sets. The others I will discuss in less detail, since you will not meet them until later. But they make good examples — perhaps you will find it useful to reread this post at some point in the future.
I have already touched on this example. A bijection is defined to be a function that is both an injection and a surjection. It may therefore seem obvious that if you are asked to prove that some function is a bijection, then you should set about proving first that it is an injection and then that it is a surjection. However, that is not always true. One of the basic results about bijections is this.
Proposition. A function is a bijection if and only if it has an inverse.
Once we’ve proved that proposition, we are allowed to treat “has an inverse” as an alternative definition of “is a bijection”. Let me give a very simple instance of a proof where “has an inverse” is a more convenient definition to use. Suppose you are given two sets and and asked to prove the easy fact that the function that takes to is a bijection. If you work directly from the injection-surjection definition, your proof will look something like this.
Proof 1. First let us show that is an injection. If then (by the main property of ordered pairs) it follows that and , and hence that (again by the main property of ordered pairs).
Now let us show that is a surjection. Let . Then .
Since is an injection and a surjection, it follows that it is a bijection, as was wanted.
Contrast that with the following proof.
Proof 2. Define by . Then is an inverse for . Therefore, is a bijection.
Moral: if you need to prove that a function is a bijection, consider whether you can just write down an inverse.
More general moral: if you are dealing with bijections, bear in mind that “has an inverse” can be treated as a useful alternative definition, even if “officially” it is an equivalent property.
There is one exception to the second moral. If you are asked to prove that a function is a bijection if and only if it has an inverse, then it is not legitimate to interpret “is a bijection” as “has an inverse” and then argue that the statement is a tautology.
This touches on a question that many beginning mathematicians have asked: “What am I allowed to assume?” Many cases of this question are covered by the following principle.
A slightly more subtle situation occurs with the following example. Suppose you are asked to prove that the function is a bijection from the real numbers to the positive real numbers. Following my advice, you might say, “ is an inverse!” and leave it at that. That might be a legitimate argument, but only if your definition of is not “the inverse of “. And if you define in a different way — for example as — then it may take you quite a bit of work to prove that what you have defined really does invert the function . (It can be done, however. You might like to see whether you can use that integral definition of to prove that , and then use that property to prove that inverts some exponential function, and finally a bit of calculus to prove that the derivative of that exponential function at 0 is 1. However, it is easier — though less illuminating — to prove that is an injection and a surjection. The former is true because is a strictly increasing function, and the latter is a consequence of the intermediate value theorem, which is discussed later in the post.)
2. Invertible matrices.
An matrix is defined to be invertible if there is an matrix such that , where is the identity matrix. In a typical linear algebra course, the following statements are all shown to be equivalent to the statement that is invertible. (I’ll take the matrix to have real-number entries.)
(i) The only solution to the equation is . (Here, is denoting the column vector of height that consists entirely of zeros.)
(ii) For every column vector of height the equation has a solution.
(iii) The row-rank of is .
(iv) The column-rank of is .
(v) The determinant of is non-zero.
A matrix that is not invertible is called singular. If you want to decide whether a matrix is invertible or singular, you can choose any one of the properties (i)-(v) and see whether it holds. And if you know that a matrix has one of the five properties (i)-(v), then you are free to use any of the other properties. For example, if in the particular problem you are working on it happens to be easy to show that the determinant of is zero, you could slip over to (i) and say, “Let us choose such that .”
In this case, it is probably best to think of invertibility and singularity as clusters of related properties rather than single properties. At any rate, you shouldn’t be wedded to any particular property as “the main” one.
3. Highest common factors.
A factor of a positive integer is a positive integer such that . A common factor of two positive integers and is a positive integer such that and . The highest common factor of two positive integers and is … well … the biggest of all the common factors. The phrase “highest common factor” is so suggestive that the temptation to think of the definition I have just given as the primary one is extremely strong. And yet for many problems it is not at all the most convenient definition to use.
What other equivalent definition is there? Well, it won’t be presented to you as an equivalent definition: more likely, it will be presented as part of the proof of a very important fact about highest common factors, sometimes known as Bézout’s theorem. It is the following result.
Theorem. Let and be two positive integers, and let be the highest common factor of and . Then there exist integers and such that .
How is this number identified? It is the smallest positive integer that can be written in the form . Once you have proved that the smallest positive integer that can be written in the form is indeed a factor of both and , you immediately obtain as a consequence that every integer of the form is a multiple of . You also immediately obtain as a consequence that every multiple of can be written in the form .
That gives us two alternative definitions of the highest common factor of and .
(i) The highest common factor of and is the smallest positive integer of the form such that and are integers.
(ii) The highest common factor of and is the positive integer such that the set of integers of the form coincides with the set of all multiples of .
Now let’s see how we can put these alternative definitions to use. Suppose that we are asked to prove the following simple fact. I’ll use the standard notation to stand for the highest common factor of and .
Exercise. Let and be positive integers. Suppose that and that . Prove that .
I think the majority of first-year undergraduates look at this statement and think something like this: “Every prime factor of goes into , but it can’t go into because , so it must go into .” As it stands, that argument isn’t quite correct, because it ignores the case where a prime goes into more than once. But it can be turned into an ugly proof if you want.
What is ugly about it? Two things. First, it uses the fundamental theorem of arithmetic (that every positive integer has a unique factorization into primes), when it could get away with a more basic tool, namely Bézout’s theorem. Secondly, writing out proofs of this kind properly involves writing expressions like , which is a pain.
But if you agree with my aesthetic sensibilities (which probably a fair percentage of you won’t, but I hope you’ll eventually come to change your minds), then there remains the question of how exactly you use Bézout’s theorem to prove a statement like this. And here is where alternative definitions come into play. In this case, they don’t really give you anything that you don’t get from the general instruction to use Bézout’s theorem, but they do at least keep you focused on that goal.
The statement that becomes, if we are thinking about highest common factors in a Bézout’s-theorem kind of way, the statement that we can write 1 as . We’re also given that , and we might like to note that that implies that goes into any number of the form . Since we want to show that , it makes sense to try to write in the form . We know that . It follows that , which is indeed of the desired form.
That proof is perhaps too similar to the proof that if a prime divides a product then or . So let’s try another one. It’s the statement that if and two numbers x and y are congruent mod and congruent mod , then they must be congruent mod . (If you haven’t met it yet, two numbers are congruent mod if they differ by a multiple of .)
Just before I start, let me point out that the condition is necessary. For example, if and , then 5 and 29 are congruent mod and congruent mod but not congruent mod .
I am thinking of the statement as telling me that I can write for some integers and . That is what it means to me. (Note that it is more helpful because it tells me that something exists, namely and , rather than that something doesn’t exist, namely a non-trivial common factor.) What else am I given? I’m given that and for some and . I now want to prove that for some integer .
Let us write down the three equations we have and the equation we want.
We don’t really care about and , so a fairly obvious thing to do is write and change our target to that of proving that and are both multiples of . But if we want to prove that , we are trying to prove that . That is, we want to prove that (or we could if we wanted prove that ).
So after a tiny bit of rearranging, the problem is this.
How do we use Bézout’s theorem to show that things divide other things? Well, in the previous proof we took a statement of the form and multiplied it by the number we wanted the other number to go into. Let’s do that here. We want to go into , so let’s multiply the first equation by and hope for the best. We get this.
We’ll be done if we can show that goes into the left-hand side. Does it? Well, it certainly goes into the second term, but what about the first?
Hang on, we haven’t used all the information yet. What else did we know? Oh yes, that . That tells us that , so the first term is divisible by too.
Here’s an exercise that’s well worth trying. See if you can prove that the lowest common multiple of and is , and see if you can do it without the help of the fundamental theorem of arithmetic. More precisely, your task is to prove that is a multiple of both and , and that every number that’s a multiple of both and is a multiple of .
I think it is highly likely that many people reading this will be far from convinced that using Bézout’s theorem is better than simply writing out prime factorizations. Let me try to explain again why I prefer it (and am not alone in this view). It’s partly a distaste for using results that are “more advanced” than what you are trying to prove. An extreme example is the result that if is a prime that divides then either or . That result is (usually) used to prove the fundamental theorem or arithmetic, so you shouldn’t use the fundamental theorem of arithmetic to prove it. Since the other results are of a similar flavour to that one, it seems somehow more appropriate to use similar techniques.
Another potential advantage is that there are algebraic structures called rings that are somewhat like the integers (in that you can add and multiply elements together but you can’t necessarily divide them) in which the fundamental theorem of arithmetic does not hold. I’m not enough of an algebraist (or algebraic number theorist) to have examples at my fingertips, but I am almost sure that there are results that can be generalized from the integers to more general rings, but only if you use a Bézout-type proof. If anyone can supply me with an example, I will be very grateful. (The basic point, however, is that in some rings unique factorization doesn’t hold. In such rings, it is no longer clear that any two elements and must have a common factor that’s a multiple of all other common factors. But we can still look at the set of numbers of the form , which forms an object called an ideal. An example of a ring in which unique factorization fails is the set of all numbers of the form where and are integers. In this ring the number 6 can be factorized as and also as . In this ring, 6 and have both 2 and as common factors, but there isn’t — I’m pretty sure — a common factor that’s a multiple of both 2 and .)
4. Normal subgroups.
If you haven’t met normal subgroups yet, you will do soon. Here is the usual definition.
Definition. Let be a group and let be a subgroup of . We say that is normal if whenever and .
If you are given a subgroup and asked to prove that it is normal, then the obvious thing to do is look at an arbitrary group element of the form and show that it belongs to . But that is often not the simplest way and even more often not the way that gives you the greatest insight into why is normal.
What is the alternative definition here? Well, a very important fact about normal subgroups (I would call it the most important fact myself) is this.
Proposition. A subgroup of a group is normal if and only if there exists a group and a homomorphism such that is the kernel of .
If you don’t know what homomorphisms and kernels are, then there are two things you could do at this point. One is to come back and read this when you’ve been shown them in lectures. Another is to look them up on Wikipedia — they aren’t that difficult. If you decide to skip this section and never come back to it, then please at least go away with this message in mind: it is often better to think of normal subgroups as kernels of homomorphisms rather than as subgroups that satisfy the condition in the original definition.
Let me give a simple example of a problem where thinking of normal subgroups this way is helpful. Recall that the dihedral group is the symmetry group of a regular -gon. (Some people write for this but I think is the Cambridge standard.) This splits up into rotations and reflections. The rotations form a subgroup and that subgroup is normal. Why?
I won’t bother to give the argument that works directly from the definition. Instead, I want to show that the group of rotations is the kernel of some homomorphism. That is, I want to find a group and a homomorphism such that if and only if is a rotation.
There are many ways to think about this. Probably the best is to use the concept of a group action, but since you haven’t got to that yet I will avoid it. (However, later on I plan a post on group actions and I’ll come back to this point.) Instead, let me simply define a homomorphism from to the 2-element group, which I’ll think of as under multiplication, by sending all rotations to and all reflections to . Basically, a transformation maps to 1 if you don’t have to turn your polygon over and to -1 if you do. (If you want, you can think of this homomorphism as the determinant of the linear map that does the transformation.)
What this simple example illustrates is that normal subgroups are subgroups that “leave something alone”. In this case, what the rotations leave alone is the way up that the polygon is. If you can find a proof of this kind, then you “feel the normality” in a way that you don’t if all you’ve done some calculation that happens to show that belongs to . And sometimes it is much simpler. For example, the set of real matrices of determinant 1 is a normal subgroup of the set of all non-singular real matrices. I have seen supervisees prove this directly using the multiplicative property of the determinant (that det=detdet) without noticing that that very same property shows that the determinant is a homomorphism to the non-zero real numbers and that the subgroup in question is the kernel of that homomorphism.
Another nice example is the subgroup of the alternating group that consists of the identity and the three transposition pairs and . It isn’t too hard to check that these form a subgroup, but why is it normal?
One answer is that conjugating a permutation always gives you a permutation of the same cycle type. Since we have included all transposition pairs, this subgroup must be normal. But that answer doesn’t really give me any feel for what is special about the subgroup. Here is another argument. We can identify with the group of rotations of a regular tetrahedron. (For example, if we label the places where a vertex can go by the numbers 1, 2, 3 and 4, then the 3-cycle is the 120-degree rotation that fixes the vertex in place 4 and sends the vertex in place 1 to the vertex in place 2, the vertex in place 2 to the vertex in place 3, and the vertex in place 3 to the vertex in place 1.)
Now for each pair of opposite edges of the tetrahedron, we can draw a line joining the two midpoints. This gives us three lines that go through the centre. If we label the places where these lines can be with the numbers 1, 2 and 3, then with any rotation of the tetrahedron we can look at what it does to the lines in those places and define a corresponding permutation of the set . If, for instance, it sends the line in place 1 to the line in place 2 and vice versa then we associate with it the permutation .
It is not hard to check that this gives us a homomorphism from to . The kernel of this homomorphism is the set of permutations in that correspond to rotations that send each of the three lines to itself (but possibly rotating it through 180 degrees about the centre of the tetrahedron). And if you think about it a bit, you will see that the rotations that do that are the identity and the ones that take one of those three lines and rotate about it by 180 degrees. And those three rotations correspond to transposition pairs.
This second argument is longer and — at least the way I have phrased it (trying to avoid talking about group actions) — harder to understand. But it explains the normality of that subgroup in a way that means something and that can be visualized, as opposed to the other argument, which just does a calculation that mysteriously works.
5. Continuous functions.
You don’t meet continuous functions until next term. However, I think that if you read this section and skim over what you don’t understand, you will still get the point of what I am saying. And when you have come across continuous functions, then you may feel like coming back and having another look at this section.
The basic definition of a continuous function from to is this.
Fairly soon after that definition has been presented, it is customary to present the following result.
Since that is an equivalence, it can be used as an alternative definition. What are the signs that it is a more suitable definition? A rather simple and obvious one is that the proof so far should already have mentioned a convergent sequence, or, slightly less obviously, that you might like to bring in a convergent sequence later.
For example, there is a very useful tool in real analysis called the Bolzano-Weierstrass theorem. It says, “Every sequence in a closed bounded interval has a convergent subsequence.” If you’re planning to use that theorem, or have already used it, then the sequences definition of continuity is likely to be more convenient than the original definition.
Here is a second example. A well-known theorem of real analysis called the intermediate value theorem says that if is a continuous function and and then there is some between and such that . One proof of this result starts like this. We’ll define two sequences and as follows. [Already we have sequences, so already we should be thinking about using the sequence definition of continuity.] We start with and . If then define that to be and let , while if then define that to be and let . Continue this process, each time replacing one of and by the average and leaving the other one unchanged, and doing it in such a way that at each stage and .
A basic axiom for the real numbers tells us that and both converge, and simple results can be used to show that they converge to the same limit (something that might seem obvious, but it needs a proof). That limit is our , and it remains to prove that .
It is here that the sequence definition of continuity comes into play. Since converges to , we know that converges to . Since is always non-positive, so is . (This is another simple principle from real analysis.) Similarly, is always non-negative, which implies that is non-negative. The only number that is non-negative and non-positive is 0, so we are done.
It is perfectly possible to show that using the original definition of continuity, but the proof is longer and repeats some of the steps that are used to prove that the sequences definition follows from the original definition. Once one has made the effort to prove the sequences definition, one might as well use it.
The sequences definition is by no means the end of the story. For example, another basic result about continuous functions (that applies in a much more general context) is this.
And that has an equivalent formulation that is sometimes more convenient to use.
And sometimes there are alternative definitions that work in particular contexts. For example, there is a mathematical concept known as a normed space. By far the most useful definition of continuity in the theory of normed spaces is this — but it applies only to certain sorts of functions.
Again, it doesn’t matter if you haven’t the faintest idea what that means. The point is that it looks different from the usual definition, but it is equivalent to it in this particular context and is much easier to use.
6. Differentiable functions.
I am not going to give a list of alternative definitions of differentiability. Instead, this section is here to give me an excuse to direct you towards a famous essay of William Thurston that includes, near the beginning, a discussion of the numerous ways that mathematicians have of thinking about differentiation.