What is the relationship between the following two statements?
1. If is 1 or a prime number, then is divisible by .
2. If is divisible by then is 1 or a prime number.
At first sight, this doesn’t look a very difficult question: the first statement is of the form and the second is of the form We say that the second statement is the converse of the first. (Note that the first statement is also the converse of the second.)
However, if you have read the previous posts in this series, I hope you will be slightly anxious about what I have just written, because I have not been clear about whether the above two statements are referring to some specific or whether they are meant as general statements about all positive integers . The latter is, I would suggest, the natural interpretation. And if you look at the kinds of converses that come up in a maths course, you will find that they are almost always not of specific statements, but rather of general statements.
This means that just as there are two notions of implication, there are also two notions of converse. One is the one I mentioned just a moment ago: the converse of is . The other involves a universal quantifier: we take a statement of the form and its converse is . (We don’t have to have just one variable here. For instance, the converse of is .)
This is a potential source of confusion, since the definition of converse is often given as the first one (that the converse of is ), while almost all the interesting converses that actually come up in mathematics are of the second kind (where we universally quantify over one or more variables).
With that point behind us, let me discuss two examples of converses.
I’ll start with a simple one. Suppose you want to prove that two sets and are equal. The following is the criterion for this to be the case: every element of is an element of and every element of is an element of For convenience, let’s suppose that we know that and are both subsets of a set . (For example, we might know that and consist of positive integers. In that case, would be .) Then the statement that would break down into two statements that we could write using quantifiers in the following way.
The first of these tells us that every element of is an element of , and the second that every element of is an element of . And those two statements are converses of each other.
Let and be two integers (not necessarily positive). Let us define the set of integers generated by and to be the set of all integers of the form , where and are themselves integers. We could call this the set of all “integer combinations” of and . An important question that comes up in the number theory that you will do this term is this: under what circumstances is the set generated by and equal to the set of all integers? That is, when can you make any integer you want out of and by adding appropriate multiples?
To get some traction on this problem, it might be worth looking at some examples. Suppose that and . Can we make all integers by adding multiples of and ? Quite clearly not, since every integer we can make will be a multiple of 10. Obviously this example can be generalized: if and have a common factor that’s bigger than 1, then they cannot generate the whole of (since everything that they generate will be a multiple of and not all integers are multiples of ).
Let us express this observation more formally.
Note that this statement is of the general form
As such, it has a converse, obtained by reversing the roles of the statements and . In words, the converse is as follows.
Now converses of true statements don’t have to be true. For example, the statement, “Every multiple of 10 is even,” is true, and its converse, “Every even number is a multiple of 10,” is false. However, the converse I have just written down happens to be true: it really is the case that if two integers don’t generate the whole of then they must have a common factor bigger than 1. If you are not familiar with this fact, then I recommend pausing for a few seconds just to think how you might prove it.
And unless you are very unusually quick, you will have come to the conclusion that it is not at all obvious how to prove it. This may seem slightly mysterious: the two properties, “do not generate all of ” and “have a common factor bigger than 1” are equivalent, so in a sense the same property, and yet the second feels stronger than the first, in the sense that the first is easy to deduce from the second, while the second is not so easy to deduce from the first. I discussed this phenomenon in a blog post a few years ago, which provoked a number of very interesting comments.
Let and be two statements. Then the statement is true (here I’m using the truth-value definition) if and only if it is not the case that is true and is false.
When is true? Applying the same rule, the only thing that could stop it being true is if is true and is false. But that’s the same as saying that is true and is false. In other words, the conditions for to be true are precisely the same as the conditions for to be true. The statement is called the contrapositive of the statement . A statement and its contrapositive are always equivalent, but the importance of the contrapositive is that it is sometimes easier to prove than the original statement.
As ever, the situation is more interesting if we quantify over a parameter involved in a statement. Consider a statement of the form
For each the inner part of this statement is equivalent to . Therefore, the entire statement is equivalent to
That was a bit abstract, so let me look at a simple example. A surprisingly useful way of proving that two real numbers and are equal is to show that their difference is less than all positive numbers. That is, we rely on the following principle.
The statement I have just written is of the general form
Indeed, is the statement,
and is the statement .
Now the easiest way to prove this useful principle is to say the following. Suppose that . Then is some number that isn’t zero (since if , then adding to both sides gives us that ). But then is a positive number, which shows that the statement
is false. (The statement claims that a certain inequality holds for every positive number . But when it doesn’t hold.)
In that argument, we started with the assumption (in other words, we assumed that ) and ended up proving (in other words, we proved that there is some positive number that is not greater than ).
What makes it easier to prove the contrapositive in the cases when it is indeed easier? It’s a bit tricky to say in general, so I’m not going to try too hard. Instead, let me offer some stylistic advice, which is that you should always start by trying to prove a statement in the obvious direct way, turning to the contrapositive if (i) you get stuck and (ii) you can see why you will be less stuck if you try to prove the contrapositive. Let’s see how that advice would have played out in the example above.
We would have started trying to prove the statement directly. So we’d have said, “Right, I’ve got two numbers and and I get to assume that their difference is smaller than any number I choose. Hmm … but what number do I choose?” At that point it wouldn’t be all that obvious. Note that one number we can’t choose is , since we don’t know that that number is positive. In fact, we’re trying to prove that it is not positive!
At that point, feeling stuck, we wonder whether starting with the assumption that might help. And we see that it does, since it provides us with a positive real number, namely the difference between and . So when we decide to try to prove the contrapositive instead, we are not doing so merely because we are stuck — there are plenty of times when we get stuck where turning to the contrapositive is clearly of no help whatsoever — but for the additional reason that gives us something that we needed in order to make progress.
You may have heard this method called “proof by contradiction”. That isn’t strictly speaking correct (though this is a point that I myself didn’t understand until recently). There are two sorts of proof that make use of the negation of the conclusion. Just to give names to things, let’s suppose we are trying to prove the statement
One type of proof is what I illustrated above: you simply prove the contrapositive instead. That is, you prove
Another type of proof is this. You assume that the result is false. That is, you assume
and from that you proceed to deduce a contradiction: that is, you deduce a statement and its negation. There is an interesting discussion of this difference at Mathoverflow, which also contains a link to a highly recommended blog post of Terence Tao.
More on converses.
Now that we have talked about contrapositives, I can say something further about converses. We know that a statement is equivalent to its contrapositive. What happens if we start with a statement and look at the contrapositive of its converse? It will of course be equivalent to the converse, but what will it actually look like?
Well, if the original statement is , then the converse is , and the contrapositive of that is . What this tells us is that is another way of formulating the converse of . Or at least, that’s the bare truth-value way of putting it. For the more typical cases of interest, I would want to say that we can formulate the converse of
The practical consequence of this is that if you are asked to prove a statement that looks like this:
and you start by showing that for some arbitrary (which means that you’ve proved it for all ), then you have two options. Either you can say, “Conversely, let us assume that ,” and proceed to try to deduce , or you can say, “Conversely, let us assume that ,” and proceed to try to deduce . The one thing not to do is say, “Conversely, let us assume that ,” and proceed to try to prove . If you do that, then you are proving the contrapositive of what you proved before — i.e., basically the same result — and not the converse. I have seen this mistake many times. I’ve even made it myself, though I think I’ve always noticed before I’ve got very far. I recommend that if you ever decide to prove the converse of a statement by considering negations of the component parts, you check very carefully that you’ve got things the right way round.
As an example of this, consider the pair of statements with which I began this post. Suppose that we want to prove the following equivalence.
We might start by trying to show that (i) implies (ii). This turns out not to be obvious: it is a theorem that will be covered in the Numbers and Sets course, known as Wilson’s theorem. But suppose we’ve finished that and we wonder about the converse. If we go for the option, then we’ll be using a weird condition — that is divisible by — and trying to deduce from that a rather negative property of — that it hasn’t got any non-trivial factors. If on the other hand we go for the option then we’re much better off. This time our hypothesis is that is composite, which gives us some factors to play around with, and instead of being forced to use a weird condition we merely have to disprove it, which is somehow an easier task.
Indeed, the whole deduction turns out to be easy this way round: if isn’t prime then we can write with both and less than . So if we write the numbers from 1 to out in a line, then and will be amongst them, from which it follows that is a multiple of , which equals . Since it’s not possible for both and to be multiples of , we have proved that the strange condition doesn’t hold, just as we wanted.
Or have we? Can you see a mistake in the above argument? If you can’t, then there is an important moral: don’t be fooled by your notation into making assumptions that you haven’t justified. In the above argument I assumed that and were different: if I hadn’t then it wouldn’t have followed that was a multiple of .
There are two ways of correcting the argument. One is to note that we can always write with and different unless is either a prime or the square of a prime. But if for some prime then both and are less than — except if — so is a multiple of . If then we just look and see that , which is not a multiple of 4.
The other way of correcting the argument is cleaner because it doesn’t involve splitting into cases. We simply observe that if is a factor of that’s less than , then is also a factor of . Therefore is not a factor of . (I’m also assuming of course that .) But is a factor of any multiple of , so cannot be a multiple of .
So here we have an example of an equivalence with two properties mentioned above: one direction is easier than the other, and to prove the converse of a statement it is easier to show that than it is to show that .