In my introductory post I talked about fake difficulties. It will take some time and several more posts before I can say what I really mean by that notion, but this post will get me a bit closer. So far I have mentioned that if you can’t solve a problem because you haven’t been bothered to look up what the words mean, then your difficulties are not genuine. A more interesting category of fake difficulties is a failure to grasp a few basic logical principles. I call this a fake difficulty even though for some people it is genuinely difficult; the reason I do so is that when mathematicians consider a problem to be hard, it is not for basic logical reasons. To put that another way, with a little bit of practice one can make basic logical deductions completely mechanically, and it is absolutely essential to learn how to do so. It is a simple skill (which is not to say that no work is needed), and it underlies all mathematical reasoning. Trying to understand university-level mathematics without a secure grasp of basic logic is like trying to learn long multiplication without knowing your tables — only a lot harder.
What am I talking about when I use the phrase “basic logic”? I am talking about having a good understanding of the following.
Logical connectives. The main ones are “and”, “or”, “not” and “implies”.
Quantifiers. The phrases “for all” and “there exists” come up a lot in mathematics and you have to be capable of dealing with sentences like this: for every there exists such that for every
Relationships between statements. Given a statement, you should have no trouble forming its negation, its converse and its contrapositive. Of course, for that you need to know what those three things are.
Each of the items above is a potential source of confusion. In particular, quantifiers take a lot of getting used to. However, when you are used to basic logical principles you will find that in many situations you can apply them in a purely mechanical way, sometimes without even needing to understand the statements to which you are applying them. In the next few posts I will try to help you develop that ability if you don’t already have it.
These posts, all with titles of the form “Basic logic — <subtitle>,” are not intended to be typical of the posts in this series, since they are introductory and not tied to any particular course. But basic logic is something that I have to get out of the way before we move on to the more interesting stuff. So let me now discuss a couple of logical connectives in more detail. I’ll save “not”, “implies”, quantifiers, and sentence relationships for later posts.
If you think your basic logic is not quite 100% secure, then I recommend that you read through all the posts that I shall put up on the topic. If put together, they are quite long, and therefore you probably won’t want to reread them too often, so I intend to write a brief summary of the main points, for easy reference later. But that summary will just give you the basic facts, without the justifications and explanations that will be presented in the posts.
AND and OR.
It may seem at first as though there is not much to say about the words “and” and “or”. After all, are they not just common English words that mathematicians happen to use as well? In a sense they are, but their meanings are subtly different from their usual everyday meanings, and it is important to understand the differences.
For the purposes of this discussion I am going to do something that you may not have seen before. You will be very familiar with the idea that letters can be substituted for numbers. But at university, letters (or other symbols) get substituted for absolutely everything. Here I am going to use letters to stand for statements. For some reason the traditional letters to use seem to be p and q, or sometimes P and Q.
How might we explain what “and” means? It seems difficult to do so without being circular and saying something like this: “P and Q” is true if and only if P is true and Q is true. I appear to have defined “and” using the word “and”. If I substitute actual sentences for P and Q I get something like this: “London is in England and Paris is in France” is true if and only if “London is in England” is true and “Paris is in France” is true, which is a bit of an odd observation to make.
And yet somehow it does tell me something. It tells me that if I want to establish the truth of the compound sentence “London is in England and Paris is in France” then it is enough to establish that London is in England and also to establish (possibly by a completely different method) that Paris is in France.
However, that observation is not enough to stop the example feeling odd. In my opinion, one gets a much better idea of the role of logical connectives in mathematics by considering statements with parameters. I’m talking here about statements like, “n is prime”, where all you know about n is that it is a positive integer. The statement “London is in England” is either true or false (and happens to be true). The statement “n is prime” involves the parameter n, and is true for some values of n, such as 13 and 17, and false for others, such as 14, 15 and 16.
The great advantage of thinking about statements with parameters is that their truth value is not fixed, but depends on the parameters. That this is an advantage will become particularly clear when we come to think about the word “implies”. But for now let us return to the word “and” and consider the statement, “n is prime and n is even.” For this to be true, we need n to be both a prime number and an even number, which tells us that n=2. For all other values of n it is false. For example, if n=3, then n is prime but n is not even, and because n is not even we cannot say “n is prime and n is even.” Similarly, if n=4 then the statement is false because n is not prime.
Here is another example that is perhaps even more typical of how mathematics uses the word “and”. One of the topics in the Numbers and Sets course is, needless to say, sets, and one of the definitions associated with sets is that of the intersection of two sets (which you may have met at school and represented in Venn diagrams). If A and B are two sets, then their intersection is defined to be the set of all objects x such that x belongs to A and x belongs to B. I am trying to avoid using symbols here, but for those who have seen them, the definition would normally be written like this:
Another way of expressing the same idea would be this: x belongs to if and only if x belongs to A and x belongs to B.
What does this tell us? Well, when we try to solve a problem, we usually have a statement that we are trying to prove, and some statements that for one reason or another we know to be true. Suppose that one of the statements we know to be true is that x belongs to the intersection or in symbols Then the meaning of the word “and” tells us that we can replace this information by the two bits of information that x belongs to A and that x belongs to B. That is, if we were to keep a list of statements that we knew to be true, and on that list we had
x belongs to
we would know that we could replace that list item by the two items
x belongs to .
x belongs to .
This might be a very useful thing to do: for example, perhaps the information that x belongs to A is used in one place in the argument and the information that x belongs to B is used in another place.
I still haven’t really demonstrated an interesting difference between the mathematical use of the word “and” and the everyday use. But here’s one. In everyday usage the following sentence makes good sense: “Jack and Jill went up the hill.” But in mathematics, at least when it is written formally, the word “and” connects statements (which is why it is called a logical connective) rather than connecting objects. So a mathematician would feel an urge to rewrite the well-known nursery rhyme in the following rather less catchy way: “Jack went up the hill and Jill went up the hill.” That’s not to say that you never see a sentence like “x and y are positive”, but you should be very aware that that is a different use of the word and should think of it as a useful shorthand for “x is positive and y is positive”.
The first thing to say about the word “or” as it is used in mathematics is that it is always the inclusive or rather than the exclusive or. In abstract terms, the statement “P or Q” is true if and only if P is true or Q is true or both. As with “and” one can illustrate this in a rather unilluminating way with examples such as the following.
London is in England or Paris is in France.
London is in England or Paris is in Pakistan.
London is in Scotland or Paris is in France.
London is in Scotland or Paris is in Pakistan.
Of those four sentences, the first three are true (because in each case at least one of the two constituent parts is true) and the fourth is false (because neither constituent part is true).
But again the real use of the word “or” is much better illustrated by statements with parameters. For instance, the statement “n is prime or n is of the form 4m+1″ is true if at least one of “n is prime” and “n is of the form 4m+1″ is true. (Here I’m imagining that n is a positive integer but m can be zero.) Let’s list the first few values of n for which this statement is true:
For all the other positive integers up to 60 it is false (unless I’ve made a slip). Had I been using the exclusive or (so called because the two statements “exclude” each other) then I would not have included 5, 13, 17, 29, 37, 41 or 53 because those number are both prime and of the form 4m+1. But I was following conventional mathematical practice and using the inclusive or, so I did include them.
Perhaps the single most important thing you need to know about “or” is how to respond to it if it comes up in proofs. For example, suppose you find that amongst the list of statements you know a statement of the form “P or Q”. In other words, you know that either P is true or Q is true (or possibly both). In most cases, P and Q will be statements with parameters: an example might be “n=2 or n is odd”, which would follow if you knew that n was prime.
If you know that “P and Q” is true, then you can add both P and Q to your list of things that you know and can use. But if all you know is that “P or Q” is true, then your task is more difficult. This time what you have to do is split your argument into cases. If you are given that “P or Q” is true, and you want to deduce R, then you must show that you can deduce R if you assume P (and whatever else you know) and that you can also deduce R if you assume Q (and whatever else you know). That way, since at least one of P and Q is guaranteed to be true, R must be true. So in a proof you might write something like this: “First let us assume that P. Then … blah blah … so R follows. Now let us assume that Q. Then … blah blah … and again R follows. Therefore the result is proved.”
What I mean when I say that you need to be on top of basic logic is that you should get to the stage where reacting to a statement of the form “P or Q” in the way I have just described is a pure reflex: you see that you know that either P or Q, so you just automatically go into “If P, then … and if Q, then …” mode.
What about if the statement “P or Q” is the statement you are trying to prove (as opposed to what I have just been discussing, when “P or Q” is one of the statements you are trying to use to prove something else)?
If P and Q are fixed statements, then you can prove “P or Q” by picking one of P and Q and proving that. For instance, if I want to prove the statement “There are infinitely many primes or every group of prime order is cyclic” then I could simply prove that there are infinitely many primes and have done with it. But that is not the situation that usually occurs. Usually when one wishes to prove “P or Q” it is a statement with parameters. For example, an important result in the Numbers and Sets course is this.
Theorem. Let p be a prime number and let a and b be positive integers. Suppose that p is a factor of ab. Then either p is a factor of a or p is a factor of b.
Before I say anything more about this, note that the result would be false if I were not using the inclusive or. For example, p could be 5, a could be 10 and b could be 15. Then p is a factor of ab (which equals 150) but it is a factor of both a and b. Part of the reason I’m mentioning this is that the word “either” suggests to many people that one is using the exclusive or, probably because in many ordinary-language contexts we use the word “either” to emphasize that the two alternatives we then go on to lay out are incompatible. An example: if I say, “I don’t know where I’m going to go on holiday next year. I think I’ll either go to the South of France or to Greece.” then it would be quite reasonable of you to interpret me as ruling out going to both places. But in mathematics the word either does not have this implication. If it sounds to you as though it does, then you need to get into the habit of ignoring the word whenever you see it.
Returning to the theorem, what is the statement we are trying to prove. It is “p is a factor of a or p is a factor of b.” Now because we don’t actually know anything about p, a or b, beyond the fact that p is a prime and a and b are positive integers, we can’t just say, “Right, I’m going to prove that p is a factor of a,” or “Right, I’m going to prove that p is a factor of b.” Instead what we have to do is more like this.
1. Assume that p is not a factor of a.
2. Draw some consequence of that assumption and use it to prove that p is a factor of b.
If we can do that, then the result is proved, since either p is a factor of a or the above argument can be used to show that p is a factor of b. The consequence one uses if p is not a factor of a is that the highest common factor of p and a is 1. I won’t say any more at this stage but in a future post I shall discuss this proof in detail.
Sometimes when one wishes to prove a statement of the form “P or Q” a slightly more complicated plan of attack works. It goes something like this.
1. Attempt to prove P, and try to identify what has to happen for your proof not to work.
2. Show that if that happens, then Q must be true.
Here is a simple example: the infinite pigeonhole principle. I will state the result and its proof.
Theorem. Suppose that the positive integers are each coloured either red or blue. Then either there are infinitely many red integers or there are infinitely many blue integers.
Proof. [First comes the attempt to prove one of the two statements.] Let us attempt to find infinitely many red integers. We’ll try to do it in the simplest way imaginable: we’ll just pick a red integer then a larger red integer then a larger red integer and so on. If we succeed, then we have produced an infinite sequence of distinct red integers and we have therefore obtained the first statement.
[Next comes the process of understanding what must be the case if the proof of the first statement doesn't work.] The only way we can fail is if for some k we find that we have identified red integers and cannot find any red integer that is bigger than (If we could find such an then we could have used it for so we wouldn’t have been stuck at )
[And now, having drawn a consequence of our failure to prove the first statement, we go ahead and use it to prove the second.] But in that case all the integers from onwards are blue, so in particular there are infinitely many blue integers.
The associativity of AND and OR.
There is a rule that I ought to mention briefly, even though you will either be aware of anyway or will happily use it without even thinking about it. It is the rule
and the companion rule
For “P or (Q or R)” to be true, we need at least one of P and “Q or R” to be true. And for “Q or R” to be true we need at least one of Q or R to be true. Putting all that together, we find that for “P or (Q or R)” to be true we need at least one of P, Q and R to be true. And clearly the same argument shows that the same condition needs to hold for the statement “(P or Q) or R” to be true. So we just drop the brackets and write “P or Q or R” and think of this as saying that at least one of the three statements is true.
Similarly, we don’t bother with brackets for repeated ANDs, writing “P and Q and R” for the statement that all three of P, Q and R are true. And similar considerations hold for larger collections of statements as well.
The distributivity of AND over OR and vice versa.
Consider the following sentence.
What does this mean? Does it mean this?
Or does it mean this?
The answer is that the sentence as written could mean either. Since mathematicians don’t like ambiguity, they do not allow themselves to write such sentences, or their symbolic equivalents, which are and
Before I talk about distributivity, let me briefly mention another context where there are expressions we are not allowed to write. Nobody writes because the numbers and are in general different. But hang on, you might say, the numbers and are in general different, but it is considered perfectly OK to write That is indeed true, and all I can say in response is that we have a very strongly established convention that when you’ve got a string of pluses and minuses, you bracket them in such a way that you do the calculations from the left. For instance, is understood to mean (More intuitively, it means “start with then take away then take away then add then take away then take away ) We could easily use an exactly analogous convention for division, so why don’t we? That’s a historical question, but I suspect the answer is closely related to the fact that we usually don’t use the division symbol , preferring to write things as fractions. With that notation, we can distinguish clearly between and . (Of course, we don’t write because that would go back to being ambiguous.)
Returning to AND and OR, that tells us that an expression like is invalid until it is given some brackets. So let’s do so and consider the expression And to do that, let us continue with analogies from arithmetic. (You might prefer the word “algebra” since I’ll use letters rather than numbers, but in university mathematics, the word “algebra” has rather different connotations and we’ll happily talk about arithmetic even if the numbers we’re adding, subtracting, multiplying and dividing are unknown and represented by letters.) The distributive law in arithmetic is the one that says that By a miracle, exactly the same rule applies to and That is,
Before I continue, let me say that you should always be suspicious of anything in mathematics that looks like a remarkable coincidence. Nearly always there is an explanation, and nearly always that explanation takes the form of some more general statement of which the two parts of the coincidence are separate manifestations. If you want something to think about, see whether you can come up with some general statement that contains the distributive law for times and plus and the distributive law for OR and AND as special cases. And if you’re interested in the question of whether there is such a thing as a genuine coincidence in mathematics, you might like to read a very nice article on the topic by Philip Davis.
Oh, and if you do try to find that general distributivity statement, then while you’re at it you might as well throw in the fact that AND is distributive over OR. That is, not only do we have the rule I’ve just stated, namely
but we also have the same thing with AND and OR reversed. That is,
That may come as a bit of a surprise, seeing that addition is not distributive over multiplication, by which I mean that it is not a correct rule that always equals This tells you that finding a general framework that encompasses times/plus and or/and can’t be too simple, since otherwise it wouldn’t explain why plus/times doesn’t work but and/or does.
I’ve just stated these distributivity rules, so before I stop, let me briefly discuss why they are true. Let’s go back to the sentence about primes, but this time put brackets in. Suppose we wanted to show that
Well, if we can show that is prime, then we’ll be in great shape, since that will make both brackets true, which is exactly what we need. But if we can’t show that is prime, then the only way of making both brackets true is to show that is prime and is prime. And now if you look carefully at what I’ve just written, you’ll see that I’m saying that it is both necessary and sufficient to show that
A similar justification can be given for the other distributivity law. That is, if you want to show that
then it is necessary and sufficient to show that
I recommend that you stare at these two sentences until you have convinced yourself that they are indeed saying precisely the same thing.
One last question to finish the post: why do you think that the conventional way of interpreting expressions like is what it is?