High School Mathematics Extensions/Discrete Probability

From testwiki
Jump to navigation Jump to search

Template:High School Mathematics Extensions/TOC

Introduction

Probability theory is one of the most widely applicable mathematical theories. It deals with uncertainty and teaches you how to manage it. It is simply one of the most useful theories you will ever learn.

Please do not misunderstand. We are not learning to predict things, rather we learn to utilise predicted chances and make them useful. Therefore, we don't care, what is the probability it will rain tomorrow?, but given the probability is 60% we can make deductions, the easiest of which is the probability it will not rain tomorrow is 40%.

As suggested above, a probability is a percentage and it's between 0% and 100% (inclusive). Mathematicians like to express a probability as a proportion i.e. as a number between 0 and 1.

info - Why discrete?

Probability comes in two flavours, discrete and continuous. The continuous case is considered to be far more difficult to understand, and much less intuitive, than discrete probability and it requires knowledge of calculus. But we will touch on a little bit of the continuous case later on in the chapter.

Event and Probability

Roughly, an event is something we can assign a probability to. For example the probability it will rain tomorrow is 0.6, in here the event is it will rain tomorrow the assigned probability is 0.6. We can write

P(it will rain tomorrow) = 0.6

as mathematicians like to do we can use abstract letters to represent events. In this case we choose A to represent the event it will rain tomorrow, so the above expression can be written as

P(A) = 0.6

Another example a fair die will turn up 1, 2, 3, 4, 5 or 6 equally probably each time it is tossed. Let B be the event that it turns up 1 in the next toss, we write

P(B) = 1/6

Misconception

Please note that the probability 1/6 does not mean that it will turns up 1 in at most six tries. Its precise meaning will be discussed later on in the chapter. Roughly, it just means that on the long run (i.e. the die being tossed a large number of times), the proportion of 1's will be very close to 1/6.

Impossible and Certain events

Two types of events are special. One type are the impossible events (e.g., the sum of digits of a two-digit number is greater than 18); the other type are certain to happen (e.g., a roll of a die will turn up 1, 2, 3, 4, 5 or 6). The probability of an impossible event is 0, while that of a certain event is 1. We write

P(Impossible event) = 0
P(Certain event) = 1

The above reinforces a very important principle concerning probability. Namely, the range of probability is between 0 and 1. You can never have a probability of 2.5! So remember the following

0P(E)1

for all events E.

Complement of an event

A most useful concept is the complement of an event. We use :B to represent the event that the die will NOT turn up 1 in the next toss. Generally, putting a bar over a variable (that represents an event) means the opposite of that event. In the above case of a die:

P(B)=5/6

it means the die will turn up 2, 3, 4, 5 or 6 in the next toss has probability 5/6. Please note that

P(E)=1P(E)

for any event E.

Combining independent probabilities

It is interesting how independent probabilities can be combined to yield probabilities for more complex events. I stress the word independent here, because the following demonstrations will not work without that requirement. The exact meaning of the word will be discussed a little later on in the chapter, and we will show why independence is important in Exercise 10 of this section.

Adding probabilities

Probabilities are added together whenever an event can occur in multiple "ways." As this is a rather loose concept, the following example may be helpful. Consider rolling a single die; if we want to calculate the probability for, say, rolling an odd number, we must add up the probabilities for all the "ways" in which this can happen -- rolling a 1, 3, or 5. Consequently, we come to the following calculation:

P(rolling an odd number) = P(rolling a 1) + P(rolling a 3) + P(rolling a 5) = 1/6 + 1/6 + 1/6 = 3/6 = 1/2 = 50%

Note that the addition of probabilities is often associated with the use of the word "or" -- whenever we say that some event E is equivalent to any of the events X, Y, or Z occurring, we use addition to combine their probabilities.

A general rule of thumb is that the probability of an event and the probability of its complement must add up to 1. This makes sense, since we intuitively believe that events, when well-defined, must either happen or not happen.

Multiplying probabilities

Probabilities are multiplied together whenever an event occurs in multiple "stages" or "steps." For example, consider rolling a single die twice; the probability of rolling a 6 both times is calculated by multiplying the probabilities for the individual steps involved. Intuitively, the first step is simply the first roll, and the second step is the second roll. Therefore, the final probability for rolling a 6 twice is as follows:

P(rolling a 6 twice) = P(rolling a 6 the first time)×P(rolling a 6 the second time) = 16×16 = 1/36 2.8%

Similarly, note that the multiplication of probabilities is often associated with the use of the word "and" -- whenever we say that some event E is equivalent to all of the events X, Y, and Z occurring, we use multiplication to combine their probabilities (if they are independent).

Also, it is important to recognize that the product of multiple probabilities must be less than or equal to each of the individual probabilities, since probabilities are restricted to the range 0 through 1. This agrees with our intuitive notion that relatively complex events are usually less likely to occur.

Combining addition and multiplication

It is often necessary to use both of these operations simultaneously. Once again, consider one die being rolled twice in succession. In contrast with the previous case, we will now consider the event of rolling two numbers that add up to 3. In this case, there are clearly two steps involved, and therefore multiplication will be used, but there are also multiple ways in which the event under consideration can occur, meaning addition must be involved as well. The die could turn up 1 on the first roll and 2 on the second roll, or 2 on the first and 1 on the second. This leads to the following calculation:

P(rolling a sum of 3) = P(1 on 1st roll)×P(2 on 2nd roll) + P(2 on 1st roll)×P(1 on 2nd roll) = 16×16 + 16×16 = 1/18 5.5%

This is only a simple example, and the addition and multiplication of probabilities can be used to calculate much more complex probabilities.

Exercises

Let A represent the number that turns up in a (fair) die roll, let C represent the number that turns up in a separate (fair) die roll, and let B represent a card randomly picked out of a deck:

1. A die is rolled. What is the probability of rolling a 3 i.e. calculate P(A = 3)?

2. A die is rolled. What is the probability of rolling a 2, 3, or 5 i.e. calculate P(A = 2) + P(A = 3) + P(A = 5)?

3. What is the probability of choosing a card of the suit Diamonds (in a 52-card deck)?

4. A die is rolled and a card is randomly picked from a deck of cards. What is the probability of rolling a 4 and picking the Ace of spades, i.e. calculate P(A = 4)×P(B = Ace of spades).

5. Two dice are rolled. What is the probability of getting a 1 followed by a 3?

6. Two dice are rolled. What is the probability of getting a 1 and a 3, regardless of order?

7. Calculate the probability of rolling two numbers that add up to 7.

8. (Optional) Show the probability of C is equal to A is 1/6.

9. What is the probability that C is greater than A?

10. Gareth was told that in his class 50% of the pupils play football, 30% play video games and 30% study mathematics. So if he was to choose a student from the class randomly, he calculated the probability that the student plays football, video games or studies mathematics is 50% + 30% + 30% = 1/2 + 3/10 + 3/10 = 11/10. But all probabilities should be between 0 and 1. What mistake did Gareth make?

Solutions

1. P(A = 3) = 1/6

2. P(A = 2) + P(A = 3) + P(A = 5) = 1/6 + 1/6 + 1/6 = 1/2

3. P(B = Ace of Diamonds) + ... + P(B = King of Diamonds) = 13 × 1/52 = 1/4

4. P(A = 4) × P(B = Ace of Spades) = 1/6 × 1/52 = 1/312

5. P(A = 1) × P(A = 3) = 1/36

6. P(A = 1) × P(A = 3) + P(A = 3) × P(A = 1) = 1/36 + 1/36 = 1/18

7. Here are the possible combinations: 1 + 6 = 2 + 5 = 3 + 4 = 7. Probability of getting each of the combinations are 1/18 as in Q6. There are 3 such combinations, so the probability is 3 × 1/18 = 1/6.

9. Since both dice are fair, C > A is just as likely as C < A. So

P(C > A) = P(C < A)

and

P(C > A) + P(C < A) + P(A = C) = 1

But

P(A = C) = 1/6

so P(C > A) = 5/12.

10. For example, some of those 50% who play football may also study mathematics. So we can not simply add them.

Random Variables

A random experiment, such as throwing a die or tossing a coin, is a process that produces some uncertain outcome. We also require that a random experiments can be repeated easily. In this section we shall start using a capital letter to represent the outcome of a random experiment. For example, let D be the outcome of a die roll, D could take the value 1, 2, 3, 4, 5 or 6, but it is uncertain. We say D is a random variable. Suppose now I throw a die, and it turns up 5, we say the observed value of D is 5.

A random variable is simply the outcome of a certain random experiment. It is usually denoted by a CAPITAL letter, but its observed value is not. For example let

D1,D2,...,Dn

denote the outcome of n die throws, then we usually use

d1,d2,...,dn

to denoted the observed values of each of Di's.

From here on, random variable may be abbreviated as simply rv (a common abbreviation in other probability literatures).

The Bernoulli

This section is optional and it assumes knowledge of binomial expansion.

A Bernoulli experiment is basically a "coin-toss". If we toss a coin, we will expect to get a head or a tail equally probably. A Bernoulli experiment is slightly more versatile than that, in that the two possible outcomes need not have the same probability.

In a Bernoulli experiment you will either get a

success, denoted by 1, with proability p (where p is a number between 0 and 1)

or a

failure, denoted by 0, with probaility 1 - p.

If the random variable B is the outcome of a Bernoulli experiment, and the probability of getting a 1 is p, we say B comes from a Bernoulli distribution with success probability p and we write:

BBer(p)

For example, if

CBer(0.65)

then

P(C = 1) = 0.65

and

P(C = 0) = 1 - 0.65 = 0.35

Binomial Distribution

Suppose we want to repeat the Bernoulli experiment n times, then we get a binomial distribution. For example:

CiBer(p)

for i = 1, 2, ... , n. That is, there are n variables C1, C2, ... , Cn and they all come from the same Bernoulli distribution. We consider:

B=C1+C2+...+Cn

, then B is simply the rv that counts the number of successes in n trials (experiments). Such a variables is called a binomial variable, and we write

BB(n,p)

Example 1

Aditya, Gareth, and John are equally able. Their probability of scoring 100 in an exam follows a Bernoulli distribution with success probability 0.9. What is the probability of

i) One of them getting 100?
ii) Two of them getting 100?
iii) All 3 getting 100?
iv) None getting 100?

Solution

We are dealing with a binomial variable, which we will call B. And

BBin(3,0.9)

i) We want to calculate

P(B=1)

The probability of any of them getting 100 (success) and the other two getting below 100 (failure) is

0.9×0.1×0.1=0.009

but there are 3 possible candidates for getting 100 so

P(B=1)=3×0.009=0.027

ii) We want to calculate

P(B=2)

The probability is

0.9×0.9×0.1=0.081

but there are (32) combinations of candidates for getting 100, so

P(B=2)=(32)×0.081=0.243

iii) To calculate

P(B=3)=0.9×0.9×0.9=0.729

iv) The probability of "None getting 100" is getting 0 success, so

P(B=0)=0.1×0.1×0.1=0.001

The above example strongly hints at the fact the binomial distribution is connected with the binomial expansion. The following result regarding the binomial distribution is provided without proof, the reader is encouraged to check its correctness.

If

BBin(n,p)

then

P(B=k)=(nk)pk(1p)nk

This is the kth term of the binomial expansion of (p + q)n, where q = 1 - p.

Exercises ...

Distribution

...

Events

In the previous sections, we have slightly abused the use of the word event. An event should be thought of as a collection of random outcomes of a certain rv.

Let us introduce some notations first. Let A and B be two events, we define

AB

to be the event of A and B. We also define

AB

to be the event of A or B. As demonstrated in exercise 10 above,

P(AB)P(A)+P(B)

in general.

Let's see some examples. Let A be the event of getting a number less than or equal to 4 when tossing a die, and let B be the event of getting an odd number. Now

P(A) = 2/3

and

P(B) = 1/2

but the probability of A or B does not equal to the sum of the probabilities, as below

P(AB)P(A)+P(B)=12+23=76

as 7/6 is greater than 1.

It is not difficult to see that the event of throwing a 1 or 3 is included in both A and B. So if we simply add P(A) and P(B), some events' probabilities are being added twice!

The Venn diagram below should clarify the situation a little more,

A or B

think of the blue square as the probability of B and the yellow square as the probability of A. These two probabilities overlap, and where they do is the probability of A and B. So the probability of A or B should be:

P(AB)=P(A)+P(B)P(AB)

The above formula is called the Simple Inclusion Exclusion Formula.

If for events A and B, we have

P(AB)=0

we say A and B are disjoint. The word means to separate. If two events are disjoint we have the following Venn diagram representing them:

A and B are disjoint

info -- Venn Diagram

Traditionally, Venn Diagrams are used to illustrate sets graphically. A set being simply a collection of things, e.g. {1, 2, 3} is a set consisting of 1, 2 and 3. Note that Venn diagrams are usually drawn round. It is generally very difficult to draw Venn diagrams for more than 3 intersecting sets. E.g. below is a Venn diagram showing four intersecting sets:

4 intersecting sets

Expectation

The expectation of a random variable can be roughly thought of as the long term average of the outcome of a certain repeatable random experiment. By long term average it is meant that if we perform the underlying experiment many times and average the outcomes. For example, let D be as above, the observed values of D (1,2 ... or 6) are equally likely to occur. So if you were to toss the die a large number of times, you would expect each of the numbers to turn up roughly an equal number of times. So the expectation is

1+2+3+4+5+66=3.5

. We denote the expection of D by E(D), so

E(D)=3.5

We should now properly define the expectation.

Consider a random variable R, and suppose the possible values it can take are r1, r2, r3, ... , rn. We define the expectation to be

E(R)=r1P(R=r1)+r2P(R=r2)+...+rnP(R=rn)

Think about it: Taking into account the expectation is the long term average of the outcomes. Can you explain why is E(R) defined the way it is?

Example 1 In a fair coin toss, let 1 represent tossing a head and 0 a tail. The same coin is tossed 8 times. Let C be a random variable representing the number of heads in 8 tosses? What is the expectation of C, i.e. calculate E(C)?

Solution 1 ...

Solution 2 ...

Areas as probability

The uniform distributions. ... ........ ...

Order Statistics

Estimate the x in U[0, x]. ...

Addition of the Uniform distribution

Adding U[0,1]'s and introduce the CLT. ....

to be continued ...

Template:High School Mathematics Extensions/Suggestions