Here we look at some simple questions in probability: from tossing a coin, via Pascal's triangle, to the binomial distribution and the ubiquitous "bell curve".
"Let's flip for it! Heads or tails?" Tossing a coin is an age-old method for settling disputes, and for picking between a pair of outcomes. Even the Romans used to flip coins, although they would call navia aut caput ("ship or head").
The perceived fairness of the coin toss rests on the fact that either outcome -- head or tail -- is considered to be equally likely. Writing this mathematically,
p(head) = 0.5 = p(tail)
Here p(head) is the probability of the coin showing the head when it lands.
The probability of an event is a real number from 0 to 1, with 0 meaning "it never happens", and 1 meaning "it always happens". The probabilities of all events adds up to one, meaning that all possibilities are accounted for, i.e.
p(head) + p(tail) = 1
(Of course, the coin may land on its edge, or be swallowed by a seagull in mid-air, etc., although I've never seen this happen so, for simplicity, let's say the probability of all other eventualities is 0).
Now let's flip a coin twice in succession. What is the chance of getting two heads? Easy, it's 0.5 x 0.5 = 0.52 = 0.25. In other words, it should happen 1 time in 4.
To find the probability of two independent events occuring, we simply multiply together the probabilities associated with two individual events. Crucially, this works because the two events are considered to be independent: the result of the first event does not affect or influence the second event. In many real-life situations, when two rare events are wrongly assumed to be independent, multiplying their probabilities leads to underestimates for their combined probability, which can result in miscarriages of justice; or nuclear meltdowns!.
Likewise, the chance of getting two tails in succession is the same, 0.52 = 0.25, or 1-in-4.
What's the chance of getting one head and one tail? It is twice as likely. Why? Because there are two possible ways of this happening:
In this case, we add together the probabilities of the two possibilities. So the chance of getting one head and one tail is p = 0.25 + 0.25 = 0.5. That is, it happens every other time (1 in 2). Try it!
Now let's imagine tossing a coin three times. There are 2 x 2 x 2 = 23 = 8 possible outcomes, which can be written like this:
where (e.g.) HTT means "first we get a head, then a tail, then a tail".
Each of the eight possibilities is equally likely, so each has a probability of 1/8 = 0.125. To find the probabilities of getting "two heads and a tail", we simply add together the probabilities for HHT, HTH and THH. So:
Now let's imagine tossing a coin n times. We may write the probability like this:
p(number of heads, number of tosses)
Following the logic above, the formula for the probability of getting k heads in n tosses should be:
p(k, n) = (1/2)n C(k, n)
where C(k, n) is the number of different ways of arranging the characters in a string HHTHTTHTH... which features k H's and (n - k) T's.
The probabilities p(k, n) for all 0 <= k <= n describe a Binomial Distribution.
C(k, n) is known to mathematicians as the Binomial Coefficient, and is commonly denoted as . It is sometimes called "n choose k", expressing the idea that it is the number of different ways of choosing k items from n possibilities -- here, the k positions of the H's in a string of length n.
Pascal's triangle is made up of the binomial coefficients:
For instance, the fifth row is made up of the coefficients "4 choose k", reading across from left-to-right k=0, 1, 2, 3, 4.
The rows of Pascal's triangle -- and hence the binomial coefficients -- also make an appearance when we "multiply out brackets" in algebra, for example,
(a + b)4 = 1 a4 + 4 a3 b + 6 a2 b2 + 4 a b3 + 1 b4.
It should not be a surprise that binomial coefficients make an appearance here, since the coefficient of (e.g.) a2b2 is simply the number of ways of arranging two a's and two b's (e.g. aabb, abab, abba, baab, baba, bbaa).
Every number in Pascal's triangle is the sum of the two numbers immediately above it. This gives us a simple recursive method for calculating Pascal's triangle, and thus the set of all binomial coefficients. Alternatively, to compute an individual coefficient we can use the factorial formula given here.
This raises a question: roughly speaking, what is the minimum level of deviation from the average that should arouse our suspicion?
Before scientists at the LHC could announce the discovery of the Higgs boson, they first had to convince themselves that the signal was not arising merely "by chance", due to random fluctuations. In other words, they asked: was the measurement of the Higgs boson statistically significant? (Yes it was!)
Flipping a coin once is rather fun, but flipping it 1000 times is tedious! So to examine the statistics of multiple coin tosses, we can use a Python program, making use of the
First, we should import the random-number generator with
This gives the number of heads found in 1000 tosses of an (unbiased) coin, in 10 separate trials. You should find that the numbers are all in the vicinity of 500 -- and certainly nowhere near 700.
Let's try visualizing the results of these trials. One way is to use a frequency distribution, showing the number of heads on the x-axis, and the frequency of occurrence on the y-axis. Here's some Python code, that makes use of the pylab library for plotting a bar graph:
Here I have used 1000 trials, each consisting of 1000 coin tosses. A typical frequency diagram looks like this:
The distribution is centred around the "average" of 500, and it falls off on either side. The shape of distribution looks a bit like a "bell", but with some random fluctuations. The width of the bell, from shoulder to shoulder, seems to be around 40 (the majority of scores are in the range 480 to 520), and from foot-to-foot, around 100. All of the 1000 trials shown here have resulted in between ~455 and 555 heads.
Try reducing the number of tosses per trial (n), and the number of trials. What do you find?
The distribution is smoother; the fluctuations have been reduced. However, the width of "bell" has stayed approximately the same.
The average is ten times larger (obviously). However, the width of the bell has only increased by a factor of about 3 (from 40 to 120). Let's try to understand why.
To find the "mean" average, we should divide the sum of the number of heads in each trial by the number of trials.
Variance is a measure of the expected square of the width of the frequency distribution. It is defined as the mean average of the square of the deviation from the mean, in other words, the sum (taken over the number of trials) of the square of the deviation from the mean, divided by the number of trials.
The standard deviation is the square root of the variance. Roughly speaking, it should correspond to one-half of the width of the "bell" measured from shoulder-to-shoulder, or one-sixth measured from base-to-base. See the diagram of the bell curve below for an illustration.
Exercise: Write a Python program to compute the mean, the variance and the standard deviation of your simulation data.
Our observations above can be understood by considering the mean and variance of the Binomial Distribution.
The mean of the binomial distribution is n p. In our case, n is the number of tosses in each trial, and p=0.5 is the chance of a coin landing heads up. So the mean of 1000 tosses is 500, as expected.
The variance of a binomial distribution is n p (1 - p). So the standard deviation (in our case) is the square root of n, divided by 2. So with ten times more tosses, we expect the width of the distribution to increase by a factor of ~3.162... Thus the relative width decreases by a factor of 3.162...
By similar logic, the fluctuations in the distribution will have a standard deviation of sqrt(t) / 2, where t is the number of trials. Their relative magnitude will scale in proportion to 1/sqrt(t). So fluctuations diminish in relative magnitude as t is increased (just as we observed).
The bell curve, also known as the normal distribution is described by a formula involving the exponential function, and our familiar friend π:
It looks something like this:
Here μ is the mean, and σ is the standard deviation. We see that, from "shoulder to shoulder", the bell curve has a width of 2 σ. Roughly 68.2% of the area of the distribution is within one standard deviation of the mean.
We have already seen that the distribution of the number of heads somewhat resembles a bell curve. Why is this? As the number of tosses n goes towards infinity, the binomial distribution approaches the normal distribution! That is, the probability of getting k heads in n tosses, given by
p(k, n) = (1/2)n C(k, n)
(where C(k,n) is the binomial coefficient) tends towards the normal distribution formula, after inserting x = k, μ = n / 2, and σ2 = n / 4 in the formula above.
I've tried to illustrate this by plotting (i) simulation data [red bars], (ii) the binomial distribution [blue bars], and (iii) the normal distribution [green curve], on the same plot:
Exercise: use Python to generate a similar plot for yourself.