2 The Poisson distribution
The Poisson distribution is a probability model which can be used to find the probability of a single event occurring a given number of times in an interval of (usually) time. The occurrence of these events must be determined by chance alone which implies that information about the occurrence of any one event cannot be used to predict the occurrence of any other event. It is worth noting that only the occurrence of an event can be counted; the non-occurrence of an event cannot be counted. This contrasts with Bernoulli trials where we know the number of trials, the number of events occurring and therefore the number of events not occurring.
The Poisson distribution has widespread applications in areas such as analysing traffic flow, fault prediction in electric cables, defects occurring in manufactured objects such as castings, email messages arriving at a computer and in the prediction of randomly occurring events or accidents. One well known series of accidental events concerns Prussian cavalry who were killed by horse kicks. Although not discussed here (death by horse kick is hardly an engineering application of statistics!) you will find accounts in many statistical texts. One example of the use of a Poisson distribution where the events are not necessarily time related is in the prediction of fault occurrence along a long weld - faults may occur anywhere along the length of the weld. A similar argument applies when scanning castings for faults - we are looking for faults occurring in a volume of material, not over an interval if time.
The following definition gives a theoretical underpinning to the Poisson distribution.
2.1 Definition of a Poisson process
Suppose that events occur at random throughout an interval. Suppose further that the interval can be divided into subintervals which are so small that:
- the probability of more than one event occurring in the subinterval is zero
- the probability of one event occurring in a subinterval is proportional to the length of the subinterval
- an event occurring in any given subinterval is independent of any other subinterval
then the random experiment is known as a Poisson process .
The word ‘process’ is used to suggest that the experiment takes place over time, which is the usual case. If the average number of events occurring in the interval (not subinterval) is then the random variable representing the actual number of events occurring in the interval is said to have a Poisson distribution and it can be shown (we omit the derivation) that
The following Key Point provides a summary.
Key Point 7
The Poisson Probabilities
If is the random variable
‘number of occurrences in a given interval’
for which the average rate of occurrence is then, according to the Poisson model, the probability of occurrences in that interval is given by
Task!
Using the Poisson distribution write down the formulae for and noting that
Task!
Calculate to when , accurate to 4 d.p.
0 | 1 | 2 | 3 | 4 | 5 | ||
0.1353 | 0.2707 | 0.2707 | 0.1804 | 0.0902 | 0.0361 |
Notice how the values for in the above answer increase, stay the same and then decrease relatively rapidly (due to the significant increase in with increasing ). Here two of the probabilities are equal and this will always be the case when is an integer.
In this last Task we only went up to and calculated each entry separately. However, each probability need not be calculated directly. We can use the following relations (which can be checked from the formulae for ) to get the next probability from the previous one:
Key Point 8
Recurrence Relation for Poisson Probabilities
In general, for ease of calculation the recurrence relation below can be used
Example 15
Calculate the value for to extend the Table in the previous Task using the recurrence relation and the value for .
Solution
The recurrence relation gives the formula
We now look further at the Poisson distribution by considering an example based on traffic flow.
Example 16
Suppose it has been observed that, on average, 180 cars per hour pass a specified point on a particular road in the morning Ôrush hour.Õ Due to impending roadworks it is estimated that congestion will occur closer to the city centre if more than 5 cars pass the point in any one minute. What is the probability of congestion occurring?
Solution
We note that we cannot use the binomial model since we have no values of and . Essentially we are saying that there is no fixed number ( ) of cars passing the specified point and that we have no way of estimating . The only information available is the average rate at which cars pass the specified point.
Let be the random variable = number of cars arriving in any minute. We need to calculate the probability that more than 5 cars arrive in any one minute. Note that in order to do this we need to convert the information given on the average rate (cars arriving per hour) into a value for (cars arriving per minute). This gives the value .
Using to calculate the required probabilities gives:
0 | 1 | 2 | 3 | 4 | 5 | Sum | |
0.04979 | 0.149361 | 0.22404 | 0.22404 | 0.168031 | 0.10082 | 0.91608 | |
To calculate the required probability we note that
Thus
Then (more than 5) (4 d.p).
Example 17
The mean number of bacteria per millilitre of a liquid is known to be 6. Find the probability that in 1 ml of the liquid, there will be:
- 0,
- 1,
- 2,
- 3,
- less than 4,
- 6 bacteria.
Solution
Here we have an average rate of occurrences but no estimate of the probability so it looks as though we have a Poisson distribution with . Using the formula in Key Point 7 we have:
-
.
That is, the probability of having no bacteria in 1 ml of liquid is 0.00248
-
.
That is, the probability of having 1 bacteria in 1 ml of liquid is 0.0149
-
.
That is, the probability of having 2 bacteria in 1 ml of liquid is 0.0446
-
.
That is, the probability of having 3 bacteria in 1 ml of liquid is 0.0892
Note that in working out the first 6 answers, which link together, all the digits were kept in the calculator to ensure accuracy. Answers were rounded off only when written down.
Never copy down answers correct to, say, 4 decimal places and then use those rounded figures to calculate the next figure as rounding-off errors will become greater at each stage. If you did so here you would get answers 0.0025, 0.0150, 0.0450, 0.9000 and . The difference is not great but could be significant.
Task!
A Council is considering whether to base a recovery vehicle on a stretch of road to help clear incidents as quickly as possible. The road concerned carries over 5000 vehicles during the peak rush hour period. Records show that, on average, the number of incidents during the morning rush hour is 5. The Council won’t base a vehicle on the road if the probability of having more than 5 incidents in any one morning is less than 30%. Based on this information should the Council provide a vehicle?
We need to calculate the probability that more than 5 incidents occur i.e. . To find this we use the fact that . Now, for this problem:
Writing answers to 5 d.p. gives:
The probability of more than 5 incidents is , which is 38.4% (to 3 s.f.) so the Council should provide a vehicle.