1 The Poisson approximation to the binomial distribution

The probability of the outcome X = r of a set of Bernoulli trials can always be calculated by using the formula

P ( X = r ) = n C r q n r p r

given above. Clearly, for very large values of n the calculation can be rather tedious, this is particularly so when very small values of p are also present. In the situation when n is large and p is small and the product n p is constant we can take a different approach to the problem of calculating the probability that X = r . In the table below the values of P ( X = r ) have been calculated for various combinations of n and p under the constraint that n p = 1 . You should try some of the calculations for yourself using the formula given above for some of the smaller values of n .

Probability of X successes

n p X = 0 X = 1 X = 2 X = 3 X = 4 X = 5 X = 6
4 0.25 0.316 0.422 0.211 0.047 0.004
5 0.20 0.328 0.410 0.205 0.051 0.006 0.000
10 0.10 0.349 0.387 0.194 0.058 0.011 0.001 0.000
20 0.05 0.359 0.377 0.189 0.060 0.013 0.002 0.000
100 0.01 0.366 0.370 0.185 0.061 0.014 0.003 0.001
1000 0.001 0.368 0.368 0.184 0.061 0.015 0.003 0.001
10000 0.0001 0.368 0.368 0.184 0.061 0.015 0.003 0.001

Each of the binomial distributions given has a mean given by n p = 1 . Notice that the probabilities that X = 0 , 1 , 2 , 3 , 4 , approach the values 0.368 , 0.368 , 0.184 , as n increases.

If we have to determine the probabilities of success when large values of n and small values of p are involved it would be very convenient if we could do so without having to construct tables. In fact we can do such calculations by using the Poisson distribution which, under certain constraints, may be considered as an approximation to the binomial distribution.

By considering simplifications applied to the binomial distribution subject to the conditions

  1. n is large
  2. p is small
  3. n p = λ ( λ a constant)

we can derive the formula

P ( X = r ) = e λ λ r r ! as an approximation to P ( X = r ) = n C r q n r p r .

This is the Poisson distribution given previously. We now show how this is done. We know that the binomial distribution is given by

( q + p ) n = q n + n q n 1 p + n ( n 1 ) 2 ! q n 2 p 2 + + n ( n 1 ) ( n r + 1 ) r ! q n r p r + + p n

Condition (2) tells us that since p is small, q = 1 p is approximately equal to 1. Applying this to the terms of the binomial expansion above we see that the right-hand side becomes

1 + n p + n ( n 1 ) 2 ! p 2 + + n ( n 1 ) ( n r + 1 ) r ! p r + + p n

Applying condition (1) allows us to approximate terms such as ( n 1 ) , ( n 2 ) , to n (mathematically, we are allowing n ) and the right-hand side of our expansion becomes

1 + n p + n 2 2 ! p 2 + + n r r ! p r +

Note that the term p n 0 under these conditions and hence has been omitted.

We now have the series

1 + n p + ( n p ) 2 2 ! + + ( n p ) r r ! +

which, using condition (3) may be written as

1 + λ + ( λ ) 2 2 ! + + ( λ ) r r ! +

You may recognise this as the expansion of e λ .

If we are to be able to claim that the terms of this expansion represent probabilities, we must be sure that the sum of the terms is 1. We divide by e λ to satisfy this condition. This gives the result

e λ e λ = 1 = 1 e λ ( 1 + λ + ( λ ) 2 2 ! + + ( λ ) r r ! + )

= e λ + e λ λ + e λ λ 2 2 ! + e λ λ 3 3 ! + + e λ λ r r ! + +

The terms of this expansion are very good approximations to the corresponding binomial expansion under the conditions

  1. n is large
  2. p is small
  3. n p = λ  ( λ constant)

The Poisson approximation to the binomial distribution is summarized below.

Key Point 6

Poisson Approximation to the Binomial Distribution

Assuming that n is large, p is small and that n p is constant, the terms

P ( X = r ) = n C r ( 1 p ) n r p r

of a binomial distribution may be closely approximated by the terms

P ( X = r ) = e λ λ r r !

of the Poisson distribution for corresponding values of r.

Example 12

We introduced the binomial distribution by considering the following scenario. A worn machine is known to produce 10% defective components. If the random variable X is the number of defective components produced in a run of 3 components, find the probabilities that X takes the values 0 to 3.

Suppose now that a similar machine which is known to produce 1% defective components is used for a production run of 40 components. We wish to calculate the probability that two defective items are produced. Essentially we are assuming that X B ( 40 , 0.01 ) and are asking for P ( X = 2 ) . We use both the binomial distribution and its Poisson approximation for comparison.

Solution

Using the binomial distribution we have the solution

P ( X = 2 ) = 40 C 2 ( 0.99 ) 40 2 ( 0.01 ) 2 = 40 × 39 1 × 2 × 0.9 9 38 × 0.0 1 2 = 0.0532

Note that the arithmetic involved is unwieldy. Using the Poisson approximation we have the solution

P ( X = 2 ) = e 0.4 0 . 4 2 2 ! = 0.0536

Note that the arithmetic involved is simpler and the approximation is reasonable.

1.1 Practical considerations

In practice, we can use the Poisson distribution to very closely approximate the binomial distribution provided that the product n p is constant with

n 100 and p 0.05

Note that this is not a hard-and-fast rule and we simply say that

‘the larger n is the better and the smaller p is the better provided that n p is a sensible size.’ 

The approximation remains good provided that n p < 5 for values of n as low as 20.

Task!

Mass-produced needles are packed in boxes of 1000. It is believed that 1 needle in 2000 on average is substandard. What is the probability that a box contains 2 or more defectives? The correct model is the binomial distribution with n = 1000 , p = 1 2000 (and q = 1999 2000 ).

  1. Using the binomial distribution calculate P ( X = 0 ) , P ( X = 1 ) and hence P ( X 2 ) :

    P ( X = 0 ) = 1999 2000 1000 = 0.60645

    P ( X = 1 ) = 1000 1999 2000 999 × 1 2000 = 1 2 1999 2000 999 = 0.30338

    P ( X = 0 ) + P ( X = 1 ) = 0.60645 + 0.30338 = 0.90983 0.9098 (4 d.p.)

    Hence P (2 or more defectives) 1 0.9098 = 0.0902 .

  2. Now choose a suitable value for λ in order to use a Poisson model to approximate the probabilities:

    λ = n p = 1000 × 1 2000 = 1 2

    Now recalculate the probability that there are 2 or more defectives using the Poisson distribution with λ = 1 2 :

    P ( X = 0 ) = e 1 2 , P ( X = 1 ) = 1 2 e 1 2

    P ( X = 0 ) + P ( X = 1 ) = 3 2 e 1 2 = 0.9098 (4 d.p.)

    Hence P (2 or more defectives) 1 0.9098 = 0.0902 .

In the above Task we have obtained the same answer to 4 d.p., as the exact binomial calculation, essentially because p was so small. We shall not always be so lucky!

Example 13

In the manufacture of glassware, bubbles can occur in the glass which reduces the status of the glassware to that of a ‘second’. If, on average, one in every 1000 items produced has a bubble, calculate the probability that exactly six items in a batch of three thousand are seconds.

Solution

Suppose that X = number of items with bubbles, then X B ( 3000 , 0.001 )

Since n = 3000 > 100 and p = 0.001 < 0.005 we can use the Poisson distribution with λ = n p = 3000 × 0.001 = 3 . The calculation is:

P ( X = 6 ) = e 3 3 6 6 ! 0.0498 × 1.0125 0.05

The result means that we have about a 5% chance of finding exactly six seconds in a batch of three thousand items of glassware.

Example 14

A manufacturer produces light-bulbs that are packed into boxes of 100. If quality control studies indicate that 0.5% of the light-bulbs produced are defective, what percentage of the boxes will contain:

  1. no defective?
  2. 2 or more defectives?
Solution

As n is large and p , the P (defective bulb), is small, use the Poisson approximation to the binomial probability distribution. If X = number of defective bulbs in a box, then

X P ( μ ) where μ = n × p = 100 × 0.005 = 0.5

  1. P ( X = 0 ) = e 0.5 ( 0.5 ) 0 0 ! = e 0.5 ( 1 ) 1 = 0.6065 61 %
  2. P ( X = 2  or more ) = P ( X = 2 ) + P ( X = 3 ) + P ( X = 4 ) + but it is easier to consider:

    P ( X 2 ) = 1 [ P ( X = 0 ) + P ( X = 1 ) ]

    P ( X = 1 ) = e 0.5 ( 0.5 ) 1 1 ! = e 0.5 ( 0.5 ) 1 = 0.3033

    i.e. P ( X 2 ) = 1 [ 0.6065 + 0.3033 ] = 0.0902 9 %