1 The Hypergeometric distribution

Suppose we are sampling without replacement from a batch of items containing a variable number of defectives. We are essentially assuming that we know the probability p that a given item is defective but not the actual number of defective items contained in the batch. The number of defective items in the batch is a random variable in this case.

When we sample from the batch, we are left with:

  1. a smaller batch;
  2. a (possibly) smaller (but still variable ) number of defective items. The number of defective items is still a random variable.

While the probability of finding a given number of defectives in a sample drawn from the second batch will (in general) be different from the probability of finding a given number of defectives in a sample drawn from the first batch, sampling from both batches may be described by the binomial distribution for which:

P ( X = r ) = n C r ( 1 p ) n r p r

Sampling in this case varies the values of n and p in general but not the underlying distribution describing the sampling process.

Example 18

A batch of 100 piston rings is known to contain 10 defective rings. If two piston rings are drawn from the batch, write down the probabilities that:

  1. the first ring is defective;
  2. the second ring is defective given that the first one is defective.
Solution
  1. The probability that the first ring is defective is clearly 10 100 = 1 10 .
  2. Assuming that the first ring selected is defective and we do not replace it, the probability that the second ring is defective is equally clearly 9 99 = 1 11 .

The hypergeometric distribution may be thought of as arising from sampling from a batch of items where the number of defective items contained in the batch is known.

Essentially the number of defectives contained in the batch is not a random variable, it is fixed. The calculations involved when using the hypergeometric distribution are usually more complex than their binomial counterparts.

If we sample without replacement we may proceed in general as follows:

Key Point 10

Hypergeometric Distribution

The distribution given by

P ( X = r ) = M C r × N M C n r N C n

which describes the probability of obtaining a sample of size n containing r defective items from a population of size N known to contain M defective items is known as the hypergeometric distribution .

Example 19

A batch of 10 rocker cover gaskets contains 4 defective gaskets. If we draw samples of size 3 without replacement, from the batch of 10, find the probability that a sample contains 2 defective gaskets.

Solution

Using P ( X = r ) = M C r × N M C n r N C n we know that N = 10 , M = 4 , n = 3 and r = 2 .

Hence P ( X = 2 ) = 4 C 2 × 6 C 1 10 C 3 = 6 × 6 120 = 0.3

It is possible to derive formulae for the mean and variance of the hypergeometric distribution. However, the calculations are more difficult than their binomial counterparts, so we will simple state the results.

Key Point 11

Expectation and Variance of the Hypergeometric Distribution

The expectation (mean) and variance of the hypergeometric random variable

P ( X = r ) = M C r × N M C n r N C n

are given by

E ( X ) = μ = n p and V ( X ) = n p ( 1 p ) N M N 1 where p = M N

Example 20

For the previous Example, concerning rocker cover gaskets, find the expectation and variance of samples containing 2 defective gaskets.

Solution

Using P ( X = r ) = M C r × N M C n r N C n we know that N = 10 , M = 4 , n = 3 and r = 2 .

Hence

E ( X ) = n p = 3 × 4 10 = 1.2

and

V ( X ) = n p ( 1 p ) N M N 1 = 3 × 4 10 × 6 10 × 10 4 10 1 = 0.48

Task!

In the manufacture of car tyres, a particular production process is know to yield 10 tyres with defective walls in every batch of 100 tyres produced. From a production batch of 100 tyres, a sample of 4 is selected for testing to destruction. Find:

  1. the probability that the sample contains 1 defective tyre
  2. the expectation of the number of defectives in samples of size 4
  3. the variance of the number of defectives in samples of size 4.

Sampling is clearly without replacement and we use the hypergeometric distribution with

N = 100 , M = 10 , n = 4 , r = 1 and p = 0.1 . Hence:

  1. P ( X = r ) = M C r × N M C n r N C n gives

    P ( X = 1 ) = 10 C 1 × 100 10 C 4 1 100 C 4 = 10 × 117480 3921225 0.3

  2. The expectation is E ( X ) = n p = 4 × 0.1 = 0.4
  3. The variance is V ( X ) = n p ( 1 p ) N M N 1 = 0.4 × 0.9 × 90 99 0.33
Task!

A company (the producer) supplies microprocessors to a manufacturer (the consumer) of electronic equipment. The microprocessors are supplied in batches of 50. The consumer regards a batch as acceptable provided that there are not more than 5 defective microprocessors in the batch. Rather than test all of the microprocessors in the batch, 10 are selected at random and tested.

  1. Find the probability that out of a sample of 10, d = 0 , 1 , 2 , 3 , 4 , 5 are defective when there are actually 5 defective microprocessors in the batch.
  2. Suppose that the consumer will accept the batch provided that not more than m defectives are found in the sample of 10.
    1. Find the probability that the batch is accepted when there are 5 defectives in the batch.
    2. Find the probability that the batch is rejected when there are 3 defectives in the batch.
  1. Let X = the numbers of defectives in a sample. Then

    P ( X = d ) = 45 C 10 d × 5 C d 50 C 10

    Hence

    P ( X = 0 ) = 45 C 10 × 5 C 0 50 C 10 = 0.311 P ( X = 1 ) = 45 C 9 × 5 C 1 50 C 10 = 0.431

    P ( X = 2 ) = 45 C 8 × 5 C 2 50 C 10 = 0.210 P ( X = 3 ) = 45 C 7 × 5 C 3 50 C 10 = 0.044

    P ( X = 4 ) = 45 C 6 × 5 C 4 50 C 10 = 0.004 P ( X = 5 ) = 45 C 5 × 5 C 5 50 C 10 = 0.0001

    1. Case D = 5

      P(Accept batch with 5 defectives) is

      d = 0 m P ( X = d ) = d = 0 m 45 C 10 d × 5 C d 50 C 10 m 5

    2. Case D = 3

      P(Reject batch with 3 defectives) is

      1 d = 0 m P ( X = d ) = 1 d = 0 m 47 C 10 d × 3 C d 50 C 10 m 3

Exercise

A company buys batches of n components. Before a batch is accepted, m of the components are selected at random from the batch and tested. The batch is rejected if more than d components in the sample are found to be below standard.

  1. Find the probability that a batch which actually contains six below-standard components is rejected when n = 20 , m = 5 and d = 1.
  2. Find the probability that a batch which actually contains nine below-standard components is rejected when n = 30 , m = 10 and d = 1.
  1. Let the number of below-standard components in the sample be X . The probability of acceptance is P ( X = 0 ) + P ( X = 1 ) = 14 5 6 0 20 5 + 14 4 6 1 20 5 = 14 5 × 13 4 × 12 3 × 11 2 × 10 1 + 14 4 × 13 3 × 12 2 × 12 2 × 11 1 × 6 1 20 5 × 19 4 × 18 3 × 17 2 × 16 1 = 2002 + 6006 15504 = 0.5165

    Hence the probability of rejection is 1 0.5165 = 0 . 4835 .

  2. Let the number of below-standard components in the sample be X . The probability of acceptance is

    P ( X = 0 ) + P ( X = 1 ) = 21 10 9 0 30 10 + 21 9 9 1 30 10

    Now

    21 10 9 0 = 21 10 × 20 9 × 19 8 × 18 7 × 17 6 × 16 5 × 15 4 × 14 3 × 13 2 × 12 1 = 352716 21 9 9 1 = 21 9 × 20 8 × 19 7 × 18 6 × 17 5 × 16 4 × 15 3 × 14 2 × 13 1 × 9 1 = 2645370 30 10 = 30 10 × 29 9 × 28 8 × 27 7 × 26 6 × 25 5 × 24 4 × 23 3 × 22 2 × 21 1 = 30045015



    So the probability of acceptance is

    352716 + 2645370 30045015 = 0.0998

    Hence the probability of rejection is 1 0.0998 = 0.9002