1 Sampling

1.1 Why sample?

Considering samples from a distribution enables us to obtain information about a population where we cannot, for reasons of practicality, economy, or both, inspect the whole of the population. For example, it is impossible to check the complete output of some manufacturing processes. Items such as electric light bulbs, nuts, bolts, springs and light emitting diodes (LEDs) are produced in their millions and the sheer cost of checking every item as well as the time implications of such a checking process render it impossible. In addition, testing is sometimes destructive - one would not wish to destroy the whole production of a given component!

1.2 Populations and samples

If we choose n items from a population, we say that the size of the sample is n . If we take many samples, the means of these samples will themselves have a distribution which may be different from the population from which the samples were chosen. Much of the practical application of sampling theory is based on the relationship between the ‘parent’ population from which samples are drawn and the summary statistics (mean and variance) of the ‘offspring’ population of sample means. Not surprisingly, in the case of a normal ‘parent’ population, the distribution of the population and the distribution of the sample means are closely related. What is surprising is that even in the case of a non-normal parent population, the ‘offspring’ population of sample means is usually (but not always) normally distributed provided that the samples taken are large enough. In practice the term ‘large’ is usually taken to mean about 30 or more. The behaviour of the distribution of sample means is based on the following result from mathematical statistics.

1.3 The central limit theorem

In what follows, we shall assume that the members of a sample are chosen at random from a population. This implies that the members of the sample are independent . We have already met the Central Limit Theorem. Here we will consider it in more detail and illustrate some of the properties resulting from it.

Much of the theory (and hence the practice) of sampling is based on the Central Limit Theorem. While we will not be looking at the proof of the theorem (it will be illustrated where practical) it is necessary that we understand what the theorem says and what it enables us to do. Essentially, the Central Limit Theorem says that if we take large samples of size n with mean X ̄ from a population which has a mean μ and standard deviation σ then the distribution of sample means X ̄ is normally distributed with mean μ and standard deviation σ n .

That is, the sampling distribution of the mean X ̄ follows the distribution

X ̄ N μ , σ n

Strictly speaking we require σ 2 < , and it is important to note that no claim is made about the way in which the original distribution behaves, and it need not be normal . This is why the Central Limit Theorem is so fundamental to statistical practice. One implication is that a random variable which takes the form of a sum of many components which are random but not necessarily normal will itself be normal provided that the sum is not dominated by a small number of components. This explains why many biological variables, such as human heights, are normally distributed. In the case where the original distribution is normal, the relationship between the original distribution X N ( μ , σ ) and the distribution of sample means X ̄ N μ , σ n is shown below.

Figure 1

No alt text was set. Please request alt text from the person who provided you with this resource.

The distributions of X and X ̄ have the same mean μ but X ̄ has the smaller standard deviation σ n

The theorem says that we must take large samples. If we take small samples, the theorem only holds if the original population is normally distributed .

1.4 Standard error of the mean

You will meet this term often if you read statistical texts. It is the name given to the standard deviation of the population of sample means. The name stems from the fact that there is some uncertainty in the process of predicting the original population mean from the mean of a sample or samples.

Key Point 1

For a sample of n independent observations from a population with variance σ 2 , the standard error of the mean is σ n = σ n .

Remember that this quantity is simply the standard deviation of the distribution of sample means.

1.5 Finite populations

When we sample without replacement from a population which is not infinitely large, the observations are not independent. This means that we need to make an adjustment in the standard error of the mean. In this case the standard error of the sample mean is given by the related but more complicated formula

σ n , N = σ n N n N 1

where σ n , N is the standard error of the sample mean, N is the population size and n is the sample size.

Note that, in cases where the size of the population N is large in comparison to the sample size n , the quantity

N n N 1 1

so that the standard error of the mean is approximately σ n .

Illustration - a distribution of sample means

It is possible to illustrate some of the above results by setting up a small population of numbers and looking at the properties of small samples drawn from it. Notice that the setting up of a small population, say of size 5, and taking samples of size 2 enables us to deal with the totality of samples, there are 5 2 = 5 ! 2 ! 3 ! = 10 distinct samples possible, whereas if we take a population of 100 and draw samples of size 10, there are 100 10 = 100 ! 10 ! 90 ! = 51 , 930 , 928 , 370 , 000 possible distinct samples and from a practical point of view, we could not possibly list them all let alone work with them!

Suppose we take a population consisting of the five numbers 1, 2, 3, 4 and 5 and draw samples of size 2 to work with. The complete set of possible samples is:

( 1 , 2 ) , ( 1 , 3 ) , ( 1 , 4 ) , ( 1 , 5 ) , ( 2 , 3 ) , ( 2 , 4 ) , ( 2 , 5 ) , ( 3 , 4 ) , ( 3 , 5 ) , ( 4 , 5 )

For the parent population, since we know that the mean μ = 3 , then we can calculate the standard deviation by

σ = ( 1 3 ) 2 + ( 2 3 ) 2 + ( 3 3 ) 2 + ( 4 3 ) 2 + ( 5 3 ) 2 5 = 10 5 = 1.4142

For the population of sample means,

1.5 , 2 , 2.5 , 3 , 2.5 , 3 , 3.5 , 3.5 , 4 , 4.5

their mean and standard deviation are given by the calculations:

1.5 + 2 + 2.5 + 3 + 2.5 + 3 + 3.5 + 3.5 + 4 + 4.5 10 = 3

and

( 1.5 3 ) 2 + ( 2 3 ) 2 + + ( 4 3 ) 2 + ( 4.5 3 ) 2 10 = 7.5 10 = 0.8660

We can immediately conclude that the mean of the population of sample means is the same as the population mean μ . Using the results given above the value of σ n , N should be given by the formula

σ n , N = σ n N n N 1

with σ = 1.4142 , N = 5 and n = 2 . Using these numbers gives:

σ 2 , 5 = σ n N n N 1 = 1.4142 2 5 2 5 1 = 3 4 = 0.8660   as predicted.

Note that in this case the ‘correction factor’  N n N 1 0.8660 and is significant. If we take samples of size 10 from a population of 100, the factor becomes

N n N 1 0.9535

and for samples of size 10 taken from a population of 1000, the factor becomes

N n N 1 0.9955 .

Thus as N n N 1 1 , its effect on the value of σ n reduces to insignificance.

Task!

Two-centimetre number 10 woodscrews are manufactured in their millions but packed in boxes of 200 to be sold to the public or trade. If the length of the screws is known to be normally distributed with a mean of 2 cm and variance 0.05 cm 2 , find the mean and standard deviation of the sample mean of 200 boxed screws. What is the probability that the sample mean length of the screws in a box of 200 is greater than 2.02 cm?

Since the population is very large indeed, we are effectively sampling from an infinite population. The mean and standard deviation are given by

μ = 2 cm and σ 200 = 0.05 200 = 0.016 cm

Since the parent population is normally distributed the means of samples of 200 will be normally distributed as well.

Hence P ( sample mean length > 2.02 ) = P ( z > 2.02 2 0.016 ) = P ( z > 1.25 ) = 0.5 0.3944 = 0.1056