1 Sampling

1.1 Why sample?

Considering samples from a distribution enables us to obtain information about a population where we cannot, for reasons of practicality, economy, or both, inspect the whole of the population. For example, it is impossible to check the complete output of some manufacturing processes. Items such as electric light bulbs, nuts, bolts, springs and light emitting diodes (LEDs) are produced in their millions and the sheer cost of checking every item as well as the time implications of such a checking process render it impossible. In addition, testing is sometimes destructive - one would not wish to destroy the whole production of a given component!

1.2 Populations and samples

If we choose [maths rendering] items from a population, we say that the size of the sample is [maths rendering] . If we take many samples, the means of these samples will themselves have a distribution which may be different from the population from which the samples were chosen. Much of the practical application of sampling theory is based on the relationship between the ‘parent’ population from which samples are drawn and the summary statistics (mean and variance) of the ‘offspring’ population of sample means. Not surprisingly, in the case of a normal ‘parent’ population, the distribution of the population and the distribution of the sample means are closely related. What is surprising is that even in the case of a non-normal parent population, the ‘offspring’ population of sample means is usually (but not always) normally distributed provided that the samples taken are large enough. In practice the term ‘large’ is usually taken to mean about 30 or more. The behaviour of the distribution of sample means is based on the following result from mathematical statistics.

1.3 The central limit theorem

In what follows, we shall assume that the members of a sample are chosen at random from a population. This implies that the members of the sample are independent . We have already met the Central Limit Theorem. Here we will consider it in more detail and illustrate some of the properties resulting from it.

Much of the theory (and hence the practice) of sampling is based on the Central Limit Theorem. While we will not be looking at the proof of the theorem (it will be illustrated where practical) it is necessary that we understand what the theorem says and what it enables us to do. Essentially, the Central Limit Theorem says that if we take large samples of size [maths rendering] with mean [maths rendering] from a population which has a mean [maths rendering] and standard deviation [maths rendering] then the distribution of sample means [maths rendering] is normally distributed with mean [maths rendering] and standard deviation [maths rendering] .

That is, the sampling distribution of the mean [maths rendering] follows the distribution

[maths rendering]

Strictly speaking we require [maths rendering] , and it is important to note that no claim is made about the way in which the original distribution behaves, and it need not be normal . This is why the Central Limit Theorem is so fundamental to statistical practice. One implication is that a random variable which takes the form of a sum of many components which are random but not necessarily normal will itself be normal provided that the sum is not dominated by a small number of components. This explains why many biological variables, such as human heights, are normally distributed. In the case where the original distribution is normal, the relationship between the original distribution [maths rendering] and the distribution of sample means [maths rendering] is shown below.

Figure 1

No alt text was set. Please request alt text from the person who provided you with this resource.

The distributions of [maths rendering] and [maths rendering] have the same mean [maths rendering] but [maths rendering] has the smaller standard deviation [maths rendering]

The theorem says that we must take large samples. If we take small samples, the theorem only holds if the original population is normally distributed .

1.4 Standard error of the mean

You will meet this term often if you read statistical texts. It is the name given to the standard deviation of the population of sample means. The name stems from the fact that there is some uncertainty in the process of predicting the original population mean from the mean of a sample or samples.

Key Point 1

For a sample of [maths rendering] independent observations from a population with variance [maths rendering] , the standard error of the mean is [maths rendering] .

Remember that this quantity is simply the standard deviation of the distribution of sample means.

1.5 Finite populations

When we sample without replacement from a population which is not infinitely large, the observations are not independent. This means that we need to make an adjustment in the standard error of the mean. In this case the standard error of the sample mean is given by the related but more complicated formula

[maths rendering]

where [maths rendering] is the standard error of the sample mean, [maths rendering] is the population size and [maths rendering] is the sample size.

Note that, in cases where the size of the population [maths rendering] is large in comparison to the sample size [maths rendering] the quantity

[maths rendering]

so that the standard error of the mean is approximately [maths rendering]

Illustration - a distribution of sample means

It is possible to illustrate some of the above results by setting up a small population of numbers and looking at the properties of small samples drawn from it. Notice that the setting up of a small population, say of size 5, and taking samples of size 2 enables us to deal with the totality of samples, there are [maths rendering] distinct samples possible, whereas if we take a population of 100 and draw samples of size 10, there are [maths rendering] possible distinct samples and from a practical point of view, we could not possibly list them all let alone work with them!

Suppose we take a population consisting of the five numbers 1, 2, 3, 4 and 5 and draw samples of size 2 to work with. The complete set of possible samples is:

[maths rendering]

For the parent population, since we know that the mean [maths rendering] , then we can calculate the standard deviation by

[maths rendering]

For the population of sample means,

[maths rendering]

their mean and standard deviation are given by the calculations:

[maths rendering]

and

[maths rendering]

We can immediately conclude that the mean of the population of sample means is the same as the population mean [maths rendering] . Using the results given above the value of [maths rendering] should be given by the formula

[maths rendering]

with [maths rendering] and [maths rendering] . Using these numbers gives:

[maths rendering]   as predicted.

Note that in this case the ‘correction factor’  [maths rendering] and is significant. If we take samples of size 10 from a population of 100, the factor becomes

[maths rendering]

and for samples of size 10 taken from a population of 1000, the factor becomes

[maths rendering] .

Thus as [maths rendering] , its effect on the value of [maths rendering] reduces to insignificance.

Task!

Two-centimetre number 10 woodscrews are manufactured in their millions but packed in boxes of 200 to be sold to the public or trade. If the length of the screws is known to be normally distributed with a mean of 2 cm and variance [maths rendering] , find the mean and standard deviation of the sample mean of 200 boxed screws. What is the probability that the sample mean length of the screws in a box of 200 is greater than 2.02 cm?

Since the population is very large indeed, we are effectively sampling from an infinite population. The mean and standard deviation are given by

[maths rendering] cm and [maths rendering] cm

Since the parent population is normally distributed the means of samples of 200 will be normally distributed as well.

Hence [maths rendering]