2 Tests for population means

2.1 Tests concerning a single mean

Introduction

In cases where tests involving measurements are performed, it is often possible to statistically hypothesize about the results. Suppose that the boiling point of a particular coolant used in car engines is claimed by a manufacturer to be 11 0 ∘ C. Further suppose that a series of accurate measurements made in a laboratory using 8 random samples of the coolant are recorded as:

110 . 2 ∘ , 110 . 3 ∘ , 110 . 1 ∘ , 109 . 8 ∘ , 109 . 9 ∘ , 110 . 0 ∘ , 110 . 4 ∘ , 110 . 1 ∘ ,

The mean of these results is 110 . 1 ∘ C.

It is reasonable to ask whether, on the basis of the results obtained, we may claim that the boiling point of the coolant is greater than the assumed true boiling point of 11 0 ∘ C. We will return to this problem later in this Workbook after looking at some general results.

General results

In general terms, we need to make predictions, based on calculation, about the parameters of the population from which the random sample is drawn. As illustrated above we calculate the sample mean x ̄ . The statistical tests used to answer the above question depend on whether the variance of the population is known or not. Case (i) - Population variance known

Firstly we form the null hypothesis that there is no difference between the true population mean μ and the theoretical value μ 0 . That is:

H 0 : μ = μ 0
Secondly we consider drawing samples of size n from the population. If n is large (say n 30 ) then, because of the central limit theorem, we can often assume that the sample means approximately follow a normal distribution with mean μ and standard deviation (standard error of the mean) σ n given by
σ n = σ n

It follows that

Z = X ̄ μ 0 σ n
has a standard normal distribution when the null hypothesis is true. That is, when μ = μ 0 , Z N ( 0 , 1 ) .

We may now set up an alternative hypothesis which can take one of the three forms:

H 1 : μ μ 0 H 1 : μ > μ 0 H 1 : μ < μ 0
depending on the form of deviation from the null hypothesis for which we wish to test. Then we will reject the null hypothesis at the 5% level of significance if
Z > 1.96 for a two-tailed test
Z > 1.645 for a (right) one-tailed test
Z < 1.645 for a (left) one-tailed test

In each case we reject H 0 in favour of the alternative hypothesis when Z lies in the remote tail of the standard normal distribution.

Example 1

Dishwasher powder is poured into the cartons in which it is sold by an automatic dispensing machine which is set to dispense 3 kg of powder into each carton. In order to check that the dispensing machine is working to an acceptable standard (i.e. does not need adjustment), a production engineer takes a random samples of 40 cartons and weighs them. It is found that the mean weight of the sample is 3.005 kg. It is known that the dispensing machine operates with a variance of 0.01 5 2 t e x t k g 2 and that the manufacturer of the powder is willing to rely on a 5% level of significance. Does the sample provide the engineer with sufficient evidence that the true mean is not 3.00 kg and so the machine requires adjustment?

Solution

Given that the dispensing machine can over-fill or under-fill the containers, the null and alternative hypotheses are:

H 0 : μ = 3 H 1 : μ 3

Since the sample size is large ( 30 ) and we can regard the population as infinite but with a known variance, we can calculate the relevant value of the test statistic Z by using the formula:

Z = x ̄ μ 0 σ n

Hence, in this case:

Z = x ̄ μ 0 σ n = 3.005 3 0.015 40 = 2.108

and since we are performing a two-tailed test at the 5% level of significance and have found that Z > 1.96 , that is, Z is outside the range [ 1.96 , 1.96 ] , we must reject the null hypothesis and conclude that the machine is not operating acceptably and needs adjustment.

Case (ii) - Population variance unknown

We have exactly the same situation as that described in Case (i) but do not know the value of the population variance σ 2 . Therefore we estimate it using

s 2 = 1 n 1 i = 1 n ( x i x ̄ ) 2
and calculate the test statistic
T = x ̄ μ 0 s 2 n .
However, because we are now dividing by an estimate, which is itself random, this test statistic does not have a standard normal distribution under the null hypothesis. Instead it has a distribution called Student’s t -distribution on n 1 degrees of freedom. The number of degrees of freedom is the same as that which we have already seen when we looked at the χ 2 distribution in connection with sample variances in Workbook 40. So, for example, instead of comparing Z with ± 1.96 for a two-sided test at the 5% level, when σ 2 is known, we compare T with a value from the t -distribution which depends on the sample size through the number of degrees of freedom. The t -distribution is symmetric, centred at zero and, for all but very small numbers of degrees of freedom, has a shape similar to that of a standard normal distribution but with a larger variance. A table which gives the values which we need is provided at the back of this Workbook. For example, if we have a two-sided test at the 5% level of significance and a sample size n = 15 , then the number of degrees of freedom is 14 and we compare T with the upper 2.5% point which is 2.145.

Looking at the table and comparing it with the values for a standard normal distribution we can see that, as the number of degrees of freedom becomes large, the t -distribution gets closer to the standard normal distribution so that, for large samples, it makes little difference which we use. It is also true that, under most circumstances, even if we do not know that the distribution from which data are drawn is normal, a t -test provides a good approximation when the sample size is reasonably large. In other circumstances, for example when normality cannot be assumed and the sample is small, we need to use other procedures, often non-parametric tests.

In summary we have the following.

Population Variance Sample size Test
Normal Known Small Normal ( Z )
Normal Known Large Normal ( Z )
Normal Unknown Small t
Normal Unknown Large t but Z approximates
Not Normal Either Small Non-parametric
Not Normal Known Large Z approximates
Not Normal Unknown Large Z and t approximate

Non-parametric testing is covered in HELM booklet  45.

Example 2

The average useful life of a random sample of 33 similar calculator batteries made on a production line is found to be 99.5 hours continuous use. The sample variance is 18.49 t e x t h o u r s 2 . Test the null hypothesis that the population mean lifetime is 100 hours against the alternative that it is less. Use the 5% level of significance.

Solution

The null and alternative hypotheses are:

H 0 : μ = 100 H 1 : μ < 100

Our test statistic is
T = x ̄ μ 0 s 2 n

In this case
T = 99.5 100.0 18.49 33 = 0.668


and the number of degrees of freedom is n 1 = 33 1 = 32. The table does not give values for 32 degrees of freedom but it does give values for 30 degrees of freedom and for 40 and the values for 32 must be in between. The lower 5% points for 30 and 40 degrees of freedom are 1.697 and 1.684 respectively. Clearly our observed value of 0.668 is not significant and we do not have sufficient evidence to reject the null hypothesis that μ = 100.

Task!

Solve the problem given at the start of subsection 2 (page 11). Note the sample is small and you will have to estimate the population variance from the sample variance. Use the tabulated values of the t -distribution given at the end of this Workbook in conjunction with the appropriate number of degrees of freedom.

The null and alternative hypotheses are:

H 0 : μ = 110 H 1 : μ > 110

The value of the sample variance is given by the formula

s 2 = ( x x ̄ ) 2 n 1 = 0.28 7 = 0.004

The test statistic t is given by

t = x ̄ μ 0 s n = 110.1 110 0.04 8 = 0.1 × 8 0.2 = 1.414

At the 5% level of significance and using 8 1 = 7 degrees of freedom, the value of t α , ν from tables is 1.895. Since 1.414 < 1.895 , we cannot reject the null hypothesis in favour of the alternative hypothesis. On the basis of the evidence available, we are not able to conclude that the boiling point of the coolant is greater than 11 0 &compfn; C.

2.2 General comments about tests concerning a population mean

  1. The sample mean x ̄ is often used as a test statistic when testing a hypothesis concerning a population mean μ .
  2. Even if the population distribution cannot be assumed to be normal, the distribution of sample means can often be assumed to be normal. This depends on the sample size.
  3. The tests described above sometimes require us to assume that the population variance is known. This is often unrealistic and we turn to the t -test to deal with cases where the population standard deviation is unknown and must be estimated from the data available.

2.3 General comments on the t -test

  1. The test only applies when the underlying distribution can be assumed to be normal.
  2. The test is used when the standard deviation of the parent population has to be estimated.
  3. As the sample size n get larger, the distribution approximates to the standard normal distribution.
  4. The distribution depends on the number of degrees of freedom, for a single sample or equal paired samples (see below), the number of degrees of freedom is always one less than the sample size.

2.4 Tests concerning paired data

Sometimes experimental data may be directly compared using an appropriate test. The following Example looks at experimental data concerning the throttle reaction times of two turbochargers fitted to an internal combustion engine.

Example 3

In order to test the hypothesis that two standard turbochargers A and B have the same throttle reaction times, a random sample of 7 cars were fitted with the turbochargers and the throttle reaction times measured. The results were as follows:

Car 1 2 3 4 5 6 7
Throttle Reaction time for A ; R 1 0.223 0.212 0.201 0.205 0.216 0.211 0.209
Throttle Reaction time for B ; R 2 0.208 0.207 0.203 0.204 0.205 0.202 0.206
D = R 1 R 2 0.015 0.005 0.002 0.001 0.011 0.009 0.003
Solution

Let D be the difference between the throttle reaction times of the two turbochargers. We assume that the distribution of D is normal. Our null hypothesis is that μ D , the mean of the population of differences, is zero. We must decide between the two hypotheses

H 0 : μ D = 0 H 1 : μ D 0

The alternative hypothesis here indicates that we perform a two-tailed test.
Let d ̄ be the sample mean of the seven observed differences. Then
d ̄ = d 7 = 0.042 7 = 0.006

The sample variance of the differences is
s d 2 = ( d d ̄ ) 2 n 1 = 0.000214 6 = 3.5667 × 1 0 5

The value of the test statistic is
t = d ̄ 0 s d 2 n = 0.006 3.5667 × 1 0 5 7 = 2.658

The number of degrees of freedom is 7 1 = 6 and the critical value from the table is 2.447. Since 2.658 > 2.447 we reject H 0 at the 5% level and conclude that the evidence suggests that there is a difference in the throttle reaction times between the two turbochargers.
Task!

Two different methods of analysis were used to determine the levels of impurity present in a particular aircraft quality aluminium alloy. Eight specimens were analysed using both methods. Does the available evidence suggest that both methods lead to the same results?

Alloy Specimen 1 2 3 4 5 6 7 8
Test 1 1.24 1.23 1.24 1.20 1.21 1.22 1.23 1.22
Test 2 1.23 1.20 1.20 1.21 1.20 1.20 1.21 1.25
D = Test1 Test2 0.01 0.03 0.04 0.01 0.01 0.02 0.02 0.03

Let D be the difference between the two methods of analysis. We assume that the distribution of D is normal. Our null hypothesis is that μ D , the mean of the population of differences, is zero. We must decide between the two hypotheses

H 0 : μ D = 0 H 1 : μ D 0

The alternative hypothesis here indicates that we perform a two-tailed test.

Let d ̄ be the sample mean of the eight observed differences. Then

d ̄ = d 8 = 0.09 8 = 0.01125

The sample variance of the differences is

s d 2 = ( d d ̄ ) 2 n 1 = 0.0034875 7 = 0.0004982

The value of the test statistic is

t = d ̄ 0 s d 2 n = 0.01125 0.0004982 8 = 1.426

The number of degrees of freedom is 8 1 = 7 and the critical value from the table is 2.306. Since 2.306 < 1.426 < 2.306 we do not reject H 0 at the 5% level and conclude that there is insufficient evidence to show that there is a difference between the two methods.