2 Tests for population means
2.1 Tests concerning a single mean
Introduction
In cases where tests involving measurements are performed, it is often possible to statistically hypothesize about the results. Suppose that the boiling point of a particular coolant used in car engines is claimed by a manufacturer to be $11{0}^{\∘}$ C. Further suppose that a series of accurate measurements made in a laboratory using 8 random samples of the coolant are recorded as:
$\phantom{\rule{2em}{0ex}}110.{2}^{\∘},\phantom{\rule{1em}{0ex}}110.{3}^{\∘},\phantom{\rule{1em}{0ex}}110.{1}^{\∘},\phantom{\rule{1em}{0ex}}109.{8}^{\∘},\phantom{\rule{1em}{0ex}}109.{9}^{\∘},\phantom{\rule{1em}{0ex}}110.{0}^{\∘},\phantom{\rule{1em}{0ex}}110.{4}^{\∘},\phantom{\rule{1em}{0ex}}110.{1}^{\∘},\phantom{\rule{1em}{0ex}}$
The mean of these results is $110.{1}^{\∘}$ C.
It is reasonable to ask whether, on the basis of the results obtained, we may claim that the boiling point of the coolant is greater than the assumed true boiling point of $11{0}^{\∘}$ C. We will return to this problem later in this Workbook after looking at some general results.
General results
In general terms, we need to make predictions, based on calculation, about the parameters of the population from which the random sample is drawn. As illustrated above we calculate the sample mean $\stackrel{\u0304}{x}$ . The statistical tests used to answer the above question depend on whether the variance of the population is known or not. Case (i) - Population variance known
Firstly we form the null hypothesis that there is no difference between the true population mean $\mu $ and the theoretical value ${\mu}_{0}.$ That is:
It follows that
We may now set up an alternative hypothesis which can take one of the three forms:
$\left|Z\right|>1.96$ | for a two-tailed test |
$Z>1.645$ | for a (right) one-tailed test |
$Z<-1.645$ | for a (left) one-tailed test |
In each case we reject ${H}_{0}$ in favour of the alternative hypothesis when $Z$ lies in the remote tail of the standard normal distribution.
Example 1
Dishwasher powder is poured into the cartons in which it is sold by an automatic dispensing machine which is set to dispense 3 kg of powder into each carton. In order to check that the dispensing machine is working to an acceptable standard (i.e. does not need adjustment), a production engineer takes a random samples of 40 cartons and weighs them. It is found that the mean weight of the sample is 3.005 kg. It is known that the dispensing machine operates with a variance of $0.01{5}^{2}text{kg}^{2}$ and that the manufacturer of the powder is willing to rely on a 5% level of significance. Does the sample provide the engineer with sufficient evidence that the true mean is not 3.00 kg and so the machine requires adjustment?
Solution
Given that the dispensing machine can over-fill or under-fill the containers, the null and alternative hypotheses are:
$\phantom{\rule{2em}{0ex}}{H}_{0}:\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\mu =3\phantom{\rule{2em}{0ex}}{H}_{1}:\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\mu \ne 3$
Since the sample size is large ( $\ge 30$ ) and we can regard the population as infinite but with a known variance, we can calculate the relevant value of the test statistic $Z$ by using the formula:
$\phantom{\rule{2em}{0ex}}Z=\frac{\stackrel{\u0304}{x}-{\mu}_{0}}{\sigma \u2215\sqrt{n}}$
Hence, in this case:
$\phantom{\rule{2em}{0ex}}Z=\frac{\stackrel{\u0304}{x}-{\mu}_{0}}{\sigma \u2215\sqrt{n}}=\frac{3.005-3}{0.015\u2215\sqrt{40}}=2.108$
and since we are performing a two-tailed test at the 5% level of significance and have found that $\left|Z\right|>1.96$ , that is, $Z$ is outside the range $\left[-1.96,1.96\right]$ , we must reject the null hypothesis and conclude that the machine is not operating acceptably and needs adjustment.
Case (ii) - Population variance unknown
We have exactly the same situation as that described in Case (i) but do not know the value of the population variance ${\sigma}^{2}.$ Therefore we estimate it using
Looking at the table and comparing it with the values for a standard normal distribution we can see that, as the number of degrees of freedom becomes large, the $t$ -distribution gets closer to the standard normal distribution so that, for large samples, it makes little difference which we use. It is also true that, under most circumstances, even if we do not know that the distribution from which data are drawn is normal, a $t$ -test provides a good approximation when the sample size is reasonably large. In other circumstances, for example when normality cannot be assumed and the sample is small, we need to use other procedures, often non-parametric tests.
In summary we have the following.
Population | Variance | Sample size | Test |
Normal | Known | Small | Normal ( $Z$ ) |
Normal | Known | Large | Normal ( $Z$ ) |
Normal | Unknown | Small | $t$ |
Normal | Unknown | Large | $t$ but $Z$ approximates |
Not Normal | Either | Small | Non-parametric |
Not Normal | Known | Large | $Z$ approximates |
Not Normal | Unknown | Large | $Z$ and $t$ approximate |
Non-parametric testing is covered in HELM booklet 45.
Example 2
The average useful life of a random sample of 33 similar calculator batteries made on a production line is found to be 99.5 hours continuous use. The sample variance is $18.49text{hours}^{2}$ . Test the null hypothesis that the population mean lifetime is 100 hours against the alternative that it is less. Use the 5% level of significance.
Solution
The null and alternative hypotheses are:
Our test statistic is
In this case
$$\begin{array}{rcll}T& =& \frac{99.5-100.0}{\sqrt{18.49\u221533}}& \text{}\\ & =& -0.668& \text{}\end{array}$$
and the number of degrees of freedom is
$n-1=33-1=32.$
The table does not give values for 32 degrees of freedom but it does give values for 30 degrees of freedom
and for 40 and the values for 32 must be in between. The lower 5% points for 30 and 40 degrees of freedom
are
$-1.697$
and
$-1.684$
respectively. Clearly
our observed value of
$-0.668$
is not significant and we do not have sufficient evidence to reject the null hypothesis that
$\mu =100.$
Task!
Solve the problem given at the start of subsection 2 (page 11). Note the sample is small and you will have to estimate the population variance from the sample variance. Use the tabulated values of the $t$ -distribution given at the end of this Workbook in conjunction with the appropriate number of degrees of freedom.
The null and alternative hypotheses are:
$\phantom{\rule{2em}{0ex}}{H}_{0}:\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\mu =110\phantom{\rule{2em}{0ex}}{H}_{1}:\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\mu >110$
The value of the sample variance is given by the formula
$\phantom{\rule{2em}{0ex}}{s}^{2}=\frac{\sum {\left(x-\stackrel{\u0304}{x}\right)}^{2}}{n-1}=\frac{0.28}{7}=0.004$
The test statistic $t$ is given by
$\phantom{\rule{2em}{0ex}}t=\frac{\stackrel{\u0304}{x}-{\mu}_{0}}{s\u2215\sqrt{n}}=\frac{110.1-110}{\sqrt{0.04}\u2215\sqrt{8}}=\frac{0.1\times \sqrt{8}}{0.2}=1.414$
At the 5% level of significance and using $8-1=7$ degrees of freedom, the value of ${t}_{\alpha ,\nu}$ from tables is 1.895. Since $1.414<1.895$ , we cannot reject the null hypothesis in favour of the alternative hypothesis. On the basis of the evidence available, we are not able to conclude that the boiling point of the coolant is greater than $11{0}^{\∘}$ C.
2.2 General comments about tests concerning a population mean
- The sample mean $\stackrel{\u0304}{x}$ is often used as a test statistic when testing a hypothesis concerning a population mean $\mu $ .
- Even if the population distribution cannot be assumed to be normal, the distribution of sample means can often be assumed to be normal. This depends on the sample size.
- The tests described above sometimes require us to assume that the population variance is known. This is often unrealistic and we turn to the $t$ -test to deal with cases where the population standard deviation is unknown and must be estimated from the data available.
2.3 General comments on the $t$ -test
- The test only applies when the underlying distribution can be assumed to be normal.
- The test is used when the standard deviation of the parent population has to be estimated.
- As the sample size $n$ get larger, the distribution approximates to the standard normal distribution.
- The distribution depends on the number of degrees of freedom, for a single sample or equal paired samples (see below), the number of degrees of freedom is always one less than the sample size.
2.4 Tests concerning paired data
Sometimes experimental data may be directly compared using an appropriate test. The following Example looks at experimental data concerning the throttle reaction times of two turbochargers fitted to an internal combustion engine.
Example 3
In order to test the hypothesis that two standard turbochargers $A$ and $B$ have the same throttle reaction times, a random sample of 7 cars were fitted with the turbochargers and the throttle reaction times measured. The results were as follows:
Car | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
Throttle Reaction time for $A;R1$ | 0.223 | 0.212 | 0.201 | 0.205 | 0.216 | 0.211 | 0.209 |
Throttle Reaction time for $B;R2$ | 0.208 | 0.207 | 0.203 | 0.204 | 0.205 | 0.202 | 0.206 |
$D=R1-R2$ | 0.015 | 0.005 | $-$ 0.002 | 0.001 | 0.011 | 0.009 | 0.003 |
Solution
Let
$D$
be the
difference between the throttle reaction times of the two turbochargers. We assume that the distribution of
$D$
is normal. Our null
hypothesis is that
${\mu}_{D},$
the mean of the population of differences, is zero. We must decide between the two hypotheses
The alternative hypothesis here indicates that we perform a two-tailed test.
Let $\stackrel{\u0304}{d}$ be the sample mean of the seven observed differences. Then
The sample variance of the differences is
The value of the test statistic is
The number of degrees of freedom is $7-1=6$ and the critical value from the table is 2.447. Since $2.658>2.447$ we reject ${H}_{0}$ at the 5% level and conclude that the evidence suggests that there is a difference in the throttle reaction times between the two turbochargers.
Task!
Two different methods of analysis were used to determine the levels of impurity present in a particular aircraft quality aluminium alloy. Eight specimens were analysed using both methods. Does the available evidence suggest that both methods lead to the same results?
Alloy Specimen | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
Test 1 | 1.24 | 1.23 | 1.24 | 1.20 | 1.21 | 1.22 | 1.23 | 1.22 |
Test 2 | 1.23 | 1.20 | 1.20 | 1.21 | 1.20 | 1.20 | 1.21 | 1.25 |
$D=$ Test1 $-$ Test2 | 0.01 | 0.03 | 0.04 | $-$ 0.01 | 0.01 | 0.02 | 0.02 | $-$ 0.03 |
Let $D$ be the difference between the two methods of analysis. We assume that the distribution of $D$ is normal. Our null hypothesis is that ${\mu}_{D},$ the mean of the population of differences, is zero. We must decide between the two hypotheses
The alternative hypothesis here indicates that we perform a two-tailed test.
Let $\stackrel{\u0304}{d}$ be the sample mean of the eight observed differences. Then
The sample variance of the differences is
The value of the test statistic is
The number of degrees of freedom is $8-1=7$ and the critical value from the table is 2.306. Since $-2.306<1.426<2.306$ we do not reject ${H}_{0}$ at the 5% level and conclude that there is insufficient evidence to show that there is a difference between the two methods.