Goodness-of-fit tests

1 Goodness-of-fit tests

The aim of a goodness-of-fit test is to determine the underlying nature of the probability distribution describing the population from which a random sample has been drawn. For example, we may wish to determine whether the population from which a sample has been drawn has a normal, binomial or Poisson distribution. While a variety of goodness-of-fit tests exist, the test described here depends on the $χ^{2}$ -distribution and is usually called the chi-squared test.

We assume that a random sample of size $n$ has been drawn from a population with an unknown probability distribution and that we wish to determine the nature of that distribution.

Firstly, if the data are continuous we organize the data into $k$ intervals (often equal but not necessarily so) in order that we can write down the observed frequency, say $O_{i}$ , of the $i$ th interval for $1 \leq i \leq k$ .
Secondly, we form a hypothesis about the nature of the unknown distribution. That is, we assume that it is normal, binomial, Poisson or some other appropriate probability distribution.
Thirdly, we calculate, on the basis of the hypothesis outlined above, the expected frequency, say $E_{i}$ , of the $i$ th interval for $1 \leq i \leq k$ . The values of $E_{i}$ are calculated using the formula
$E_{i} = n P_{i}$

where $P_{i}$ is the probability associated with the interval $i$ .
Fourthly, we calculate the goodness-of-fit statistic as defined in Key Point 1.

Key Point 1

The goodness-of-fit statistic is given by

$W = \sum_{i = 1}^{k} \frac{{(O_{i} - E_{i})}^{2}}{E_{i}}$

It can be shown that, if the assumption made about the nature of the population (normal, binomial, Poisson etc.) is true then $W$ follows (approximately) a chi-squared distribution with $k - p - 1$ degrees of freedom. Note that $p$ represents the number of parameters needed to describe the probability distribution of the population which we have to estimate from the data. For example the normal distribution has two parameters $μ$ and $σ$ , the binomial distribution has two parameters $n$ and $p$ but we usually only need to estimate $p$ , while the Poisson distribution has one parameter, $μ$ .

Fifthly, we reject the hypothesis concerning the nature of the underlying probability distribution if the calculated value of $W$ exceeds the value of $χ_{α, k - p - 1}^{2}$ where $α$ is the area in the tail of the $χ^{2}$ -distribution, typically 5% or 1%.

Notes

The larger the sample, the more reliable the result since the assertion that $W$ follows (approximately) a chi-squared distribution improves with increasing sample size.
The size of the expected frequencies should be monitored carefully. Various authors recommend that minimum expected frequencies of 3, 4 or 5 are acceptable. It is reasonably safe to accept expected frequencies provided that they are greater than 5 and 10 is certainly acceptable.
Some authors recommend that the $k$ intervals into which the data are organized are chosen so that the frequencies in each interval are roughly equal - remember that equal intervals are not necessary for the test to be performed.

We will now look at two examples of goodness-of-fit tests, the first uses a (discrete) Poisson distribution and the second uses a (continuous) normal distribution. Each worked Example is immediately followed by a Task for you to do.

Example 1

A manufacturer produces high-quality sheet aluminium for use in highly stressed aircraft wings. A random sample of 100 sheets is inspected and the number of faults per sheet recorded. The results are given in the table below.

Number of Faults per Sheet	Frequency of Occurrence
0	50
1	24
2	14
3	8
4	4

Suggest a possible probability distribution from which the sample may have been drawn and perform a chi-squared test to determine the validity of your suggestion.

Solution

The data are already given in 5 classes with observed frequencies as shown. We will assume that the underlying distribution is Poisson and calculate the expected frequencies accordingly using the Poisson formula $P (X = r) = \frac{e^{- μ} μ^{r}}{r!}$ We need the value of the mean.

This is calculated as $μ = \frac{50 \times 0 + 24 \times 1 + 14 \times 2 + 8 \times 3 + 4 \times 4}{100} = 0.92$

Hence the Poisson probabilities and the corresponding expected frequencies are:

\begin{array}{l} p_{0} & = P (X = 0) = \frac{e^{- μ} μ^{0}}{0!} = e^{- 0.92} = 0.399, E_{0} = 39.9 \\ p_{1} & = P (X = 1) = \frac{e^{- μ} μ^{1}}{1!} = e^{- 0.92} \times 0.92 = 0.367, E_{1} = 36.7 \\ p_{2} & = P (X = 2) = \frac{e^{- μ} μ^{2}}{2!} = \frac{e^{- 0.92} \times 0.9 2^{2}}{2} = 0.169, E_{2} = 16.9 \\ p_{3} & = P (X = 3) = \frac{e^{- μ} μ^{3}}{3!} = \frac{e^{- 0.92} \times 0.9 2^{3}}{6} = 0.052, E_{3} = 5.2 \\ p_{4} & = P (X \geq 4) = 1 - (0.399 + 0.367 + 0.169 + 0.052) = 0.013, E_{4} = 1.3 \end{array}

Note that in calculating $p_{4}$ we have ensured that our probabilities sum to unity.

Since the last frequency is very small we will combine the last two and use 4 classes so that $O_{3} = 12$ and $E_{3} = 6.5$ .

The test statistic is

$W = \sum_{i = 0}^{3} \frac{{(O_{i} - E_{i})}^{2}}{E_{i}} = \frac{{(50 - 39.9)}^{2}}{39.9} + \frac{{(24 - 36.7)}^{2}}{36.7} + \frac{{(14 - 16.9)}^{2}}{16.9} + \frac{{(12 - 6.5)}^{2}}{6.5} = 12.103$

and the number of degrees of freedom is $k - p - 1 = 4 - 1 - 1 = 2$ so that the critical value from Table 1 (at the end of the Workbook) is $χ_{0.05, 2}^{2} = 5.99$ . Clearly $12.103 > 5.99$ and we must reject the null hypothesis that the underlying distribution is Poisson.

Task!

A manufacturer produces electronic components for use in computer controlled monitoring systems. A random sample of 100 components is inspected and the number of faults per component recorded. The results are given in the table below.

Number of Faults per Component	Frequency of Occurrence
0	45
1	35
2	16
3	4

Perform a chi-squared test to determine the validity of the assumption that the occurrence of faults in the components is Poisson.

The data are given in 4 classes with observed frequencies as shown. The expected frequencies using the Poisson formula with a mean

$μ = \frac{45 \times 0 + 35 \times 1 + 16 \times 2 + 4 \times 3}{100} = 0.79$

are

\begin{array}{l} p_{0} & = P (X = 0) = \frac{e^{- μ} μ^{0}}{0!} = e^{- 0.79} = 0.454, E_{0} = 45.4 \\ p_{1} & = P (X = 1) = \frac{e^{- μ} μ^{1}}{1!} = e^{- 0.79} \times 0.79 = 0.359, E_{1} = 35.9 \\ p_{2} & = P (X = 2) = \frac{e^{- μ} μ^{2}}{2!} = \frac{e^{- 0.79} \times 0.7 9^{2}}{2} = 0.142, E_{2} = 14.2 \\ p_{3} & = P (X \geq 3) = 1 - (0.454 + 0.359 + 0.142) = 0.045, E_{3} = 4.5 \end{array}

The last frequency is small but since it is greater than 3 we will allow its use.

The test statistic is

$W = \sum_{i = 0}^{3} \frac{{(O_{i} - E_{i})}^{2}}{E_{i}} = \frac{{(45 - 45.4)}^{2}}{45.4} + \frac{{(35 - 35.9)}^{2}}{35.9} + \frac{{(16 - 14.2)}^{2}}{14.2} + \frac{{(4 - 4.5)}^{2}}{4.5} = 0.310$

and the number of degrees of freedom is $k - p - 1 = 4 - 1 - 1 = 2$ so that the critical value from tables is $χ_{0.05, 2}^{2} = 5.99$ . Clearly $0.310 < 5.99$ and we accept the null hypothesis that the underlying distribution is Poisson. Note that the decision to accept the value $E_{3} = 4.5$ is fairly marginal and that some personal judgement in such situations as to whether such values should be accepted or combined with another class is unavoidable.

Task!

Using the data of the previous Task but combining the expected frequencies of the last two classes, perform a chi-squared test to determine the validity of the assumption that the occurrence of faults in the components is Poisson.

The data are given in 4 classes with observed frequencies as shown. The expected frequencies using the Poisson formula with a mean

$μ = \frac{45 \times 0 + 35 \times 1 + 16 \times 2 + 4 \times 3}{100} = 0.79$

are

\begin{array}{l} p_{0} & = P (X = 0) = \frac{e^{- μ} μ^{0}}{0!} = e^{- 0.79} = 0.454, E_{0} = 45.4 \\ p_{1} & = P (X = 1) = \frac{e^{- μ} μ^{1}}{1!} = e^{- 0.79} \times 0.79 = 0.359, E_{1} = 35.9 \\ p_{2} & = P (X = 2) = \frac{e^{- μ} μ^{2}}{2!} = \frac{e^{- 0.79} \times 0.7 9^{2}}{2} = 0.142, E_{2} = 14.2 \\ p_{3} & = P (X \geq 3) = 1 - (0.454 + 0.359 + 0.142) = 0.045, E_{3} = 4.5 \end{array}

We will combine the expected frequencies of the last two classes and use 3 classes in total with expected frequencies of $E_{0} = 45.4, E_{1} = 35.9, E_{2} = 18.7$ .

The test statistic is

$W = \sum_{i = 0}^{3} \frac{{(O_{i} - E_{i})}^{2}}{E_{i}} = \frac{{(45 - 45.4)}^{2}}{45.4} + \frac{{(35 - 35.9)}^{2}}{35.9} + \frac{{(20 - 18.7)}^{2}}{18.7} = 0.113$

and the number of degrees of freedom is $k - p - 1 = 4 - 1 - 1 = 2$ so that the critical value from Table 1 (at the end of the Workbook) is $χ_{0.05, 2}^{2} = 5.99$ . Clearly $0.113 < 5.99$ and we accept the null hypothesis that the underlying distribution is Poisson. Note that the decision to combine the last two classes has not, in this case, affected the acceptance of the null hypothesis.

Example 2

A quality control engineer is given the job of checking the voltage output characteristics of a circuit component in a CD player. After checking 100 randomly selected components and plotting a histogram of the results, the engineer concludes that the mean output of the 100 checked components is $\bar{x} = 6.12$ volts, that the standard deviation is $s = 0.1$ volts and that the voltage distribution is probably normal. Choose a suitable test to decide whether the assumption of normality is valid at the 5% level of significance.

Solution

The engineer decides to use a chi-squared test to test the assumption of normality and follow the (common) practice of ensuring that the expected frequencies are equal. To do this, the data are put into eight equal classes and the class boundaries calculated as follows.

From the standard normal distribution the $Z$ values corresponding to class boundaries giving a probability of 0.125 (i.e. 1/8) may be read off from tables as $0, 0.32, 0.675, 1.15$ and $\infty$ for positive values and $0, - 0.32, - 0.675, - 1.15$ and $- \infty$ for negative values. Using

$Z = \frac{x - \bar{x}}{s} \to x = \bar{x} + Z . s$

the class boundaries are calculated to be: $6.005, 6.053, 6.088, 6.120, 6.152, 6.188, 6.235$ . This gives the eight classes, the observed frequencies found by the engineer (you are given this information here), and the expected frequencies as:

Classes	Observed Frequencies $O_{i}$	Expected Frequencies $E_{i}$
$x < 6.005$	8	12.5
$6.005 \leq x < 6.053$	11	12.5
$6.053 \leq x < 6.088$	16	12.5
$6.088 \leq x < 6.120$	19	12.5
$6.120 \leq x < 6.152$	18	12.5
$6.152 \leq x < 6.188$	13	12.5
$6.188 \leq x < 6.235$	9	12.5
$6.235 \leq x$	6	12.5

The hypotheses are: $H_{0}$ : distribution is normal, $H_{1} :$ distribution is not normal

The test statistic is

\begin{array}{rcl} W & = & \sum_{i = 1}^{8} \frac{{(O_{i} - E_{i})}^{2}}{E_{i}} \\ = & \frac{{(8 - 12.5)}^{2}}{12.5} + \frac{{(11 - 12.5)}^{2}}{12.5} + \frac{{(16 - 12.5)}^{2}}{12.5} + \frac{{(19 - 12.5)}^{2}}{12.5} + \frac{{(18 - 12.5)}^{2}}{12.5} \\ + & \frac{{(13 - 12.5)}^{2}}{12.5} + \frac{{(9 - 12.5)}^{2}}{12.5} + \frac{{(16 - 12.5)}^{2}}{12.5} \\ = & 1.62 + 0.18 + 0.98 + 3.38 + 2.42 + 0.02 + 0.98 + 3.38 = 12.96 \end{array}

and the number of degrees of freedom is $k - p - 1 = 8 - 2 - 1 = 5$ so that the critical value from Table 1 is $χ_{0.05, 5}^{2} = 11.07$ .

Since $11.07 < 12.96$ we have sufficient evidence to reject the null hypothesis and so the engineer should conclude that the distribution of voltages is not normal.

Task!

An electrical engineer working for a Health and Safety Executive measures the radiation emitted through the closed doors of 100 used microwave ovens. The measurements, in $mw {cm}^{- 2}$ , are given in the table below.

0.19	0.16	0.14	0.20	0.17	0.21	0.18	0.22	0.26	0.23
0.13	0.17	0.16	0.21	0.18	0.22	0.20	0.23	0.16	0.26
0.19	0.16	0.14	0.20	0.18	0.21	0.19	0.22	0.27	0.24
0.12	0.17	0.15	0.20	0.18	0.22	0.19	0.23	0.29	0.25
0.06	0.16	0.14	0.20	0.17	0.21	0.18	0.22	0.26	0.23
0.13	0.17	0.16	0.20	0.18	0.22	0.19	0.23	0.30	0.25
0.19	0.17	0.14	0.20	0.18	0.21	0.19	0.22	0.27	0.24
0.11	0.17	0.15	0.20	0.18	0.21	0.19	0.23	0.27	0.24
0.13	0.17	0.16	0.21	0.18	0.22	0.19	0.23	0.33	0.25
0.13	0.17	0.16	0.21	0.18	0.22	0.19	0.23	0.36	0.26

The mean radiation of the checked ovens is $\bar{x} = 0.20 mw {cm}^{- 2}$ , and the standard deviation is $s = 0.05 mw {cm}^{- 2}$ . Verify that the table below giving the eight classes corresponding to the observed and expected frequencies shown is correct.

Classes	Observed Frequencies $O_{i}$	Expected Frequencies $E_{i}$
$x < 0.143$	11	12.5
$0.143 \leq x < 0.166$	10	12.5
$0.166 \leq x < 0.184$	19	12.5
$0.184 \leq x < 0.200$	10	12.5
$0.200 \leq x < 0.216$	16	12.5
$0.216 \leq x < 0.234$	17	12.5
$0.234 \leq x < 0.258$	6	12.5
$0.258 \leq x$	11	12.5

Use a chi-squared test to decide whether the radiation readings obtained from the ovens are normally distributed at the 5% level of significance.

Although the choice of class boundaries is arbitrary, for convenience we choose boundaries to make eight classes with equal probabilities of 0.125.

From the standard normal distribution the $Z$ values corresponding to class boundaries giving a probability of 0.125 may be read off from tables as $0, 0.32, 0.675, 1.15$ and $\infty$ for positive values and $0, - 0.32, - 0.675, - 1.15$ and $- \infty$ for negative values. Using

$Z = \frac{x - \bar{x}}{s} \to x = \bar{x} + Z . s$

the class boundaries are calculated to be:

$0.143, 0.166, 0.184, 0.200, 0.216, 0.234, 0.258$

This gives the eight classes, the observed frequencies found by the engineer and the expected frequencies as given in the table above.

The hypotheses are: $H_{0}$ : distribution is normal, $H_{1} :$ distribution is not normal.

The test statistic is

\begin{array}{rcl} W & = & \sum_{i = 1}^{8} \frac{{(O_{i} - E_{i})}^{2}}{E_{i}} \\ = & \frac{{(11 - 12.5)}^{2}}{12.5} + \frac{{(10 - 12.5)}^{2}}{12.5} + \frac{{(19 - 12.5)}^{2}}{12.5} + \frac{{(10 - 12.5)}^{2}}{12.5} + \frac{{(16 - 12.5)}^{2}}{12.5} \\ + & \frac{{(17 - 12.5)}^{2}}{12.5} + \frac{{(6 - 12.5)}^{2}}{12.5} + \frac{{(11 - 12.5)}^{2}}{12.5} \\ = & 0.18 + 0.5 + 3.38 + 0.5 + 0.98 + 1.62 + 3.38 + 0.18 = 10.72 \end{array}

and the number of degrees of freedom is $k - p - 1 = 8 - 2 - 1 = 5$ so that the critical value from Table 1 is $χ_{0.05, 5}^{2} = 11.07$ .

Since $10.72 < 11.07$ we do not have sufficient evidence to reject the null hypothesis and so the engineer should conclude that the distribution of microwave radiation readings taken from the ovens is normal.

Exercises

A factory produces portable CD players. Every week a sample of ten players is selected and subjected to 100 hours of continuous use. At the end of this time the players are tested and the number not reaching a specified standard is recorded. The numbers recorded in 100 consecutive weeks are given below. Test the hypothesis that the data come from a binomial distribution. Use the 5% level of significance.

Number failing standard	0	1	2	3	4	5
Number of weeks	34	24	19	14	9	0

A highway engineer records the numbers of vehicles passing a point in a road in 120 consecutive one-minute intervals, as follows. Test the hypothesis that the data come from a Poisson distribution. Use the 5% level of significance.

Number of vehicles	0	1	2	3	4	5	6	7	8	9	10	11
Number of intervals	0	5	10	20	30	20	15	7	6	4	2	1

In a test of a device to generate electricity from wave power at sea, 60 observations are made of the root mean square bending moment

Y

of a component (in newton metres). The data are summarised as follows. The sample mean is 5.08 and the sample variance is 3.29. Test the hypothesis that

Y

has a normal distribution. Use the 5% level of significance.

Class	Frequency	Class	Frequency
$Y \leq 2$	1	$6 < Y \leq 7$	5
$2 < Y \leq 3$	4	$7 < Y \leq 8$	4
$3 < Y \leq 4$	12	$8 < Y \leq 9$	2
$4 < Y \leq 5$	18	$9 < Y \leq 10$	2
$5 < Y \leq 6$	11	$10 < Y$	1

Eighty aircraft components are tested until they fail. The failure times

T

in hours are summarised as follows. The sample mean is 6434. Test the hypothesis that the distribution of

T

is exponential. Use the 5% level of significance.

Class	Frequency	Class	Frequency
$0 < T \leq 2000$	11	$10000 < T \leq 12000$	3
$2000 < T \leq 4000$	21	$12000 < T \leq 14000$	5
$4000 < T \leq 6000$	19	$14000 < T \leq 16000$	1
$6000 < T \leq 8000$	9	$16000 < T \leq 18000$	3
$8000 < T \leq 10000$	4	$18000 < T$	4

Total number of failures:

0 \times 34 + 1 \times 24 + \dots + 4 \times 9 = 140.

Mean number of failures per week: $140 ∕ 100 = 1.4 .$

Estimate of $p :$ $1.4 ∕ 5 = 0.28 .$

Use binomial $(5, 0.28)$ distribution.

$P (X = j) = (\begin{matrix} 5 \\ j \end{matrix}) 0.2 8^{j} 0.7 2^{5 - j}$

No. failing	Probability	Frequency
		Expected	Observed
0	0.1935	19.35	34
1	0.3762	37.62	24
2	0.2926	29.26	19
3	0.1138	11.38	14
4	0.0221	2.21	9
5	0.0017	0.17	0

Some expected frequencies are too small so we combine neighbouring classes.

No. failing	Probability	Frequency
		Expected	Observed
0	0.1935	19.35	34
1	0.3762	37.62	24
2	0.2926	29.26	19
3,4,5	0.1376	13.76	23

Test statistic:

$W = \frac{{(34 - 19.35)}^{2}}{19.35} + \dots + \frac{{(23 - 13.76)}^{2}}{13.76} = 25.825 .$

Degrees of freedom: $4 - 1 - 1 = 2$ (4 classes, 1 estimated parameter).

Critical value: $χ_{2}^{2} (5 %) = 5.991 .$

The test statistic is significant at the 5% level. We reject the null hypothesis. We conclude that the data do not come from a binomial distribution. There seems to be an excess of large and small counts.

Total number of vehicles:

0 \times 0 + 1 \times 5 + 2 \times 10 + \dots + 11 \times 1 = 559.

Mean number of vehicles per minute: $559 ∕ 120 = 4.658 .$

Use Poisson $(4.658)$ distribution.

$P (X = j) = \frac{e^{- 4.658} 4.65 8^{j}}{j!} .$

No. vehicles	Probability	Frequency
		Expected	Observed
0	0.00949	1.14	0
1	0.04418	5.30	5
2	0.10290	12.35	10
3	0.15977	19.17	20
4	0.18606	22.33	30
5	0.17333	20.80	20
6	0.13456	16.15	15
7	0.08954	10.74	7
8	0.05214	6.26	6
9	0.02698	3.24	4
10	0.01257	1.51	2
$\geq 11$	0.00848	1.02	1

Some expected frequencies are too small so we combine neighbouring classes.

No. vehicles	Probability	Frequency
		Expected	Observed
0,1	0.05367	6.44	5
2	0.10290	12.35	10
3	0.15977	19.17	20
4	0.18606	22.33	30
5	0.17333	20.80	20
6	0.13456	16.15	15
7	0.08954	10.74	7
8	0.05214	6.26	6
$\geq 9$	0.04803	5.76	7

Test statistic:

$W = \frac{{(5 - 6.44)}^{2}}{6.44} + \dots + \frac{{(7 - 5.76)}^{2}}{5.76} = 5.132 .$

Degrees of freedom: $9 - 1 - 1 = 7$ (9 classes, 1 estimated parameter).

Critical value: $χ_{7}^{2} (5 %) = 14.07 .$

The test statistic is not significant at the 5% level. We do not reject the null hypothesis. There is insufficient evidence to conclude that the data do not come from a Poisson distribution.

Using a

N (5.08, 3.29)

distribution we can calculate the probabilities for the various class intervals. For example,

\begin{array}{rcl} P (5 < Y \leq 6) & = & Φ (\frac{6 - 5.08}{\sqrt{3.29}}) - Φ (\frac{5 - 5.08}{\sqrt{3.29}}) \\ = & Φ (0.5072) - Φ (- 0.0441) \\ = & Φ (0.5072) + Φ (0.0441) - 1 \\ = & 0.694 + 0.518 - 1 = 0.694 - 0.482 = 0.212 . \end{array}

Bending moment $Y$	Probability	Frequency
		Expected	Observed
$Y \leq 2$	0.045	2.70	1
$2 < Y \leq 3$	0.081	4.86	4
$3 < Y \leq 4$	0.150	9.00	12
$4 < Y \leq 5$	0.206	12.36	18
$5 < Y \leq 6$	0.212	12.72	11
$6 < Y \leq 7$	0.161	9.66	5
$7 < Y \leq 8$	0.091	5.46	4
$8 < Y \leq 9$	0.039	2.34	2
$9 < Y \leq 10$	0.012	0.72	2
$10 < Y$	0.003	0.18	1

Some expected frequencies are too small so we combine neighbouring classes.

Bending moment $Y$	Probability	Frequency
		Expected	Observed
$Y \leq 3$	0.126	7.56	5
$3 < Y \leq 4$	0.150	9.00	12
$4 < Y \leq 5$	0.206	12.36	18
$5 < Y \leq 6$	0.212	12.72	11
$6 < Y \leq 7$	0.161	9.66	5
$7 < Y \leq 8$	0.091	5.46	4
$8 < Y$	0.054	3.24	5

Test statistic:

$W = \frac{{(5 - 7.56)}^{2}}{7.56} + \dots + \frac{{(5 - 3.24)}^{2}}{3.24} = 8.267 .$

Degrees of freedom: $7 - 2 - 1 = 4$ (7 classes, 2 estimated parameters).

Critical value: $χ_{4}^{2} (5 %) = 9.488 .$

The test statistic is not significant at the 5% level. We do not reject the null hypothesis. There is insufficient evidence to conclude that the data do not come from a normal distribution.

The sample mean is 6434. We estimate

λ

using

1 ∕ 6434 = 1.554 \times 1 0^{- 4} .

We use an exponential

(1.554 \times 1 0^{- 4})

distribution. For example

\begin{array}{rcl} P (2000 < T \leq 4000) & = & {1 - exp (- 4000 \times 1.544 \times 1 0^{- 4})} - {1 - exp (- 2000 \times 1.544 \times 1 0^{- 4})} \\ = & exp (- 2000 ∕ 6434) - exp (- 4000 ∕ 6434) \\ = & exp (- 0.3108) - exp (- 0.6217) \\ = & 0.733 - 0.537 = 0.196 \end{array}

Failure time $T$	Probability	Frequency
		Expected	Observed
$0 < T \leq 2000$	0.267	21.36	11
$2000 < T \leq 4000$	0.196	15.68	21
$4000 < T \leq 6000$	0.143	11.44	19
$6000 < T \leq 8000$	0.106	8.48	9
$8000 < T \leq 10000$	0.077	6.16	4
$10000 < T \leq 12000$	0.056	4.48	3
$12000 < T \leq 14000$	0.041	3.28	5
$14000 < T \leq 16000$	0.031	2.48	1
$16000 < T \leq 18000$	0.022	1.76	3
$18000 < T$	0.061	4.88	4

Some expected frequencies are too small so we combine neighbouring classes.

Failure time $T$	Probability	Frequency
		Expected	Observed
$0 < T \leq 2000$	0.267	21.36	11
$2000 < T \leq 4000$	0.196	15.68	21
$4000 < T \leq 6000$	0.143	11.44	19
$6000 < T \leq 8000$	0.106	8.48	9
$8000 < T \leq 10000$	0.077	6.16	4
$10000 < T \leq 12000$	0.056	4.48	3
$12000 < T \leq 14000$	0.041	3.28	5
$14000 < T \leq 18000$	0.053	4.24	4
$18000 < T$	0.061	4.88	4

Test statistic:

$W = \frac{{(11 - 21.36)}^{2}}{21.36} + \dots + \frac{{(4 - 4.88)}^{2}}{4.88} = 14.18 .$

Degrees of freedom: $9 - 1 - 1 = 7$ (9 classes, 1 estimated parameter).

Critical value: $χ_{7}^{2} (5 %) = 14.07 .$

The test statistic is significant at the 5% level. We reject the null hypothesis. We conclude that the data do not come from an exponential distribution. The observed frequency in the first class seems to be too small.

1 Goodness-of-fit tests

Answer

Answer

Answer

Answer