The Wilcoxon signed-rank test

4 The Wilcoxon signed-rank test

As you will now appreciate, the sign test only makes use of the signs of the differences between observed data and the median $θ$ or pairs of differences between observed data in the case of a paired sample. In either case, no account is taken of the size of the differences arising. The statistician Frank Wilcoxon developed a procedure which takes into account both the sign and the magnitude of the differences arising. The resulting test is now widely known as the Wilcoxon signed-rank test. You should note that the test applies to symmetric continuous distributions and it is important that you justify this assumption before applying the procedure to a set of data. Note that under this condition, the mean and the median of a distribution are equal and we can use this fact to test the null hypothesis.

$H_{0} : μ = μ_{0}$

against the alternatives

$\begin{matrix} H_{1} : μ \neq μ_{0} \\ H_{1} : μ > μ_{0} \\ H_{1} : μ < μ_{0} \end{matrix}$

While the theory underpinning this test is complex and is not considered here, the actual test procedure is straightforward and involves the use of special tables. A copy of the Wilcoxon signed-rank test table is given at the end of this Workbook (Table 1). The test procedure is as follows.

On the assumption that $x_{1}, x_{2}, x_{3}, \dots, x_{n}$ is a random sample taken from a continuous symmetric distribution with mean and median $μ = θ$ we test the null hypothesis $H_{0} : μ = μ_{0}$ against one of the alternatives given above.
Calculate the differences $x - μ_{0}, i = 1, \dots, n$ .
Rank the absolute differences $|x_{i} - μ_{0}|, i = 1, \dots, n$ in ascending order.
Label the ranks with the signs of their corresponding differences.
Sum the ranks corresponding to positive differences to obtain the value $S_{P}$ .
Sum the ranks corresponding to negative differences to obtain the value $S_{N}$ .
Let $S = min (S_{P}, S_{N})$ .
Use Table 1 at the end of this Workbook to reject (if appropriate) the null hypothesis as follows:


Case 1	$\begin{matrix} H_{0} : & μ = μ_{0} \\ H_{1} : & μ \neq μ_{0} \end{matrix}$	Reject $H_{0}$ if $S \leq$ tabulated value


Case 2	$\begin{matrix} H_{0} : & μ = μ_{0} \\ H_{1} : & μ > μ_{0} \end{matrix}$	Reject $H_{0}$ if $S_{N} \leq$ tabulated value


Case 3	$\begin{matrix} H_{0} : & μ = μ_{0} \\ H_{1} : & μ < μ_{0} \end{matrix}$	Reject $H_{0}$ if $S_{P} \leq$ tabulated value

4.1 Note

It is possible that calculation will result in data with equal rankings. Ties in ranking are dealt with in the usual way. The short example below reminds you how to deal with equal ranking.

Data	Incorrect ranks	Correct ranks
3.1	1	1
4.2	2	2.5
4.2	3	2.5
5.7	4	4.5
5.7	5	4.5
7	6	6
8.1	7	7

To illustrate the application of the Wilcoxon signed-rank test, we will use one of the examples used previously when considering the sign test. The example is repeated here for convenience.

Example 5

The compressive strength of insulating blocks used in the construction of new houses is tested by a civil engineer. The engineer needs to be certain at the 5% level of significance that the median compressive strength is at least 1000 psi. Twenty randomly selected blocks give the following results:

Observation	Compressive Strength
1	1128.7
2	679.1
3	1317.2
4	1001.3
5	1107.6
6	718.4
7	787.4
8	1562.3
9	1356.9
10	1153.2
11	1167.1
12	1387.5
13	679.9
14	1323.2
15	788.4
16	1153.6
17	1423.3
18	1122.6
19	1644.3
20	737.4

Use the Wilcoxon signed-rank test to decide (at the 5% level of significance) whether the hypothesis that the median compressive strength of the insulating blocks is at least 1000 psi is acceptable.

Solution

Assume that the data are taken from a symmetric continuous distribution, so the mean and median are identical. The hypotheses may be stated as:

$\begin{matrix} H_{0} : μ = 1000 \\ H_{1} : μ > 1000 \end{matrix}$

The differences are:

Observation	Compressive Strength	$x_{i} - 1000$	$\|x_{i} - 1000\|$	Ascending Order	Signed Rank
1	1128.7	128.7	128.7	1.3	$+ 1$
2	679.1	$- 320.9$	320.9	107.6	+2
3	1317.2	317.2	317.2	122.6	+3
4	1001.3	1.3	1.3	128.7	+4
5	1107.6	107.6	107.6	153.2	+5
6	718.4	$- 281.6$	281.6	153.6	+6
7	787.4	$- 212.6$	212.6	167.1	+7
8	1562.3	562.3	562.3	211.6	$- 8$
9	1356.9	356.9	356.9	212.6	$- 9$
10	1153.2	153.2	153.2	262.6	$- 10$
11	1167.1	167.1	167.1	281.6	$- 11$
12	1387.5	387.5	387.5	317.5	+12
13	679.9	$- 320.1$	320.1	320.1	$- 13$
14	1323.2	323.2	323.2	320.9	$- 14$
15	788.4	$- 211.6$	211.6	323.2	+15
16	1153.6	153.6	153.6	356.9	+16
17	1423.3	423.3	423.3	387.5	+17
18	1122.6	122.6	122.6	423.3	+18
19	1644.3	644.3	644.3	562.3	+19
20	737.4	$- 262.6$	262.6	644.3	+20

We now calculate the sum $S_{N}$ in order to decide whether to reject the null hypothesis. Note that the form of the null hypothesis dictates that we only need to calculate $S_{N}$ ,

$S_{N} = |- 8 - 9 - 10 - 11 - 13 - 14| = 65$

From Table 1, the critical value at the 5% level of significance for a one-tailed test performed with a sample of 20 values is 60. Since $60 < 65$ we conclude that we cannot reject the null hypothesis and that on the basis of the available evidence, the median compressive strength of the insulating blocks is not significantly different to 1000 psi.

Now do the following Tasks.

Again you have seen this problem previously (Task on page 7). This time you are required to use the Wilcoxon signed-rank test to decide whether to reject the null hypothesis.

Task!

A certain type of solid rocket fuel is manufactured by bonding an igniter with a propellant. in order that the fuel burns smoothly and does not suffer either “flame-out” or become unstable it is essential that the shear strength of the material bonding the two components of the fuel has a shear strength of 2000 psi. The results arising from tests performed on 10 randomly selected sample of fuel are as follows.

Observation	Shear Strength	Observation	Shear Strength
1	2128.7	6	1718.4
2	1679.1	7	1787.4
3	2317.2	8	2562.3
4	2001.3	9	2356.9
5	2107.6	10	2153.2

Using the Wilcoxon signed-rank test and the 5% level of significance, test the null hypothesis that the median shear strength is 2000 psi.

Assume that the data are taken from a symmetric continuous distribution. The hypotheses are

$\begin{matrix} H_{0} : μ = 2000 \\ H_{1} : μ \neq 2000 \end{matrix}$

The Wilcoxon calculations are as shown below. We perform a two-tailed test.

Shear Strength	$x_{1} - 2000$	Sorted $\|x_{i} - 2000\|$	Signed Rank
$2128.7$	$128.7$	$1.3$	$+ 1$
$1679.1$	$- 320.9$	$107.6$	$+ 2$
$2317.2$	$317.2$	$128.7$	$+ 3$
$2001.3$	$1.3$	$153.2$	$+ 4$
$2107.6$	$107.6$	$212.6$	$- 5$
$1718.4$	$- 281.6$	$281.6$	$- 6$
$1787.4$	$- 212.6$	$317.2$	$+ 7$
$2562.3$	$562.3$	$320.9$	$- 8$
$2356.9$	$356.9$	$356.9$	$+ 9$
$2153.2$	$153.2$	$562.3$	$+ 10$

We now calculate the sums $S_{N}, S_{P}$ and $S$ in order to decide whether to reject the null hypothesis.

\begin{array}{rcl} S_{N} & = & |- 5 - 6 - 8| = 19 \\ S_{p} & = & |1 + 2 + 3 + 4 + 7 + 9 + 10| = 36 \\ S & = & min (S_{p}, S_{N}) = min (36, 19) = 19 \end{array}

From Table 1, the critical value at the 5% level of significance for a two-tailed test performed with a sample of 10 values is 8. Since $8 < 19$ we conclude that we cannot reject the null hypothesis and that, on the basis of the available evidence, the median compressive strength of the insulating blocks is not significantly different to 2000 psi.

Task!

An automotive development engineer is investigating the properties of two fuel injection systems in order to determine whether they exhibit any significant difference in the level of fuel economy measured on different cars. The systems are fitted to 12 cars and a test is run ensuring that each injection system is used on each car under conditions which are as uniform as possible. The fuel consumption figures (in miles per gallon) obtained are given in the table below. Use the Wilcoxon signed-rank test applied to the differences in the paired data to decide whether the median fuel consumption figures are significantly different at the 5% level of significance.

	Fuel Injection System
Car	1	2
1	27.6	26.3
2	29.4	31.0
3	29.5	28.2
4	27.2	26.1
5	25.8	27.6
6	26.9	25.8
7	26.7	28.2
8	28.9	27.6
9	27.3	26.9
10	29.2	30.3
11	27.8	26.9
12	29.2	28.3

We assume that each data set is taken from separate continuous distributions. It can be shown that this ensures that the distribution of differences is then symmetric and continuous. In this case the median and mean are identical. We are testing to find any differences in the median miles per gallon figures for each injection system. The null and alternative hypotheses are:

$H_{0} : μ_{1} = μ_{1} or H_{0} : μ_{differences} = 0$

$H_{1} : μ_{1} \neq μ_{2} or H_{1} : μ_{differences} \neq 0$

We perform a two-tailed test.

The signed ranks are obtained as shown in the table below:

	Fuel Injection System
Car	1	2	Differences	Sorted Abs	Signed
1	27.6	26.3	1.3	0.4	+1
2	29.4	31.0	$-$ 1.6	0.9	+2.5
3	29.5	28.2	1.3	0.9	+2.5
4	27.2	26.1	1.1	1.1	+5
5	25.8	27.6	$-$ 1.8	1.1	+5
6	26.9	25.8	1.1	1.1	$-$ 5
7	26.7	28.2	$-$ 1.5	1.3	+8
8	28.9	27.6	1.3	1.3	+8
9	27.3	26.9	0.4	1.3	+8
10	29.2	30.3	$-$ 1.1	1.5	$-$ 10
11	27.8	26.9	0.9	1.6	$-$ 11
12	29.2	28.3	0.9	1.8	$-$ 12

We now calculate the sums $S_{N}, S_{P}$ and $S$ in order to decide whether to reject the null hypothesis.

\begin{array}{l} S_{N} & = |- 5 - 10 - 11 - 12| = 38 \\ S_{P} & = |1 + 2.5 + 2.5 + 5 + 5 + 8 + 8 + 8| = 40 \\ S & = min (S_{P}, S_{N}) = min (40, 38) = 38 \end{array}

From Table 1, the critical value at the $5 %$ level of significance for a two-tailed test performed with a sample of 12 values is 13.

Since $13 < 38$ we conclude that we cannot reject the null hypothesis and that on the basis of the available evidence, the two injection systems do not differ significantly in respect of the fuel economy they offer.

4.2 General comments about the Wilcoxon signed-rank test

For underlying normal populations, either the $t$ -test or the Wilcoxon signed-rank test may be used to test the null hypothesis, say $H_{0} : μ = μ_{0}$ , concerning the mean of the distribution against the usual alternative. Comparisons between the two tests are difficult since it is hard to obtain the Type II error for the Wilcoxon signed-rank test and hard to obtain the Type II error for the $t$ -test in the case of non-normal populations. For the $t$ -test, the Type I error rate is wrong in non-normal populations.
Investigations have shown that the Wilcoxon signed-rank test is never much worse than the $t$ -test and in the case of non-normal populations it may be rather better. The Wilcoxon signed-rank test may be seen as a useful alternative to the $t$ -test, especially when doubt is cast on the normality of the underlying distribution.

Exercises

Springs used in the lids of portable CD players are subjected to testing by repeated flexing until they fail. The times, in hours, to failure of forty springs are given below. Those times marked * indicate cases where the experiment was stopped before the spring failed.

*48.0	41.2	1.2	*48.0	*48.0	0.7	0.2	12.2
0.7	19.0	1.9	0.0	42.6	*48.0	15.7	*48.0
4.3	24.2	*48.0	47.5	33.3	17.8	15.9	8.2
4.6	2.7	25.3	3.2	15.7	10.5	2.4	37.1
4.1	30.0	*48.0	19.9	39.3	*48.0	17.5	*48.0

Use a sign test to test the null hypothesis that the median time to failure is 15 hours against the alternative that it is greater than 15 hours. Use the 5% level of significance.

In dual-pivot bicycle brakes the control cable enters on one side and there is potential for greater wear in the brake pads on one side than the other. Thirty trials were conducted with a test rig in which a brake was fitted to a wheel connected to a flywheel which was repeatedly set in motion and then brought to rest by the brake with a fixed force applied. The abrasion loss of each brake pad was measured (mg).

Run	Left	Right	Run	Left	Right
1	114	105	16	150	132
2	149	141	17	160	161
3	116	144	18	50	56
4	69	130	19	128	192
5	134	185	20	147	121
6	117	108	21	72	74
7	78	111	22	120	131
8	146	170	23	103	92
9	88	107	24	145	120
10	105	96	25	96	112
11	117	139	26	63	73
12	102	140	27	85	103
13	68	137	28	137	133
14	105	111	29	107	141
15	65	123	30	67	83

Use a sign test to test the null hypothesis that the median difference between left-pad wear and right-pad wear is zero against the two-sided alternative. Use the 5% level of significance.

Loaded lorries leaving a quarry are weighed on a weigh bridge. To test the weigh bridge, each of a sample of twelve lorries is driven to a second weigh bridge and weighed again. The differences (kg) between the two weights (first $-$ second) are given below.
$\begin{matrix} 38 & 14 & 16 & 54 & 36 & - 19 & - 24 & 1 & - 18 & 5 & - 14 & - 28 \end{matrix}$

Use a Wilcoxon signed-rank test to test the null hypothesis that there is no systematic difference in the weights given by the two weigh bridges. Use the 5% level of significance. Comment on any assumptions which you need to make.
Apply a Wilcoxon signed-rank test to test to the data in Exercise 2 to test the null hypothesis that the mean difference in abrasion loss between the left and right pads is zero. Use the 5% level of significance. Comment on any assumptions which you need to make.

Under the null hypothesis the probability that the failure time is greater than 15 hours is 0.5 and the distribution of the number with failure times greater than 15 hours in binomial $(40, 0.5)$ . Of the forty test springs, 25 had failure times greater than 15 hours. The probability under the null hypothesis of observing at least 25 can be found approximately using the normal distribution $N (20, 10) .$ Now
$\frac{24.5 - 20}{\sqrt{10}} = 1.423$

and the probability that a standard normal random variable is greater than $1.423$ is $1 - Φ (1.423) = 0.077 .$ Since $0.077 > 0.05,$ the result is not significant at the 5% level and we do not reject the null hypothesis that the median failure time is 15 hours.
In 9 cases the left-pad wear is greater than the right-pad wear. Let $X$ be the number of cases where left-pad wear is greater than right-pad wear. Under the null hypothesis $X$ has a binomial $(30, 0.5)$ distribution. The probability of observing a value less than or equal to 9 from this distribution is $0.0214$ . Because we are testing against the two-sided alternative we double this to $0.0428$ and, because $0.0428 < 0.05,$ the result is significant at the 5% level. We reject the null hypothesis and conclude that left-pad wear tends to be less than right-pad wear.

The observations and their signed ranks are as follows.

Observation	$38$	$14$	$16$	$54$	$36$	$- 19$
Signed rank	$11.0$	$3.5$	$5.0$	$12.0$	$10.0$	$- 7.0$

Observation	$- 24$	$1$	$- 18$	$5$	$- 14$	$- 28$
Signed rank	$- 8.0$	$1.0$	$- 6.0$	$2.0$	$- 3.5$	$- 9.0$

The sum of the positive ranks is $44.5$ and the sum of the negative ranks is $33.5$ . For a two-tailed test at the 5% level of significance, the critical value is 13 and we compare the smaller rank sum with this. We see that the rank sum is not less than 13 so the result is not significant and we do not reject the null hypothesis. There is no significant evidence of a systematic difference between the weigh bridges.

Comment : We are assuming that, under the null hypothesis, the distribution of the differences is symmetric. This may well be valid in this case since, if the weigh bridges are really the same then the differences between values given by them should be distributed symmetrically about zero. (We also have to assume that the weight does not change systematically on the journey between the weigh bridges, for example by spillage.)

The thirty differences (left

-

right) and their signed ranks are as follows.

Run	Difference	Signed rank	Run	Difference	Signed rank
$1$	$9$	$8.0$	$16$	$18$	$15.5$
$2$	$8$	$6.0$	$17$	$- 1$	$- 1.0$
$3$	$- 28$	$- 22.0$	$18$	$- 6$	$- 4.5$
$4$	$- 61$	$- 28.0$	$19$	$- 64$	$- 29.0$
$5$	$- 51$	$- 26.0$	$20$	$26$	$21.0$
$6$	$9$	$8.0$	$21$	$- 2$	$- 2.0$
$7$	$- 33$	$- 23.0$	$22$	$- 11$	$- 11.5$
$8$	$- 24$	$- 19.0$	$23$	$11$	$11.5$
$9$	$- 19$	$- 17.0$	$24$	$25$	$20.0$
$10$	$9$	$8.0$	$25$	$- 16$	$- 13.5$
$11$	$- 22$	$- 18.0$	$26$	$- 10$	$- 10.0$
$12$	$- 38$	$- 25.0$	$27$	$- 18$	$- 15.5$
$13$	$- 69$	$- 30.0$	$28$	$4$	$3.0$
$14$	$- 6$	$- 4.5$	$29$	$- 34$	$- 24.0$
$15$	$- 58$	$- 27.0$	$30$	$- 16$	$- 13.5$

The sum of the positive ranks is 101. The sum of the negative ranks is 364. (The total of the ranks is $0.5 \times 30 \times 31 = 465$ .) With $n = 30$ the distribution of the rank sum under the null hypothesis is approximately normal with mean $M = n (n + 1) ∕ 4 = 30 \times 31 ∕ 4 = 232.3$ and standard deviation $S = \sqrt{n (n + 1) (2 n + 1) ∕ 24} = \sqrt{30 \times 31 \times 61 ∕ 24} = 48.62 .$ For a two-sided test at the 5% level we reject the null hypothesis if either rank sum is outside the range $M \pm 1.96 S,$ which is $232.3 \pm 95.3$ or 137.0 to 327.6. We see that the rank sums are indeed outside of this range so we reject the null hypothesis at the 5% level and conclude that left-pad wear tends to be less than right-pad wear.

Comment : We are assuming that, under the null hypothesis, the distribution of the differences is symmetric. This seems reasonable since the assumption that there is no systematic difference bewteen left and right would imply that the distribution of differences in observed wear should be symmetric.

4 The Wilcoxon signed-rank test

4.1 Note

Example 5

Solution

Task!

Answer

Task!

Answer

4.2 General comments about the Wilcoxon signed-rank test

Exercises

Answer