The sign test for paired data

3 The sign test for paired data

Very often, experiments are designed so that the results occur in matched pairs. In these cases the sign test can often be applied to decide between two hypotheses concerning the data. Performing a sign test involves counting the number of times when, say, the first score is higher then the second $-$ designated by a “ $+$ ” sign and the number of times that the first score is lower than the second $-$ designated by a “ $-$ ” sign.

3.1 Ties

It is, of course, possible that in some cases, the scores will be equal, that is, they are said to be tied .

There are two ways in which tied scores are dealt with.

3.2 Method 1

Ties may be counted as minus signs so that they count for the null hypothesis. The logic of this is that equal scores cannot be used as agents for change.

3.3 Method 2

Ties may be discounted completely and not used in any analysis performed. The logic of this is that ties can sometimes occur because of the way in which the data are collected. Throughout this Workbook, any ties occurring will be discounted and ignored in any subsequent analysis.

Essentially, we take paired observations, say $(X_{1 i}, X_{2 i}), i = 1 \dots n$ , from a continuous population and proceed as illustrated below.

Example 3

In an experiment concerning gas cutting of steel for use in off-shore structures, 48 test plates were prepared. Each plate was cut using both oxy-propane cutting and oxy-natural gas cutting and, in each case, the maximum Vickers hardness near the cut edge was measured. The results were as follows.

Plate	Propane	Nat. gas	Plate	Propane	Nat. gas	Plate	Propane	Nat. gas
1	291	296	17	295	272	33	325	313
2	315	281	18	327	300	34	312	323
3	318	310	19	329	309	35	318	317
4	319	312	20	319	291	36	314	317
5	312	320	21	327	317	37	324	334
6	296	297	22	317	279	38	319	293
7	331	319	23	289	282	39	305	294
8	316	290	24	321	301	40	305	332
9	321	301	25	299	259	41	306	330
10	283	259	26	325	302	42	303	296
11	316	327	27	307	337	43	321	311
12	342	306	28	291	320	44	328	338
13	302	259	29	312	300	45	302	292
14	312	314	30	335	330	46	324	278
15	293	268	31	319	307	47	327	352
16	346	300	32	310	307	48	329	295

Use a sign test to test the null hypothesis that the mean difference between the hardnesses produced by the two methods is zero against the alternative that it is not zero. Use the 1% level of significance.

Solution

We are testing to see whether there is evidence that the media difference between the hardnesses produced by the two methods is zero. The null and alternative hypotheses are:

$H_{0} : θ_{differences} = 0 H_{1} : θ_{differences} \neq 0$

We perform a two-tailed test. The signs of the differences (propane minus natural gas) are shown in the table below.

Plate	Prop.	N.gas		Plate	Prop	N.gas		Plate	Prop	N.gas
1	291	296	$-$	17	295	272	$+$	33	325	313 $+$
2	315	281	$+$	18	327	300	$+$	34	312	323 $-$
3	318	310	$+$	19	329	309	$+$	35	318	317 $+$
4	319	312	$+$	20	319	291	$+$	36	314	317 $-$
5	312	320	$-$	21	327	317	$+$	37	324	334 $-$
6	296	297	$-$	22	317	279	$+$	38	319	293 $+$
7	331	319	$+$	23	289	282	$+$	39	305	294 $+$
8	316	290	$+$	24	321	301	$+$	40	305	332 $-$
9	321	301	$+$	25	299	259	$+$	41	306	330 $-$
10	283	259	$+$	26	325	302	$+$	42	303	296 $+$
11	316	327	$-$	27	307	337	$-$	43	321	311 $+$
12	342	306	$+$	28	291	320	$-$	44	328	338 $-$
13	302	259	$+$	29	312	300	$+$	45	302	292 $+$
14	312	314	$-$	30	335	330	$+$	46	324	278 $+$
15	293	268	$+$	31	319	307	$+$	47	327	352 $-$
16	346	300	$+$	32	310	307	$+$	48	329	295 $+$

There are 34 positive differences and 14 negative differences.The probability of getting 14 or fewer negative differences, if the probability that a difference is negative is $0.5,$ is

\begin{array}{rcl} P (X \leq 14) & = & \sum_{r = 0}^{14} (\begin{matrix} 48 \\ r \end{matrix}) {(\frac{1}{2})}^{r} {(\frac{1}{2})}^{48 - r} = \sum_{r = 0}^{14} (\begin{matrix} 48 \\ r \end{matrix}) {(\frac{1}{2})}^{48} \\ = & 0.0027576 \end{array}

We can find this value approximately by using the normal approximation. The required mean and variance are $48 \times 0.5 = 24$ and $48 \times 0.5 \times 0.5 = 12$ repectively. So we calculate the probability that a normal random variable with mean 24 and variance 12 is less than $14.5 .$

\begin{array}{rcl} P (X \leq 14) \approx P (Y < 14.5) & = & P (\frac{Y - 24}{\sqrt{12}} < \frac{14.5 - 24}{\sqrt{12}}) = Φ (\frac{14.5 - 24}{\sqrt{12}}) \\ = & Φ (- 2.742) = 1 - Φ (2.742) \\ = & 1 - 0.9969 = 0.0031 \end{array}

For a two-sided test at the 1% level we must compare this probability with 0.5%, that is 0.005. We see that, even using the larger approximate value, our probability is less than 0.005 so our test statistic is significant at the 1% level. We therefore reject the null hypothesis and conclude that the evidence suggests strongly that the median of the differences is not zero but is, in fact, positive. Use of propane tends to result in greater hardness.

Example 4

Automotive development engineers are testing the properties of two anti-lock braking systems in order to determine whether they exhibit any significant difference in the stopping distance achieved by different cars.

The systems are fitted to 10 cars and a test is run ensuring that each system is used on each car under conditions which are as uniform as possible.

The stopping distances (in yards) obtained are given in the table below.

	Anti-lock Braking System
Car	1	2
1	27.7	26.3
2	32.1	31.0
3	29.6	28.1
4	29.2	28.1
5	27.8	27.9
6	26.9	25.8
7	29.7	28.2
8	28.9	27.6
9	27.3	26.5
10	29.9	28.3

Solution

We are testing to find any differences in the median stopping distance figures for each braking system. The null and alternative hypotheses are:

$H_{0} : θ_{1} = θ_{2} or H_{0} : θ_{differences} = 0$

$H_{1} : θ_{1} \neq θ_{2} or H_{1} : θ_{differences} \neq 0$

We perform a two-tailed test.

The signed differences shown by the two systems are shown in the table below:

	Anti-lock Braking System
Car	1	2	Sign
1	27.7	26.3	+
2	32.1	31.0	+
3	29.6	28.1	+
4	29.2	28.1	+
5	27.8	27.9	$-$
6	26.9	25.8	+
7	29.7	28.2	+
8	28.9	27.6	+
9	27.3	26.5	+
10	29.9	28.3	+

We have 9 plus signs and the required probability value is calculated directly from the binomial formula as

\begin{array}{rcl} P (X \geq 9) & = & \sum_{r = 9}^{10} ((\begin{matrix} 10 \\ r \end{matrix})) {(\frac{1}{2})}^{10 - r} {(\frac{1}{2})}^{r} \\ = & \frac{10}{1} {(\frac{1}{2})}^{10} + {(\frac{1}{2})}^{10} = 11 \times {(\frac{1}{2})}^{10} ≃ 0.011 \end{array}

Since we are performing a two-tailed test, we must compare the calculated value with the value 0.025. Since $0.011 < 0.025$ we reject the null hypothesis on the basis of the available evidence and conclude the the differences in the median stopping distances recorded is significant at the 5% level.

3.4 General comments about the sign test

Before the sign test can be applied we must be sure that the underlying distribution is continuous. Usually, the second score being higher than the first score counts as a plus sign. The null hypothesis $H_{0}$ is that the probability of obtaining each sign is the same, that is $p = \frac{1}{2}$ . The alternative hypothesis $H_{1}$ may be that $p \neq \frac{1}{2}$ which gives a two-tailed test or $p > \frac{1}{2}$ or $p < \frac{1}{2}$ each of which gives a one-tailed test.
If $H_{0}$ is correct, the test involves the $B (n, 0.5)$ distribution which, if $n$ is “large” and the conditions for the normal approximation hold, can be approximated by the $N (n \times \frac{1}{2}, \sqrt{n \times \frac{1}{2} \times \frac{1}{2}})$ distribution. This approximation can save much tedious arithmetic and time.
The sign test may not be as reliable as an equivalent parametric test since it relies only on the sign of the difference of each pair and not on the size of the difference. If it is possible it is suggested that an equivalent parametric test is used.
If the underlying distribution is normal, either the sign test or the $t$ -test may be used to test the null hypothesis $H_{0} : θ = θ_{0}$ against the usual alternative, but the $t$ -test will not give valid results when the data are non-normal. It can be shown that the $t$ -test produces a smaller Type II error probability for one-sided tests and also for two-sided tests where the critical regions are symmetric. Hence we may claim that the $t$ -test is superior to the sign test when the underlying distribution is normal.