3 The sign test for paired data

Very often, experiments are designed so that the results occur in matched pairs. In these cases the sign test can often be applied to decide between two hypotheses concerning the data. Performing a sign test involves counting the number of times when, say, the first score is higher then the second designated by a “ + ” sign and the number of times that the first score is lower than the second designated by a “ ” sign.

3.1 Ties

It is, of course, possible that in some cases, the scores will be equal, that is, they are said to be tied .

There are two ways in which tied scores are dealt with.

3.2 Method 1

Ties may be counted as minus signs so that they count for the null hypothesis. The logic of this is that equal scores cannot be used as agents for change.

3.3 Method 2

Ties may be discounted completely and not used in any analysis performed. The logic of this is that ties can sometimes occur because of the way in which the data are collected. Throughout this Workbook, any ties occurring will be discounted and ignored in any subsequent analysis.

Essentially, we take paired observations, say ( X 1 i , X 2 i ) , i = 1 n , from a continuous population and proceed as illustrated below.

Example 3

In an experiment concerning gas cutting of steel for use in off-shore structures, 48 test plates were prepared. Each plate was cut using both oxy-propane cutting and oxy-natural gas cutting and, in each case, the maximum Vickers hardness near the cut edge was measured. The results were as follows.

Plate Propane Nat. gas Plate Propane Nat. gas Plate Propane Nat. gas
1 291 296 17 295 272 33 325 313
2 315 281 18 327 300 34 312 323
3 318 310 19 329 309 35 318 317
4 319 312 20 319 291 36 314 317
5 312 320 21 327 317 37 324 334
6 296 297 22 317 279 38 319 293
7 331 319 23 289 282 39 305 294
8 316 290 24 321 301 40 305 332
9 321 301 25 299 259 41 306 330
10 283 259 26 325 302 42 303 296
11 316 327 27 307 337 43 321 311
12 342 306 28 291 320 44 328 338
13 302 259 29 312 300 45 302 292
14 312 314 30 335 330 46 324 278
15 293 268 31 319 307 47 327 352
16 346 300 32 310 307 48 329 295

Use a sign test to test the null hypothesis that the mean difference between the hardnesses produced by the two methods is zero against the alternative that it is not zero. Use the 1% level of significance.

Solution

We are testing to see whether there is evidence that the media difference between the hardnesses produced by the two methods is zero. The null and alternative hypotheses are:

H 0 : θ differences = 0 H 1 : θ differences 0

We perform a two-tailed test. The signs of the differences (propane minus natural gas) are shown in the table below.

Plate Prop. N.gas Plate Prop N.gas Plate Prop N.gas
1 291 296 17 295 272 + 33 325 313 +
2 315 281 + 18 327 300 + 34 312 323
3 318 310 + 19 329 309 + 35 318 317 +
4 319 312 + 20 319 291 + 36 314 317
5 312 320 21 327 317 + 37 324 334
6 296 297 22 317 279 + 38 319 293 +
7 331 319 + 23 289 282 + 39 305 294 +
8 316 290 + 24 321 301 + 40 305 332
9 321 301 + 25 299 259 + 41 306 330
10 283 259 + 26 325 302 + 42 303 296 +
11 316 327 27 307 337 43 321 311 +
12 342 306 + 28 291 320 44 328 338
13 302 259 + 29 312 300 + 45 302 292 +
14 312 314 30 335 330 + 46 324 278 +
15 293 268 + 31 319 307 + 47 327 352
16 346 300 + 32 310 307 + 48 329 295 +

There are 34 positive differences and 14 negative differences.The probability of getting 14 or fewer negative differences, if the probability that a difference is negative is 0.5 , is

P ( X 14 ) = r = 0 14 48 r 1 2 r 1 2 48 r = r = 0 14 48 r 1 2 48 = 0.0027576

We can find this value approximately by using the normal approximation. The required mean and variance are 48 × 0.5 = 24 and 48 × 0.5 × 0.5 = 12 repectively. So we calculate the probability that a normal random variable with mean 24 and variance 12 is less than 14 . 5 .

P ( X 14 ) P ( Y < 14.5 ) = P Y 24 12 < 14.5 24 12 = Φ 14.5 24 12 = Φ ( 2.742 ) = 1 Φ ( 2.742 ) = 1 0.9969 = 0.0031

For a two-sided test at the 1% level we must compare this probability with 0.5%, that is 0.005. We see that, even using the larger approximate value, our probability is less than 0.005 so our test statistic is significant at the 1% level. We therefore reject the null hypothesis and conclude that the evidence suggests strongly that the median of the differences is not zero but is, in fact, positive. Use of propane tends to result in greater hardness.

Example 4

Automotive development engineers are testing the properties of two anti-lock braking systems in order to determine whether they exhibit any significant difference in the stopping distance achieved by different cars.

The systems are fitted to 10 cars and a test is run ensuring that each system is used on each car under conditions which are as uniform as possible.

The stopping distances (in yards) obtained are given in the table below.

Anti-lock
Braking System
Car 1 2
1 27.7 26.3
2 32.1 31.0
3 29.6 28.1
4 29.2 28.1
5 27.8 27.9
6 26.9 25.8
7 29.7 28.2
8 28.9 27.6
9 27.3 26.5
10 29.9 28.3
Solution

We are testing to find any differences in the median stopping distance figures for each braking system. The null and alternative hypotheses are:

H 0 : θ 1 = θ 2 or H 0 : θ differences = 0

H 1 : θ 1 θ 2 or H 1 : θ differences 0

We perform a two-tailed test.

The signed differences shown by the two systems are shown in the table below:

Anti-lock
Braking System
Car 1 2 Sign
1 27.7 26.3 +
2 32.1 31.0 +
3 29.6 28.1 +
4 29.2 28.1 +
5 27.8 27.9
6 26.9 25.8 +
7 29.7 28.2 +
8 28.9 27.6 +
9 27.3 26.5 +
10 29.9 28.3 +

We have 9 plus signs and the required probability value is calculated directly from the binomial formula as

P ( X 9 ) = r = 9 10 10 r 1 2 10 r 1 2 r = 10 1 1 2 10 + 1 2 10 = 11 × 1 2 10 0.011

Since we are performing a two-tailed test, we must compare the calculated value with the value 0.025. Since 0.011 < 0.025 we reject the null hypothesis on the basis of the available evidence and conclude the the differences in the median stopping distances recorded is significant at the 5% level.

3.4 General comments about the sign test

  1. Before the sign test can be applied we must be sure that the underlying distribution is continuous. Usually, the second score being higher than the first score counts as a plus sign. The null hypothesis H 0 is that the probability of obtaining each sign is the same, that is p = 1 2 . The alternative hypothesis H 1 may be that p 1 2 which gives a two-tailed test or p > 1 2 or p < 1 2 each of which gives a one-tailed test.
  2. If H 0 is correct, the test involves the B ( n , 0.5 ) distribution which, if n is “large” and the conditions for the normal approximation hold, can be approximated by the N n × 1 2 , n × 1 2 × 1 2 distribution. This approximation can save much tedious arithmetic and time.
  3. The sign test may not be as reliable as an equivalent parametric test since it relies only on the sign of the difference of each pair and not on the size of the difference. If it is possible it is suggested that an equivalent parametric test is used.
  4. If the underlying distribution is normal, either the sign test or the t -test may be used to test the null hypothesis H 0 : θ = θ 0 against the usual alternative, but the t -test will not give valid results when the data are non-normal. It can be shown that the t -test produces a smaller Type II error probability for one-sided tests and also for two-sided tests where the critical regions are symmetric. Hence we may claim that the t -test is superior to the sign test when the underlying distribution is normal.