3 The sign test for paired data
Very often, experiments are designed so that the results occur in matched pairs. In these cases the sign test can often be applied to decide between two hypotheses concerning the data. Performing a sign test involves counting the number of times when, say, the first score is higher then the second designated by a “ ” sign and the number of times that the first score is lower than the second designated by a “ ” sign.
3.1 Ties
It is, of course, possible that in some cases, the scores will be equal, that is, they are said to be tied .
There are two ways in which tied scores are dealt with.
3.2 Method 1
Ties may be counted as minus signs so that they count for the null hypothesis. The logic of this is that equal scores cannot be used as agents for change.
3.3 Method 2
Ties may be discounted completely and not used in any analysis performed. The logic of this is that ties can sometimes occur because of the way in which the data are collected. Throughout this Workbook, any ties occurring will be discounted and ignored in any subsequent analysis.
Essentially, we take paired observations, say , from a continuous population and proceed as illustrated below.
Example 3
In an experiment concerning gas cutting of steel for use in off-shore structures, 48 test plates were prepared. Each plate was cut using both oxy-propane cutting and oxy-natural gas cutting and, in each case, the maximum Vickers hardness near the cut edge was measured. The results were as follows.
Plate | Propane | Nat. gas | Plate | Propane | Nat. gas | Plate | Propane | Nat. gas |
1 | 291 | 296 | 17 | 295 | 272 | 33 | 325 | 313 |
2 | 315 | 281 | 18 | 327 | 300 | 34 | 312 | 323 |
3 | 318 | 310 | 19 | 329 | 309 | 35 | 318 | 317 |
4 | 319 | 312 | 20 | 319 | 291 | 36 | 314 | 317 |
5 | 312 | 320 | 21 | 327 | 317 | 37 | 324 | 334 |
6 | 296 | 297 | 22 | 317 | 279 | 38 | 319 | 293 |
7 | 331 | 319 | 23 | 289 | 282 | 39 | 305 | 294 |
8 | 316 | 290 | 24 | 321 | 301 | 40 | 305 | 332 |
9 | 321 | 301 | 25 | 299 | 259 | 41 | 306 | 330 |
10 | 283 | 259 | 26 | 325 | 302 | 42 | 303 | 296 |
11 | 316 | 327 | 27 | 307 | 337 | 43 | 321 | 311 |
12 | 342 | 306 | 28 | 291 | 320 | 44 | 328 | 338 |
13 | 302 | 259 | 29 | 312 | 300 | 45 | 302 | 292 |
14 | 312 | 314 | 30 | 335 | 330 | 46 | 324 | 278 |
15 | 293 | 268 | 31 | 319 | 307 | 47 | 327 | 352 |
16 | 346 | 300 | 32 | 310 | 307 | 48 | 329 | 295 |
Use a sign test to test the null hypothesis that the mean difference between the hardnesses produced by the two methods is zero against the alternative that it is not zero. Use the 1% level of significance.
Solution
We are testing to see whether there is evidence that the media difference between the hardnesses produced by the two methods is zero. The null and alternative hypotheses are:
We perform a two-tailed test. The signs of the differences (propane minus natural gas) are shown in the table below.
Plate | Prop. | N.gas | Plate | Prop | N.gas | Plate | Prop | N.gas | |||
1 | 291 | 296 | 17 | 295 | 272 | 33 | 325 | 313 | |||
2 | 315 | 281 | 18 | 327 | 300 | 34 | 312 | 323 | |||
3 | 318 | 310 | 19 | 329 | 309 | 35 | 318 | 317 | |||
4 | 319 | 312 | 20 | 319 | 291 | 36 | 314 | 317 | |||
5 | 312 | 320 | 21 | 327 | 317 | 37 | 324 | 334 | |||
6 | 296 | 297 | 22 | 317 | 279 | 38 | 319 | 293 | |||
7 | 331 | 319 | 23 | 289 | 282 | 39 | 305 | 294 | |||
8 | 316 | 290 | 24 | 321 | 301 | 40 | 305 | 332 | |||
9 | 321 | 301 | 25 | 299 | 259 | 41 | 306 | 330 | |||
10 | 283 | 259 | 26 | 325 | 302 | 42 | 303 | 296 | |||
11 | 316 | 327 | 27 | 307 | 337 | 43 | 321 | 311 | |||
12 | 342 | 306 | 28 | 291 | 320 | 44 | 328 | 338 | |||
13 | 302 | 259 | 29 | 312 | 300 | 45 | 302 | 292 | |||
14 | 312 | 314 | 30 | 335 | 330 | 46 | 324 | 278 | |||
15 | 293 | 268 | 31 | 319 | 307 | 47 | 327 | 352 | |||
16 | 346 | 300 | 32 | 310 | 307 | 48 | 329 | 295 | |||
There are 34 positive differences and 14 negative differences.The probability of getting 14 or fewer negative differences, if the probability that a difference is negative is is
We can find this value approximately by using the normal approximation. The required mean and variance are and repectively. So we calculate the probability that a normal random variable with mean 24 and variance 12 is less than
For a two-sided test at the 1% level we must compare this probability with 0.5%, that is 0.005. We see that, even using the larger approximate value, our probability is less than 0.005 so our test statistic is significant at the 1% level. We therefore reject the null hypothesis and conclude that the evidence suggests strongly that the median of the differences is not zero but is, in fact, positive. Use of propane tends to result in greater hardness.
Example 4
Automotive development engineers are testing the properties of two anti-lock braking systems in order to determine whether they exhibit any significant difference in the stopping distance achieved by different cars.
The systems are fitted to 10 cars and a test is run ensuring that each system is used on each car under conditions which are as uniform as possible.
The stopping distances (in yards) obtained are given in the table below.
Anti-lock
Braking System |
||
Car | 1 | 2 |
1 | 27.7 | 26.3 |
2 | 32.1 | 31.0 |
3 | 29.6 | 28.1 |
4 | 29.2 | 28.1 |
5 | 27.8 | 27.9 |
6 | 26.9 | 25.8 |
7 | 29.7 | 28.2 |
8 | 28.9 | 27.6 |
9 | 27.3 | 26.5 |
10 | 29.9 | 28.3 |
Solution
We are testing to find any differences in the median stopping distance figures for each braking system. The null and alternative hypotheses are:
We perform a two-tailed test.
The signed differences shown by the two systems are shown in the table below:
Anti-lock
Braking System |
|||
Car | 1 | 2 | Sign |
1 | 27.7 | 26.3 | + |
2 | 32.1 | 31.0 | + |
3 | 29.6 | 28.1 | + |
4 | 29.2 | 28.1 | + |
5 | 27.8 | 27.9 | |
6 | 26.9 | 25.8 | + |
7 | 29.7 | 28.2 | + |
8 | 28.9 | 27.6 | + |
9 | 27.3 | 26.5 | + |
10 | 29.9 | 28.3 | + |
We have 9 plus signs and the required probability value is calculated directly from the binomial formula as
Since we are performing a two-tailed test, we must compare the calculated value with the value 0.025. Since we reject the null hypothesis on the basis of the available evidence and conclude the the differences in the median stopping distances recorded is significant at the 5% level.
3.4 General comments about the sign test
- Before the sign test can be applied we must be sure that the underlying distribution is continuous. Usually, the second score being higher than the first score counts as a plus sign. The null hypothesis is that the probability of obtaining each sign is the same, that is . The alternative hypothesis may be that which gives a two-tailed test or or each of which gives a one-tailed test.
- If is correct, the test involves the distribution which, if is “large” and the conditions for the normal approximation hold, can be approximated by the distribution. This approximation can save much tedious arithmetic and time.
- The sign test may not be as reliable as an equivalent parametric test since it relies only on the sign of the difference of each pair and not on the size of the difference. If it is possible it is suggested that an equivalent parametric test is used.
- If the underlying distribution is normal, either the sign test or the -test may be used to test the null hypothesis against the usual alternative, but the -test will not give valid results when the data are non-normal. It can be shown that the -test produces a smaller Type II error probability for one-sided tests and also for two-sided tests where the critical regions are symmetric. Hence we may claim that the -test is superior to the sign test when the underlying distribution is normal.