Figure 3 shows similar power functions for exponential distributions,
where the Wilcoxon rank-sum test, the Wilcoxon signed-ranks test, and the
modified t test on ranks were substituted for the parametric t tests.
Comparison of the three power functions reveals that the outcome is almost
the same as in the case of the corresponding parametric tests applied to
normally distributed data. The Wilcoxon signed-ranks test was superior to
the Wilcoxon rank-sum test for paired data, while the modified t test on
ranks was slightly superior to both.
Apparently the modified t test corrected for the correlation resulting
from pairing, while at the same time the transformation to ranks
counteracted non-normality. Figures 4, 5, and 6 indicate similar outcomes
for lognormal, chi-square, half-normal, and uniform distributions, using
several sample sizes, population correlations, and significance levels. Note
that the power functions for the smaller sample sizes were more widely
separated, while convergence is evident for the larger sample sizes.
Table 8 compares Type I error probabilities when a sample
correlation is entered into equation (2) for each sample taken and when a
fixed population correlation is entered the equation for every sample. The
first section of the table, for the normal distribution, is the result of the t test
performed on scores. The remaining three sections, for non-normal
distributions, show the result of the t test on rank-transformed data. For
relatively small correlations and relatively large sample sizes, the Type I
error probabilities for both tests were about the same and close to the
nominal significance level.
SOME PRACTICAL IMPLICATIONS
For samples of size 25 or 50 from normal distributions, the modified t
test with a correction for correlation maintained Type I error rates close to
the significance level, increased power in the case of positive correlations,
and removed spurious increases in the probability of rejecting H0 in the case
of negative correlations. The power superiority of this test over the pairedsamples
t test is about what one would expect from the difference in degrees
of freedom. The difference became less marked as sample sizes increased to
100 and 400, presumably because the difference in the critical values of the
t statistic for N – 1 and 2N – 2 degrees of freedom decreases as N increases.
Nevertheless, the power of the modified test was equal to that of the pairedsamples
test for the larger sample sizes.
For small N’s of 8, 10, and 15, the same differences in power
functions were evident, but the interpretation of these differences is
problematic, because the Type I error rates of the modified t test were
somewhat higher than the nominal significance levels. Generally the Type I
error rate was about .060 for the .05 significance level and about .014 for
the .01 significance level. Possibly these disparities resulted from variability
of the sample correlation coefficient for small N.
The elevation, rather than a depression of the probability of rejecting
H0 can be explained by the left-skewness of the distribution of the sample
correlation coefficient for positive values of the population correlation. For
those positive values of ρ, proportionately more high values of the sample r
appeared in the denominator of equation (2), resulting in an inflated t
statistic. However, as sample size increased, the distribution of the sample r
became more nearly symmetrical, and the inflation was not as large.
The skewness is evident in Figure 7, which shows distributions of
the sample correlation coefficient under the conditions represented in Figure
1 and Table 2, when the sample sizes were 25 and 100 and the population
correlation was .50 and .75. The sample correlations were substantially
left-skewed for the smaller sample size and became more symmetrical and
less variable when the sample size increased. For N = 25, there was
considerable overlap of the two distributions of sample values for
population correlations of .50 and .75, and for N = 100, the distributions
were more widely separated.
Figure 8 plots relative frequency distributions of the values of the t
statistic. All four graphs are for a normal distribution with N = 25. The first
distribution, at the top, shows the independent-samples t statistics when
ρ = 0. The second distribution shows a decrease in the variance of that
distribution when ρ = .50. The remaining distributions are for the two
methods of correcting for correlation based on r and ρ. The two
distributions of the corrected statistics have nearly the same variance, and
both restore the distributions close to their shape of the one in the graph at
the top. Means and values of the distributions of the t statistics, the two
corrections, and the paired-samples t statistic are shown in Table 9 for
various sample sizes and population correlations.
For non-normal distributions, the results were similar. The Wilcoxon
signed-ranks test is related to the Wilcoxon-Mann-Whitney rank sum test in
the same way as the paired-samples t test is related to the independentsamples
t test. However, there is no version of the Wilcoxon-Mann-
Whitney test involving correlation coefficients corresponding to the z test
for correlated samples. Since the Student t test with a rank transformation
and the Wilcoxon-Mann-Whitney test are equivalent, the modified t test on
ranks appears suitable in the case of paired data. This test preserved Type I
error rates and increased power for the larger sample sizes. Again, there was
an elevation of the probabilities of rejecting H0 above the nominal
significance level for the smaller sample sizes.
The modified t test on ranks performed about the same as both the
paired-samples t test and the Wilcoxon signed-ranks test for small and
moderate sample sizes, when the population correlation was used in the
correction formula. However, in the case of small sample sizes, Type I error
rates of the modified test were altered when sample correlations were used.
For large sample sizes – 100 or more – all the tests performed about the
same.
One might question, therefore, whether the advantage of acquiring
more degrees of freedom is enough to outweigh the disadvantage of
inflation of the Type I error rate for small sample sizes. Perhaps in some
special circumstances the modified test could be advantageous. First, under
some conditions, the population correlation coefficient between two paired
groups may be known in advance. In a before-after experimental design,
theory or previous research may have established the correlation between
the pairs. In that case, the known value of ρ can be substituted into equation
(2), and the variability of the sample r would be obviated, as suggested by
the results in Table 8. For small sample sizes, the increase in power could
be substantial. Although these special circumstances are unlikely in
practical research, the modified t test can be a useful alternative to have
available. Second, in the case of some non-normal data, an assumption of
the Wilcoxon signed-ranks test, symmetry of the difference scores, may not
be satisfied. In that case it is reasonable to employ the modified t test on
ranks, which appears to be effective.
More recently, many additional statistical tests have been developed
that are more accurate and more powerful than the traditional parametric
and nonparametric methods listed in Table 1 (see, for example, Huber,
1996; Wilcox, 2003). The estimation of correlation has also improved in
recent years (see, for example, Rousseeuw & Leroy, 1987; Wilcox &
Muska, 2002; Zimmerman, Zumbo, & Williams, 2003). The modified t test
of the present study is not a substitute for the best current statistical tests
available, but is provided because of its theoretical interest and because it
fills gaps in the classification of two-sample tests of location. Under
conditions where limited computing resources are available, the correction
for correlation could be useful as a practical method.