Practical use of correlation coefficient
Simple application of the correlation coefficient can be
exemplified using data from a sample of 780 women
attending their first antenatal clinic (ANC) visits. We can
expect a positive linear relationship between maternal age in
years and parity because parity cannot decrease with age, but
we cannot predict the strength of this relationship. The task
is one of quantifying the strength of the association. That
is, we are interested in the strength of relationship between
the two variables rather than direction since direction is
obvious in this case. Maternal age is continuous and usually
skewed while parity is ordinal and skewed. With these scales
of measurement for the data, the appropriate correlation
coefficient to use is Spearman’s.The Spearman’s coefficient
is 0.84 for this data. In this case, maternal age is strongly
correlated with parity, i.e. has a high positive correlation
(Table1). The Pearson’s correlation coefficient for these
variables is 0.80. In this case the two correlation coefficients
are similar and lead to the same conclusion, however in
some cases the two may be very different leading to different
statistical conclusions. For example, in the same group of
women the spearman’s correlation between haemoglobin
level and parity is 0.3 while the Pearson’s correlation is
0.2. In this case the two coefficients may lead to different
statistical inference. For example, a correlation coefficient
of 0.2 is considered to be negligible correlation while a
correlation coefficient of 0.3 is considered as low positive
correlation (Table 1), so it would be important to use the
most appropriate one. The most appropriate coefficient in
this case is the Spearman’s because parity is skewed.
In another dataset of 251 adult women, age and weight
were log-transformed. The reason for transforming was to
make the variables normally distributed so that we can use
Pearson’s correlation coefficient. Then we analysed the data
for a linear association between log of age (agelog) and log
of weight (wlog). Both variables are approximately normally
distributed on the log scale. In this case Pearson’s correlation
coefficient is more appropriate. The coefficient is 0.184.This
shows that there is negligible correlation between the age
and weight on the log scale (Table 1).