5. Discussion
As was discussed above, previous work has not been done comparing these
indices using Monte Carlo methods, which has left a gap in the literature in terms
of assessing their performance under a variety of conditions. Given the results
described herein, it appears that under the conditions present in this study, the
four measures of distance perform very much the same in terms of correctly
classifying simulated observations into two clusters based on a set of dichotomous
variables. In turn, the use of the raw data is associated with somewhat lower
accuracy unless the sample size is large or the variation of the underlying latent
trait is low. The similarity in performance of the four distance measures appears
to hold true regardless of the level of group separation, the variation in the
underlying latent variable, the number of variables included in the study and the
size of the sample. Across all measures, the clustering solutions are more accurate
for greater group separation, lower variance in the latent trait, more variables
and a larger sample size. With respect to the real data, the five methods did not
have perfect agreement in terms of clustering observations. While they all found
three clusters, the use of the Jaccard, Russell/Rao and Dice coefficients resulted
in very similar solutions, with clusters that were clearly differentiated based on
the achievement measures. In contrast, the matching and raw data approaches
yielded somewhat different results with less well defined groups.
The relative dearth in previous research of this type leaves little in the way
for comparison of these results with comparable ones. However, there has been a
small amount of discussion regarding the expected performance of these indices,
given their conceptual bases. Hall (1969) made the point that the Dice and