of a study in terms of the sample size and the number of variables is fairly small
in terms of clustering accuracy.
Finally, it seems clear that when group separation on the latent trait is relatively
low, the Jaccard, Dice and Russell/Rao measures work similarly, and
better, than the Matching coefficient or raw data. Indeed, even in the case of low
group separation, these three measures are able to correctly cluster over 60% of
the subjects.
As with any research, there are weaknesses in this study which should be
taken into account as the results are interpreted. First of all, only one clustering
algorithm, Ward’s, was used. In order to expand upon these results, the data
could be replicated and other clustering algorithms used. In addition, the distance
measures selected for inclusion in this study are of a particular class, albeit
one identified by several authors as among the most useful for clustering with dichotomous
data. Therefore, it would be worthwhile to compare these approaches
to other measures of distance that are calculated fundamentally differently, such
as Holley and Guilford’s G index. Finally, it might be worthwhile to expand
the parameters used in the 2PL model that simulated the data. For example, a
larger difference between the two levels of latent trait variance could be used, or a
sample size of less than 240, so that a true lower bound for adequate performance
for this variable could be identified.