The wide use of nearest neighbor and hierarchical methods (e.g., decision
tree and agglomerative hierarchical clustering) in both
classification and clustering would be able to support this
viewpoint. In addition, both classification and clustering
have to attack similar issues (e.g., feature selection, scalability,
and missing value), and many solutions to the issues
can be used in both tasks without too much modification.
For example, when computing the goodness of an attribute
for classification or clustering, the difference is that the former
usually only considers class information while the latter
will take all attribute information into account. As a demonstration,
Fig. 3 of section 4 shows that a classification tree
(or decision tree) using the class information could be the
same as a clustering tree (or monothetic divisive tree, in numerical
taxonomy literature) without class information.