Thus far, we have largely ignored how to choose K. In hierarchical and K-means
clustering,K represents the number of clusters. In K nearest neighbors smoothing,
K represents the number of nearest neighbors used. Although these two
things are fundamentally different, it turns out that both are equally challenging
to set properly in a fully automated way. The problem of choosing K is one of the
most challenging issues involved with clustering, since there is really no good solution.
No magical formula exists that will predict the optimal number of clusters
to use in every possible situation. Instead, the best choice of K largely depends
on the task and data set being considered. Therefore, K is most often chosen experimentally.
In some cases, the application will dictate the number of clusters to use. This,
however, is rare. Most of the time, the application offers no clues as to the best
choice of K. In fact, even the range of values for K to try might not be obvious.
Should 2 clusters be used? 10? 100? 1,000? There is no better way of getting an
understanding of the best setting for K than running experiments that evaluate
the quality of the resulting clusters for various values of K.