We present three k-means clustering algorithms: the
Forgy/Lloyd algorithm, the MacQueen algorithm and the
Hartigan & Wong algorithm. We chose those three
algorithms because they are the most widely used k-means
clustering techniques and they all have slightly different
goals and thus results. To be able to use any of the three, you
first need to know how many clusters are present in your
data. As this information is often unavailable, multiple trials
will be necessary to find the best amount of clusters. As a
starting point, it is often useful to standardize the data if the
components of the cases are not in the same scale.
There is no absolute best algorithm. The choice of the
optimal algorithm depends on the characteristics of the
dataset (size, number of variables in the cases). Jain, Duin &
Mao (2000) even suggest trying several different clustering
algorithms to gain the best understanding possible about the
dataset.
We present three k-means clustering algorithms: theForgy/Lloyd algorithm, the MacQueen algorithm and theHartigan & Wong algorithm. We chose those threealgorithms because they are the most widely used k-meansclustering techniques and they all have slightly differentgoals and thus results. To be able to use any of the three, youfirst need to know how many clusters are present in yourdata. As this information is often unavailable, multiple trialswill be necessary to find the best amount of clusters. As astarting point, it is often useful to standardize the data if thecomponents of the cases are not in the same scale.There is no absolute best algorithm. The choice of theoptimal algorithm depends on the characteristics of thedataset (size, number of variables in the cases). Jain, Duin &Mao (2000) even suggest trying several different clusteringalgorithms to gain the best understanding possible about thedataset.
การแปล กรุณารอสักครู่..
