2.4.1 k-Means
k-Means clustering is a partitioning method. The function partitions the data set of
N items into k disjoint subsets S j that contain Nj items so that they are as close
to each other as possible according a given distance measure. Each cluster in the
partition is defined by its Nj members and by its centroid λ j. The centroid for each
cluster is the point to which the sum of distances from all items in that cluster is
minimized. Thus, we can define the k-means algorithm as an iterative process to
minimize E = ∑1 ∑n∈Sj d(xn, λ j), where xn is a vector representing the n-th item,
λ j is the centroid of the item in S j and d is the distance measure. The k-means
algorithm moves items between clusters until E cannot be decreased further.