Many clustering algorithms try to minimize a function that measures the quality
of the clustering. Such a quality function is often referred to as the objective function, so clustering can be viewed as an optimization problem: the ideal clustering
algorithm would consider all possible partitions of the data and output the partitioning that minimizes the quality function. But the corresponding optimization problem
is NP hard, so many algorithms resort to heuristics (e.g., in the k-means algorithm
using only local optimization procedures potentially ending in local minima). The
main point is that clustering is a difficult problem for which finding optimal solutions is often not possible. For that same reason, selection of the particular clustering
algorithm and its parameters (e.g., similarity measure) depend on many factors, including the characteristics of the data. In the following paragraphs we describe the
k-means clustering algorithm and some of its alternatives.