K-means is probably the most celebrated and widely used clustering technique;
hence it is the best representative of the class of iterative centroid-based divisive
algorithms. On the other hand, PDDP is a recently proposed technique ([4-7]). It is
representative of the non-iterative techniques based upon the Singular Value
Decomposition (SVD) of a matrix built from the data-set.
The objective of this paper is twofold:
• compare the clustering performance of bisecting K-means and PDDP;
• analyze the dynamic behavior of the K-means iterative algorithm.
In the existing literature, both these issues have been considered only empirically.
The performance of PDDP and K-means have been recently studied, and have been
reported to be somehow similar, on the basis of a few application examples ([4-7]). As
for K-means behavior, the main theoretical result known so far is [16], where it is shown
that the K-means iterative procedure is guaranteed to converge; however, nothing is said
about “where” and “how” it converges.
The main contribution of this work is to provide a simple mathematical
explanation of some features of K-means and PDDP. This is done under the restrictive
assumption that the data are uniformly distributed within a 2-dimensional ellipsoid. The
main results here obtained can be summarized as follows:
• when the number of data-points tends to infinity, K-means and PDDP converge to
the same solution;
• when the number of data-points tends to infinity, the iterative bisecting K-means
algorithm is characterized by 2 stationary-points: one is an unstable equilibrium, one
is a stable equilibrium point;
The paper is organized as follows: in Section 2 K-means and PDDP are concisely
recalled and discussed; in Section 3 they are analyzed when the number of data-points
tends to infinity, whereas in Section 4 an empirical analy