The performance evaluation of clustering results is commonly
carried out using Normalized Mutual Information [28],
Rand Index [29] or Dunn Index [30].
The Normalized Mutual Information (NMI) quantifies the
extent of predicted cluster labels (PCL) with respect to the desired
cluster labels (DCL) (i.e. ground truth). NMI is computed
as the mutual information between PCL and DCL, normalized
by the sum of the respective entropies.
The Rand Index (RI) measures how accurately a cluster is
generated by computing the percentage of correctly labeled
features. The accuracy score is given by the sum of true
positive and true negative PCLs (normalized by the sum of
all true and false PCLs) averaged over the whole PCL set.
True positives are the number of correct PCLs with respect
to a ground truth, whereas true negatives are the number of
features correctly classified as outliers. The value of NMI and
RI scores lies in the interval [0; 1]. The larger the NMI or RI
score, the better the clustering. However, it is not possible to
use ground-truth based evaluation measures such as NMI or
RI to quantify the clustering performance on-line