How can we evaluate our performance? Precision and all that stuff does not work,
since we have no target classes to compare with. To evaluate, we need to know the
"real" clusters, whatever that means. We can suppose, for our example, that each
cluster includes every drawing of a certain number, and only that number. Knowing
this, we can compute the adjusted Rand index between our cluster assignment and
the expected one. The Rand index is a similar measure for accuracy, but it takes into
account the fact that classes can have different names in both assignments. That is, if
we change class names, the index does not change. The adjusted index tries to deduct
from the result coincidences that have occurred by chance. When you have the exact
same clusters in both sets, the Rand index equals one, while it equals zero when
there are no clusters sharing a data point.