The purpose of this appendix is to show that when distance
is defined only on class, within-cluster average distance
is equivalent to the Gini index [10].
PROOF. We consider only the two-class situation4
. The
class labels are denoted 0 and 1. Given a split into left and
right subsets, let p
0
L+p
1
L = 1, p0
R+p
1
R = 1 be the probabilities
of label 0 and 1 on the left and on the right, respectively.
Denoting with pL + pR = 1 the probabilities of the left and
the right subset given the parent data set. Then Gini index
takes this form: