(8)
where C is the number of classes, N is defined as in (4), and σa is the standard deviation of the numeric values
of attribute a. Note that in practice the square root in (7) is not performed since the squared attribute
distances are needed in (5) to compute H. Similarly, the square root in (5) is not typically performed in
computing H, since the squared distance (H2 instead of D2 in this case) is used in (1) to compute g, the
activation of a hidden node.