The simplest network constructed from FDA solution gives classifi-
cation error which is as good as the original FDA. For such datasets
[12] as Wisconsin breast cancer, hepatitis, Cleveland heart disease or
diabetes the network obtains better results already before the learning
process starts, but for some datasets this is not the best approach
since separation of a single class from all others may be difficult.
Suppose that vectors from class C1 are not separated well by FDA
procedure from all other vectors. In such a case separation from individual
classes may still work (since weights are computed from
means of classes), or classes should be broken into several subclasses
(clusters) before applying FDA. The weights depend directly
on the selection of the vectors used to compute X¯2 mean, giving us a
lot of flexibility in their selection. This leads to a more sophisticated
construction of the network, with several hidden neurons per class
and one output neuron per class connected to those hidden layer
units that discriminate this class from all others. The hidden-output
layer weights are all equal to 1.0 and the bias is determined by selecting
the smallest activation of the output unit after presentation of
all vectors from a given class.