CHAPTER 3. DEFINING DIVERSITY 49
resulting networks differed in the number of cycles in which they took to converge upon
a solution, and in whether they converged at all. However, the trained neural networks
were not found to be statistically independent in their generalisation performance, i.e. they
displayed very similar patterns of generalisation despite having been derived from different
initial weight vectors. Thus, varying the initial weights of neural networks, although important when using a deterministic training method such as backpropagation, seems not to
be an effective stand-alone method for generating error diversity in an ensemble of neural
networks.
These observations are supported by a number of other studies. Partridge [106, 155]
conducted several experiments on large (> 150, 000 patterns) synthetic data sets, and concludes that after network type, training set structure, and number of hidden units, the
random initialization of weights is the least effective method for generating diversity. Parmanto, Munro and Doyle [12] used one synthetic dataset and two medical diagnosis datasets
to compare 10-fold cross-validation, Bagging, and random weight initializations; again the
random weights method comes in last place.
We have now discussed implicit diversity methods for manipulating the starting point
in hypothesis space. We will next discuss an explicit method for this, where randomisation
of weights does not occur.
Maclin and Shavlik [89] present an approach to initializing neural network weights that
uses competitive learning to create networks that are initialised far from the origin of weight
space, thereby potentially increasing the set of reachable local minima; they show significantly improved performance over the standard method of initialization on two real world
datasets.
A technique relevant to this discussion, Fast Committee Learning [131] trains a single
neural network, taking M snapshots of the state of its weights at a number of instances
during the training. The M snapshots are then used as M different ensemble members.
Although the performance was not as good as when using separately trained nets, this offers
the advantage of reduced training time as it is only necessary to train one network.
CHAPTER 3. DEFINING DIVERSITY 49resulting networks differed in the number of cycles in which they took to converge upona solution, and in whether they converged at all. However, the trained neural networkswere not found to be statistically independent in their generalisation performance, i.e. theydisplayed very similar patterns of generalisation despite having been derived from differentinitial weight vectors. Thus, varying the initial weights of neural networks, although important when using a deterministic training method such as backpropagation, seems not tobe an effective stand-alone method for generating error diversity in an ensemble of neuralnetworks.These observations are supported by a number of other studies. Partridge [106, 155]conducted several experiments on large (> 150, 000 patterns) synthetic data sets, and concludes that after network type, training set structure, and number of hidden units, therandom initialization of weights is the least effective method for generating diversity. Parmanto, Munro and Doyle [12] used one synthetic dataset and two medical diagnosis datasetsto compare 10-fold cross-validation, Bagging, and random weight initializations; again therandom weights method comes in last place.We have now discussed implicit diversity methods for manipulating the starting pointin hypothesis space. We will next discuss an explicit method for this, where randomisationof weights does not occur.
Maclin and Shavlik [89] present an approach to initializing neural network weights that
uses competitive learning to create networks that are initialised far from the origin of weight
space, thereby potentially increasing the set of reachable local minima; they show significantly improved performance over the standard method of initialization on two real world
datasets.
A technique relevant to this discussion, Fast Committee Learning [131] trains a single
neural network, taking M snapshots of the state of its weights at a number of instances
during the training. The M snapshots are then used as M different ensemble members.
Although the performance was not as good as when using separately trained nets, this offers
the advantage of reduced training time as it is only necessary to train one network.
การแปล กรุณารอสักครู่..
