CHAPTER 3. DEFINING DIVERSITY 48
architecture of the networks, and the training algorithm used. This means providing each ensemble member with a different training set, or a different architecture, and so on. Though
at first this seems a sensible way to group the literature, we found it difficult to group all
ensemble techniques under these umbrellas. Specifically we could not see where a regularisation technique [116, 85] would fit. If we regularise the error function, we are changing
none of Sharkey’s four factors. Instead we came to the following categories upon which we
believe the majority of neural network ensemble techniques can be placed.
Starting point in Hypothesis Space Methods under this branch vary the starting points
within the search space, thereby influencing where in hypothesis space we converge
to.
Set of Accessible Hypotheses These methods vary the set of hypotheses that are accessible by the ensemble. Given that certain hypotheses may be made accessible or inaccessible with a particular training subset and network architecture, these techniques
either vary training data used, or the architecture employed, for different ensemble
members.
Traversal of Hypothesis Space These alter the way we traverse the search space, thereby
leading different networks to converge to different hypotheses.
3.2.1 Starting Point in Hypothesis Space
Starting each network with differing random initial weights will increase the probability of
continuing in a different trajectory to other networks. This is perhaps the most common
way of generating an ensemble, but is now generally accepted as the least effective method
of achieving good diversity; many authors use this as a default benchmark for their own
methods [101]. We will first discuss implicit instances of this axis, where weights are generated randomly, and then discuss explicit diversity for this, where networks are directly
placed in different parts of the hypothesis space.
Sharkey, Neary and Sharkey [127] investigated the relationship between initialisation of
the output weight vectors and final backpropagation solution types. They systematically
varied the initial output weight vectors of neural networks throughout a circle of radius 10
and then trained them using the fuzzy XOR task with a fixed set of training data. The
CHAPTER 3. DEFINING DIVERSITY 48architecture of the networks, and the training algorithm used. This means providing each ensemble member with a different training set, or a different architecture, and so on. Thoughat first this seems a sensible way to group the literature, we found it difficult to group allensemble techniques under these umbrellas. Specifically we could not see where a regularisation technique [116, 85] would fit. If we regularise the error function, we are changingnone of Sharkey’s four factors. Instead we came to the following categories upon which webelieve the majority of neural network ensemble techniques can be placed.Starting point in Hypothesis Space Methods under this branch vary the starting pointswithin the search space, thereby influencing where in hypothesis space we convergeto.Set of Accessible Hypotheses These methods vary the set of hypotheses that are accessible by the ensemble. Given that certain hypotheses may be made accessible or inaccessible with a particular training subset and network architecture, these techniqueseither vary training data used, or the architecture employed, for different ensemblemembers.Traversal of Hypothesis Space These alter the way we traverse the search space, therebyleading different networks to converge to different hypotheses.3.2.1 Starting Point in Hypothesis SpaceStarting each network with differing random initial weights will increase the probability ofcontinuing in a different trajectory to other networks. This is perhaps the most commonway of generating an ensemble, but is now generally accepted as the least effective methodof achieving good diversity; many authors use this as a default benchmark for their ownmethods [101]. We will first discuss implicit instances of this axis, where weights are generated randomly, and then discuss explicit diversity for this, where networks are directlyplaced in different parts of the hypothesis space.Sharkey, Neary and Sharkey [127] investigated the relationship between initialisation ofthe output weight vectors and final backpropagation solution types. They systematicallyvaried the initial output weight vectors of neural networks throughout a circle of radius 10and then trained them using the fuzzy XOR task with a fixed set of training data. The
การแปล กรุณารอสักครู่..
