CHAPTER 3. DEFINING DIVERSITY 50
3.2.2 Set of Accessible Hypotheses
It can be argued that there are two ways to manipulate the set of hypotheses accessible to a
network: firstly to alter the training data it receives, and secondly to alter the architecture
of the network itself. We will now discuss these and how they have been used to create
error diversity.
Manipulation of Training Data
Several methods attempt to produce diverse or complementary networks by supplying each
network with a slightly different training set. This is probably the most widely investigated
method of ensemble training. Regard figure 3.4; the frontmost bold square represents the
training set for our ensemble. Different ensemble members can be given different parts of
this set, so they will hopefully learn different things about the same task. Some methods
will divide it by training pattern, supplying each member with all the K features, but a
different subset of the rows (patterns). Other methods will divide it by feature, supplying
each member with all the N patterns in the set, but each consists of a different subset of
the columns (features). Both of these are termed resampling methods, and could provide
overlapping or non-overlapping subsets of the rows or columns (or both) to different learners.
Another alternative would be to pre-process the features in some way to get a different
representation, for example using a log-scaling of the features. This can be viewed in our
diagram as using a different plane, moving in the space of all possible features. The data
techniques which transform features are termed distortion methods [126].
Duin and Tax [35] find that combining the results of one type of classifier on different
feature sets is far more effective than combining the results of different classifiers on one
feature set. They conclude that the combination of independent information from the
different feature sets is more useful than the different approaches of the classifiers on the
same data.
The most well-known resampling method is probably k-fold cross-validation. By dividing
the dataset randomly into k disjoint pattern subsets, new overlapping training sets can be
created for each ensemble member, by leaving out one of these k subsets and training on
the remainder. The Bagging algorithm, is another example, randomly selecting N patterns
with replacement from the original set of N patterns.
CHAPTER 3. DEFINING DIVERSITY 503.2.2 Set of Accessible HypothesesIt can be argued that there are two ways to manipulate the set of hypotheses accessible to anetwork: firstly to alter the training data it receives, and secondly to alter the architectureof the network itself. We will now discuss these and how they have been used to createerror diversity.Manipulation of Training DataSeveral methods attempt to produce diverse or complementary networks by supplying eachnetwork with a slightly different training set. This is probably the most widely investigatedmethod of ensemble training. Regard figure 3.4; the frontmost bold square represents thetraining set for our ensemble. Different ensemble members can be given different parts ofthis set, so they will hopefully learn different things about the same task. Some methodswill divide it by training pattern, supplying each member with all the K features, but adifferent subset of the rows (patterns). Other methods will divide it by feature, supplyingeach member with all the N patterns in the set, but each consists of a different subset ofthe columns (features). Both of these are termed resampling methods, and could provideoverlapping or non-overlapping subsets of the rows or columns (or both) to different learners.Another alternative would be to pre-process the features in some way to get a differentrepresentation, for example using a log-scaling of the features. This can be viewed in ourdiagram as using a different plane, moving in the space of all possible features. The data
techniques which transform features are termed distortion methods [126].
Duin and Tax [35] find that combining the results of one type of classifier on different
feature sets is far more effective than combining the results of different classifiers on one
feature set. They conclude that the combination of independent information from the
different feature sets is more useful than the different approaches of the classifiers on the
same data.
The most well-known resampling method is probably k-fold cross-validation. By dividing
the dataset randomly into k disjoint pattern subsets, new overlapping training sets can be
created for each ensemble member, by leaving out one of these k subsets and training on
the remainder. The Bagging algorithm, is another example, randomly selecting N patterns
with replacement from the original set of N patterns.
การแปล กรุณารอสักครู่..
