The combination in AdaBoost is a linear (not necessarily convex) combination of the
network outputs, with weights α based on the training error but are fixed with respect to the
input patterns. The obvious extension to this is to allow the weights to vary according to
the input pattern, so the α values are re-calculated by some method for every new x value.
Several authors, including Schapire and Singer [117], Meir et al [96] and Moerland [98], do
precisely this with versions of “localized” boosting. Their architectures bear a number of
similarities to the Mixture of Experts, which will be covered in the next section. Although
we have so far discussed Boosting in the context of classification problems, Avnimelech
and Intrator [3] showed an extension of AdaBoost to boosting with regression estimators.
Schapire [119] conducts a review of recent theoretical analysis on AdaBoost, describing links
to game theory, and extensions to handle multi-class problems.
Maclin and Opitz [90] and also Bauer [8] compare Bagging and Boosting methods in a
large empirical study. Their findings show that although Bagging almost always produces an
ensemble which is better than any of its component classifiers, and is relatively impervious
to noise, it is on average not significantly better than a simple ensemble. They find Boosting
to be a powerful technique, usually beating Bagging, but is susceptible to noise in the data
and can quickly overfit; similar problems with overfitting in AdaBoost have been observed
by a number of authors. Most recently, Jin et al [61] use a confidence-based regularisation
term when combining the Boosted learners; if learners early on in the Boosting chain are
confident in their predictions, then the contribution of learners later on in the chain is
down-played; this technique shows significantly improved tolerance to noisy datasets.
The combination in AdaBoost is a linear (not necessarily convex) combination of thenetwork outputs, with weights α based on the training error but are fixed with respect to theinput patterns. The obvious extension to this is to allow the weights to vary according tothe input pattern, so the α values are re-calculated by some method for every new x value.Several authors, including Schapire and Singer [117], Meir et al [96] and Moerland [98], doprecisely this with versions of “localized” boosting. Their architectures bear a number ofsimilarities to the Mixture of Experts, which will be covered in the next section. Althoughwe have so far discussed Boosting in the context of classification problems, Avnimelechand Intrator [3] showed an extension of AdaBoost to boosting with regression estimators.Schapire [119] conducts a review of recent theoretical analysis on AdaBoost, describing linksto game theory, and extensions to handle multi-class problems.Maclin and Opitz [90] and also Bauer [8] compare Bagging and Boosting methods in alarge empirical study. Their findings show that although Bagging almost always produces anensemble which is better than any of its component classifiers, and is relatively imperviousto noise, it is on average not significantly better than a simple ensemble. They find Boostingto be a powerful technique, usually beating Bagging, but is susceptible to noise in the dataand can quickly overfit; similar problems with overfitting in AdaBoost have been observed
by a number of authors. Most recently, Jin et al [61] use a confidence-based regularisation
term when combining the Boosted learners; if learners early on in the Boosting chain are
confident in their predictions, then the contribution of learners later on in the chain is
down-played; this technique shows significantly improved tolerance to noisy datasets.
การแปล กรุณารอสักครู่..
