For the seven data sets we examined, model selection using
1k validation sets was nearly as good as optimal model
selection using the final test sets.
Ensemble selection uses forward stepwise selection to select
a subset of models that optimize the ensemble’s AUC
performance. Using a variety of learning algorithms and parameters
for these algorithms proved to be an effective way of
generating a collection of diverse, high quality models, some
of which have excellent AUC. In our experiments with seven
test problems, ensemble selection consistently found ensembles
that had better AUC than the best models trained with
any of the learning method, including models trained with
bagging and boosting.