As crossvalidation essentially computes a prediction for each example in
the training set, it was soon realized that this information could be used in
more elaborate ways than simply counting the number of correct and incorrect
predictions. One general way to achieve this is Stacking (Wolpert, 1992), which
learns from predictions of base (=level-0) classifiers, via a single meta (=level-1)
classifier. The basic idea of Stacking is to use the predictions of the base classifiers
as attributes in a new training set that keeps the original class labels. This
new meta training set is then used to train the meta classifier which learns to
predict the final class. So for the additional cost of running an appropriate meta
classifier it is possible to utilize all the output generated by a crossvalidation.
Furthermore, the dimensionality of the meta dataset is equal to the number of
classes multiplied by the number of base classifiers3 and thus fairly independent
of the dimensionality of the original dataset. The additional training cost for
the meta classifier is usually much smaller than the training costs for the base
classifiers, especially for large, high-dimensional datasets.