drift, where a chosen model varies throughout the learning process. As the number
of training points increases, more complex models tend to fit data better and are
therefore selected over simpler models. Since the selection of training input points
depends on the model, the training points chosen for a simpler model in the early
stages could be less useful for the more complex model selected at the end of the
learning process. Due to model drift, portions of training points are gathered for
different models, resulting in the training data being not well suited for any of the
models. However, because the selection of the final model is unclear at the onset,
one possibility is to select training input points with respect to multiple models [52],
by optimizing the training data for all the models: