1. Introduction
One of the most common assumptions in the design
of learning algorithms is that the training data consist
of examples drawn independently from the same
underlying distribution as the examples about which
the model is expected to make predictions. In many
real-world applications, however, this assumption is violated
because we do not have complete control over
the data gathering process.
For example, suppose we are using a learning method
to induce a model that predicts the side-effects of a
treatment for a given patient. Because the treatment
is not given randomly to individuals in the general
population, the available examples are not a random
sample from the population. Similarly, suppose we are
learning a model to predict the presence/absence of an
animal species given the characteristics of a geographical
location. Since data gathering is easier in certain
regions than others, we would expect to have more
data about certain regions than others.