In this paper, we address the sample selection bias
problem in the context of learning and evaluating classifiers.
In Section 2 we formally define the sample selection
bias problem in machine learning terms. In
Section 3 we present a new categorization of learning
methods that is useful for characterizing their behavior
under sample selection bias and study how a number
of well-known classifier learning methods are affected
by sample selection bias. In Section 4, we present a
bias correction method based on estimating the probability
that an example is selected into the sample and
using rejection sampling to obtain unbiased samples of
the correct distribution. It can be used both for learning
classifiers and, more importantly, for evaluating a
classifier using a biased sample.