While global learners are affected by sample selection
bias, local learners are not. This is a new categorization,
different from the more usual categorization of
learning methods into discriminative and generative
(Ng & Jordan, 2002). As seen in Section 3.1, although
generative (or Bayesian) methods model P(x|y), P(y)
and P(x), their behavior is generally independent of
P(x) (although this is not true for naive Bayes).
This categorization is also useful for defining situations
in which we can learn from both labeled and unlabeled
data, an area of research that has received some attention
in recent years (see, for example, Szummer and
Jaakkola (2003)). Clearly, global learners can take advantage
of unlabeled data, while local learners cannot.
For global learners, we showed that we can still learn
correctly under sample selection bias if we have data to
estimate the selection probabilities P(s = 1|x). Also,
we showed how to evaluate a classifier using a biased
sample and estimates of the selection probabilities