In the last 10 years, there has been a lot of research on machine-learning approaches
to text categorization. Many of the applications of machine learning to
information retrieval, however, have been limited by the amount of training data
available. If the system is trying to build a separate classifier for every query, there
is very little data about relevant documents available, whereas other machinelearning
applications may have hundreds or even thousands of training examples.
Even the approaches that tried to learn ranking algorithms by using training data
from all the queries were limited by the small number of queries and relevance
judgments in typical information retrieval test collections.