Given some assumptions, such as that the relevance of a document to a query
is independent of other documents, it is possible to show that this statement is
true, in the sense that ranking by probability of relevance will maximize precision,
which is the proportion of relevant documents, at any given rank (for example,
in the top 10 documents). Unfortunately, the Probability Ranking Principle
doesn’t tell us how to calculate or estimate the probability of relevance. There are
many probabilistic retrieval models, and each one proposes a different method for
estimating this probability. Most of the rest of this chapter discusses some of the
most important probabilistic models.
In this section, we start with a simple probabilistic model based on treating
information retrieval as a classification problem. We then describe a popular and
effective ranking algorithm that is based on this model.