One of the features that a retrieval model should provide is a clear statement about
the assumptions upon which it is based. The Boolean and vector space approaches
make implicit assumptions about relevance and text representation that impact
the design and effectiveness of ranking algorithms. The ideal situation would be
to show that, given the assumptions, a ranking algorithm based on the retrieval
model will achieve better effectiveness than any other approach. Such proofs are
actually very hard to come by in information retrieval, since we are trying to formalize
a complex human activity. The validity of a retrieval model generally has
to be validated empirically, rather than theoretically.
One early theoretical statement about effectiveness, known as the Probability
Ranking Principle (Robertson, 1977/1997), encouraged the development of
probabilistic retrieval models, which are the dominant paradigm today. These
models have achieved this status because probability theory is a strong foundation
for representing and manipulating the uncertainty that is an inherent part of the information retrieval process. The Probability Ranking Principle, as originally
stated, is as follows: