5.1 Probabilistic Information Retrieval principles
The results retrieved by probabilistic information retrieval systems depend on estimations and probabilities. The first assumption is that terms are dispersed differently between relevant and non-relevant documents [3]. A PIR system ranks documents and sorts them in decreasing order of probability of relevance to the information need once the probability is calculated [3]. The results are as accurate as the calculated probability [9].
The classic probabilistic model returns documents in decreasing order of calculated probability of relevance to the information requirement. After the indexing process every term can have assigned a value that indicates the probability that a document containing this term is relevant to the concept described by the term. In the retrieval phase the documents have calculated a value which is the sum of probabilities from terms that exists in both a document and in the query. The documents are then retrieved in order according to this value (descending). The document representation for this version of Probabilistic Information Retrieval could be the same as in the Boolean model as it only need to store information if either document contains a term or not [9].
Similarly to the Inverse Document Vector in the VSM model, a vector has to be created that stores information about how important each term is. If ‘p’ is the probability that a document which contains a term and it is relevant to the query and ‘q’ is probability that the document contains the term but it is not relevant, then the weight of the term is calculated as: