Some of these steps require further explanation. In steps 1 and 4, the document
language model probabilities (P(w|D)) should be estimated using Dirichlet
smoothing. In step 2, the model allows the set C to be the whole collection, but
because low-ranked documents have little effect on the estimation of P(w|R),
usually only 10–50 of the top-ranked documents are used. This also makes the
computation of P(w|R) substantially faster.
For similar reasons, the summation in step 4 is not done over all words in
the vocabulary. Typically only a small number (10–25) of the highest-probability
words are used. In addition, the importance of the original query words is emphasized by combining the original query frequency estimates with the relevance
model estimates using a similar approach to Jelinek-Mercer, i.e., λP(w|Q)+(1−
λ)P(w|R), where λ is a mixture parameter whose value is determined empirically
(0.5 is a typical value for TREC experiments). This combination makes it
clear that estimating relevance models is basically a process for query expansion
and smoothing. for relevance model estimation. In the second pass, we use KL-divergence to rank
documents by comparing the relevance model and the document model. Note
also that we are in effect adding words to the query by smoothing the relevance
model using documents that are similar to the query. Many words that had zero
probabilities in the relevance model based on query frequency estimates will now
have non-zero values. What we are describing here is exactly the pseudo-relevance
feedback process described in section 6.2.4. In other words, relevance models provide
a formal retrieval model for pseudo-relevance feedback and query expansion.