In the second line, we split the score into the words that occur in the document and those that don’t occur (fqi;D = 0). In the third line, we add to the last term and subtract it from the first (where it ends up in the denominator), so there is no net effect. The last term is now the same for all documents andcan be ignored for ranking. The final expression gives the document score in terms of a “weight” for matching query terms. Although this weight is not identical to
a tf.idf weight, there are clear similarities in that it is directly proportional to the document term frequency and inversely proportional to the collection frequency. A different form of estimation, and one that is generally more effective, comes
from using a value of αD that is dependent on document length. This approach is known as Dirichlet smoothing, for reasons we will discuss later, and uses