In the query likelihood retrieval model, we rank documents by the probability that the query text could be generated by the document language model. In other words, we calculate the probability that we could pull the query words out of the
“bucket” of words representing the document. This is a model of topical relevance, in the sense that the probability of query generation is the measure of how likely it is that a document is about the same topic as the query. Since we start with a query, we would in general like to calculate P(D|Q) to rank the documents. Using Bayes’ Rule, we can calculate this by