The question that faces us now is how to compute these probabilities. To start with, let’s focus on P(R|D). It’s not clear how we would go about calculating this, but given information about the relevant set, we should be able to calculate P(D|R). For example, if we had information about how often specific words occurred in the relevant set, then, given a new document, it would be relatively straightforward to calculate how likely it would be to see the combination of words in the document occurring in the relevant set. Let’s assume that the probability of the word “president” in the relevant set is 0.02, and the probability of
“lincoln” is 0.03. If a new document contains the words “president” and “lincoln”, we could say that the probability of observing that combination of words in the relevant set is 0.02 × 0.03 = 0.0006, assuming that the two words occur independently.
So how does calculating P(D|R) get us to the probability of relevance? It turns out there is a relationship between P(R|D) and P(D|R) that is expressed by Bayes’ Rule: