The question that faces us now is how to compute these probabilities. To start
with, let’s focus on P(R|D). It’s not clear how we would go about calculating
this, but given information about the relevant set, we should be able to calculate
P(D|R). For example, if we had information about how often specific words
occurred in the relevant set, then, given a new document, it would be relatively
straightforward to calculate how likely it would be to see the combination of
words in the document occurring in the relevant set. Let’s assume that the probability
of the word “president” in the relevant set is 0.02, and the probability of
“lincoln” is 0.03. If a new document contains the words “president” and “lincoln”,
we could say that the probability of observing that combination of words in the relevant set is 0.02 × 0.03 = 0.0006, assuming that the two words occur independently.
So how does calculating P(D|R) get us to the probability of relevance? It
turns out there is a relationship between P(R|D) and P(D|R) that is expressed
by Bayes’ Rule: