The connections in the inference network graph are defined by the query and
the representation nodes connected to every document in the collection. The
probabilities for the representation nodes are estimated using language models
for each document. Note that these nodes do not represent the occurrence of a
particular feature in a document, but instead capture the probability that the feature
is characteristic of the document, in the sense that the language model could
generate it. For example, a node for the word “lincoln” represents the binary event
that a document is about that topic (or not), and the language model for the document
is used to calculate the probability of that event being TRUE.
Since all the events in the inference network are binary, we cannot really use
a multinomial model of a document as a sequence of words. Instead, we use a
multiple-Bernoulli16 model, which is the basis for the binary independence model
in section 7.2.1. In that case, a document is represented as a binary feature vector,
which simply records whether a feature is present or not. In order to capture
term frequency information, a different multiple-Bernoulli model is used where
the document is represented by a multiset17 of vectors, with one vector for each
term occurrence (Metzler, Lavrenko, & Croft, 2004). It turns out that with the
appropriate choice of parameters, the probability estimate based on the multiple-
Bernoulli distribution is the same as the estimate for the multinomial distribution
with Dirichlet smoothing, which is