In addition to features based on word occurrence, ri nodes
also represent proximity features. Proximity features take a number of different
forms, such as requiring words to co-occur within a certain “window” (length)
of text, and will be described in detail in the next section. Features that are not
based on language models, such as document date, are allowed but not shown in
this example.
The query nodes qi are used to combine evidence from representation nodes
and other query nodes. These nodes represent the occurrence of more complex evidence
and document features. A number of forms of combination are available,
with Boolean AND and OR being two of the simplest. The network as a whole
computes P(I|D, μ), which is the probability that an information need is met
given the document and the parameters μ. The information need node I is a special
query node that combines all of the evidence from the other query nodes into
a single probability or belief score. This score is used to rank documents. Conceptually,
this means we must evaluate an inference network for every document
in the collection, but as with every other ranking algorithm, indexes are used to
speed up the computation. In general, representation nodes are indexed, whereas
query nodes are specified for each query by the user or search application. This
means that indexes for a variety of proximity features, in addition to words, will be
created (as described in Chapter 5), significantly expanding the size of the indexes.
In some applications, the probabilities associated with proximity features are computed
at query time in order to provide more flexibility in specifying queries.