5.3.1 Documents
The simplest form of an inverted list stores just the documents that contain each
word, and no additional information. This kind of list is similar to the kind of
index you would find at the back of this textbook.
Figure 5.3 shows an index ofthis type built from the four sentences in Table 5.1
(so in this case, the “documents” are sentences). The index contains every word
found in all four sentences. Next to each word, there are a list of boxes, and each
one contains the number of a sentence. Each one of these boxes is a posting. For
example, look at the word “fish”. You can quickly see that this word appears in
all four sentences, because the numbers 1, 2, 3, and 4 appear by it. You can also
quickly determine that “fish” is the only word that appears in all the sentences.
Two words come close: “tropical” appears in every sentence but S4
, and “water”
is not in S
3
.
2
Every term in a document corresponds to a dimension, so there are tens of thousands
of dimensions in effect. This is in comparison to a typical database application with
tens of dimensions at most.