Notice that this index does not record the number of times each word appears;
it only records the documents in which each word appears. For instance, S2
con
tains the word “fish” twice, whereas S
1
contains “fish” only once. The inverted list
for “fish” shows no distinction between sentences 1 and 2; both are listed in the
same way. In the next few sections, we will look at indexes that include informa
tion about word frequencies.
Inverted lists become more interesting when we consider their intersection.
Suppose we want to find the sentence that contains the words “coloration” and
“freshwater”. The inverted index tells us that “coloration” appears in S3
and S4
,
while “freshwater” appears in S
1
and S
4
. We can quickly tell that only S
4
contains
both “coloration” and “freshwater”. Since each list is sorted by sentence number,
finding the intersection of these lists takes O(max(m, n)) time, where m and n
are the lengths ofthe two lists. The algorithm is the same as in merge sort. With list
skipping, which we will see later in the chapter, this cost drops to O(min(m, n)).