Blocked sort-based indexing
The basic steps in constructing a nonpositional index are depicted in Figure
1.4 (page 8).
We first make a pass through the collection assembling all
term–docID pairs.
We then sort the pairs with the term as the dominant key
and docID as the secondary key.
Finally, we organize the docIDs for each
term into a postings list and compute statistics like term and document frequency.
For small collections, all this can be done in memory.
In this chapter,
we describe methods for large collections that require the use of secondary
storage.