We then introduce blocked
sort-based indexing (Section 4.2), an efficient single-machine algorithm designed
for static collections that can be viewed as a more scalable version of
the basic sort-based indexing algorithm we introduced in Chapter 1. Section
4.3 describes single-pass in-memory indexing, an algorithm that has
even better scaling properties because it does not hold the vocabulary in
memory. For very large collections like the web, indexing has to be distributed
over computer clusters with hundreds or thousands of machines.
We discuss this in Section 4.4. Collections with frequent changes require dy-
namic indexing introduced in Section 4.5 so that changes in the collection are
immediately reflected in the index. Finally, we cover some complicating issues
that can arise in indexing – such as security and indexes for ranked
retrieval – in Section 4.6