the actual indexing time is usually dominated by the time it takes to parse the
documents (PARSENEXTBLOCK) and to do the final merge (MERGEBLOCKS).
Exercise 4.6 asks you to compute the total index construction time for RCV1
that includes these steps as well as inverting the blocks and writing them to
disk.
Notice that Reuters-RCV1 is not particularly large in an age when one or
more GB of memory are standard on personal computers. With appropriate
compression (Chapter 5), we could have created an inverted index for RCV1
in memory on a not overly beefy server. The techniques we have described
are needed, however, for collections that are several orders of magnitude
larger.