Exercise 4.1
If we need T log2 T comparisons (where T is the number of termID–docID pairs) and
two disk seeks for each comparison, how much time would index construction for
Reuters-RCV1 take if we used disk instead of memory for storage and an unoptimized
sorting algorithm (i.e., not an external sorting algorithm)? Use the system
parameters in Table 4.1.
Exercise 4.2 [⋆]
How would you create the dictionary in blocked sort-based indexing on the fly to
avoid an extra pass through the data?
are needed, however, for collections that are several orders of magnitude
larger.