Notice that this analysis assumes that p is much smaller than n/c; that’s what
allows us to assume that each posting lies in its own interval. As p grows closer
to n/c, it becomes likely that some of the postings we want will lie in the same
intervals. However, notice that once p gets close to n/c, we need to read almost
all of the inverted list, so the skip pointers aren’t very helpful.
Coming back to the formula, you can see that while a larger value of c makes
the first term smaller, it also makes the second term bigger. Therefore, picking the
perfect value for c depends on the value of p, and we don’t know what p is until
a query is executed. However, it is possible to use previous queries to simulate
skipping behavior and to get a good estimate for c. In the exercises, you will be
asked to plot some ofgraphs ofthis formula and to solve for the equilibrium point.
Although it might seem that list skipping could save on disk accesses, in practice it rarely does. Modern disks are much better at reading sequential data than
they are at skipping to random locations. Because of this, most disks require a skip
of about 100,000 postings before any speedup is seen. Even so, skipping is still useful because it reduces the amount oftime spent decoding compressed data that has
been read from disk, and it dramatically reduces processing time for lists that are
cached in memory