Skip pointers do not improve the asymptotic running time of reading an inverted list. Suppose we have an inverted list that is n bytes long, but we add skip
pointers after each c bytes, and the pointers are k bytes long. Reading the entire
5.7 Query Processing 171
list requires reading (n) bytes, but jumping through the list using the skip pointers requires (kn/c) time, which is equivalent to (n). Even though there is
no asymptotic gain in run time, the factor of c can be huge. For typical values of
c = 100 and k = 4, skipping through a list results in reading just 2.5% of the
total data.
Notice that as c gets bigger, the amount of data you need to read to skip
through the list drops. So, why not make c as big as possible? The problem is that
if c gets too large, the average performance drops. Let’s look at this problem in
more detail.
Suppose you want to find p particular postings in an inverted list, and the list is
n bytes long, with k-byte skip pointers located at c-byte intervals. Therefore, there
are n/c total intervals in the list. To find those p postings, we need to read kn/c
bytes in skip pointers, but we also need to read data in p intervals. On average, we
assume that the postings we want are about halfway between two skip pointers,
so we read an additional pc/2 bytes to find those postings. The total number of
bytes read is then: