The SPIMI algorithm is shown in Figure 4.4. The part of the algorithm that
parses documents and turns them into a stream of term–docID pairs, which
we call tokens here, has been omitted. SPIMI-INVERT is called repeatedly on
the token stream until the entire collection has been processed.
Tokens are processed one by one (line 4) during each successive call of
SPIMI-INVERT. When a term occurs for the first time, it is added to the
dictionary (best implemented as a hash), and a new postings list is created
(line 6). The call in line 7 returns this postings list for subsequent occurrences
of the term.