In general, MapReduce breaks a large computing problem into smaller
KEY-VALUE PAIRS parts by recasting it in terms of manipulation of key-value pairs. For indexing,
a key-value pair has the form (termID,docID). In distributed indexing,
the mapping from terms to termIDs is also distributed and therefore more
complex than in single-machine indexing. A simple solution is to maintain
a (perhaps precomputed) mapping for frequent terms that is copied to all
nodes and to use terms directly (instead of termIDs) for infrequent terms.
We do not address this problem here and assume that all nodes share a consistent
term → termID mapping.