The distributed index construction method we describe in this section is an
application of MapReduce, a general architecture for distributed computing.
MapReduce is designed for large computer clusters. The point of a cluster is
to solve large computing problems on cheap commodity machines or nodes
that are built from standard parts (processor, memory, disk) as opposed to on
a supercomputer with specialized hardware. Although hundreds or thousands
of machines are available in such clusters, individual machines can
fail at any time. One requirement for robust distributed indexing is, therefore,
that we divide the work up into chunks that we can easily assign and
NODE – in case of failure – reassign. A master node directs the process of assigning
and reassigning tasks to individual worker nodes.