The MR frameworks provide a more sophisticated failure model
than parallel DBMSs. While both classes of systems use some form
of replication to deal with disk failures, MR is far more adept at
handling node failures during the execution of a MR computation.
In a MR system, if a unit of work (i.e., processing a block of data)
fails, then the MR scheduler can automatically restart the task on
an alternate node. Part of the flexibility is the result of the fact that
the output files of the Map phase are materialized locally instead of
being streamed to the nodes running the Reduce tasks. Similarly,
pipelines of MR jobs, such as the one described in Section 4.3.4,
materialize intermediate results to files each step of the way. This
differs from parallel DBMSs, which have larger granules of work
(i.e., transactions) that are restarted in the event of a failure. Part of
the reason for this approach is that DBMSs avoid saving intermediate results to disk whenever possible. Thus, if a single node fails
during a long running query in a DBMS, the entire query must be
completely restarted.