5.1.5 Execution Strategies
As noted earlier, the query planner in parallel DBMSs are careful to transfer data between nodes only if it is absolutely necessary.
This allows the systems to optimize the join algorithm depending
on the characteristics of the data and perform push-oriented messaging without writing intermediate data sets. Over time, MR advocates should study the techniques used in parallel DBMSs and
incorporate the concepts that are germane to their model. In doing
so, we believe that again the performance of MR frameworks will
improve dramatically.
Furthermore, parallel DBMSs construct a complete query plan
that is sent to all processing nodes at the start of the query. Because
data is “pushed” between sites when only necessary, there are no
control messages during processing. In contrast, MR systems use a
large number of control messages to synchronize processing, resulting in poorer performance due to increased overhead; Vertica also
experienced this problem but on a much smaller scale (Section 4.2).