• We extend previous work [23] that showed the superior performance
of parallel databases relative to Hadoop. While this
previous work focused only on performance in an ideal setting,
we add fault tolerance and heterogeneous node experiments
to demonstrate some of the issues with scaling parallel
databases.
• We describe the design of a hybrid system that is designed to
yield the advantages of both parallel databases and MapReduce.
This system can also be used to allow single-node
databases to run in a shared-nothing environment.
• We evaluate this hybrid system on a previously published
benchmark to determine how close it comes to parallel
DBMSs in performance and Hadoop in scalability.