For the 1TB/cluster data set experiments, Figure 5 shows that all
systems executed the task on twice as many nodes in nearly half the
amount of time, as one would expect since the total amount of data
was held constant across nodes for this experiment. Hadoop and
DBMS-X performs approximately the same, since Hadoop’s startup cost is amortized across the increased amount of data processing
for this experiment. However, the results clearly show that Vertica
outperforms both DBMS-X and Hadoop. We attribute this to Vertica’s aggressive use of data compression (see Section 5.1.3), which
becomes more effective as more data is stored per node.