The performance results for this task is displayed in Figure 9. We had to slightly change the SQL used in 100
node experiments for Vertica due to an optimizer bug in the system,
which is why there is an increase in the execution time for Vertica
going from 50 to 100 nodes. But even with this increase, it is clear
that this task results in the biggest performance difference between
Hadoop and the parallel database systems. The reason for this disparity is two-fold.