Another important reason for why the parallel DBMSs are able
to outperform Hadoop is that both Vertica and DBMS-X use an index on the pageRank column and store the Rankings table already
sorted by pageRank. Thus, executing this query is trivial. It should
also be noted that although Vertica’s absolute times remain low, its
relative performance degrades as the number of nodes increases.
This is in spite of the fact that each node still executes the query in
the same amount of time (about 170ms). But because the nodes finish executing the query so quickly, the system becomes flooded with
control messages from too many nodes, which then takes a longer
time for the system to process. Vertica uses a reliable message layer
for query dissemination and commit protocol processing [4], which
we believe has considerable overhead when more than a few dozen
nodes are involved in the query.