For each benchmark task, we describe the steps used to implement the MR program as well as provide the equivalent SQL statement(s) executed by the two database systems. We executed each
task three times and report the average of the trials. Each system executes the benchmark tasks separately to ensure exclusive access to
the cluster’s resources. To measure the basic performance without
the overhead of coordinating parallel tasks, we first execute each
task on a single node. We then execute the task on different cluster
sizes to show how each system scales as both the amount of data
processed and available resources are increased. We only report
results using trials where all nodes are available and the system’s
software operates correctly during the benchmark execution.