Our next task requires each system to calculate the total adRevenue generated for each sourceIP in the UserVisits table (20GB/node),
grouped by the sourceIP column. We also ran a variant of this query
where we grouped by the seven-character prefix of the sourceIP column to measure the effect of reducing the total number of groups
on query performance. We designed this task to measure the performance of parallel analytics on a single read-only table, where
nodes need to exchange intermediate data with one another in order
compute the final value. Regardless of the number of nodes in the
cluster, this tasks always produces 2.5 million records (53 MB); the
variant query produces 2,000 records (24KB).