Unlike the previous tasks, the MR program
for this task consists of both a Map and Reduce function. The Map
function first splits the input value by the field delimiter, and then
outputs the sourceIP field (given as the input key) and the adRevenue field as a new key/value pair. For the variant query, only the
first seven characters (representing the first two octets, each stored
as three digits) of the sourceIP are used. These two Map functions
share the same Reduce function that simply adds together all of the
adRevenue values for each sourceIP and then outputs the prefix and
revenue total. We also used MR’s Combine feature to perform the
pre-aggregate before data is transmitted to the Reduce instances,
improving the first query’s execution time by a factor of two [8].