We assume that the data comes in a set of records. The records are sent to the
Mapper, which transforms these records into pairs, each with a key and a value.
The next step is the shuffle, which the library performs by itself. This operation
uses a hash function so that all pairs with the same key end up next to each other
and on the same machine. The final step is the reduce stage, where the records are
processed again, but this time in batches, meaning all pairs with the same key are
processed at once. The MapReduce steps are summarized in Figure 5.10.