Map Reduce
Map Reduce is a distributed programming framework that focuses on data placement and distribution. As we saw in the last few examples, proper data placement
can make some problems very simple to compute. By focusing on data placement,
Map Reduce can unlock the parallelism in some common tasks and make it easier
to process large amounts of data.
Map Reduce gets its name from the two pieces of code that a user needs to
write in order to use the framework: the Mapper and the Reducer. The Map Reduce
library automatically launches many Mapper and Reducer tasks on a cluster of
machines. The interesting part about Map Reduce, though, is the path the data
takes between the Mapper and the Reducer.
Before we look at how the Mapper and Reducer work, let’s look at the foundations of the Map Reduce idea. The functions map and reduce are commonly found
in functional languages. In very simple terms, the map function transforms a list
of items into another list of items of the same length. The reduce function transforms a list of items into a single item. The Map Reduce framework isn’t quite
so strict with its definitions: both Mappers and Reducers can return an arbitrary
number of items. However, the general idea is the same.