The input data format in MapReduce framework is application specific,
is specified by the user [20] and is suitable for semi structured
or unstructured data.
The MapReduce’s output is a
set of pairs. The name “MapReduce” expresses
the fact that users specify an algorithm using two kernel
functions: “Map” and “Reduce”.
The Map function is applied on
the input data and produces a list of intermediate
pairs; and the Reduce function merges all intermediate values
associated with the same intermediate key [19] [20]. In
a Hadoop cluster, a job (i.e a MapReduce program [11]) is executed
by subsequently breaking it down into pieces called
tasks. When a node in Hadoopcluster receives a job, it is able
to divide it, and run it in parallel over other nodes [12].
Here the data location problem is solved by the JobTracker
which communicates with the NameNode to help datanodes
to send tasks to near-data datanodes.
Let us note that this
processing in form of pairs is not a limitation to
processing which does not seem, at first glance, feasible in
map-reduce manner.
Indeed, MapReduce has been successfully
used in RDF/RDFS and OWL reasoning [21,22] and in structured
data querying [23].