MapReduce. Originally put in place by Google to solve the web search index creation problem [12], MapReduce is nowadays the main programming model and associated implementation for processing and generating large datasets [19].The input data format in MapReduce framework is applicationspecific,is specified by the user [20] and is suitable for semistructured or unstructured data. The MapReduce’s output is a set of pairs. The name “MapReduce” expresses the fact that users specify an algorithm using two kernel functions: “Map” and “Reduce”. The Map function is applied on
the input data and produces a list of intermediate pairs; and the Reduce function merges all intermediate values associated with the same intermediate key [19] [20]. In a Hadoop cluster, a job (i.e a MapReduce program [11]) is executed
by subsequently breaking it down into pieces called tasks. When a node in Hadoopcluster receives a job, it is able to divide it, and run it in parallel over other nodes [12].
Here the data location problem is solved by the JobTracker which communicates with the NameNode to help datanodes to send tasks to near-data datanodes. Let us note that this
processing in form of pairs is not a limitation to processing which does not seem, at first glance, feasible in map-reduce manner. Indeed, MapReduce has been successfully used in RDF/RDFS and OWL reasoning [21,22] and in structured data querying [23
MapReduce. Originally put in place by Google to solve the web search index creation problem [12], MapReduce is nowadays the main programming model and associated implementation for processing and generating large datasets [19].The input data format in MapReduce framework is applicationspecific,is specified by the user [20] and is suitable for semistructured or unstructured data. The MapReduce’s output is a set of pairs. The name “MapReduce” expresses the fact that users specify an algorithm using two kernel functions: “Map” and “Reduce”. The Map function is applied onthe input data and produces a list of intermediate pairs; and the Reduce function merges all intermediate values associated with the same intermediate key [19] [20]. In a Hadoop cluster, a job (i.e a MapReduce program [11]) is executedby subsequently breaking it down into pieces called tasks. When a node in Hadoopcluster receives a job, it is able to divide it, and run it in parallel over other nodes [12].Here the data location problem is solved by the JobTracker which communicates with the NameNode to help datanodes to send tasks to near-data datanodes. Let us note that thisprocessing in form of pairs is not a limitation to processing which does not seem, at first glance, feasible in map-reduce manner. Indeed, MapReduce has been successfully used in RDF/RDFS and OWL reasoning [21,22] and in structured data querying [23
การแปล กรุณารอสักครู่..