The first benchmark task (the “Grep task”) requires each system
to scan through a data set of 100-byte records looking for a
three character pattern. This is the only task that requires processing
largely unstructured data, and was originally included in the
benchmark by the authors of [23] since the same task was included
in the original MapReduce paper [8].
To explore more complex uses of the benchmarked systems, the
benchmark includes four more analytical tasks related to log-file
analysis and HTML document processing. Three of these tasks operate
on structured data; the final task operates on both structured
and unstructured data.
The datasets used by these four tasks include a UserVisits table
meant to model log files of HTTP server traffic, a Documents table
containing 600,000 randomly generated HTML documents, and a
Rankings table that contains some metadata calculated over the data
in the Documents table. The schema of the tables in the benchmark
data set is described in detail in [23]. In summary, the UserVisits
table contains 9 attributes, the largest of which is destinationURL
which is of type VARCHAR(100). Each tuple is on the order of 150
bytes wide. The Documents table contains two attributes:
The first benchmark task (the “Grep task”) requires each systemto scan through a data set of 100-byte records looking for athree character pattern. This is the only task that requires processinglargely unstructured data, and was originally included in thebenchmark by the authors of [23] since the same task was includedin the original MapReduce paper [8].To explore more complex uses of the benchmarked systems, thebenchmark includes four more analytical tasks related to log-fileanalysis and HTML document processing. Three of these tasks operateon structured data; the final task operates on both structuredand unstructured data.The datasets used by these four tasks include a UserVisits tablemeant to model log files of HTTP server traffic, a Documents tablecontaining 600,000 randomly generated HTML documents, and aRankings table that contains some metadata calculated over the datain the Documents table. The schema of the tables in the benchmarkdata set is described in detail in [23]. In summary, the UserVisitstable contains 9 attributes, the largest of which is destinationURLwhich is of type VARCHAR(100). Each tuple is on the order of 150bytes wide. The Documents table contains two attributes:
การแปล กรุณารอสักครู่..