3) TeraSort: This benchmark is probably the most wellknown
Hadoop benchmark. It has the capability of sorting
large number of 100-byte records. It examines the MapReduce
and HDFS and layers of Hadoop. It represents the true
workload of Hadoop as it requires substantial networking,
computation and storage I/O. It comprises of three phases
namely generation, sorting, and validation. TeraGen is used to
create the data. TeraSort is responsible for doing the sorting
and writing sorted data to HDFS. Finally for the validation of
sorted data TeraValidate is used.