To understand the system’s performance characteristics over its entire design space, we need to vary both
compute and disk resources to determine the point where
these resources are balanced and yield best performance.
These experiments thus quantify the performance effect of
adding more nodes to a cluster. In Figure 9, the performance of a benchmark for a given number of CPU cores is
plotted on the Y-axis, while clustering the results based on
the number of disks used on the X-axis. Each bar represents
the number of cores used for the experiment. The trend to
note here is that the least execution time occurs when using the most resources (768 cores and all the 64 disks). This
shows that Hadoop MapReduce jobs have improved execu
tion time when executed on a larger cluster