The MapReduce framework is one of the most imperative parts of big data dispensation. In earlier kinds of MapReduce the mechanisms were designed to address basic needs of processing and resource management. More recently, it has progressed into a much improved version known as MapReduce 2/YARN that provides improved features andfunctionality with HIVE.
In this research work we focused on performance modelling and prediction of Hadoop Map-Reduce systems, the most popular framework for large-scale data processing. We developed the capability to evaluate application performance in hypothetical MapReduce systems using simulation. Compared to the traditional build- and-measure approach, our simulation-based evaluation is faster and cheaper and offers flexibility. Although real experiments must be conducted before total commitment, simulation-based evaluation can work as a intermediate step to reveal obvious flaws and help system designers further understand performance characteristics of their applications and the MapReduce system.