Big data is something so huge and complex that it is impossible for traditional systems and traditional data-warehousing tools to process and work on them. Data ( Big Data ) is generated by machines, generated by humans, and also generated by mother nature. With the growth of technologies and services, this large data is produced that can be structured, semi-structure and unstructured from the different sources. Big data neither be worked upon by using traditional SQL like queries nor can the relational database management system (RDBMS) be used for storage. So that a wide variety of scalable database tools and techniques have evolved. Hadoop, an open source distributed data processing system is one of the prominent and well know solutions. The NoSQL has gained prominence as a non-relational database with the likes of MongoDB, Dynamo DB from Apache.
The need of big data comes from The Big Companies like Google and Facebook. For the purpose of analysis of big amount of data which is in unstructured form. Such type of data is very difficult to process that contains the billions records of millions people information that includes the web social media, image, audio and so on. The paper is divided in the following sequence: Starting with the introduction, we talk about the characteristics of big data (5Vs). It is followed with a descriptive note on the various component of Big Data based on Hadoop framework. Apache Hadoop is an open source software framework for storage and large scale processing of data sets on clusters of commodity hardware. Hadoop was developed by Doug Cutting and Mike Cafarella in 2005.