As the data grows in the industry, new techniques and
approaches need to be adopted. This paper focuses on the
unstructured aspects of data analytics and reviews as a case
study, some of the key projects in Apache Hadoop. This
paper also describes the fundamentals of relational database
management systems (RDBMS) and their use for traditional
big data sets in data warehousing, decision support, and
analytics. The paper then reviews non-relational big data
approaches such as distributed/shared-nothing architectures,
horizontal scaling, key/value stores, and eventual
consistency. This part of the paper differentiates between
structured versus unstructured data. The paper describes
various building blocks and techniques for Map Reduce and
HDFS, HBase and their implementation in an open source
Hadoop framework. This paper focuses on the infrastructure
planning (compute, network, and storage systems), and
reviews Hadoop design criteria and implementation
considerations. Hadoop includes many technologies,
including MapReduce, which interact with the infrastructure
elements while analyzing data. This paper reviews
performance considerations and describes relevant
benchmarks with a Hadoop analytics cluster. In conclusion,
the paper review the alternatives to hosting an analytics
cluster in a public cloud.