distributed file system designed to run on top of the local file systems of the cluster nodes and
store extremely large files suitable for streaming data access. HDFS is highly fault tolerant and can scale up from a single server to thousands of machines, e