HDFS, the Hadoop Distributed File System, is responsible for storing data on
the cluster
● Data files are split into blocks and distributed across multiple nodes in the
cluster
● Each block is replicated multiple times
○ Default is to replicate each block three times
○ Replicas are stored on different nodes
○ This ensures both reliability and availability