The main benefit of RDDs is an efficient mechanism for fault
recovery. Traditional main-memory databases support fine-grained
updates to tables and replicate writes across the network for fault
tolerance, which is expensive on large commodity clusters. In contrast,
RDDs restrict the programming interface to coarse-grained
deterministic operators that affect multiple data items at once