In addition to language integration, another key benefit of using
RDDs as the data structure for operators is the execution engine integration.
This common abstraction allows machine learning computations
and SQL queries to share workers and cached data without
the overhead of data movement.
Because SQL query processing is implemented using RDDs, lineage
is kept for the whole pipeline, which enables end-to-end fault
tolerance for the entire workflow. If failures occur during the machine
learning stage, partitions on faulty nodes will automatically
be recomputed based on their lineage.