The cache action leaves the dataset lazy, but hints that it should be kept in memory after the first time it is computed, because it will be reused
The save action evaluates the dataset and writes it to a distributed filesystem such as HDFS. The saved version is used in future operations on it.