Sparkling Water combines two open source technologies: Apache Spark and H2O - a machine learning engine. It makes H2O’s library of Advanced Algorithms including Deep Learning, GLM, GBM, KMeans, PCA, and Random Forest accessible from Spark workflows. Spark users are provided with the options to select the best features from either platforms to meet their Machine Learning needs. Users can combine Sparks’ RDD API and Spark MLLib with H2O’s machine learning algorithms, or use H2O independent of Spark in the model building process and post-process the results in Spark.
Sparkling Water provides a transparent integration of H2O’s framework and data structures into Spark’s RDD-based environment by sharing the same execution space as well as providing a RDD-like API for H2O data structures.