• Data Integration Capability
– Apache Sqoop: a tool designed for transferring data from
a relational database directly into HDFS or into Hive
[12,18]. It automatically generates classes needed to
import data into HDFS after analyzing the schema’s
tables; then the reading of tables’ contents is a parallel
MapReduce job;
– Flume is a distributed, reliable, and available service
for efficiently collecting, aggregating, and moving large
amounts of log data. It is designed to import streaming
data flows [12,27].