In our work, we only extended the Hadoop file system
HDFS to support the customized data warehouse placement
policy. We did not address other aspects such as the varia-
tion of the blocks size, which remains equal to default 64MB.
Nor the replication blocks policy which remains to default 3.
Our goal is the study of the plain query performance gains
due to careful data warehouse organization in the context
of parallelization with MapReduce. We used on top of our
extended Hadoop version the Apache Hive (Version 0.10.0,
released on January 14th, 2013) which is a data warehouse
software that allows querying and managing large datasets
residing in distributed storage. Hive provides an SQL-like
language called HiveQL and has also support for creating
cubes[4]. In the next section we discuss experimental de-
tails and evaluation procedure.