In this paper, we studied instantiating multidimensional data warehouses using
NoSQL column-oriented systems. We proposed three approaches at the columnoriented
logical model. Using a simple formalism that separate structures from values,
we described mappings from the conceptual level (described using a multidimensional conceptual schema) to the logical level (described using NoSQL column-oriented
logical schemas).
Our experimental work illustrates the instantiation of a data warehouse with each
of our three approaches. Each model has its own weaknesses and strengths. The shattered
model (MLC2) uses less disk space, but it is quite inefficient when it comes to
answering queries (most requiring joins in this case). The simple models MLC0 and
MLC1 do not show significant performance differences. Converting from one model
to another is shown to be easy and comparable in time to “data loading from scratch”.
One conversion is significantly very time consuming and corresponds to merging data
from multiple tables (MLC2) into one unique table. Interesting results were also obtained
when computing the OLAP cuboid lattice using the column-oriented models
and they are reasonable enough for a big data framework.
For future work, we will consider logical models in alternative NoSQL architectures,
i.e. document-oriented models as well as graph-oriented models. Moreover,
after exploring data warehouse instantiation across different NoSQL systems, we need
to generalize across all these logical models. We need a simple formalism to express
model differences and we need to compare models within each paradigm and across
paradigms (e.g. document versus column). Finally we intend to study others query
languages frameworks such as PIG or PHOENIX and compare them with Hive.