3.3. Adding Big Data capability to an existing information
system
A whole book can be written on this topic. It is what had
been done by [3] by the study of data warehousing in the
age of Big Data. A number of strategies of this integration
are presented in Table 1. The first step of that integration is
about data acquisition. Since traditional databases have to
deal with structured data, existing ecosystem needs to be
extended across all of the data types and domains. Then,
data integration capability needs to deal with velocity and
frequency. The challenge here is also about ever growing
volume and, because many technologies leverage Hadoop, use
technologies that allow you to interact with Hadoop in a bidirectional
manner: load and store data (HDFS) and process
and reuse the output (MapReduce) for further processing.
[14, page 12] reminds us that the main challenge is not to
build “that is ideally suited for all processing tasks” but to
have an underlying architecture flexible enough to permit to
processes built on top to work at their full potential. For sure
there is not a commonly agreed solution, an infrastructure is
intimately tied to the purpose of the organization in which
it is used and consequently to the kind of integration (realtime
or batch). More and other important questions have to
be answered: are Big Data stored timeliness or not [4]?