data that must be targeted to understand, plan and act in a
predictive way. This perspective raises new questions about
the quality of the data. In this context, people do not agree
with the definition of quality. The quality of the data may
be its high processing level or its relevance according to the
reality they represent. In fact, since Big Data is big and messy,
challenges can be classified into engineering tasks (managing
data at an unimaginable scale) and semantics (finding and
meaningfully combining information that is relevant to your
needs) [36] have identified each a relevant challenge for Big
Data:
1. the meaningful data integration challenge which can be
seen as a five-step challenge: (1) define the problem to
solve, (2) identify relevant pieces of data in Big Data, (3) ETL
it into appropriate formats and store it for processing, (4)
disambiguate it and (5) solve the problem.
2. the Billion Triple Challenge which aims to process largescale
RDF to provide a full description of each entity of the
triple in a single target vocabulary and to link that entity
to the corresponding sources.
3. the Linked Open Data (LOD) Ripper for providing good use
cases for LOD and to able to link them with non LOD
efficiently.
4. the value of the use of semantics in data integration and
in the design of future DBMS.
Similar challenges have been identified by S. Auer and J.
Lehmann [37]. Unlike [36], [37] proposes solutions for some of
these challenges (data integration, scalable reasoning, etc.).
Semantics could be considered as a magical world to bridge
the gap of the hétérogénéity of data. Moreover, semantics can
be used in a decidable system which makes possible to detect
inconsistency of data, generates new knowledge using inference
engine or simply links more accurately specific data
not relevant for machine learning based techniques. In the
literature, we can find work whose purpose is about the challenges
mentioned before. Before presenting them, we must
note that the relation between Big Data and semantics is bidirectional.
As it is true for Big Data leverages on semantics,