1. the meaningful data integration challenge which can be
seen as a five-step challenge: (1) define the problem to
solve, (2) identify relevant pieces of data in Big Data, (3) ETL
it into appropriate formats and store it for processing, (4)
disambiguate it and (5) solve the problem.
2. the Billion Triple Challenge which aims to process largescale
RDF to provide a full description of each entity of the
triple in a single target vocabulary and to link that entity
to the corresponding sources.
3. the Linked Open Data (LOD) Ripper for providing good use
cases for LOD and to able to link them with non LOD
efficiently.
4. the value of the use of semantics in data integration and
in the design of future DBMS.
Similar challenges have been identified by S. Auer and J.
Lehmann [37]. Unlike [36], [37] proposes solutions for some of
these challenges (data integration, scalable reasoning, etc.).
Semantics could be considered as a magical world to bridge
the gap of the hétérogénéity of data. Moreover, semantics can
be used in a decidable system which makes possible to detect
inconsistency of data, generates new knowledge using inference
engine or simply links more accurately specific data
not relevant for machine learning based techniques. In the
literature, we can find work whose purpose is about the challenges
mentioned before. Before presenting them, we must
note that the relation between Big Data and semantics is bidirectional.
As it is true for Big Data leverages on semantics,
some semantics tasks are optimized by using tools designed
for large dataset processing, especially MapReduce framework.
More, in the articles cited in the following lines, the term Big
Data is rarely explicitly mentioned; it could be hidden behind
terms like “web scale/web-scale” or “large scale/large-scale”
[21,38–41] to express the volume feature, “real-time” or “dynamic”
[42,43] to express velocity and “informal/informality”,
“natural language”, “unstructured” or “data streams” [44–47]
to state the variety/variability feature. In another way, Big Data
can be experienced through Linked Data: it has volume, variety,
and veracity features and we can thus assume that other
characteristics are under control [13].