4.2.3. Information extraction
One of the intuitive ways to perform this task is to provide
hand-written regular expressions (REs) like [59,60]. The results are promising but the number of manually-written REs
(165 REs for a 9-concept ontology [59]) makes it hard to handle. More, their approach does not focus on scalability unlike
[61,40] who propose a REs pattern-based tool named OnTeA.
OnTeA takes advantage of Hadoop MapReduce to scale. More and
more, automatic approaches had been proposed. It is the case
of KNOWITALL [62] and TextRunner. The former uses predefined
patterns and rule templates to populate classes in a given ontology. Though automatic, KNOWITALL does not scale: a webdocument is processed several times for patterns matching
and many web-queries are done to assign a probability to a
concept, etc. Thus, TextRunner which implements the new
extraction paradigm of Open Information Extraction (OIE) had
been introduced. In OIE, we are not limited in a set of triples
but try to extract all of them [8,47]. More recently, following
REVERB, [63] present OLLIE. Unlike REVERB, OLLIE can extract relation not mediated by verb and in certain case can provide
the context of a relation (e.g: “If he wins five key states, Romney will be elected President.” −→ (the wining of key states determines the election fact)).