Kulkarni and Caragea (2009) propose a
Concept Extractor and Relationship Identifier (CERI)
system to bridge the gap between the current
Web and the Semantic Web. The Concept
Extractor (CE) component is relevant to our work.
As Gabrilovich and Markovitch (2007) CE
exploits the vast amount of information found on
the Web but in contrast does not rely on a
knowledge base like Wikipedia. They utilise the
power of existing search engines to collect a set of
documents relevant to a set of queries based on the
user query. Then they use PageRank (Page et al.,
1999) in combination with the document
frequencies to find the most representative
documents w.r.t. the user query. Based on these
documents, they extract a set of concepts.
However, instead of extracting a set of terms from
the documents (in contrast to our approach) they rely on meta information being available, more
specifically, meta keywords and the titles of the
Web pages. It is unclear how vulnerable this
approach is with respect to ambiguous words.