In view of today’s information avalanche, well structured knowledge bases play an
important role in simplifying the access to knowledge and its further processing. In
the biomedical domain, research results holding important information are hidden in
publications or online forums in the form of unstructured free texts. Determining and
storing relational information into machine-readable data is therefore crucial to advance
the scientific research.
In this work we introduce a system providing convenient access to knowledge about
environmental and behavioral factors involved in human diseases, as well as body parts
and symptoms that are affected and caused by diseases. The system is capable of
automatically extracting relations between these entities from textual Web sources.
Our knowledge base is bootstrapped by integrating entities from hand-crafted and
well organized sources like MeSH, OMIM and UMLS. As these are short on relationships
between different types of biomedical entities, this system employs flexible and
robust pattern learning and constraint-based reasoning methods to automatically extract
new relational facts from textual sources, which are then added to the knowledge
base.
The result is a semantic graph of typed entities and relations between diseases, their
symptoms, affected body parts, and determining factors, with emphasis on behavioral
and environmental factors, including molecular determinants. The facts stored in our
knowledge base are provided to the user in a Web-browser interface.
We validated our approach on the basis of four data sets on diseases and their factors
gained from different sources. With our approach, we were able to ac