Parallel mining
large linked datasets
Linked Data has become a vast repository with billions of triples available in thousands of datasets.
One of
the challenges in integrating, querying and reusing the Linked Data is obtaining the ontology to which the
datasets conform. Although many ontologies are built manually, many RDF (Resource Description
Framework) datasets are still published without any prescribed schema. In this study, we propose a parallel
ontology mining approach. Ontology axioms are obtained through statistical measures by running
SPARQL queries. To improve efficiency, large Linked Data is divided into blocks based on the connectivity
of property graphs. Mining process is then executed on parallel computing units. The division method
conforms that mining results from the parallel computing units are complete and correct. Evaluations
are performed on two kinds of DBpedia datasets, namely, Mapping-based Dataset with ontology and
Raw Infobox Dataset without ontology and the results show the effectivity and efficiency of our approach.