Design and implementation of information retrieval
system based ontology
Lachtar Nadia
Preparatory School for sciences and techniques
Annaba, Algeria nadia _ ishak2002@yahoo.fr
Abstract-Nowadays, the resources available on the web increases significantly. It then has a large volume of information, but without mastery of content. In this immense data warehouse research of current information retrieval systems do not allow users to obtain results to their requests that meet exactly their needs. This is due in large part to indexing techniques (key words, thesaurus). The result is that the user of the web wasting much of his time to examine a large number of Web page by searching for what he needs, because the Web does not provide service in this direction. The Semantic Web is the solution; this new vision of the web is to make web resources not only understandable by humans but also by machines. To improve the relevance of information retrieval, we propose in this paper an approach based on the use of domain ontology for indexing a collection of documents and the use of semantic links between documents in the collection to allow the inference of all relevant documents. The work involves the implementation of a system based on the use of OWL ontology for research pedagogical documents. In this case, the descriptors are not directly chosen in the documents but in the ontology and are indexed by concepts that reflect their meaning rather than words are often ambiguous. To perform a search based on meaning, documents and their descriptors are stored in OWL ontologies describing the documentary features of a document. The objective is to design two types of OWL ontologies: document ontology reserved for storage of all pedagogical documents and domain ontology reserved for well-structured of documents stored in the level of the document ontology and each document is indexed by its keywords and their synonyms.
Keywords-component; Pedagogical document; Information retrieval; ontology; sematic web; indexation
I. INTRODUCTION
The information retrieval (lR) is an ancient discipline; it dates back to the 50s. His problematic can be seen as the satisfaction of a need for information of user, which is expressed by a query on a collection of documents called the corpus or collection [14, 12] .The information retrieval systems (IRS) allows you to automate the task of IR. The evaluation of such systems appears to be a necessity. This evaluation is based on the concept of relevance. So, to improve the relevance of IR in IRS, several studies have been made at various levels. Thus, there have been proposed several IR models:
The Boolean model, Boolean queries are composed of words and Boolean operators (AND, OR, NOT).
Documentalists have more control over this type of query that is often difficult to formulate for the uninitiated user. This type of query is the most used for access to specialized databases (Pascal), is also available for many search engines on the web such as Google and Yahoo from advanced search interfaces.
The vector model [11], in this model, documents and queries are represented as vectors in the space of words from indexing. The documents are then ordered from their similarity to the query. Several measures (scalar product, Measurement Dice, Jaccard measure, ... ) are used to calculate the similarity between the two calculations corresponding to the distance between the two vectors.