Search engines are practical application of information retrieval system to large scale text collections. Search engines can be found in many different applications such as desktop search or enterprise search. Search engines come in a number of configurations that reflect the applications they are designed for. Web search engines such as Google and Yahoo! must be able to capture large size of data, and then provide subsecond response times to millions of queries submitted everyday from everywhere. Enterprise search engines for example Autonomy must be able to process the large variety of information sources in a company and use company-specific knowledge as part of search and related task, such as data mining. Desktop search engines such as the Microsoft Vista search feature must be able to rapidly incorporate new documents, web pages and email as the person creates or looks at them as well as provide an intuitive interface for searching the information.
A digital library is a machine readable representation of materials, which might be found in conventional library. Along with this representation, organizing information is also available to assist users in finding specific information. It also says that digital libraries as a means of knowledge transmission in an electronic medium represent a significant paradigm shift from information to be get from paper books in [1], major characteristics of digital library are variety of digital information resources, digital library reduces the need for physical space, users at remote, users may build their own personal collection by the facilities provided digital library, provide access to distributed information resources, same information resources can be shared by many at the same time, paradigm shift both in use and ownership, ability to handle multilingual content and collection development be based on potential usefulness and appropriate filtering mechanism be followed to negotiate the problem of plenty. Thus, the searching for information sources in the digital library applies the same concept in the enterprise search engine where it process and retrieve the variety of academic theses in the digital library.
Traditional search engines do not deal with any domain knowledge, so they do not understand the meaning of a user's search request and the inherent relations among the terms that a Web document contains. This severely limits their abilities to do content-based search.
The main problem in search engine implementation is the lack of semantics. Converse to the problem of polysemy is the fact that conventional search engines that match query terms against a keyword based index will fail to match relevant information when the keywords used in the query are different from those used in the index, in spite of having the same meaning .
The second problem is lack of context. Many search engines fail to take into consideration aspects of the user's context to help disambiguate their queries.
The third problem is presentation of results. The results returned from a conventional search engine are usually presented to the user as a simple ranked list.
In this paper, an ontology-based retrieval model meant for the utilization of complete domain ontologies and knowledge bases, to support semantic search in digital library is proposed. The search system takes advantage of both detailed instance-level knowledge available in the knowledge base (KB), and topic taxonomies for classification. To manage the large-scale information sources, an adaptation of the classic vector-space model for an ontology-based representation is proposed, upon which a ranking algorithm is defined. The scope of our work
is on digital library for academic theses. The domain ontology is based upon the ACM topic taxonomies.