5.2. Indexing module
Apache Lucene was adopted for indexing and document
storage, the search interface for querying index and the reading
interface to read texts and documents. The fundamental
concepts in Apache Lucene are index, document, field and
term [27]. An index contains a sequence of documents,
which are a sequence of fields. Each field gets tokenized and
generates pairs of field name and text tokens called terms. The index stores input in a data structure called inverted
index, making efficient use of disk space while allowing
quick keyword lookups. The KMS provides a user interface
for document insertion, giving users the opportunity to collect
them in a unique repository. This activity is absolutely
transparent to the user (fig. 9).