5.1. Pre-processing module
Pre-processing is a necessary procedure in document
management, through which data and information stored in
documents in a specific format can be elicited by analyzing
and tokenizing content. Organizations generally create and
use a great amount of documents that can be stored in
different kinds of formats like text files (.txt), document files
(.doc, .pdf), web pages (.xml, .html) as figure 8 shows [26].
The analysis of heterogeneous format contents, the removal
of meaningless terms and the maintenance of information
useful to retrieve and recover documents will depend on the
DMS.