The task of the text acquisition component is to identify and make available
the documents that will be searched. Although in some cases this will involve simply
using an existing collection, text acquisition will more often require building
a collection by crawling or scanning the Web, a corporate intranet, a desktop, or
other sources of information. In addition to passing documents to the next component
in the indexing process, the text acquisition component creates a document
data store, which contains the text and metadata for all the documents.
Metadata is information about a document that is not part of the text content,
such the document type (e.g., email or web page), document structure, and other
features, such as document length.