an algorithm that has
even better scaling properties because it does not hold the vocabulary in
memory. For very large collections like the web, indexing has to be distributed
over computer clusters with hundreds or thousands of machines.
We discuss this in Section 4.4. Collections with frequent changes require dy-
namic indexing introduced in Section 4.5 so that changes in the collection are
immediately reflected in the index. Finally, we cover some complicating issues
that can arise in indexing – such as security and indexes for ranked
retrieval – in Section 4.6.
Index construction interacts with several topics covered in other chapters.
The indexer needs raw text, but documents are encoded in many ways (see
Chapter 2). Indexers compress and decompress intermediate files and the
final index (see Chapter 5). In web search, documents are not on a local
file system, but have to be spidered or crawled (see Chapter 20). In enterprise
search, most documents are encapsulated in varied content management
systems, email applications, and databases. We give some examples
in Section 4.7. Although most of these applications can be accessed via http,
native Application Programming Interfaces (APIs) are usuallymore efficient.
The reader should be aware that building the subsystem that feeds raw text
to the indexing process can in itself be a challenging problem.
an algorithm that haseven better scaling properties because it does not hold the vocabulary inmemory. For very large collections like the web, indexing has to be distributedover computer clusters with hundreds or thousands of machines.We discuss this in Section 4.4. Collections with frequent changes require dy-namic indexing introduced in Section 4.5 so that changes in the collection areimmediately reflected in the index. Finally, we cover some complicating issuesthat can arise in indexing – such as security and indexes for rankedretrieval – in Section 4.6.Index construction interacts with several topics covered in other chapters.The indexer needs raw text, but documents are encoded in many ways (seeChapter 2). Indexers compress and decompress intermediate files and thefinal index (see Chapter 5). In web search, documents are not on a localfile system, but have to be spidered or crawled (see Chapter 20). In enterprisesearch, most documents are encapsulated in varied content managementsystems, email applications, and databases. We give some examplesin Section 4.7. Although most of these applications can be accessed via http,native Application Programming Interfaces (APIs) are usuallymore efficient.The reader should be aware that building the subsystem that feeds raw textto the indexing process can in itself be a challenging problem.
การแปล กรุณารอสักครู่..
