The Filtering component is responsible for blocking the non-relevant
documents from reaching the focused DL. It uses various levels of filtering that
all remaining documents have to pass to be considered relevant. A first level,
for example, can use 'regular expressions' to match query keywords with the
URL string tokens. A second level can use statistical techniques on the
document itself, based on keyword counts and frequencies. A third level might
use a Categorizer to classify the document and check if it belongs to the
gathered DL categories. More levels or any geared combination of levels can
ensure a cleaner DL devoid of 'noises'. All relevant documents are passed now
to the Summarizer. It extracts a summary of the document, and passes a stream
of summaries to the Broker. The Broker indexes the summaries and organizes
the DL. The IS builds for the DL a relevant topics-tree, possibly using advanced
IR tools for categorization and clustering. The Retriever provides the DL user
with a user-friendly interface.
5 of 8 6