A meta-search engine acts as an agent for the par-
ticipant search engines.
It receives queries from users
and redirects them to one
or more of the participant
search engines
for processing. A meta-search engine
incorporating
many participant search engines is bet-
ter than a single global search engine
in terms of the
number
of pages indexed and the freshness of the in-
dexes. The meta-search engine stores descriptive data
(i.e., descriptors) about the index maintained by each
particapant search engine so that it can estimate the
relevance
of each search engine when a query is re-
ceived. The ability for the meta-search engine to select
the most relevant search engines determines the qual-
ity of the final result. To facilitate the selection pro-
cess, the document space covered
by each search engine
must be described not only concisely but also precisely.
Existing methods tend to focus on the conciseness of
the descriptors by keeping a descriptor for a search en-
gine’s entire index. This paper proposes to cluster
a
search engine’s document space into clusters and keep
a descriptor
for each cluster. We show that cluster de-
scriptors can provide a finer and more accurate repre-
sentation
of the document space, and hence enable the
meta-search engine to improve the selection
of relevant
search engines. Two cluster-based search engine selec-
tion scenarios (;.e., independent and high-correlation)
are discussed
an this paper. Experiments verify that
the cluster-based search engine selection can effectively
identify the most relevant search engines and improve
the quality
of the search results consequently.