Given a mathematics expression, _nding pages with relevant
mathematical content is an important problem that is
the basis of many mathematics retrieval systems. Correctly
predicting the relevance of mathematical expressions is a
core problem that should be addressed in order to develop
useful retrieval systems.
We characterized several possible approaches to this problem,
and we elaborated two working systems that exploit
the structure of mathematical expressions for approximate
match: structural similarity search and pattern matching.
We empirically showed that these two search paradigms outperform
other search techniques, including the ones that perform
exact matching of (normalized) expressions or subexpressions
and the one that performs keyword search. We
also showed that it takes more e_ort from the user to form
queries when doing pattern search as compared to similarity
search, but when relevant matches are found they are ranked somewhat higher. So in conclusion, structural similarity
search seems to be the best way for general users
to search for mathematical expressions, but we hypothesize
that pattern search may be the preferred approach for experienced
users in speci_c domains.
In this paper we focussed on the usability of answers and
how well a search system can _nd relevant documents for a
given query. Others may wish to re-evaluate these results
using more controlled methods for assessing relevance. The
study should next be extended in an ongoing e_ort to include
new approaches as they are developed. Optimizing
the proposed search techniques in terms of query processing
time and index size is a separate direction [14]. Based
on the results of this paper, more complex query languages
can also be developed to accommodate queries that consist
of multiple mathematical expressions supplemented by
textual keywords that might match other parts of relevant
documents, or pattern queries with one or more similarity
constraints.
NTCIR is an international initiative to create a public
and shared infrastructure to facilitate research in Math IR.
It aims to provide a test collection and a set of math tasks.
As a part of our future research, we plan to use this data
(which is not yet available) to further evaluate the discussed
algorithms.