Information retrieval research has made significant progress
in the retrieval of text documents and images. However, relatively little attention has been given to the retrieval of information graphics (non-pictorial images such as bar charts
and line graphs) despite their proliferation in popular media
such as newspapers and magazines. Our goal is to build a
system for retrieving bar charts and line graphs that reasons
about the content of the graphic itself in deciding its relevance to the user query. This paper presents the first steps
toward such a system, with a focus on identifying the category of intended message of potentially relevant bar charts
and line graphs. Our learned model achieves accuracy higher
than 80% on a corpus of collected user queries.
CONCLUSION
This paper has presented the first steps in the development of a system for effectively retrieving information graphics in response to a user query. Our method relies on fullsentence queries in order to identify features of potentially
relevant graphics, rather than relying merely on keyword
matching. Thus far, we have developed learned models
for identifying the content of the independent and dependent axes and the category of intended message of relevant
graphs. Future work will utilize these in a mixture model for
ranking graphs for retrieval. To our knowledge, this work
is the only research effort that is specifically focused on the
retrieval of information graphics and that is attempting to
take into account the content of graphics.