Information graphics (non-pictorial graphics such as bar
charts and line graphs) contain a great deal of knowledge. Information retrieval research has focused on retrieving textual documents and on extracting images
based on words appearing in the accompanying article
or based on low-level features such as color or texture.
Our goal is to build a system for retrieving information
graphics that reasons about the content of the graphic
itself in deciding its relevance to the user query. As a
first step, we aim to identify, from a full sentence user
query, what should be depicted on the independent and
dependent axes of potentially relevant graphs. Natural
language processing techniques are used to extract features from the query and machine learning is employed
to build a model for hypothesizing the content of the
axes. Results have shown that our models can achieve
accuracy higher than 80% on a corpus of collected user
queries.