PRECISE[14] is a system based on light annotation of the database schema
that was tested over the Geoquery 250 corpus. PRECISE reduces semantic
analysis to a graph matching problem after the schema elements have been
named. Interestingly the system leverages a domain independent grammar to extract
attachment relationships between tokens in the user’s requests. The PRECISE
work identifies a class of so called semantically tractable queries. Although
the group did not publish the actual configuration times, they presumably corresponded
to the naming phase and thus were rather short durations. We will
forgo a discussion of the so called semantically tractable queries class and take at
face value the claim that they achieved 100% precision and 77.5% recall, yielding
a correctness of 77.5%. For such little configuration this is an impressive result
and over very simple databases with a stream of very simple queries this may
be adequate. However experience tells us that users do actually ask somewhat
complex queries at times and they will be frustrated if told that their queries
are not semantically tractable and must be rephrased or abandoned.
The PRECISE group reported a side experiment in which a student took
over 15 hours to build a Geoquery 250 NLI using Microsoft’s EnglishQuery
2 This does not address larger issues such as the limitations of the underlying formal
language nor users querying for information outside the scope of the database.
tool. The resulting system achieved rather poor results for such an expensive
effort – approximately 80% precision and 55% recall, yielding a correctness of
approximately 45%. Our limited experience with the Microsoft English query
tool was rather frustrating as well, but it would be interesting to repeat this experiment
and also see how other commercial systems such as Progress Software’s
EasyAsk and Elfsoft’s ELF fare.