The term “dialogue” is used in different communities in
different ways. Many researchers in the speech recognition
community view “dialogue methods” as a way of controlling
and restricting the interaction. For instance, consider
building a telephony system that answers queries about
your mortgage. The ideal system would allow you to ask
for what you need in any way you chose. The variety of
possible expressions you might use makes this a challenge
for current speech recognition technology. One approach to
this problem is to have the system engage you in a dialogue
by having you answer questions such as “What is
your account number?” “Do you want your balance information?”
and so on. On the positive side, by controlling the
interaction, your speech is much more predictable, leading
to better recognition and language processing. On the
negative side the systems has limited your interaction. You
may need to provide all sorts of information that isn’t relevant
to your current situation, making the interaction less
efficient.
Another view of dialogue involves basing humancomputer
interaction on human conversation. In this view,
dialogue enhances the richness of the interaction and allows
more complex information to be conveyed than is
possible in a single utterance. In this view, language understanding
in dialogue becomes more complex. It is this second
view of dialogue to which we subscribe. Our goal is to
design and build systems that approach human performance
in conversational interaction. We believe that such an
approach is feasible and will lead to much more effective
user interfaces to complex systems.
Some people argue that spoken language interfaces will
never be as effective as graphical user interfaces (GUI)
except in limited special-case situations (e.g., Schneiderman,
2000). This view underestimates the potential power
of dialogue-based interfaces. First, there will continue to be
more and more applications for which a GUI is not feasible
because of the size of the device one is interacting with, or
because the task one is doing requires using one’s eyes
and/or hands. In these cases, speech provides a worthwhile
and natural additional modality (Cohen and Oviatt, 1995).
Even when a GUI is available, spoken dialogue can be a
valuable additional modality as it adds considerable flexibility
and reduces the amount of training required. For instance,
GUI designers are always faced with a dilemma—either
they provide a relatively basic set of operations,
forcing the user to perform complex tasks using long
sequences of commands, or they add higher-level commands
which do the task the user desires. One problem
with providing higher-level commands is that in many
situations there is a wide range of possible tasks, so the
interface becomes cluttered with options, and the user requires
significant training to learn how to use the system.
It is important to realize that a speech interface by itself
does not solve this problem. If it simply replaces the operations
of menu selection with speaking a predetermined
phrase that performs the equivalent operation, it may aggravate
the problem, because the user would need to remember
a potentially long list of arbitrary commands.
Conversational interfaces, on the other hand, would provide
the opportunity for the user to state what they want to
do in their own terms, just as they would do to another person,
and the system takes care of the complexity.
Dialogue-based interfaces allow the possibility of extended
mixed-initiative interaction (Chu-Carroll and
Brown, 1997; Allen, 1999). This approach models the human-machine
interaction after human collaborative problem
solving. Rather than viewing the interaction as a series
of commands, the interaction involves defining and discussing
tasks, exploring ways to perform the task, and collaborating
to get it done. Most importantly, all interactions
are contextually interpreted with respect to the interactions
performed so far, allowing the system to anticipate the user’s needs and provide responses that best further the
user’s goals. Such systems will create a new paradigm for
human-computer interaction.