MMDAgent is a toolkit for building voice interaction
systems. It utilizes Julius [5] as a voice recognition engine,
Open JTalk [6] as a text-to-speech engine, MikuMikuDance
[7] as a 3D model, and Bullet Physics [8] as a Physics
engine. MMDAgent for Android, which functions as a standalone
app on smartphones, has also been developed (Fig.
1).
MMDAgent operates on the basis of Finite State Transition
(FST) written by a script file called the FST script.
An example of an FST script is shown in Fig. 2. The
script is composed of transition source state numbers, destination
state numbers, transition conditions (events), and
commands. As shown in Fig. 2, first, in state number 1,
MMDAgent recognizes the word ”hello.” Next, the state
number changes from one to 10 and MMDAgent says
”hello.” The state number subsequently changes from 10 to
11 and then from 11 back to one. FST scripts are relatively
easy to rewrite; consequently, it is possible to realize voice
navigation with MMDAgent by converting guide annotations
into FST scripts. It also is possible to simultaneously
activate various state transitions because multiple FST
scripts can be applied in parallel by using sub-FST scripts. In
other words, MMDAgent can operate according to multiple
state transitions in parallel. One of the major advantages of
MMDAgent on smartphones is response speed. It carries out
navigation smoothly because the app operates in standalone
mode, and so voice synthesis and voice recognition can both
operate in real time on a smartphone, without any delay due
to the network.