In the JAMES project,1 we are investigating these challenges by developing a robot
bartender that supports interactions with multiple customers in a dynamic
setting. The robot hardware consists of a pair of manipulator arms with grippers,
mounted to resemble human arms, along with an animatronic talking head capable
of producing facial expressions, rigid head motion, and lip-synchronized synthesized
speech. The input sensors include a vision system that tracks the location, facial expressions,
gaze behavior, and body language of all people in the scene in real time, along
with a linguistic processing system combining a speech recognizer with a naturallanguage
parser to create symbolic representations of the speech produced by all
users.