Using an iPad2 with a camera as the interaction device, this thesis introduces a
system with speech recognition and optional language translation and display of the
resulting string in an easy and natural way to use by detecting a face and its
corresponding position present in the frames and outputting the resulting string next to
the detected face in a cartoon-like bubble. The use of this system, dubbed iHeAR, consist
in having the user simply angles the device towards the person’s face and once both text
and face are detected, the final string is outputted on the screen without requiring any
additional steps from the user.