Using an iPad2 with a camera as the interaction device, this thesis introduces a system with speech recognition and optional language translation and display of the resulting string in an easy and natural way to use by detecting a face and its corresponding position present in the frames and outputting the resulting string next to the detected face in a cartoon-like bubble. The use of this system, dubbed iHeAR, consist in having the user simply angles the device towards the person’s face and once both text and face are detected, the final string is outputted on the screen without requiring any additional steps from the user.
This system presents one of the different aspects of augmented reality that does not only involve augmenting the sense of sight, but also involves hearing augmentation. In addition, the following conditions and limitations are assumed:
• The user does not know sign language or how to read lips,
• The environment is quiet and free of background noise,
• The system will be used for one-on-one conversation,
• Speech recognition needs to happen on the device so as to not depend on network availability.