The scope of this thesis encompasses a detailed description of augmented reality technologies, systems and applications as well as the future of augmented reality technologies as the author sees it. This research makes use of the iOS platform as a means to implement augmented reality and therefore the author believed it was important to introduce the useful iOS tools and steps to creating an augmented reality by introducing a designed and implemented non-augmented reality application and offering the augmented reality transformation design solution. As a result, chapter 3 of this thesis introduces iTranslatAR, a “picture translator” that uses Object Character Recognition (OCR) to translate text present in images, through explanation of the methods relevant to the transformation of iTranslatAR into an augmented reality application that is the use of Tesseract OCR and its implementation as a foreign library in iOS platform. Then the author offers a design for transforming iTranslatAR into an augmented reality application, which corresponds to the application “translating” text from real time video feed frames as opposed to pictures. In addition, section 4.3 presents an explanation to iOS AV Foundation framework and its implementation necessary for using and extracting video frames from the camera.
This research also foresaw that augmented reality brings the possibility of not only enhancing our current senses, but of possibly “making up” for missing ones. In this thesis, the author designed and implemented an augmented reality application for hearing augmentation where hearing-impaired users can see visual cues of what is being said to them in a natural and intuitive way to understand. The application, dubbed iHeAR, uses the iOS platform and an iPad2 as the supporting device. It is implemented using current algorithms for speech recognition and face detection in order to output the “heard” speech in real time next to the speaker’s face in a “text-bubble”. The speech recognition system used is the open source OpenEars which is a wrapper for iOS application of the PocketSphinx system for device speech recognition. A detailed explanation of OpenEars was provided in section 4.3. Face detection is achieved using OpenCV’s Viola-Jones method implementation for face detection, whose explanation was provided in section
4.4. In order to make the face detection algorithm work in real time and perform calculations for both speech recognition and face detection on the device, the author optimized the system to run the face detection algorithm only when speech is detected and only when a previous frame is not already being analyzed for a face since the detection algorithm runs slower than the video feed. In this way, the final system is not overloaded with heavy calculations. The system built assumes the following conditions and limitations:
• The user does not know sign language or how to read lips,
• The environment is quiet and free of background noise,
• The system will be used for one-on-one conversation,
• Speech recognition needs to happen on the device so as to not depend on network
availability.
This thesis challenges included the integration and selection of multiple components to build a hearing augmented reality application using the iOS platform. The APIs and libraries used as part of this research are listed in Table 5.1 along with comments to guide readers who wish to know more about the reasons for using these libraries. Readers should note that the Google Translate API while a good choice for this thesis work will no longer be available to use as a free open source library starting December 1st, 2011.