Two modes: motion tracking and image matching co-exist in the system. The first mode uses a
light-weight tracking algorithm. Image matching is based on robust local features, and is more computationally
intensive [7].
As shown in Fig. 4, the tracking module estimates the motion model between the low-resolution viewfinder frames, and transforms and displays the previous recognition results to the next frame according to the motion model. The recognition results will be displayed even though image matching is not performed for each frame. Besides, the locations of matched frames (key frames) are displayed in a mini-map on screen.
When the tracking module decides that a large portion of the matched frame is out of the view, the system extracts and matches robust local features in the current viewfinder frame again.
The tracking algorithm essentially estimates the camera ego-motions on the mobile device. The problem of global motion estimation has been well studied in the past. The tracking algorithm in our system is an accelerated area-matching method, which extracts a set of point features from an image and performs local motion estimation on patches centered on these feature points. On a Nokia N95 mobile phone, which has a 330MHz ARM11 CPU, the tracking algorithm works at around 30fps.
Fig. 3 plots the timing result for a test sequence. It is obvious that by using the hybrid algorithm, the system is much more efficient, and can provide smoother output to the users, even though the image matching takes a perceivable period of time (around 1 second) for each frame.