We have shown that the use of selective weighing based on saliency
offers a viable method of 2D/3D registration. Saliency-based
2D/3D registration utilising as little as one quarter of the image content
has been shown to achieve a comparable accuracy to traditional
correlation-based 2D/3D registration. Utilising eye tracking during
visual search tasks is a novel idea for determining visual saliency.
The feasibility of this approach has been demonstrated by acquiring
eye-tracking data from a group of volunteers, which was used as the
training data set in order to automatically determine salient features
in video frames, most of which have never been seen. Salient image
features were extracted based on an analysis of human visual
behaviour but without the use of domain-specific knowledge.
There are a number of factors that affect the quality of the derived
saliency maps, such as prior knowledge, expectation, the nature
of question being asked, number of volunteers, feature selection
and extraction, and the process of combining different human
visual search strategies. Furthermore, the completeness of the chosen
feature space library significantly influences the effectiveness
of this method. It is worth noting that in deriving the saliency map,
only one feature component selected from a large set of candidates
was utilised. Strategies for combining saliency maps [Itti and Koch
1999] can further enhance the quality of the system, improve immunity
to noise, and increase robustness to spurious eye movements.
It is also worth noting that the method for assessing the success rate
of the 2D/3D registration, although consistently applied, was based
on subjective observations.
It is known that individuals can employ different visual search
strategies for the same image, so the extraction of common visual
search behaviour is better realised by expanding the study population.
This should improve the robustness of the technique in the
face of spurious eye movements, and also enable the exploration of
how differences in experience level affect the quality of the saliency
map. Variations of the core technique for deriving saliency form another
avenue for future investigation. Although features based on
multi-scale contrast and Gabor filter response (Eqn.3) have been
shown to be important in biological vision systems, there exist alternative
approaches for generating saliency maps which utilise information
theory [J¨agersand 1993] and multiscale wavelets [Shokoufandeh
et al. 1999].
The experiments confirm that, in the case of no special lighting
model being adopted, contrast based saliency maps can improve
normalised cross-correlation as an image similarity measure
in 2D/3D registration. This showed that comparing images, using
features selected according to Gabor filter response, tended to be
more immune to lighting conditions. It must be noted, however, the
effectiveness of the normalised cross correlation measure in 2D/3D
registration depends greatly on how accurately one can model the
illumination conditions in the rendered images. One method by
which the convergence of the optimisation process can be significantly
improved is to employ a carefully tuned function that attenuates
reflected light intensity according to the distance from the light
source. In practice, however, the attenuation parameters may have
to be adjusted specific to each situation.
In summary, a first attempt of 2D/3D image registration using
observer derived saliency maps has been presented. It has been
demonstrated that implicit modelling of human visual search behaviour
can enhance computer vision