Some experiments
In order to cover a range of scales of approximately 4:1 the same detection procedure
is carried out at 5-6 resolutions. On a 200 Mhz Pentium-II the computation
time for all scales together is about 1.5 seconds for generic 240x320 gray scale
images. Example detections are shown in figures 7, 8, 9. Also in these displays
are random samples from the training sets and some of the local features used
in each representation in their respective locations on the reference grid. The
detectors for the 2d view of the clip, and for the ‘eight’ were made from only one
image which was subsequently synthetically deformed using some random linear
transformations. In the example of the clip, due to the invariances built into
the detection algorithm, the clip can be detected in a moderate range of viewing
angles around the original one, which were not present in training.
The face detector was produced from only 300 faces of the Olivetti database.
Still faces are detected in very diverse lighting conditions and in the presence of
occlusion. More can be viewed on http://galton.uchicago.edu/∼amit/detect. For
each image we show both the detection of the Hough transform using a representation
with 40 local features of 2 edges each. We also show the result of an
algorithm which employs one additional stage of filtering of the false positives as
described in Amit & Geman (1999). In this data set there are on average 13 false
positives per image at all 6 resolutions together. This amounts to .00005 false
positives per pixel. The false negative rate for this collection of images is 5%.
Note that the lighting conditions of the faces are far more diverse than those on
the Olivetti training set.