this issue is
not dealt with here.
Selection can therefore be viewed as a process whereby each location is classi-
fied as ‘object’ or ‘background’. By ‘object’ we mean that an object is present at
one of an admissible range of deformations at that location. There are two pitfalls
arising from this point of view that should be avoided.
The first is the notion that one simply needs training samples from the object at
the predetermined range of scales and samples from background images. These are
then processed through some learning algorithm, e.g. tree classifiers, feed forward
neural nets, support vector machines, to produce the classifier. The second is that
this classifier is subsequently implemented at each location. The problem with
this point of view is that it will produce a different global architecture (tree, neural
net, etc.) for each new object that needs to be detected; it will require a large
training set both to represent background images and to ‘learn’ the required range
of scales; it will not directly address the issue of efficiency in computation and
resources required for detection of multiple objects. It should be noted however
that this approach has led to successful computer vision algorithms for specific
objects. For example in the context of face detection see