This paper addresses the problem of generating possible object locations
for use in object recognition. We introduce Selective Search
which combines the strength of both an exhaustive search and segmentation.
Like segmentation, we use the image structure to guide
our sampling process. Like exhaustive search, we aim to capture
all possible object locations. Instead of a single technique to generate
possible object locations, we diversify our search and use a
variety of complementary image partitionings to deal with as many
image conditions as possible. Our Selective Search results in a
small set of data-driven, class-independent, high quality locations,
yielding 99% recall and a Mean Average Best Overlap of 0.879 at
10,097 locations. The reduced number of locations compared to
an exhaustive search enables the use of stronger machine learning
techniques and stronger appearance models for object recognition.
In this paper we show that our selective search enables the use of
the powerful Bag-of-Words model for recognition. The Selective
Search software is made publicly available