Computer vision and human-powered services can provide blind
people access to visual information in the world around them, but
their efficacy is dependent on high-quality photo inputs. Blind
people often have difficulty capturing the information necessary
for these applications to work because they cannot see what they
are taking a picture of. In this paper, we present Scan Search, a
mobile application that offers a new way for blind people to take
high-quality photos to support recognition tasks. To support realtime
scanning of objects, we developed a key frame extraction
algorithm that automatically retrieves high-quality frames from
continuous camera video stream of mobile phones. Those key
frames are streamed to a cloud-based recognition engine that
identifies the most significant object inside the picture. This way,
blind users can scan for objects of interest and hear potential
results in real time. We also present a study exploring the
tradeoffs in how many photos are sent, and conduct a user study
with 8 blind participants that compares Scan Search with a
standard photo-snapping interface. Our results show that Scan
Search allows users to capture objects of interest more efficiently
and is preferred by users to the standard interface.