This section will introduce the important concepts of visual servoing and describe
its relationship to other research areas such as active vision, and structure from
motion. Terminology used in the paper will then be introduced, followed by a brief
introduction to image-based and position-based visual servoing.
The use of vision with robots has a long history 110 and today vision systems
are available from ma jor robot vendors that are highly integrated with the robot's
programming system. Capabilities range from simple binary image processing to
more complex edge and feature based systems capable of handling overlapped parts.16
However the feature in common with all these systems is that they are static, and
typically image processing time is of the order of 0.1 to 1 second.
Traditionally visual sensing and manipulation are combined in an open-loop fashion,
`looking' then `moving'. The accuracy of the operation depends directly on the
accuracy of the visual sensor and the manipulator and its controller. An alternative
to increasing the accuracy of these subsystems is to use a visual-feedback control
loop, which will increase the overall accuracy of the system: a principle concern in
any application. The term visual servoing appears to have been introduced by Hill
and Park 46 in 1979 to distinguish their approach from earlier `blocks world' experiments
where the system alternated between picture taking and moving. Prior to the
introduction of this term, the less specic term visual feedback was generally used.
Visual servoing is the fusion of results from many elemental areas including highspeed
image processing, kinematics, dynamics, control theory, and real-time computing.
It has much in common with research into active vision and structure from
motion, but is quite dierent to the often described use of vision in hierarchical tasklevel
robot control systems.
Some robot systems60, 65 which incorporate vision are designed for task level programming.
Such systems are generally hierarchical, with higher levels corresponding
to more abstract data representation and lower bandwidth. The highest level is capable
of reasoning about the task, given a model of the environment. In general
a look-then-move approach is used. Firstly, the target location and grasp sites are
determined from calibrated stereo vision or laser rangender images, and then a sequence
of moves are planned and executed. Vision sensors have tended to be used in
this fashion because of the richness of the data they can produce about the world, in
contrast to an encoder or limit switch which would be dealt with at the lowest level.
Visual servoing is no more than the use of vision at the lowest level, with simple