Traditional video retrieval based on visual feature extraction
cannot be simply applied to lecture recordings because
of the homogeneous scene composition of lecture videos.
Fig. 1a shows an exemplary lecture video recorded using an
outdated format produced by a single video camera. Varying
factors may lower the quality of this format. For example,
motion changes of the camera may affect the size, shape
and the brightness of the slide; the slide can be partially
obstructed when the speaker moves in front of the slide;
any changes of camera focus (switching between the
speaker view and the slide view) may also affect the further
slide detection process.
Nowadays people tend to produce lecture videos by
using multi-scenes format (cf. Fig. 1b), by which the speaker
and his presentation are displayed synchronously. This can
be achieved either by displaying a single video of the
speaker and a synchronized slide file, or by applying a state