In the second stage, both audio and visual features are
extracted. For visual features, the color element is used
as the content feature. For audio features, 154 audio features
originally used by Ellis and Lee (2004) to describe
audio segments are computed.