In the second stage, both audio and visual features are extracted. For visual features, the color element is used as the content feature. For audio features, 154 audio features originally used by Ellis and Lee to describe audio segments are computed.