2. Related work
Recently, generative and discriminative models are two
major categories of models used for representing the target
in object tracking. The generative tracking models
typically model the target in a feature subspace and find
the candidates which are the most similar to the target
model [1,4,19] or search for the regions with the highest
likelihood [20–23] to handle the target variation. The
discriminative models formulate tracking as a binary
classification problem that distinguishes the target object
from the background, such as SVM classifier [24–26],
online boosting classifier [27,28], P–N learning classifier
[29]. Generally, the generative model achieves higher
generalization with limited data, but the appearance
model needs to be dynamically updated frequently to fit
the target appearance variation. The discriminative model
performs better when the size of training set is large, but it
requires enough samples to initialize and update during
tracking since the target is often given at the first frame.
Some approaches [30,31] combine the discriminative
model and generative model together to improve the
tracking accuracy.
Sparse representation based tracking approaches [11–
17,30,32] achieve impressive results due to the strong representative
capacity of sparse coding. Mei et al. [11] propose L1
tracker by modeling the target as a single entity and using
trivial templates as a sparse noise component to handle
occlusion, and then Bao et al. [12] improve the efficiency of L1
tracker via accelerated proximal gradient approach. Wang
et al. [13] replace the feature of target templates by orthogonal
PCA and propose an object model with sparse prototypes.
Liu et al. [16] consider the background information in
the dictionary construction. Jia et al. [17] propose an alignment
pooling approach to obtain global sparse representations
and update the templates by replacing old templates.
The previous approaches generally locate the best
candidate by the minimal reconstruction error. Nevertheless,
a dictionary without learning does not characterize
the appearance variation well because it is constructed
only by target templates in the first frame and updated
heuristically to adapt to variation. Hence, the minimal
reconstruction error evaluated by such dictionary is unlikely
to have high accuracy to measure the best tracking
result. In addition, it will introduce accumulated error into
the dictionary when significant occlusion or variation
occurs during tracking.
Motivated by recent progress of tracking algorithms
and supervised dictionary learning method [33], we propose
a novel cascaded probabilistic tracking algorithm via
supervised dictionary learning. The most similar approach
with ours is the tracking approach presented in [14]. We
both learn a discriminative dictionary by supervised dictionary
learning method. However, our method is different
from theirs in several ways. First, they construct the initial
dictionary with samples of target and pure background
while we use assembled sample set of three parts: target,