Our key hypothesis is that the temporal dynamics of an action are similar across
views. For example, the timing pattern of acceleration and deceleration of the limbs is
largely preserved under viewpoint changes. In our representation, an action is decomposed
into movement primitives (corresponding roughly to body parts). We encode the
fine-grained temporal dynamics of each movement primitive using a representation that
we call the movement pattern histogram (MPH). We describe an action as a collection
of MPHs (see Fig. 1).