We propose a hierarchical fall detection framework that
consists of a representation layer and two classification
layers. At the representation layer, RGBD images are acquired
from a Kinect sensor, and the 3D joint positions
are estimated. Then we construct MPGD representation for
each input frame. We then train a set of support-vectormachine
(SVM) classifiers to classify each frame into one
of different states at the first classification layer. The state
of each frame describes the spatiotemporal configuration
of the elderly person in that frame (e.g., the person is
stand still or is stretching out his right arm). At the second
classification layer, the constraint dynamic time warping
(cDTW) is utilized to classify the whole sequence of states
generated from the SVM classifiers into falling or non-falling
activity. The use of cDTW allows processing large variation
in duration of human activity video sequences efficiently.