Abstract: In this study, the authors propose a complete framework based on a hierarchical activity model to understand
and recognise activities of daily living in unstructured scenes. At each particular time of a long-time video, the framework
extracts a set of space-time trajectory features describing the global position of an observed person and the motion of his/
her body parts. Human motion information is gathered in a new feature that the authors call perceptual feature chunks
(PFCs). The set of PFCs is used to learn, in an unsupervised way, particular regions of the scene (topology) where the
important activities occur. Using topologies and PFCs, the video is broken into a set of small events (‘primitive events’)
that have a semantic meaning. The sequences of ‘primitive events’ and topologies are used to construct hierarchical
models for activities. The proposed approach has been tested with the medical field application to monitor patients
suffering from Alzheimer’s and dementia. The authors have compared their approach to their previous study and a
rule-based approach. Experimental results show that the framework achieves better performance than existing works
and has the potential to be used as a monitoring tool in medical field applications.