Nowadays, there are many applications (e.g. surveillance, human–
computer interaction etc.) that require an efficient and accurate
analysis of human activities using video input. For example, in the
medical field, the behaviour of patients (e.g. those suffering from
dementia or Alzheimer’s disease) needs to be studied for a long period
of time (days and weeks) in order to help medical staff (doctors, carers
and nurses) to understand the difficulties of patients and propose
solutions that can ameliorate their daily living conditions [1].
Modelling and recognising activities are rising fields in computer
vision and machine learning. Recent approaches [2, 3] address the
problem of detecting complex daily activities using egocentric
wearable cameras which enable the viewer to have a close view and
see the objects in their natural positions. However, a wearable
camera can be very intrusive for the user, especially for people
suffering from dementia. Visual information can also be obtained
with fixed cameras. The majority of work in activity recognition
using fixed cameras addresses short-term actions (i.e. a few seconds)
in acted footage of posture-defined classes such as ‘punching’ [4, 5].
In order to recognise human activities, scenes need to be analysed
from a sequence of frames (low-level task of computer vision) and
interpreted (high-level task). The inability to connect these two
levels (high-level and low-level tasks) is called the semantic gap
problem [6] and its reduction is still a challenging task.
In this paper, we propose a new approach to reduce this gap by
constructing, in an unsupervised manner, an intermediate layer
between low-level information (tracked objects from video) and
high-level interpretation of activity (e.g. cooking, eating and
sitting). Our method is a novel approach allowing the detection of
complex activities with long duration in an unstructured scene. We
have developed a complete vision-based framework that enables us
to model, discover and recognise activities online while monitoring
a patient. The two main contributions of this work are as follows:
Nowadays, there are many applications (e.g. surveillance, human–computer interaction etc.) that require an efficient and accurateanalysis of human activities using video input. For example, in themedical field, the behaviour of patients (e.g. those suffering fromdementia or Alzheimer’s disease) needs to be studied for a long periodof time (days and weeks) in order to help medical staff (doctors, carersand nurses) to understand the difficulties of patients and proposesolutions that can ameliorate their daily living conditions [1].Modelling and recognising activities are rising fields in computervision and machine learning. Recent approaches [2, 3] address theproblem of detecting complex daily activities using egocentricwearable cameras which enable the viewer to have a close view andsee the objects in their natural positions. However, a wearablecamera can be very intrusive for the user, especially for peoplesuffering from dementia. Visual information can also be obtainedwith fixed cameras. The majority of work in activity recognitionusing fixed cameras addresses short-term actions (i.e. a few seconds)in acted footage of posture-defined classes such as ‘punching’ [4, 5].In order to recognise human activities, scenes need to be analysedfrom a sequence of frames (low-level task of computer vision) andinterpreted (high-level task). The inability to connect these twolevels (high-level and low-level tasks) is called the semantic gapproblem [6] and its reduction is still a challenging task.
In this paper, we propose a new approach to reduce this gap by
constructing, in an unsupervised manner, an intermediate layer
between low-level information (tracked objects from video) and
high-level interpretation of activity (e.g. cooking, eating and
sitting). Our method is a novel approach allowing the detection of
complex activities with long duration in an unstructured scene. We
have developed a complete vision-based framework that enables us
to model, discover and recognise activities online while monitoring
a patient. The two main contributions of this work are as follows:
การแปล กรุณารอสักครู่..
