Abstract — With the continuous improvements in videoanalysis
techniques, automatic low-cost video surveillance
gradually emerges for consumer applications. Video
surveillance can contribute to the safety of people in the
home and ease control of home-entrance and equipmentusage
functions. In this paper, we study a flexible
framework for semantic analysis of human behavior from a
monocular surveillance video, captured by a consumer
camera. Successful trajectory estimation and human-body
modeling facilitate the semantic analysis of human
activities and events in video sequences. An additional
contribution is the introduction of a 3-D reconstruction
scheme for scene understanding, so that the actions of
persons can be analyzed from different views. The total
framework consists of four processing levels: (1) a preprocessing
level including background modeling and
multiple-person detection, (2) an object-based level
performing trajectory estimation and posture classification,
(3) an event-based level for semantic analysis, and (4) a
visualization level including camera calibration and 3-D
scene reconstruction. Our proposed framework was
evaluated and has shown its good quality (86% accuracy of
posture classification and 90% for events) and
effectiveness, as it achieves a near real-time performance
(6-8 frames/second).
Abstract — With the continuous improvements in videoanalysistechniques, automatic low-cost video surveillancegradually emerges for consumer applications. Videosurveillance can contribute to the safety of people in thehome and ease control of home-entrance and equipmentusagefunctions. In this paper, we study a flexibleframework for semantic analysis of human behavior from amonocular surveillance video, captured by a consumercamera. Successful trajectory estimation and human-bodymodeling facilitate the semantic analysis of humanactivities and events in video sequences. An additionalcontribution is the introduction of a 3-D reconstructionscheme for scene understanding, so that the actions ofpersons can be analyzed from different views. The totalframework consists of four processing levels: (1) a preprocessinglevel including background modeling andmultiple-person detection, (2) an object-based levelperforming trajectory estimation and posture classification,(3) an event-based level for semantic analysis, and (4) avisualization level including camera calibration and 3-Dscene reconstruction. Our proposed framework wasevaluated and has shown its good quality (86% accuracy ofposture classification and 90% for events) andeffectiveness, as it achieves a near real-time performance(6-8 frames/second).
การแปล กรุณารอสักครู่..
