Abstract
Static and temporally varying 3D invariants are proposed for capturing the spatio-temporal dynamics of a general human action to enable its representation in a compact, view-invariant manner. Two variants of the representation are presented and studied: (1) a restricted-3D version, whose theory and implementation are simple and efficient but which can be applied only to a restricted class of human action, and (2) a full-3D version, whose theory and implementation are more complex but which can be applied to any general human action. A detailed analysis of the two representations is presented. We show why a straightforward implementation of the key ideas does not work well in the general case, and present strategies designed to overcome inherent weaknesses in the approach. What results is an approach for human action modeling and recognition that is not only invariant to viewpoint, but is also robust enough to handle different people, different speeds of action (and hence, frame rate) and minor variabilities in a given action, while encoding sufficient distinction among actions. Results on 2D projections of human motion capture and on manually segmented real image sequences demonstrate the effectiveness of the approach.
Original language | English (US) |
---|---|
Pages (from-to) | 294-324 |
Number of pages | 31 |
Journal | Computer Vision and Image Understanding |
Volume | 98 |
Issue number | 2 |
DOIs | |
State | Published - May 2005 |
Externally published | Yes |
Keywords
- Human action-recognition
- Model based invariants
- Mutual invariants
- View invariance
ASJC Scopus subject areas
- Software
- Signal Processing
- Computer Vision and Pattern Recognition