TY - GEN
T1 - An improved model for segmentation and recognition of fine-grained activities with application to surgical training tasks
AU - Lea, Colin
AU - Hager, Gregory D.
AU - Vidal, René
N1 - Publisher Copyright:
© 2015 IEEE.
Copyright:
Copyright 2015 Elsevier B.V., All rights reserved.
PY - 2015/2/19
Y1 - 2015/2/19
N2 - Automated segmentation and recognition of fine-grained activities is important for enabling new applications in industrial automation, human-robot collaboration, and surgical training. Many existing approaches to activity recognition assume that a video has already been segmented and perform classification using an abstract representation based on spatio-temporal features. While some approaches perform joint activity segmentation and recognition, they typically suffer from a poor modeling of the transitions between actions and a representation that does not incorporate contextual information about the scene. In this paper, we propose a model for action segmentation and recognition that improves upon existing work in two directions. First, we develop a variation of the Skip-Chain Conditional Random Field that captures long-range state transitions between actions by using higher-order temporal relationships. Second, we argue that in constrained environments, where the relevant set of objects is known, it is better to develop features using high-level object relationships that have semantic meaning instead of relying on abstract features. We apply our approach to a set of tasks common for training in robotic surgery: suturing, knot tying, and needle passing, and show that our method increases micro and macro accuracy by 18.46% and 44.13% relative to the state of the art on a widely used robotic surgery dataset.
AB - Automated segmentation and recognition of fine-grained activities is important for enabling new applications in industrial automation, human-robot collaboration, and surgical training. Many existing approaches to activity recognition assume that a video has already been segmented and perform classification using an abstract representation based on spatio-temporal features. While some approaches perform joint activity segmentation and recognition, they typically suffer from a poor modeling of the transitions between actions and a representation that does not incorporate contextual information about the scene. In this paper, we propose a model for action segmentation and recognition that improves upon existing work in two directions. First, we develop a variation of the Skip-Chain Conditional Random Field that captures long-range state transitions between actions by using higher-order temporal relationships. Second, we argue that in constrained environments, where the relevant set of objects is known, it is better to develop features using high-level object relationships that have semantic meaning instead of relying on abstract features. We apply our approach to a set of tasks common for training in robotic surgery: suturing, knot tying, and needle passing, and show that our method increases micro and macro accuracy by 18.46% and 44.13% relative to the state of the art on a widely used robotic surgery dataset.
UR - http://www.scopus.com/inward/record.url?scp=84925400796&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84925400796&partnerID=8YFLogxK
U2 - 10.1109/WACV.2015.154
DO - 10.1109/WACV.2015.154
M3 - Conference contribution
AN - SCOPUS:84925400796
T3 - Proceedings - 2015 IEEE Winter Conference on Applications of Computer Vision, WACV 2015
SP - 1123
EP - 1129
BT - Proceedings - 2015 IEEE Winter Conference on Applications of Computer Vision, WACV 2015
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2015 15th IEEE Winter Conference on Applications of Computer Vision, WACV 2015
Y2 - 5 January 2015 through 9 January 2015
ER -