TY - JOUR
T1 - Classifying web videos using a global video descriptor
AU - Solmaz, Berkan
AU - Assari, Shayan Modiri
AU - Shah, Mubarak
N1 - Funding Information:
The research presented in this paper is supported by the Intelligence Advanced Research Projects Activity (IARPA) via the Department of Interior National Business Center, contract number D11PC20071. The US government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DoI/NBC or the US government.
PY - 2013/10
Y1 - 2013/10
N2 - Computing descriptors for videos is a crucial task in computer vision. In this paper, we propose a global video descriptor for classification of videos. Our method, bypasses the detection of interest points, the extraction of local video descriptors and the quantization of descriptors into a code book; it represents each video sequence as a single feature vector. Our global descriptor is computed by applying a bank of 3-D spatio-temporal filters on the frequency spectrum of a video sequence; hence, it integrates the information about the motion and scene structure. We tested our approach on three datasets, KTH (Schuldt et al.; Proceedings of the 17th international conference on, pattern recognition (ICPR'04), vol. 3, pp. 32-36, 2004), UCF50 (http://vision.eecs.ucf. edu/datasetsActions.html) and HMDB51 (Kuehne et al.; HMDB: a large video database for human motion recognition, 2011), and obtained promising results which demonstrate the robustness and the discriminative power of our global video descriptor for classifying videos of various actions. In addition, the combination of our global descriptor and a local descriptor resulted in the highest classification accuracies on UCF50 and HMDB51 datasets.
AB - Computing descriptors for videos is a crucial task in computer vision. In this paper, we propose a global video descriptor for classification of videos. Our method, bypasses the detection of interest points, the extraction of local video descriptors and the quantization of descriptors into a code book; it represents each video sequence as a single feature vector. Our global descriptor is computed by applying a bank of 3-D spatio-temporal filters on the frequency spectrum of a video sequence; hence, it integrates the information about the motion and scene structure. We tested our approach on three datasets, KTH (Schuldt et al.; Proceedings of the 17th international conference on, pattern recognition (ICPR'04), vol. 3, pp. 32-36, 2004), UCF50 (http://vision.eecs.ucf. edu/datasetsActions.html) and HMDB51 (Kuehne et al.; HMDB: a large video database for human motion recognition, 2011), and obtained promising results which demonstrate the robustness and the discriminative power of our global video descriptor for classifying videos of various actions. In addition, the combination of our global descriptor and a local descriptor resulted in the highest classification accuracies on UCF50 and HMDB51 datasets.
KW - Action recognition
KW - Frequency spectrum
KW - Spatio-temporal analysis
KW - Video descriptors
UR - http://www.scopus.com/inward/record.url?scp=84885330892&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84885330892&partnerID=8YFLogxK
U2 - 10.1007/s00138-012-0449-x
DO - 10.1007/s00138-012-0449-x
M3 - Article
AN - SCOPUS:84885330892
SN - 0932-8092
VL - 24
SP - 1473
EP - 1485
JO - Machine Vision and Applications
JF - Machine Vision and Applications
IS - 7
ER -