Zhi LIU, Jiang-tao HUANG, Xin FENG. Action recognition model construction based on multi-scale deep convolution neural network[J]. Optics and precision engineering, 2017, 25(3): 799-805.
DOI:
Zhi LIU, Jiang-tao HUANG, Xin FENG. Action recognition model construction based on multi-scale deep convolution neural network[J]. Optics and precision engineering, 2017, 25(3): 799-805. DOI: 10.3788/OPE.20172503.0799.
Action recognition model construction based on multi-scale deep convolution neural network
In order to simplify the feature extracting process of Human Activity Recognition (HAR) and improve the generalization of extracted feature
an algorithm based on multi-scale deep convolution neural network was proposed. In this algorithm
the depth video was selected as research object and a parallel CNN (Convolution Neural Network) based deep network was constructed to process coarse global information of the action and fine-grained local information of hand part simultaneously. Experiments were executed on MSRDailyActivity3D dataset. The average recognition accuracy on actions ranging from No.11 to No.16 was 98%
while that on all actions was 60.625%. The experimental results showed that proposed algorithm could take effective recognition for human activity. Almost all of the actions with obvious movements and most of actions with local movements just in hands could be recognized effectively.
关键词
Keywords
references
DALAL N, TRIGGS B. Histograms of oriented gradients for human detection [C]. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Piscataway, NJ: IEEE, 2005: 886-893.
TIAN Y L, CAO L L, LIU Z C, et al .. Hierarchical filtered motion for action recognition in crowded videos [J]. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 2012, 42(3): 313-323.
ZHANG D F, ZHANG J S, YAO K M, et al .. Infrared ship-target recognition based on SVM classification [J]. Infrared and Laser Engineering, 2016, 45(1):167-172. (inchinese)
LI W, ZHANG Z, LIU Z. Action recognition based on a bag of 3D points [C]. 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Piscataway, NJ: IEEE, 2010:9-14.
WANG J, LIU Z C, WU Y, et al .. Mining actionlet ensemble for action recognition with depth cameras [C] . 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Piscataway, NJ: IEEE., 2012:1290-1297.
XIA L, AGGARWAL J K. Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera [C]. 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ: IEEE, 2013:2834-2841.
OREIFEJ O, LIU Z. Hon4d: histogram of oriented 4D normals for activity recognition from depth sequences [C]. 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ: IEEE, 2013:716-723.
ZHANG C Y, TIAN Y L. Edge enhanced depth motion map for dynamic hand gesture recognition [C]. 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Piscataway, NJ: IEEE, 2013:500-505.
YE M, ZHANG Q, WANG L, et al .. A survey on human motion analysis from depth data [J]. Time-of-Flight and Depth Imaging, Sensors, Algorithms, and Applications, Springer, 2013:149-187.
LE Q V, ZOU W Y, YEUNG S Y, et al .. Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis [C]. 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ: IEEE, 2011:3361-3368.
ZHANG N, PALURI M, RANZATO M, et al .. Panda: pose aligned networks for deep attribute modeling [C]. 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ:IEEE, 2014:1637-1644.
TOSHEV A, SZEGEDY C. Deeppose: human pose estimation via deep neural networks [C]. 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ:IEEE, 2014:1653-1660.
LIU P, HAN S, MENG Z, et al .. Facial expression recognition via a boosted deep belief network [C]. 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ:IEEE, 2014:1805-1812.
HE K, ZHANG X, REN S, et al .. Spatial pyramid pooling in deep convolutional networks for visual recognition [C]. Computer Vision-ECCV 2014, Springer, 2014:346-361.
LIN M, CHEN Q, YAN S. Network in network [J]. Computer Science, 2014.
SZEGEDY C, LIU W, JIA Y Q, et al .. Going deeper with convolutions [C]. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015:1-9.
CHEN F, ZHENG D, PENG Z J, et al .. Depth video fast macroblock mode selection algorithm based on mode complexity [J]. Opt. Precision Eng., 2014, 22(8):2196-2204.(inchinese)
COLLOBERT R, KAVUKCUOGLU K, FARABET C. Torch7: A matlab-like environment for machine learning [R]. BigLearn, NIPS Workshop, 2011.
MÜLLER M, RÖDER T. Motion templates for automatic classification and retrieval of motion capture data [C]. Proceedings of the 2006 ACM SIGGRAPH, Eurographics Association, 2006: 137-146.