浏览全部资源
扫码关注微信
1.重庆理工大学 计算机学院, 重庆 400054
2.广西师范学院 计算机与信息工程学院, 广西 南宁 530001
[ "刘智 (1977-), 男, 江西高安人, 博士, 副教授, 2011年于四川大学计算机科学与技术专业获得博士学位, 主要从事深度学习、人体行为识别、图像处理、目标跟踪、信息融合研究。E-mail:liuzhi@cqut.edu.cn" ]
收稿日期:2016-12-21,
录用日期:2017-1-15,
纸质出版日期:2017-03-25
移动端阅览
刘智, 黄江涛, 冯欣. 构建多尺度深度卷积神经网络行为识别模型[J]. 光学 精密工程, 2017,25(3):799-805.
Zhi LIU, Jiang-tao HUANG, Xin FENG. Action recognition model construction based on multi-scale deep convolution neural network[J]. Optics and precision engineering, 2017, 25(3): 799-805.
刘智, 黄江涛, 冯欣. 构建多尺度深度卷积神经网络行为识别模型[J]. 光学 精密工程, 2017,25(3):799-805. DOI: 10.3788/OPE.20172503.0799.
Zhi LIU, Jiang-tao HUANG, Xin FENG. Action recognition model construction based on multi-scale deep convolution neural network[J]. Optics and precision engineering, 2017, 25(3): 799-805. DOI: 10.3788/OPE.20172503.0799.
为了减化传统人体行为识别方法中的特征提取过程,提高所提取特征的泛化性能,本文提出了一种基于深度卷积神经网络和多尺度信息的人体行为识别方法。该方法以深度视频为研究对象,通过构建基于卷积神经网络的深度结构,并融合粗粒度的全局行为模式与细粒度的局部手部动作等多尺度信息来研究人体行为的识别。MSRDailyActivity3D数据集上的实验得出该数据集上第11~16种行为的平均识别准确率为98%,所有行为的平均识别准确率为60.625%。结果表明,本方法能对人体行为进行有效识别,基本能准确识别运动较为明显的人体行为,对仅有手部局部运动的行为的识别准确率有所下降。
In order to simplify the feature extracting process of Human Activity Recognition (HAR) and improve the generalization of extracted feature
an algorithm based on multi-scale deep convolution neural network was proposed. In this algorithm
the depth video was selected as research object and a parallel CNN (Convolution Neural Network) based deep network was constructed to process coarse global information of the action and fine-grained local information of hand part simultaneously. Experiments were executed on MSRDailyActivity3D dataset. The average recognition accuracy on actions ranging from No.11 to No.16 was 98%
while that on all actions was 60.625%. The experimental results showed that proposed algorithm could take effective recognition for human activity. Almost all of the actions with obvious movements and most of actions with local movements just in hands could be recognized effectively.
DALAL N, TRIGGS B. Histograms of oriented gradients for human detection [C]. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Piscataway, NJ: IEEE, 2005: 886-893.
TIAN Y L, CAO L L, LIU Z C, et al .. Hierarchical filtered motion for action recognition in crowded videos [J]. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 2012, 42(3): 313-323.
张迪飞, 张金锁, 姚克明, 等.基于SVM分类的红外舰船目标识别[J].红外与激光工程, 2016, 45(1):167-172.
ZHANG D F, ZHANG J S, YAO K M, et al .. Infrared ship-target recognition based on SVM classification [J]. Infrared and Laser Engineering, 2016, 45(1):167-172. (inchinese)
LI W, ZHANG Z, LIU Z. Action recognition based on a bag of 3D points [C]. 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Piscataway, NJ: IEEE, 2010:9-14.
WANG J, LIU Z C, WU Y, et al .. Mining actionlet ensemble for action recognition with depth cameras [C] . 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Piscataway, NJ: IEEE., 2012:1290-1297.
XIA L, AGGARWAL J K. Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera [C]. 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ: IEEE, 2013:2834-2841.
OREIFEJ O, LIU Z. Hon4d: histogram of oriented 4D normals for activity recognition from depth sequences [C]. 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ: IEEE, 2013:716-723.
ZHANG C Y, TIAN Y L. Edge enhanced depth motion map for dynamic hand gesture recognition [C]. 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Piscataway, NJ: IEEE, 2013:500-505.
YE M, ZHANG Q, WANG L, et al .. A survey on human motion analysis from depth data [J]. Time-of-Flight and Depth Imaging, Sensors, Algorithms, and Applications, Springer, 2013:149-187.
LE Q V, ZOU W Y, YEUNG S Y, et al .. Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis [C]. 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ: IEEE, 2011:3361-3368.
ZHANG N, PALURI M, RANZATO M, et al .. Panda: pose aligned networks for deep attribute modeling [C]. 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ:IEEE, 2014:1637-1644.
TOSHEV A, SZEGEDY C. Deeppose: human pose estimation via deep neural networks [C]. 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ:IEEE, 2014:1653-1660.
LIU P, HAN S, MENG Z, et al .. Facial expression recognition via a boosted deep belief network [C]. 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ:IEEE, 2014:1805-1812.
HE K, ZHANG X, REN S, et al .. Spatial pyramid pooling in deep convolutional networks for visual recognition [C]. Computer Vision-ECCV 2014, Springer, 2014:346-361.
LIN M, CHEN Q, YAN S. Network in network [J]. Computer Science, 2014.
SZEGEDY C, LIU W, JIA Y Q, et al .. Going deeper with convolutions [C]. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015:1-9.
陈芬, 郑迪, 彭宗举, 等.基于模式复杂度的深度视频快速宏块模式选择算法[J].光学 精密工程, 2014, 22(8):2196-2204.
CHEN F, ZHENG D, PENG Z J, et al .. Depth video fast macroblock mode selection algorithm based on mode complexity [J]. Opt. Precision Eng., 2014, 22(8):2196-2204.(inchinese)
COLLOBERT R, KAVUKCUOGLU K, FARABET C. Torch7: A matlab-like environment for machine learning [R]. BigLearn, NIPS Workshop, 2011.
MÜLLER M, RÖDER T. Motion templates for automatic classification and retrieval of motion capture data [C]. Proceedings of the 2006 ACM SIGGRAPH, Eurographics Association, 2006: 137-146.
0
浏览量
490
下载量
21
CSCD
关联资源
相关文章
相关作者
相关机构