浏览全部资源
扫码关注微信
南京航空航天大学 机械结构力学及控制国家重点实验室,江苏 南京210016
[ "陈仁文(1966-),男,湖南平江人,博士,教授,博士生导师,1988年,1991年和1999年分别于南京航空航天大学获得学士、硕士和博士学位,主要从事机器视觉、人工智能、测控技术、能量回收,智能结构等研究。E-mail:rwchen@nuaa.edu.cn" ]
[ "袁婷婷(1997-),女,宁夏银川人,硕士研究生,现就读于南京航空航天大学,2018年于四川大学获得学士学位。主要从事模式识别,计算机视觉等研究。E-mail: yttjy@nuaa.edu.cn" ]
收稿日期:2020-08-13,
修回日期:2020-09-21,
纸质出版日期:2021-04-15
移动端阅览
陈仁文,袁婷婷,黄文斌等.卷积神经网络在驾驶员姿态估计上的应用[J].光学精密工程,2021,29(04):813-821.
CHEN Ren-wen,YUAN Ting-ting,HUANG Wen-bin,et al.Driver pose estimation using convolutional neural networks[J].Optics and Precision Engineering,2021,29(04):813-821.
陈仁文,袁婷婷,黄文斌等.卷积神经网络在驾驶员姿态估计上的应用[J].光学精密工程,2021,29(04):813-821. DOI: 10.37188/OPE.20212904.0813.
CHEN Ren-wen,YUAN Ting-ting,HUANG Wen-bin,et al.Driver pose estimation using convolutional neural networks[J].Optics and Precision Engineering,2021,29(04):813-821. DOI: 10.37188/OPE.20212904.0813.
为了实现对驾驶员的驾驶姿态估计,采集并构建了包含26名驾驶人员的姿态估计数据集,提出了一种轻量型卷积神经网络,用于对驾驶姿态的高效识别。首先,通过数学建模将驾驶员的姿态识别问题转化为寻找损失函数最小时关节点的预测值置信图与真值置信图的映射函数。以Hourglass模块为每阶段的骨干结构,残差块为基本组成单元,使用批量归一化和激活函数,构建全卷积神经网络。为了利用原始图片信息和基础上下文信息,使用多特征聚合的两级级联结构,第一阶段的粗略预测图指导预测后续阶段。通过使用多个损失函数,让网络模型学习到更加深入和精确的表示。通过对比实验,验证了模型的可行性,级联网络结构和多损失函数策略对模型预测精度提升3.84%。实验结果表明,本文所提出的网络结构计算量和参数量远低于其他人体姿态估计模型,模型参数量仅0.7 M,且平均预测精度达到了95.74%,可以在车载端实现驾驶姿态的实时检测。
A lightweight convolutional neural network is proposed to realize driver pose estimation. This architecture is validated using a new data set containing video clips of 26 drivers gathered for this study. Firstly, by mathematical modeling, the driver pose estimation task can be reformulated as seeking a mapping function between a confidence map of ground-truth joint labels and a confidence map of predicted value when the loss function is minimized. To develop a fully convolutional neural network, we used hourglass as the backbone model for each stage, the residual block as the basic unit, and employed batch normalization and activation functions. We performed feature aggregation using the features from the preview stage as the input for the current stage, and features from different stages were aggregated to obtain both local detailed information and global context information. At each stage, the computed beliefs provided an increasingly refined estimate for the location of each part. The use of multiple loss functions allowed the network model to learn more detailed and accurate representations. A comparative experiment was performed to verify the feasibility of the model, which shows that the cascaded network structure and multi-task learning strategy improve the prediction accuracy of the model by 3.84%. These extensive experiments demonstrate that the proposed architecture can be executed quickly at a low computational of 0.7 M model parameter number and has an average prediction accuracy of 95.74%. The number of calculation and parameters of the model are much lower than those of other human pose estimation models, and the real-time detection of driving posture can be accomplished on the vehicle side.
VERMA I K , KARMAKAR S . Driver distraction: Methodological review [C]. Proceedings of the International Conference on Research into Design , Springer , Singapore , 2017 .
王宏雁 , 赵明明 . 汽车驾驶人姿态监测系统研究综述 [J]. 中国公路学报 , 2019 , 32 ( 2 ): 1 - 18 .
WANG H Y , ZHAO M M . Automobile driver posture monitoring systems:A review [J]. Journal of China Highway , 2019 , 32 ( 2 ): 1 - 18 . (in Chinese)
TOSHEV A , SZEGEDY C . Deeppose: Human pose estimation via deep neural networks [C]. Proceedings of the IEEE conference on computer vision and pattern recognition , 2014 : 1653 - 1660 .
TOMPSON J , GOROSHIN R , JAIN A , et al . Efficient object localization using convolutional networks [C]. Proceedings of the IEEE conference on computer vision and pattern recognition , 2015 : 648 - 656 .
WEI S-E , RAMAKRISHNA V , KANADE T , et al . Convolutional pose machines [C]. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition , 2016 : 4724 - 4732 .
NEWELL A , YANG K , DENG J . Stacked hourglass networks for human pose estimation [C]. European Conference on Computer Vision . Springer , Cham , 2016 : 483 - 499 .
BORGHI G , VENTURELLI M , VEZZANI R , et al . Poseidon: Face-from-depth for driver pose estimation [C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2017 : 4661 - 4670 .
YUEN K , TRIVEDI M M . Looking at hands in autonomous vehicles: A convnet approach using part affinity fields [J]. IEEE Transactions on Intelligent Vehicles , 2018 , 4 : 99 .
ANDRILUKA M , PISHCHULIN L , GEHLER P , et al . 2d human pose estimation: New benchmark and state of the art analysis [C]. Proceedings of the IEEE Conference on computer Vision and Pattern Recognition , 2014 : 3686 - 3693 .
CHUN S , HAMIDI GHALEHJEGH N , CHOI J , et al . NADS-Net: A nimble architecture for driver and seat belt detection via convolutional neural networks [C]. Proceedings of the IEEE International Conference on Computer Vision Workshops , 2019 .
HE K , ZHANG X , REN S , et al . Deep residual learning for image recognition [C]. Proceedings of the IEEE conference on computer vision and pattern recognition , 2016 : 770 - 778 .
IOFFE S , SZEGEDY C . Batch normalization: Accelerating deep network training by reducing internal covariate shift [J]. arXiv preprint arXiv: 1502.03167 , 2015 .
LONG J , SHELHAMER E , DARRELL T . Fully convolutional networks for semantic segmentation [C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2015 : 3431 - 3440 .
ZHANG F , ZHU X , YE M . Fast human pose estimation [C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2019 : 3517 - 3526 .
TOMPSON J J , JAIN A , LECUN Y , et al . Joint training of a convolutional network and a graphical model for human pose estimation [C]. Advances in Neural Information Processing Systems , 2014 : 1799 - 1807 .
HOWARD A G , ZHU M , CHEN B , et al . Mobilenets: Efficient convolutional neural networks for mobile vision applications [J]. arXiv preprint arXiv:170404861 , 2017 ,
ABADI M , AGARWAL A , BARHAM P , et al . TensorFlow: Large-scale machine learning on heterogeneous systems [J]. 2015 ,
YANG Y , RAMANAN D . Articulated human detection with flexible mixtures of parts [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2012 , 35 ( 12 ): 2878 - 2890 .
PAVLLO D , FEICHTENHOFER C , GRANGIER D , et al . 3d human pose estimation in video with temporal convolutions and semi-supervised training [C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2019 : 7753 - 7762 .
0
浏览量
564
下载量
3
CSCD
关联资源
相关文章
相关作者
相关机构