浏览全部资源
扫码关注微信
1.安庆师范大学 电子工程与智能制造学院,安徽 安庆 246133
2.合肥工业大学 计算机与信息学院,安徽 合肥 230009
[ "黄 忠(1981-),男,安徽安庆人,博士,副教授,硕士生导师,2016年于合肥工业大学获博士学位,现为合肥工业大学信息与通信工程科研流动站在站博后,主要从事图像处理、光学检测等方面的研究。E-mail:huangzhong3315@163.com" ]
陶孟元(1996-),女,安徽合肥人,硕士研究生,2020 年于安庆师范大学获学士学位,主要从事光学图像处理、自然人机交互等方面的研究。E-mail: tmy2424851034@163.com
收稿日期:2022-05-16,
修回日期:2022-07-08,
纸质出版日期:2023-02-25
移动端阅览
黄忠,陶孟元,胡敏等.结合残差收缩和时空上下文的行为检测网络[J].光学精密工程,2023,31(04):552-564.
HUANG Zhong,TAO Mengyuan,HU Min,et al.Combining residual shrinkage and spatio-temporal context for behavior detection network[J].Optics and Precision Engineering,2023,31(04):552-564.
黄忠,陶孟元,胡敏等.结合残差收缩和时空上下文的行为检测网络[J].光学精密工程,2023,31(04):552-564. DOI: 10.37188/OPE.20233104.0552.
HUANG Zhong,TAO Mengyuan,HU Min,et al.Combining residual shrinkage and spatio-temporal context for behavior detection network[J].Optics and Precision Engineering,2023,31(04):552-564. DOI: 10.37188/OPE.20233104.0552.
针对R-C3D行为检测网络特征提取冗余度高及边界定位不准确的问题,结合残差收缩结构和时空上下文,提出一种改进的行为检测网络(RS-STCBD)。首先,将收缩结构和软阈值化操作融入到3D-ResNet的残差模块中,设计通道自适应阈值的残差收缩单元(3D-RSST),并级联多个3D-RSST单元构建特征提取网络以消除行为特征中的噪声、背景等冗余信息;然后,在时序候选子网中嵌入多层卷积替代一次卷积,以增加时序侯选片段的时序维度感受野;最后,在行为分类子网引入非局部注意力机制,通过捕获优质行为时序片段间的远程依赖以获取动作时空上下文信息。在THUMOS14和ActivityNet1.2数据集上的实验结果表明:改进网络的mAP@0.5分别达到36.9%和41.6%,比R-C3D方法提升了8.0%和14.8%。基于改进网络的行为检测方法提高了动作边界定位精度和行为分类准确率,有利于改善自然场景下的人机交互质量。
To solve the problems of high redundancy of behavior feature extraction and inaccurate localization of behavior boundary of R-C3D, an improved behavior detection network (RS-STCBD) based on residual shrinkage and spatio-temporal context is proposed. First, the residual shrinkage structure and soft threshold operation are integrated into the residual module of 3D-ResNet, and a unit of 3D residual shrinkage with channel-adaptive soft thresholds (3D-RSST) is designed. Moreover, multiple 3D-RSSTs are cascaded to construct a feature extraction network to adaptively eliminate redundant information such as noise and background in behavioral features. Second, instead of single convolution, multi-layer convolutions are embedded into the proposed subnet to increase the temporal dimension receptive field of the temporal proposal fragments. Finally, a non-local attention mechanism is introduced into the behavior classification subnet to obtain the spatio-temporal context information of behavior by capturing remote dependencies among high-quality behavior proposals. Experimental results on THUMOS14 and ActivityNet1.2 datasets show that the mAP@0.5 values of the improved network reach 36.9% and 41.6%, which are 8.0% and 14.8% higher than those of R-C3D, respectively. The behavior detection method based on the improved network, which increases the accuracy of behavior boundary localization and behavior classification, is beneficial and enhances the quality of human-robot interaction in natural scenes.
LIU C , LI X , LI Q , et al . Robot recognizing humans intention and interacting with humans based on a multi-task model combining ST-GCN-LSTM model and YOLO model [J]. Neurocomputing , 2021 , 430 : 174 - 184 . doi: 10.1016/j.neucom.2020.10.016 http://dx.doi.org/10.1016/j.neucom.2020.10.016
HU X J , DAI J Z , LI M , et al . Online human action detection and anticipation in videos: a survey [J]. Neurocomputing , 2022 , 491 : 395 - 413 . doi: 10.1016/j.neucom.2022.03.069 http://dx.doi.org/10.1016/j.neucom.2022.03.069
张红颖 , 安征 . 基于改进双流时空网络的人体行为识别 [J]. 光学 精密工程 , 2021 , 29 ( 2 ): 420 - 429 . doi: 10.37188/OPE.20212902.0420 http://dx.doi.org/10.37188/OPE.20212902.0420
ZHANG H Y , AN ZH . Human action recognition based on improved two-stream spatiotemporal network [J]. Opt. Precision Eng. , 2021 , 29 ( 2 ): 420 - 429 . (in Chinese) . doi: 10.37188/OPE.20212902.0420 http://dx.doi.org/10.37188/OPE.20212902.0420
LIU Y , YANG F , GINHAC D . ACDnet: an action detection network for real-time edge computing based on flow-guided feature approximation and memory aggregation [J]. Pattern Recognition Letters , 2021 , 145 : 118 - 126 . doi: 10.1016/j.patrec.2021.02.001 http://dx.doi.org/10.1016/j.patrec.2021.02.001
YUAN Z H , STROUD J C , LU T , et al . Temporal action localization by structured maximal sums [C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2126,2017 , Honolulu, HI, USA. IEEE , 2017 : 3215 - 3223 . doi: 10.1109/cvpr.2017.342 http://dx.doi.org/10.1109/cvpr.2017.342
WEI , ZHANG . I2Net: Mining intra-video and inter-video attention for temporal action localization [J]. Neurocomputing , 2021 , 444 : 16 - 29 . doi: 10.1016/j.neucom.2021.02.085 http://dx.doi.org/10.1016/j.neucom.2021.02.085
HUANG Y P , DAI Q , LU Y T . Decoupling localization and classification in single shot temporal action detection [C]. 2019 IEEE International Conference on Multimedia and Expo (ICME). 812,2019 , Shanghai, China. IEEE , 2019 : 1288 - 1293 . doi: 10.1109/icme.2019.00224 http://dx.doi.org/10.1109/icme.2019.00224
ZHAO Y , XIONG Y J , WANG L M , et al . Temporal action detection with structured segment networks [J]. International Journal of Computer Vision , 2020 , 128 ( 1 ): 74 - 95 . doi: 10.1007/s11263-019-01211-2 http://dx.doi.org/10.1007/s11263-019-01211-2
LIN T W , ZHAO X , SU H S . Joint learning of local and global context for temporal action proposal generation [J]. IEEE Transactions on Circuits and Systems for Video Technology , 2020 , 30 ( 12 ): 4899 - 4912 . doi: 10.1109/tcsvt.2019.2962063 http://dx.doi.org/10.1109/tcsvt.2019.2962063
XU H J , DAS A , SAENKO K . R-C3D: region convolutional 3D network for temporal activity detection [C]. 2017 IEEE International Conference on Computer Vision (ICCV). 2229,2017 , Venice, Italy. IEEE , 2017 : 5794 - 5803 . doi: 10.1109/iccv.2017.617 http://dx.doi.org/10.1109/iccv.2017.617
CHEN G , ZHANG C , ZOU Y X . AFNet: temporal locality-aware network with dual structure for accurate and fast action detection [J]. IEEE Transactions on Multimedia , 2021 , 23 : 2672 - 2682 . doi: 10.1109/tmm.2020.3014555 http://dx.doi.org/10.1109/tmm.2020.3014555
XU H J , DAS A , SAENKO K . Two-stream region convolutional 3D network for temporal activity detection [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2019 , 41 ( 10 ): 2319 - 2332 . doi: 10.1109/tpami.2019.2921539 http://dx.doi.org/10.1109/tpami.2019.2921539
YANG L , PENG H W , ZHANG D W , et al . Revisiting anchor mechanisms for temporal action localization [J]. IEEE Transactions on Image Processing: a Publication of the IEEE Signal Processing Society , 2020 . doi: 10.1109/tip.2020.3016486 http://dx.doi.org/10.1109/tip.2020.3016486
孟月波 , 金丹 , 刘光辉 , 等 . 共享核空洞卷积与注意力引导FPN文本检测 [J]. 光学 精密工程 , 2021 , 29 ( 8 ): 1955 - 1967 . doi: 10.37188/OPE.20212908.1955 http://dx.doi.org/10.37188/OPE.20212908.1955
MENG Y B , JIN D , LIU G H , et al . Text detection with kernel-sharing dilated convolutions and attention-guided FPN [J]. Opt. Precision Eng. , 2021 , 29 ( 8 ): 1955 - 1967 . (in Chinese) . doi: 10.37188/OPE.20212908.1955 http://dx.doi.org/10.37188/OPE.20212908.1955
毛琳 , 曹哲 , 杨大伟 , 等 . 多阶段边界参考网络的动作分割 [J]. 光学 精密工程 , 2022 , 30 ( 3 ): 340 - 349 . doi: 10.37188/OPE.20223003.0340 http://dx.doi.org/10.37188/OPE.20223003.0340
MAO L , CAO ZH , YANG D W , et al . Multi-stage boundary reference network for action segmentation [J]. Opt. Precision Eng. , 2022 , 30 ( 3 ): 340 - 349 . (in Chinese) . doi: 10.37188/OPE.20223003.0340 http://dx.doi.org/10.37188/OPE.20223003.0340
BAIRONG , LI . Learning frame-level affinity with video-level labels for weakly supervised temporal action detection [J]. Neurocomputing , 2021 , 463 : 109 - 121 . doi: 10.1016/j.neucom.2021.07.059 http://dx.doi.org/10.1016/j.neucom.2021.07.059
YANG W F , ZHANG T Z , MAO Z D , et al . Multi-scale structure-aware network for weakly supervised temporal action detection [J]. IEEE Transactions on Image Processing: a Publication of the IEEE Signal Processing Society , 2021 , 30 : 5848 - 5861 . doi: 10.1109/tip.2021.3089361 http://dx.doi.org/10.1109/tip.2021.3089361
YANG L , HAN J W , ZHAO T , et al . Background-click supervision for temporal action localization [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2022 , 44 ( 12 ): 9814 - 9829 . doi: 10.1109/tpami.2021.3132058 http://dx.doi.org/10.1109/tpami.2021.3132058
ZHAO M H , ZHONG S S , FU X Y , et al . Deep residual shrinkage networks for fault diagnosis [J]. IEEE Transactions on Industrial Informatics , 2020 , 16 ( 7 ): 4681 - 4690 . doi: 10.1109/tii.2019.2943898 http://dx.doi.org/10.1109/tii.2019.2943898
LI L W , QIN S Y , LU Z , et al . One-shot learning gesture recognition based on joint training of 3D ResNet and memory module [J]. Multimedia Tools and Applications , 2020 , 79 ( 9 ): 6727 - 6757 . doi: 10.1007/s11042-019-08429-9 http://dx.doi.org/10.1007/s11042-019-08429-9
YIWEI , WANG . Temporal convolutional network with soft thresholding and attention mechanism for machinery prognostics [J]. Journal of Manufacturing Systems , 2021 , 60 : 512 - 526 . doi: 10.1016/j.jmsy.2021.07.008 http://dx.doi.org/10.1016/j.jmsy.2021.07.008
CUI W X , LIU S H , JIANG F , et al . Image compressed sensing using non-local neural network [J]. IEEE Transactions on Multimedia , 2021 , PP(99): 1. doi: 10.1109/tmm.2021.3132489 http://dx.doi.org/10.1109/tmm.2021.3132489
JIANG Y , LIU J , ZAMIR A , et al . THUMOS challenge: Action recognition with a large number of classes [OL] http://crcv.ucf.edu/THUMOS14/ , 2014 .
HEILBRON F C , ESCORCIA V , GHANEM B , et al . ActivityNet: a large-scale video benchmark for human activity understanding [C]. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 712,2015 , Boston, MA, USA. IEEE , 2015 : 961 - 970 . doi: 10.1109/cvpr.2015.7298698 http://dx.doi.org/10.1109/cvpr.2015.7298698
ZHANG X Y , SHI H C , LI C S , et al . TwinNet: twin structured knowledge transfer network for weakly supervised action localization [J]. Machine Intelligence Research , 2022 , 19 ( 3 ): 227 - 246 . doi: 10.1007/s11633-022-1333-4 http://dx.doi.org/10.1007/s11633-022-1333-4
LI G Z , LI J , WANG N N , et al . Multi-hierarchical category supervision for weakly-supervised temporal action localization [J]. IEEE Transactions on Image Processing: a Publication of the IEEE Signal Processing Society , 2021 , 30 : 9332 - 9344 . doi: 10.1109/tip.2021.3124671 http://dx.doi.org/10.1109/tip.2021.3124671
0
浏览量
515
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构