浏览全部资源
扫码关注微信
大连民族大学 机电工程学院,辽宁 大连 116600
[ "毛 琳(1977-),女,山东荣成人,博士,副教授,硕士生导师,2005年于黑龙江大学获得硕士学位,2011年于哈尔滨工程大学获得博士学位,主要从事机器视觉目标跟踪与多传感器信息融合的研究。E-mail: maolin@dlnu.edu.cn" ]
[ "曹 哲(1998-),男,内蒙古赤峰人,硕士研究生,2020年于大连民族大学获得学士学位,主要从事计算机视觉和视频动作分割算法的研究。E-mail: cao_zhe@foxmail.com" ]
收稿日期:2021-04-20,
修回日期:2021-07-16,
纸质出版日期:2022-02-15
移动端阅览
毛琳,曹哲,杨大伟等.多阶段边界参考网络的动作分割[J].光学精密工程,2022,30(03):340-349.
MAO Lin,CAO Zhe,YANG Dawei,et al.Multi-stage boundary reference network for action segmentation[J].Optics and Precision Engineering,2022,30(03):340-349.
毛琳,曹哲,杨大伟等.多阶段边界参考网络的动作分割[J].光学精密工程,2022,30(03):340-349. DOI: 10.37188/OPE.20223003.0340.
MAO Lin,CAO Zhe,YANG Dawei,et al.Multi-stage boundary reference network for action segmentation[J].Optics and Precision Engineering,2022,30(03):340-349. DOI: 10.37188/OPE.20223003.0340.
针对现有动作分割算法中过分割问题导致预测错误、造成分割质量下降的现象,提出一种可调视频动作边界信息作为参考的多阶段参考网络,在基于多阶段时间卷积网络的主干网络中,为每个阶段独立引入视频动作边界信息作为参考。各阶段使用相同的边界信息会使模型固化,为使主干网络能够调整参与各阶段输出计算的边界值,对不同样本区分处理,提出多层并行卷积组成的权重调节单元。多阶段参考网络由于利用可调边界信息作为参考,按照时序平滑处理各阶段输出,能显著减少过分割错误。实验表明,该方法在三个视频动作分割数据集GTEA,50Salads和Breakfast中的性能优于现存同类方法,与BCN(Boundary-Aware Cascade Networks)算法相比,分割编辑分数平均提升1.7%,准确率与召回率的调和分数平均提升1.5%。
Over-segmentation leads to incorrect predictions and reduces segmentation quality in existing action segmentation algorithms. To address this, the reference from video action boundary information was independently introduced for each stage in the backbone, which was based on a multi-stage temporal convolutional network. To avoid the model solidification caused by the application of the same boundary information at all stages, a weight adjusting block composed of multilayer parallel convolution was proposed to adjust the boundary values involved in the output calculation of each stage and process various samples differently. The reference from the adjustable boundary information was used to smoothen the output of each stage according to the time sequence, significantly reducing the over-segmentation error. Experimental results show that the proposed method outperforms existing methods in the three video action segmentation datasets GTEA, 50Salads and Breakfast. Compared with the boundary-aware cascade networks(BCN) algorithm, the segmentation edit score is increased by 1.7% on average, and the reconciliation score between accuracy and recall rate is increased by 1.5% on average.
张红颖 , 安征 . 基于改进双流时空网络的人体行为识别 [J]. 光学 精密工程 , 2021 , 29 ( 2 ): 420 - 429 . doi: 10.37188/OPE.20212902.0420 http://dx.doi.org/10.37188/OPE.20212902.0420
ZHANG H Y , AN ZH . Human action recognition based on improved two-stream spatiotemporal network [J]. Opt. Precision Eng. , 2021 , 29 ( 2 ): 420 - 429 . (in Chinese) . doi: 10.37188/OPE.20212902.0420 http://dx.doi.org/10.37188/OPE.20212902.0420
马世伟 , 刘丽娜 , 傅琪 , 等 . 采用PHOG融合特征和多类别Adaboost分类器的行为识别 [J]. 光学 精密工程 , 2018 , 26 ( 11 ): 2827 - 2837 . doi: 10.3788/ope.20182611.2827 http://dx.doi.org/10.3788/ope.20182611.2827
MA SH W , LIU L N , FU Q , et al . Using PHOG fusion features and multi-class Adaboost classifier for human behavior recognition [J]. Opt. Precision Eng. , 2018 , 26 ( 11 ): 2827 - 2837 . (in Chinese) . doi: 10.3788/ope.20182611.2827 http://dx.doi.org/10.3788/ope.20182611.2827
OORD AVAN DEN , DIELEMAN S , ZEN H G , et al . WaveNet: a generative model for raw audio [J]. CoRR , arXiv preprint arXiv: 1609.03499 , 2016 .
LEA C , FLYNN M D , VIDAL R , et al . Temporal convolutional networks for action segmentation and detection [C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition . 2126,2017 , Honolulu, HI, USA . IEEE , 2017 : 1003 - 1012 . doi: 10.1109/cvpr.2017.113 http://dx.doi.org/10.1109/cvpr.2017.113
KUEHNE H , RICHARD A , GALL J . A hybrid RNN-HMM approach for weakly supervised temporal action segmentation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2018 , 42 ( 4 ): 765 - 779 . doi: 10.1109/cvpr.2018.00627 http://dx.doi.org/10.1109/cvpr.2018.00627
李庆辉 , 李艾华 , 郑勇 , 等 . 利用几何特征和时序注意递归网络的动作识别 [J]. 光学 精密工程 , 2018 , 26 ( 10 ): 2584 - 2591 . doi: 10.3788/ope.20182610.2584 http://dx.doi.org/10.3788/ope.20182610.2584
LI Q H , LI A H , ZHENG Y , et al . Action recognition using geometric features and recurrent temporal attention network [J]. Opt. Precision Eng. , 2018 , 26 ( 10 ): 2584 - 2591 . (in Chinese) . doi: 10.3788/ope.20182610.2584 http://dx.doi.org/10.3788/ope.20182610.2584
SINGH B , MARKS T K , JONES M , et al . A multi- stream Bi-directional recurrent neural network for fine-grained action detection [C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition . 2730,2016 , Las Vegas, NV, USA . IEEE , 2016 : 1961 - 1970 . doi: 10.1109/cvpr.2016.216 http://dx.doi.org/10.1109/cvpr.2016.216
FARHA Y A , GALL J . MS-TCN: multi-stage temporal convolutional network for action segmentation [C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1520,2019 , Long Beach, CA, USA. IEEE , 2019 : 3570 - 3579 . doi: 10.1109/cvpr.2019.00369 http://dx.doi.org/10.1109/cvpr.2019.00369
LI S J , ABUFARHA Y , LIU Y , et al . MS-TCN: multi-stage temporal convolutional network for action segmentation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 1756 , 99 : 1 .
WANG Z Z , GAO Z T , WANG L M , et al . Boundary-aware cascade networks for temporal action segmentation [J]. Computer Vision-ECCV , 2020 , 2020 : 34 - 51 . doi: 10.1007/978-3-030-58595-2_3 http://dx.doi.org/10.1007/978-3-030-58595-2_3
LIN T W , ZHAO X , SU H S , et al . BSN: boundary sensitive network for temporal action proposal generation [J]. Computer Vision-ECCV , 2018 , 2018 : 3 - 21 . doi: 10.1007/978-3-030-01225-0_1 http://dx.doi.org/10.1007/978-3-030-01225-0_1
CARREIRA J , ZISSERMAN A . Quo vadis, action recognition? A new model and the kinetics dataset[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition . 2126,2017 , Honolulu, HI, USA . IEEE , 2017 : 4724 - 4733 . doi: 10.1109/cvpr.2017.502 http://dx.doi.org/10.1109/cvpr.2017.502
LEI P , TODOROVIC S . Temporal deformable residual networks for action segmentation in videos [C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . 1823,2018 , Salt Lake City, UT, USA . IEEE , 2018 : 6742 - 6751 . doi: 10.1109/cvpr.2018.00705 http://dx.doi.org/10.1109/cvpr.2018.00705
LEA C , REITER A , VIDAL R , et al . Segmental spatiotemporal CNNs for fine-grained action segmentation [J]. Computer Vision-ECCV , 2016 , 2016 : 36 - 52 . doi: 10.1007/978-3-319-46487-9_3 http://dx.doi.org/10.1007/978-3-319-46487-9_3
KUEHNE H , GALL J , SERRE T . An end-to-end generative framework for video segmentation and recognition [C]. 2016 IEEE Winter Conference on Applications of Computer Vision . 710,2016 , Lake Placid , NY, USA . IEEE , 2016 : 1 - 8 . doi: 10.1109/wacv.2016.7477701 http://dx.doi.org/10.1109/wacv.2016.7477701
RICHARD A , KUEHNE H , GALL J . Weakly supervised action learning with RNN based fine-to- coarse modeling [C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition . 2126,2017 , Honolulu, HI, USA . IEEE , 2017 : 1273 - 1282 . doi: 10.1109/cvpr.2017.140 http://dx.doi.org/10.1109/cvpr.2017.140
0
浏览量
658
下载量
1
CSCD
关联资源
相关文章
相关作者
相关机构