1.昆明理工大学 机电工程学院,云南 昆明 650000
E-mail: zyhhzf1998@163.com
扫 描 看 全 文
HE Zifen, XU Lin, ZHANG Yinhui, et al. Mask generation dynamically regulates weakly supervised video instance segmentation. [J]. Optics and Precision Engineering 31(19):2884-2897(2023)
HE Zifen, XU Lin, ZHANG Yinhui, et al. Mask generation dynamically regulates weakly supervised video instance segmentation. [J]. Optics and Precision Engineering 31(19):2884-2897(2023) DOI: 10.37188/OPE.20233119.2884.
针对全监督视频实例分割网络训练数据高度依赖精细掩码标注,时间和人工成本过高,导致智能机器无法快速适应新场景的问题,提出一种端到端的掩码生成动态调控弱监督视频实例分割(Weakly Supervised Video Instance Segmentation,WSVIS)网络。为克服初始掩码预测层通道维度突降导致的实例激活特征丢失问题,构建多级特征融合模块,利用特征复用策略预测初始实例特征并融合相对位置信息生成初始预测掩码。然后,提出动态调控机制在通道和空间维度上建立掩码特征依赖关系,强化初始预测掩码与实例感知信息之间的动态交互。最后,网络设计二元颜色相似性生成伪亲和标签取代精细掩码标注,联合边界框与掩码一致性损失实现仅边界框标注的弱监督视频实例分割。实验结果表明,在BoxSet和YT-VIS数据集上,WSVIS网络能达到与全监督网络相近的分割精度和分割效果,同时能够满足实时推理要求,为智能机器快速适应新场景实现实时环境感知和理解提供了理论支撑和算法依据。
The training data of fully supervised video instance segmentation networks are highly dependent on accurate mask annotations under high labor and time costs, owing to which intelligent machines are unable to quickly adapt to new scenes. Therefore, a mask generation, dynamically regulated weakly supervised video instance segmentation (WSVIS) network was proposed. First, to overcome the loss of instance activation features caused by the sudden dimension drop of the initial mask prediction layer channel, a multi-level feature fusion module was used to predict the initial instance features through a step-by-step feature reuse strategy and to generate the initial mask by fusing the relative position information. Second, a dynamic regulation mechanism was introduced to establish mask feature dependencies in the channel and spatial dimensions to strengthen the dynamic interaction between the initial predicted mask and instance-aware information. Finally, the network replaces fine mask labeling with the binary color similarity of images, and the bounding box consistency loss and supervised video instance segmentation mask were replaced with bounding box labeling only. Experimental results reveal that on the BoxSet and YT-VIS datasets, the WSVIS network achieves similar segmentation accuracy and segmentation effect as the fully supervised network and can satisfy real-time reasoning, providing theoretical support and an algorithmic basis for intelligent machines to quickly adapt to new scenes to realize real-time environmental perception and understanding.
智能机器弱监督视频实例分割多级特征融合动态调控二元颜色相似性
intelligent machineweakly supervised video instance segmentationmulti-level feature fusiondynamic regulationbinary color similarity
YANG L J, FAN Y C, XU N. Video instance segmentation[C]. 2019 IEEE/CVF International Conference on Computer Vision (ICCV). October 27 - November 2, 2019, Seoul, Korea (South). IEEE, 2020: 5187-5196. doi: 10.1109/iccv.2019.00529http://dx.doi.org/10.1109/iccv.2019.00529
毛琳, 任凤至, 杨大伟, 等. 实例特征深度链式学习全景分割网络[J]. 光学 精密工程, 2020, 28(12):2665-2673. doi: 10.37188/ope.20202812.2665http://dx.doi.org/10.37188/ope.20202812.2665
MAO L, REN F ZH, YANG D W, et al. INFNet: Deep instance feature chain learning network for panoptic segmentation[J]. Opt. Precision Eng., 2020, 28(12):2665-2673. (in Chinese). doi: 10.37188/ope.20202812.2665http://dx.doi.org/10.37188/ope.20202812.2665
梁新宇, 林洗坤, 权冀川, 等. 基于深度学习的图像实例分割技术研究进展[J]. 电子学报, 2020, 48(12): 2476-2486. doi: 10.3969/j.issn.0372-2112.2020.12.025http://dx.doi.org/10.3969/j.issn.0372-2112.2020.12.025
LIANG X Y, LIN X K, QUAN J CH, et al. Research on the progress of image instance segmentation based on deep learning[J]. Acta Electronica Sinica, 2020, 48(12): 2476-2486.(in Chinese). doi: 10.3969/j.issn.0372-2112.2020.12.025http://dx.doi.org/10.3969/j.issn.0372-2112.2020.12.025
曹天扬,蔡浩原,方东明,等. 结合图像内容匹配的机器人视觉导航定位与全局地图构建系统[J]. 光学 精密工程,2017,25(8):2221-2232. doi: 10.3788/ope.20172508.2221http://dx.doi.org/10.3788/ope.20172508.2221
CAO T Y, CAI H Y, FANG D M, et al. Robot vision system for keyframe global map establishment and robot localization based on graphic content matching [J]. Opt. Precision Eng., 2017,25(8): 2221-2232. (in Chinese). doi: 10.3788/ope.20172508.2221http://dx.doi.org/10.3788/ope.20172508.2221
钱夔, 宋爱国. 一种改进型机器人仿生认知神经网络[J]. 电子学报, 2015, 43(6):1084-1089. doi: 10.3969/j.issn.0372-2112.2015.06.007http://dx.doi.org/10.3969/j.issn.0372-2112.2015.06.007
QIAN K, SONG A G. An improved bionic cognitive neural network for robot[J]. Acta Electronica Sinica, 2015, 43(6):1084-1089.(in Chinese). doi: 10.3969/j.issn.0372-2112.2015.06.007http://dx.doi.org/10.3969/j.issn.0372-2112.2015.06.007
伍锡如, 薛其威. 基于激光雷达的无人驾驶系统三维车辆检测[J]. 光学 精密工程, 2022, 30(4): 489-497. doi: 10.37188/OPE.20223004.0489http://dx.doi.org/10.37188/OPE.20223004.0489
WU X R, XUE Q W. 3D vehicle detection for unmanned driving systerm based on lidar[J]. Opt. Precision Eng., 2022, 30(4): 489-497.(in Chinese). doi: 10.37188/OPE.20223004.0489http://dx.doi.org/10.37188/OPE.20223004.0489
秦飞巍, 沈希乐, 彭勇, 等. 无人驾驶中的场景实时语义分割方法[J]. 计算机辅助设计与图形学学报, 2021, 33(7):1026-1037. doi: 10.3724/SP.J.1089.2021.18631http://dx.doi.org/10.3724/SP.J.1089.2021.18631
QIN F W, SHEN X Y, PENG Y, et al. A real-time semantic segmentation approach for autonomous driving scenes[J]. Journal of Computer-Aided Design & Computer Graphics, 2021, 33(7):1026-1037.(in Chinese). doi: 10.3724/SP.J.1089.2021.18631http://dx.doi.org/10.3724/SP.J.1089.2021.18631
李淑慧, 邓志红, 冯肖雪, 等. 强杂波背景下基于变分贝叶斯推理的机载雷达目标跟踪算法[J]. 电子学报, 2022, 50(5): 1089-1097. doi: 10.12263/DZXB.20210374http://dx.doi.org/10.12263/DZXB.20210374
LI SH H, DENG ZH H, FENG X X, et al. Variational Bayesian Inference? Based airborne radar target tracking algorithm in strong clutter[J]. Acta Electronica Sinica, 2022, 50(5): 1089-1097. (in Chinese). doi: 10.12263/DZXB.20210374http://dx.doi.org/10.12263/DZXB.20210374
王树亮, 毕大平, 阮怀林, 等. 基于信息熵准则的认知雷达机动目标跟踪算法[J]. 电子学报, 2019, 47(6): 1277-1284. doi: 10.3969/j.issn.0372-2112.2019.06.014http://dx.doi.org/10.3969/j.issn.0372-2112.2019.06.014
WANG SH L, BI D P, RUAN H L, et al. Cognitive radar maneuvering target tracking algorithm based on information entropy criterion[J]. Acta Electronica Sinica, 2019, 47(6): 1277-1284.(in Chinese). doi: 10.3969/j.issn.0372-2112.2019.06.014http://dx.doi.org/10.3969/j.issn.0372-2112.2019.06.014
KHOREVA A, BENENSON R, HOSANG J, et al. Simple does it: weakly supervised instance and semantic segmentation[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).2126,2017, Honolulu, HI, USA. IEEE, 2017: 1665-1674. doi: 10.1109/cvpr.2017.181http://dx.doi.org/10.1109/cvpr.2017.181
任冬伟, 王旗龙, 魏云超, 等. 视觉弱监督学习研究进展[J]. 中国图象图形学报, 2022, 27(6): 1768-1798. doi: 10.11834/jig.220178http://dx.doi.org/10.11834/jig.220178
REN D W, WANG Q L, WEI Y CH, et al. Progress in weakly supervised learning for visual understanding[J]. Journal of Image and Graphics, 2022, 27(6): 1768-1798.(in Chinese). doi: 10.11834/jig.220178http://dx.doi.org/10.11834/jig.220178
LIU Q, RAMANATHAN V, MAHAJAN D, et al. Weakly supervised instance segmentation for videos with temporal mask consistency[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).2025,2021, Nashville, TN, USA. IEEE, 2021: 13963-13973. doi: 10.1109/cvpr46437.2021.01375http://dx.doi.org/10.1109/cvpr46437.2021.01375
AHN J, CHO S, KWAK S. Weakly supervised learning of instance segmentation with inter-pixel relations[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).1520,2019, Long Beach, CA, USA. IEEE, 2020: 2204-2213. doi: 10.1109/cvpr.2019.00231http://dx.doi.org/10.1109/cvpr.2019.00231
IKEDA J, MORI J. Weakly supervised instance segmentation using motion information via optical flow[J/OL]. arXiv preprint arXiv:2202.13006.
TIAN Z, SHEN C H, WANG X L, et al. BoxInst: high-performance instance segmentation with box annotations[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).2025,2021, Nashville, TN, USA. IEEE, 2021: 5439-5448. doi: 10.1109/cvpr46437.2021.00540http://dx.doi.org/10.1109/cvpr46437.2021.00540
MANINIS K K, PONT-TUSET J, ARBELÁEZ P, et al. Convolutional oriented boundaries: from image segmentation to high-level tasks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 819-833. doi: 10.1109/tpami.2017.2700300http://dx.doi.org/10.1109/tpami.2017.2700300
HSU CC, HSU KJ, TSAI CC, et al. Weakly supervised instance segmentation using the bounding box tightness prior[J]. Advances in Neural Information Processing Systems, 2019, 32: 6586-6597.
HE K M, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[C]. 2017 IEEE International Conference on Computer Vision (ICCV).2229,2017, Venice, Italy. IEEE, 2017: 2980-2988. doi: 10.1109/iccv.2017.322http://dx.doi.org/10.1109/iccv.2017.322
TIAN Z, SHEN C H, CHEN H. Conditional Convolutions for Instance Segmentation[M]. Computer Vision - ECCV 2020. Cham: Springer International Publishing, 2020: 282-298. doi: 10.1007/978-3-030-58452-8_17http://dx.doi.org/10.1007/978-3-030-58452-8_17
BOLYA D, ZHOU C, XIAO F Y, et al. YOLACT: real-time instance segmentation[C]. 2019 IEEE/CVF International Conference on Computer Vision (ICCV). October 27 - November 2, 2019, Seoul, Korea (South). IEEE, 2020: 9156-9165. doi: 10.1109/iccv.2019.00925http://dx.doi.org/10.1109/iccv.2019.00925
CAO J L, ANWER R M, CHOLAKKAL H, et al. SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation[M]. Computer Vision - ECCV 2020. Cham: Springer International Publishing, 2020: 1-18. doi: 10.1007/978-3-030-58568-6_1http://dx.doi.org/10.1007/978-3-030-58568-6_1
LI M H, LI S, LI L D, et al. Spatial feature calibration and temporal fusion for effective one-stage video instance segmentation[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).2025,2021, Nashville, TN, USA. IEEE, 2021: 11210-11219. doi: 10.1109/cvpr46437.2021.01106http://dx.doi.org/10.1109/cvpr46437.2021.01106
YANG S S, FANG Y X, WANG X G, et al. Crossover learning for fast online video instance segmentation[C]. 2021 IEEE/CVF International Conference on Computer Vision (ICCV).1017,2021, Montreal, QC, Canada. IEEE, 2022: 8023-8032. doi: 10.1109/iccv48922.2021.00794http://dx.doi.org/10.1109/iccv48922.2021.00794
HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).2730,2016, Las Vegas, NV, USA. IEEE, 2016: 770-778. doi: 10.1109/cvpr.2016.90http://dx.doi.org/10.1109/cvpr.2016.90
LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).2126,2017, Honolulu, HI, USA. IEEE, 2017: 936-944. doi: 10.1109/cvpr.2017.106http://dx.doi.org/10.1109/cvpr.2017.106
SUN K, XIAO B, LIU D, et al. Deep high-resolution representation learning for human pose estimation[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).1520,2019, Long Beach, CA, USA. IEEE, 2020: 5686-5696. doi: 10.1109/cvpr.2019.00584http://dx.doi.org/10.1109/cvpr.2019.00584
YANG B, BENDER G, LE Q V, et al. CondConv: Conditionally Parameterized Convolutions for Efficient Inference[EB/OL]. 2019: arXiv: 1904.04971. https://arxiv.org/abs/1904.04971https://arxiv.org/abs/1904.04971.
MILLETARI F, NAVAB N, AHMADI S A. V-net: fully convolutional neural networks for volumetric medical image segmentation[C]. 2016 Fourth International Conference on 3D Vision (3DV).2528,2016, Stanford, CA, USA. IEEE, 2016: 565-571. doi: 10.1109/3dv.2016.79http://dx.doi.org/10.1109/3dv.2016.79
YU F, KOLTUN V. Multi-scale Context Aggregation by Dilated Convolutions [EB/OL]. 2015: arXiv: 1511.07122. https://arxiv.org/abs/1511.07122https://arxiv.org/abs/1511.07122. doi: 10.48550/arXiv.1511.07122http://dx.doi.org/10.48550/arXiv.1511.07122
0
Views
3
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution