Weakly supervised video instance segmentation with scale adaptive generation regulation

ZHANG Yinhui; HAI Weiqi; HE Zifen; HUANG Ying; CHEN Dongdong

doi:10.37188/OPE.20233118.2736

您当前的位置：

首页 >

文章列表页 >

Weakly supervised video instance segmentation with scale adaptive generation regulation

Information Sciences | 更新时间：2023-09-25

- Weakly supervised video instance segmentation with scale adaptive generation regulation
- Optics and Precision Engineering Vol. 31, Issue 18, Pages: 2736-2751(2023)
- 作者机构：
  
  昆明理工大学机电工程学院，云南昆明 650000
- 作者简介：
- 基金信息：
- DOI：10.37188/OPE.20233118.2736
  CLC： TP391.4
- Received：14 December 2022，
  
  Revised：18 January 2023，
  
  Published：25 September 2023
- 稿件说明：
移动端阅览
张印辉,海维琪,何自芬等.尺度自适应生成调控的弱监督视频实例分割[J].光学精密工程,2023,31(18):2736-2751.

ZHANG Yinhui,HAI Weiqi,HE Zifen,et al.Weakly supervised video instance segmentation with scale adaptive generation regulation[J].Optics and Precision Engineering,2023,31(18):2736-2751.
张印辉,海维琪,何自芬等.尺度自适应生成调控的弱监督视频实例分割[J].光学精密工程,2023,31(18):2736-2751. DOI： 10.37188/OPE.20233118.2736.

ZHANG Yinhui,HAI Weiqi,HE Zifen,et al.Weakly supervised video instance segmentation with scale adaptive generation regulation[J].Optics and Precision Engineering,2023,31(18):2736-2751. DOI： 10.37188/OPE.20233118.2736.

摘要

视频实例分割是车辆辅助驾驶多目标感知和场景理解的一项关键技术。针对弱监督视频实例分割仅使用边界框对网络进行训练严重制约交通场景大尺度动态范围目标分割精度的问题，本文提出尺度自适应生成调控弱监督视频实例分割网络（Scale Adaptive Generation Regulation weakly supervised video instance segmentation network，SAGRNet）。首先，设计一种多尺度特征映射贡献度动态自适应调控模块，通过动态调整不同尺度特征映射信息贡献度取代原有的线性加权以强化对目标局部位置和整体轮廓的聚焦能力，解决了车辆、行人等目标由于成像距离远近造成的尺度动态范围过大问题；其次，构建目标实例多细粒度空间信息聚合生成调控模块，通过聚合基于不同空洞率提取的多细粒度空间信息生成权重参数以调控各尺度特征，实现了细化实例边界和增强跨通道信息交互掩码特征映射表征能力，有效弥补了实例边缘信息匮乏导致边缘轮廓分割mask连续性缺失问题。最后，为缓解边界框标签监督信息弱化，引入正交损失和颜色相似性损失缩小模型预测mask与真实边界框偏差并计算逐像素点对间标签属性归类模糊问题。Youtube-VIS2019提取的交通场景数据集实验结果表明，SAGRNet相较于弱监督基准网络平均分割精度提升5.1%达到38.1%，为实现多目标感知和实例级场景理解提供了有效算法依据。

Abstract

Video instance segmentation is critical in multi-target perception and scene understanding in assisted driving. However， as weakly supervised video instance segmentation is often applied to bounding box annotations for network training， the segmentation accuracies of targets with large-scale dynamic ranges in traffic scenes are severely restricted. To address this issue， we propose a scale adaptive generation regulation weakly supervised video instance segmentation network （SAGRNet）. First， a multi-scale feature mapping contribution dynamic adaptive control module is proposed to replace the original linear weighting. This enables placing the focus on the local position and global contour of the target by dynamically adjusting the contribution of different scale feature mapping information， which solves the problem of large-scale dynamic ranges caused by changes in the imaging distance between vehicles and pedestrians. Second， a target instance multi-fine-grained spatial information aggregation generation control module is constructed to regulate the feature maps of each scale using weight parameters， which are obtained by aggregating multi-fine-grained spatial information extracted based on different dilations. This module refines the instance boundary and improves the representation of cross-channel mask interaction information， effectively compensating for the lack of edge contour segmentation mask continuity caused by limited instance edge information. Finally， to alleviate the weak supervision derived from bounding box level annotations， orthogonal and color similarity losses are introduced to reduce the deviation between the model prediction mask and real bounding box and to address the pixel-wise label attribute classification ambiguity problem. Experimental results on a traffic scene dataset extracted from Youtube-VIS2019 indicate that the SAGRNet improves the mean accuracy by 5.1% to 38.1% compared with the weakly supervised baseline. These results prove that our method provides an effective theoretical basis for multi-target perception and instance level scene understanding.

关键词

Keywords

references

BAYU A ， WIBISONO A ， WISESA H A ， et al . Semantic segmentation of lidar point cloud in rural area ［C］. 2019 IEEE International Conference on Communication， Networks and Satellite （Comnetsat） . 1 - 3 ， 2019， Makassar， Indonesia. IEEE ， 2019： 73 - 78 . doi: 10.1109/comnetsat.2019.8844074 http://dx.doi.org/10.1109/comnetsat.2019.8844074

MIHAI S ， SHAH P ， MAPP G ， et al . Towards autonomous driving： a machine learning-based Pedestrian Detection System Using 16 -Layer LiDAR ［C］. 2020 13th International Conference on Communications （COMM） . 18 - 20 ， 2020， Bucharest， Romania. IEEE ， 2020： 271 - 276 . doi: 10.1109/comm48946.2020.9142042 http://dx.doi.org/10.1109/comm48946.2020.9142042

伍锡如，薛其威 . 基于激光雷达的无人驾驶系统三维车辆检测［J］. 光学精密工程， 2022 ， 30 （ 4 ）： 489 - 497 . doi: 10.37188/OPE.20223004.0489 http://dx.doi.org/10.37188/OPE.20223004.0489

WU X R ， XUE Q W . 3D vehicle detection for unmanned driving systerm based on lidar ［J］. Opt. Precision Eng. ， 2022 ， 30 （ 4 ）： 489 - 497 . （in Chinese） . doi: 10.37188/OPE.20223004.0489 http://dx.doi.org/10.37188/OPE.20223004.0489

RAGURAMAN S J ， PARK J . Intelligent drivable area detection system using camera and lidar sensor for autonomous vehicle ［C］. 2020 IEEE International Conference on Electro Information Technology （EIT） . IEEE ， 2020 ： 429 - 436 . doi: 10.1109/eit48999.2020.9208327 http://dx.doi.org/10.1109/eit48999.2020.9208327

DOVIS F ， IMAM R ， QIN W J ， et al . Opportunistic use of gnss signals to characterize the environment by means of machine learning based processing ［C］. ICASSP 2020 - 2020 IEEE International Conference on Acoustics， Speech and Signal Processing （ICASSP） . 4 - 8 ， 2020， Barcelona， Spain. IEEE ， 2020： 9190 - 9194 . doi: 10.1109/icassp40776.2020.9052924 http://dx.doi.org/10.1109/icassp40776.2020.9052924

JOUBERT N ， REID T G R ， NOBLE F . Developments in modern gnss and its impact on autonomous vehicle architectures ［C］. 2020 IEEE Intelligent Vehicles Symposium （IV）. October 19 - November 13 ， 2020 ， Las Vegas， NV， USA. IEEE ， 2021 ： 2029 - 2036 . doi: 10.1109/iv47402.2020.9304840 http://dx.doi.org/10.1109/iv47402.2020.9304840

NAVARRO M ， ARRIBAS J ， VILÀ-VALLS J ， et al . Hybrid GNSS/INS/UWB positioning for live demonstration assisted driving ［C］. 2019 IEEE Intelligent Transportation Systems Conference （ITSC） . 27 - 30 ， 2019， Auckland， New Zealand. IEEE ， 2019： 3294 - 3301 . doi: 10.1109/itsc.2019.8917435 http://dx.doi.org/10.1109/itsc.2019.8917435

CHEN C ， XIONG G M ， ZHANG Z H ， et al . 3D LiDAR-GPS/IMU calibration based on hand-eye calibration model for unmanned vehicle ［C］. 2020 3rd International Conference on Unmanned Systems （ICUS） . 27 - 28 ， 2020， Harbin， China. IEEE ， 2020： 337 - 341 . doi: 10.1109/icus50048.2020.9274947 http://dx.doi.org/10.1109/icus50048.2020.9274947

王中宇，倪显扬，尚振东 . 利用卷积神经网络的自动驾驶场景语义分割［J］. 光学精密工程， 2019 ， 27 （ 11 ）： 2429 - 2438 . doi: 10.3788/ope.20192711.2429 http://dx.doi.org/10.3788/ope.20192711.2429

WANG Z Y ， NI X Y ， SHANG Z D . Autonomous driving semantic segmentation with convolution neural networks ［J］. Opt. Precision Eng. ， 2019 ， 27 （ 11 ）： 2429 - 2438 . （in Chinese） . doi: 10.3788/ope.20192711.2429 http://dx.doi.org/10.3788/ope.20192711.2429

LEE K F ， CHEN X Z ， YU C W ， et al . An intelligent driving assistance system based on lightweight deep learning models ［J］. IEEE Access ， 2022 ， 10 ： 111888 - 111900 . doi: 10.1109/access.2022.3213328 http://dx.doi.org/10.1109/access.2022.3213328

孙建波，张叶，常旭岭 . 基于改进Mask R-CNN+LaneNet的车载图像车辆压线检测［J］. 光学精密工程， 2022 ， 30 （ 7 ）： 854 - 868 . doi: 10.37188/ope.20223007.0854 http://dx.doi.org/10.37188/ope.20223007.0854

SUN J B ， ZHANG Y ， CHANG X L . Vehicle pressure line detection based on improved Mask R-CNN+LaneNet ［J］. Opt. Precision Eng. ， 2022 ， 30 （ 7 ）： 854 - 868 . （in Chinese） . doi: 10.37188/ope.20223007.0854 http://dx.doi.org/10.37188/ope.20223007.0854

HE K M ， GKIOXARI G ， DOLLÁR P ， et al . Mask R-CNN ［C］. 2017 IEEE International Conference on Computer Vision （ICCV） . 22 - 29 ， 2017， Venice， Italy. IEEE ， 2017： 2980 - 2988 . doi: 10.1109/iccv.2017.322 http://dx.doi.org/10.1109/iccv.2017.322

WANG Y Q ， XU Z L ， SHEN H ， et al . Centermask： single shot instance segmentation with point representation ［C］. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . 13 - 19 ， 2020， Seattle， WA， USA. IEEE ， 2020： 9310 - 9318 . doi: 10.1109/cvpr42600.2020.00933 http://dx.doi.org/10.1109/cvpr42600.2020.00933

BOLYA D ， ZHOU C ， XIAO F Y ， et al . YOLACT： real-time instance segmentation ［C］. 2019 IEEE/CVF International Conference on Computer Vision （ICCV）. October 27 - November 2 ， 2019 ， Seoul， Korea （South）. IEEE ， 2020 ： 9156 - 9165 . doi: 10.1109/iccv.2019.00925 http://dx.doi.org/10.1109/iccv.2019.00925

CHEN H ， SUN K Y ， TIAN Z ， et al . Blendmask： top-down meets bottom-up for instance segmentation ［C］. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . 13 - 19 ， 2020， Seattle， WA， USA. IEEE ， 2020： 8570 - 8578 . doi: 10.1109/cvpr42600.2020.00860 http://dx.doi.org/10.1109/cvpr42600.2020.00860

YANG L J ， FAN Y C ， XU N . Video instance segmentation ［C］. 2019 IEEE/CVF International Conference on Computer Vision （ICCV）. 272，2019 ， Seoul， Korea （South）. IEEE ， 2020 ： 5187 - 5196 . doi: 10.1109/iccv.2019.00529 http://dx.doi.org/10.1109/iccv.2019.00529

CAO J L ， ANWER R M ， CHOLAKKAL H ， et al . SipMask ： Spatial Information Preservation for Fast Image and Video Instance Segmentation ［M］. Computer Vision-ECCV 2020. Cham ： Springer International Publishing ， 2020 ： 1 - 18 . doi: 10.1007/978-3-030-58568-6_1 http://dx.doi.org/10.1007/978-3-030-58568-6_1

LI M H ， LI S ， LI L D ， et al . Spatial feature calibration and temporal fusion for effective one-stage video instance segmentation ［C］. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . 20 - 25 ， 2021， Nashville， TN， USA. IEEE ， 2021： 11210 - 11219 . doi: 10.1109/cvpr46437.2021.01106 http://dx.doi.org/10.1109/cvpr46437.2021.01106

WANG Y Q ， XU Z L ， WANG X L ， et al . End-to-end video instance segmentation with transformers ［C］. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . 20 - 25 ， 2021， Nashville， TN， USA. IEEE ， 2021： 8737 - 8746 . doi: 10.1109/cvpr46437.2021.00863 http://dx.doi.org/10.1109/cvpr46437.2021.00863

LIU H T ， RIVERA SOTO R A ， XIAO F Y ， et al . Yolactedge： real-time instance segmentation on the edge ［C］. 2021 IEEE International Conference on Robotics and Automation （ICRA）. 305，2021 ， Xi'an， China. IEEE ， 2021 ： 9579 - 9585 . doi: 10.1109/icra48506.2021.9561858 http://dx.doi.org/10.1109/icra48506.2021.9561858

BERTASIUS G ， TORRESANI L . Classifying， segmenting， and tracking object instances in video with mask propagation ［C］. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . 13 - 19 ， 2020， Seattle， WA， USA. IEEE ， 2020： 9736 - 9745 . doi: 10.1109/cvpr42600.2020.00976 http://dx.doi.org/10.1109/cvpr42600.2020.00976

ATHAR A ， MAHADEVAN S ， OS̆EP A ， et al . STEm-Seg ： Spatio-Temporal Embeddings for Instance Segmentation in Videos ［M］. Computer Vision - ECCV 2020. Cham ： Springer International Publishing ， 2020 ： 158 - 177 . doi: 10.1007/978-3-030-58621-8_10 http://dx.doi.org/10.1007/978-3-030-58621-8_10

YANG S S ， FANG Y X ， WANG X G ， et al . Crossover learning for fast online video instance segmentation ［C］. 2021 IEEE/CVF International Conference on Computer Vision （ICCV） . 10 - 17 ， 2021， Montreal， QC， Canada. IEEE ， 2022： 8023 - 8032 . doi: 10.1109/iccv48922.2021.00794 http://dx.doi.org/10.1109/iccv48922.2021.00794

SONG C F ， HUANG Y ， OUYANG W L ， et al . Box-driven class-wise region masking and filling rate guided loss for weakly supervised semantic segmentation ［C］. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . 15 - 20 ， 2019， Long Beach， CA， USA. IEEE ， 2020： 3131 - 3140 . doi: 10.1109/cvpr.2019.00325 http://dx.doi.org/10.1109/cvpr.2019.00325

HSU CC ， HSU KJ ， TSAI CC ， et al . Weakly supervised instance segmentation using the bounding box tightness prior ［J］. Advances in Neural Information Processing Systems . 2019 ： 6582 - 6593 .

WANG X G ， FENG J P ， HU B ， et al . Weakly-supervised instance segmentation via class-agnostic learning with salient images ［C］. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . 20 - 25 ， 2021， Nashville， TN， USA. IEEE ， 2021： 10220 - 10230 . doi: 10.1109/cvpr46437.2021.01009 http://dx.doi.org/10.1109/cvpr46437.2021.01009

TIAN Z ， SHEN C H ， WANG X L ， et al . Boxinst： high-performance instance segmentation with box annotations ［C］. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . 20 - 25 ， 2021， Nashville， TN， USA. IEEE ， 2021： 5439 - 5448 . doi: 10.1109/cvpr46437.2021.00540 http://dx.doi.org/10.1109/cvpr46437.2021.00540

TIAN Z ， SHEN C H ， CHEN H . Conditional Convolutions for Instance Segmentation ［M］. Computer Vision - ECCV 2020. Cham ： Springer International Publishing ， 2020 ： 282 - 298 . doi: 10.1007/978-3-030-58452-8_17 http://dx.doi.org/10.1007/978-3-030-58452-8_17

LIN T Y ， DOLLÁR P ， GIRSHICK R ， et al . Feature pyramid networks for object detection ［C］. 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR） . 21 - 26 ， 2017， Honolulu， HI， USA. IEEE ， 2017： 936 - 944 . doi: 10.1109/cvpr.2017.106 http://dx.doi.org/10.1109/cvpr.2017.106

HU J ， SHEN L ， SUN G . Squeeze-and-excitation networks ［C］. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . 18 - 23 ， 2018， Salt Lake City， UT， USA. IEEE ， 2018： 7132 - 7141 . doi: 10.1109/cvpr.2018.00745 http://dx.doi.org/10.1109/cvpr.2018.00745

黄滢，何自芬，杨宏宽，等 . 极化自注意力调控的情景式视频实例多尺度分割［J］. 计算机学报， 2022 ， 45 （ 12 ）： 2605 - 2618 . doi: 10.11897/SP.J.1016.2022.02605 http://dx.doi.org/10.11897/SP.J.1016.2022.02605

HUANG Y ， HE Z F ， YANG H K ， et al . Multi-scale segmentation of episodic video instance through polarized self-attention manipulation ［J］. Chinese Journal of Computers ， 2022 ， 45 （ 12 ）： 2605 - 2618 . （in Chinese） . doi: 10.11897/SP.J.1016.2022.02605 http://dx.doi.org/10.11897/SP.J.1016.2022.02605

WANG P Q ， CHEN P F ， YUAN Y ， et al . Understanding convolution for semantic segmentation ［C］. 2018 IEEE Winter Conference on Applications of Computer Vision （WACV） . 12 - 15 ， 2018， Lake Tahoe， NV， USA. IEEE ， 2018： 1451 - 1460 . doi: 10.1109/wacv.2018.00163 http://dx.doi.org/10.1109/wacv.2018.00163

Views

660

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

No data

Related Author

No data

Related Institution

No data

AI问答

Address：No.3888 Dong Nanhu Road, Changchun, Jilin, China Postal code：130033
Tel：0431-86176855 Email：gxjmgc@ciomp.ac.cn
Technical support is provided by Beijing Founder electronics co., LTD 吉ICP备11002662号-17 京公网安备11010802024621
It is recommended to read the content of this site in Chrome&IE9+. Please switch to extreme mode in browser 360.
Cookies We use cookies to help provide and enhance our service and tailor content. By continuing, you agree to the use of cookies.

⁰