1.昆明理工大学 机电工程学院,云南 昆明 650000
扫 描 看 全 文
ZHANG Yinhui, HAI Weiqi, HE Zifen, et al. Weakly supervised video instance segmentation with scale adaptive generation regulation. [J]. Optics and Precision Engineering 31(18):2736-2751(2023)
ZHANG Yinhui, HAI Weiqi, HE Zifen, et al. Weakly supervised video instance segmentation with scale adaptive generation regulation. [J]. Optics and Precision Engineering 31(18):2736-2751(2023) DOI: 10.37188/OPE.20233118.2736.
视频实例分割是车辆辅助驾驶多目标感知和场景理解的一项关键技术。针对弱监督视频实例分割仅使用边界框对网络进行训练严重制约交通场景大尺度动态范围目标分割精度的问题,本文提出尺度自适应生成调控弱监督视频实例分割网络(Scale Adaptive Generation Regulation weakly supervised video instance segmentation network,SAGRNet)。首先,设计一种多尺度特征映射贡献度动态自适应调控模块,通过动态调整不同尺度特征映射信息贡献度取代原有的线性加权以强化对目标局部位置和整体轮廓的聚焦能力,解决了车辆、行人等目标由于成像距离远近造成的尺度动态范围过大问题;其次,构建目标实例多细粒度空间信息聚合生成调控模块,通过聚合基于不同空洞率提取的多细粒度空间信息生成权重参数以调控各尺度特征,实现了细化实例边界和增强跨通道信息交互掩码特征映射表征能力,有效弥补了实例边缘信息匮乏导致边缘轮廓分割mask连续性缺失问题。最后,为缓解边界框标签监督信息弱化,引入正交损失和颜色相似性损失缩小模型预测mask与真实边界框偏差并计算逐像素点对间标签属性归类模糊问题。Youtube-VIS2019提取的交通场景数据集实验结果表明,SAGRNet相较于弱监督基准网络平均分割精度提升5.1%达到38.1%,为实现多目标感知和实例级场景理解提供了有效算法依据。
Video instance segmentation is critical in multi-target perception and scene understanding in assisted driving. However, as weakly supervised video instance segmentation is often applied to bounding box annotations for network training, the segmentation accuracies of targets with large-scale dynamic ranges in traffic scenes are severely restricted. To address this issue, we propose a scale adaptive generation regulation weakly supervised video instance segmentation network (SAGRNet). First, a multi-scale feature mapping contribution dynamic adaptive control module is proposed to replace the original linear weighting. This enables placing the focus on the local position and global contour of the target by dynamically adjusting the contribution of different scale feature mapping information, which solves the problem of large-scale dynamic ranges caused by changes in the imaging distance between vehicles and pedestrians. Second, a target instance multi-fine-grained spatial information aggregation generation control module is constructed to regulate the feature maps of each scale using weight parameters, which are obtained by aggregating multi-fine-grained spatial information extracted based on different dilations. This module refines the instance boundary and improves the representation of cross-channel mask interaction information, effectively compensating for the lack of edge contour segmentation mask continuity caused by limited instance edge information. Finally, to alleviate the weak supervision derived from bounding box level annotations, orthogonal and color similarity losses are introduced to reduce the deviation between the model prediction mask and real bounding box and to address the pixel-wise label attribute classification ambiguity problem. Experimental results on a traffic scene dataset extracted from Youtube-VIS2019 indicate that the SAGRNet improves the mean accuracy by 5.1% to 38.1% compared with the weakly supervised baseline. These results prove that our method provides an effective theoretical basis for multi-target perception and instance level scene understanding.
辅助驾驶弱监督视频实例分割自适应生成调控细粒度
assisted drivingweakly supervisedvideo instance segmentationadaptive generation regulationfine grain
BAYU A, WIBISONO A, WISESA H A, et al. Semantic segmentation of lidar point cloud in rural area[C]. 2019 IEEE International Conference on Communication, Networks and Satellite (Comnetsat).1-3, 2019, Makassar, Indonesia. IEEE, 2019: 73-78. doi: 10.1109/comnetsat.2019.8844074http://dx.doi.org/10.1109/comnetsat.2019.8844074
MIHAI S, SHAH P, MAPP G, et al. Towards autonomous driving: a machine learning-based Pedestrian Detection System Using 16-Layer LiDAR[C]. 2020 13th International Conference on Communications (COMM).18-20, 2020, Bucharest, Romania. IEEE, 2020: 271-276. doi: 10.1109/comm48946.2020.9142042http://dx.doi.org/10.1109/comm48946.2020.9142042
伍锡如, 薛其威. 基于激光雷达的无人驾驶系统三维车辆检测[J]. 光学 精密工程, 2022, 30(4):489-497. doi: 10.37188/OPE.20223004.0489http://dx.doi.org/10.37188/OPE.20223004.0489
WU X R, XUE Q W. 3D vehicle detection for unmanned driving systerm based on lidar[J]. Opt. Precision Eng., 2022, 30(4):489-497.(in Chinese). doi: 10.37188/OPE.20223004.0489http://dx.doi.org/10.37188/OPE.20223004.0489
RAGURAMAN S J, PARK J. Intelligent drivable area detection system using camera and lidar sensor for autonomous vehicle[C]. 2020 IEEE International Conference on Electro Information Technology (EIT). IEEE, 2020: 429-436. doi: 10.1109/eit48999.2020.9208327http://dx.doi.org/10.1109/eit48999.2020.9208327
DOVIS F, IMAM R, QIN W J, et al. Opportunistic use of gnss signals to characterize the environment by means of machine learning based processing[C]. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).4-8, 2020, Barcelona, Spain. IEEE, 2020: 9190-9194. doi: 10.1109/icassp40776.2020.9052924http://dx.doi.org/10.1109/icassp40776.2020.9052924
JOUBERT N, REID T G R, NOBLE F. Developments in modern gnss and its impact on autonomous vehicle architectures[C]. 2020 IEEE Intelligent Vehicles Symposium (IV). October 19 - November 13, 2020, Las Vegas, NV, USA. IEEE, 2021: 2029-2036. doi: 10.1109/iv47402.2020.9304840http://dx.doi.org/10.1109/iv47402.2020.9304840
NAVARRO M, ARRIBAS J, VILÀ-VALLS J, et al. Hybrid GNSS/INS/UWB positioning for live demonstration assisted driving[C]. 2019 IEEE Intelligent Transportation Systems Conference (ITSC).27-30, 2019, Auckland, New Zealand. IEEE, 2019: 3294-3301. doi: 10.1109/itsc.2019.8917435http://dx.doi.org/10.1109/itsc.2019.8917435
CHEN C, XIONG G M, ZHANG Z H, et al. 3D LiDAR-GPS/IMU calibration based on hand-eye calibration model for unmanned vehicle[C]. 2020 3rd International Conference on Unmanned Systems (ICUS).27-28, 2020, Harbin, China. IEEE, 2020: 337-341. doi: 10.1109/icus50048.2020.9274947http://dx.doi.org/10.1109/icus50048.2020.9274947
王中宇, 倪显扬, 尚振东. 利用卷积神经网络的自动驾驶场景语义分割[J]. 光学 精密工程, 2019, 27(11): 2429-2438. doi: 10.3788/ope.20192711.2429http://dx.doi.org/10.3788/ope.20192711.2429
WANG Z Y, NI X Y, SHANG Z D. Autonomous driving semantic segmentation with convolution neural networks[J]. Opt. Precision Eng., 2019, 27(11): 2429-2438.(in Chinese). doi: 10.3788/ope.20192711.2429http://dx.doi.org/10.3788/ope.20192711.2429
LEE K F, CHEN X Z, YU C W, et al. An intelligent driving assistance system based on lightweight deep learning models[J]. IEEE Access, 2022, 10: 111888-111900. doi: 10.1109/access.2022.3213328http://dx.doi.org/10.1109/access.2022.3213328
孙建波, 张叶, 常旭岭. 基于改进Mask R-CNN+LaneNet的车载图像车辆压线检测[J]. 光学 精密工程, 2022, 30(7):854-868. doi: 10.37188/ope.20223007.0854http://dx.doi.org/10.37188/ope.20223007.0854
SUN J B, ZHANG Y, CHANG X L. Vehicle pressure line detection based on improved Mask R-CNN+LaneNet[J]. Opt. Precision Eng., 2022, 30(7):854-868.(in Chinese). doi: 10.37188/ope.20223007.0854http://dx.doi.org/10.37188/ope.20223007.0854
HE K M, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[C]. 2017 IEEE International Conference on Computer Vision (ICCV).22-29, 2017, Venice, Italy. IEEE, 2017: 2980-2988. doi: 10.1109/iccv.2017.322http://dx.doi.org/10.1109/iccv.2017.322
WANG Y Q, XU Z L, SHEN H, et al. Centermask: single shot instance segmentation with point representation[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).13-19, 2020, Seattle, WA, USA. IEEE, 2020: 9310-9318. doi: 10.1109/cvpr42600.2020.00933http://dx.doi.org/10.1109/cvpr42600.2020.00933
BOLYA D, ZHOU C, XIAO F Y, et al. YOLACT: real-time instance segmentation[C]. 2019 IEEE/CVF International Conference on Computer Vision (ICCV). October 27 - November 2, 2019, Seoul, Korea (South). IEEE, 2020: 9156-9165. doi: 10.1109/iccv.2019.00925http://dx.doi.org/10.1109/iccv.2019.00925
CHEN H, SUN K Y, TIAN Z, et al. Blendmask: top-down meets bottom-up for instance segmentation[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).13-19, 2020, Seattle, WA, USA. IEEE, 2020: 8570-8578. doi: 10.1109/cvpr42600.2020.00860http://dx.doi.org/10.1109/cvpr42600.2020.00860
YANG L J, FAN Y C, XU N. Video instance segmentation[C]. 2019 IEEE/CVF International Conference on Computer Vision (ICCV).272,2019, Seoul, Korea (South). IEEE, 2020: 5187-5196. doi: 10.1109/iccv.2019.00529http://dx.doi.org/10.1109/iccv.2019.00529
CAO J L, ANWER R M, CHOLAKKAL H, et al. SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation[M]. Computer Vision-ECCV 2020. Cham: Springer International Publishing, 2020: 1-18. doi: 10.1007/978-3-030-58568-6_1http://dx.doi.org/10.1007/978-3-030-58568-6_1
LI M H, LI S, LI L D, et al. Spatial feature calibration and temporal fusion for effective one-stage video instance segmentation[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).20-25, 2021, Nashville, TN, USA. IEEE, 2021: 11210-11219. doi: 10.1109/cvpr46437.2021.01106http://dx.doi.org/10.1109/cvpr46437.2021.01106
WANG Y Q, XU Z L, WANG X L, et al. End-to-end video instance segmentation with transformers[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).20-25, 2021, Nashville, TN, USA. IEEE, 2021: 8737-8746. doi: 10.1109/cvpr46437.2021.00863http://dx.doi.org/10.1109/cvpr46437.2021.00863
LIU H T, RIVERA SOTO R A, XIAO F Y, et al. Yolactedge: real-time instance segmentation on the edge[C]. 2021 IEEE International Conference on Robotics and Automation (ICRA).305,2021, Xi'an, China. IEEE, 2021: 9579-9585. doi: 10.1109/icra48506.2021.9561858http://dx.doi.org/10.1109/icra48506.2021.9561858
BERTASIUS G, TORRESANI L. Classifying, segmenting, and tracking object instances in video with mask propagation[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).13-19, 2020, Seattle, WA, USA. IEEE, 2020: 9736-9745. doi: 10.1109/cvpr42600.2020.00976http://dx.doi.org/10.1109/cvpr42600.2020.00976
ATHAR A, MAHADEVAN S, OS̆EP A, et al. STEm-Seg: Spatio-Temporal Embeddings for Instance Segmentation in Videos[M]. Computer Vision - ECCV 2020. Cham: Springer International Publishing, 2020: 158-177. doi: 10.1007/978-3-030-58621-8_10http://dx.doi.org/10.1007/978-3-030-58621-8_10
YANG S S, FANG Y X, WANG X G, et al. Crossover learning for fast online video instance segmentation[C]. 2021 IEEE/CVF International Conference on Computer Vision (ICCV).10-17, 2021, Montreal, QC, Canada. IEEE, 2022: 8023-8032. doi: 10.1109/iccv48922.2021.00794http://dx.doi.org/10.1109/iccv48922.2021.00794
SONG C F, HUANG Y, OUYANG W L, et al. Box-driven class-wise region masking and filling rate guided loss for weakly supervised semantic segmentation[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).15-20, 2019, Long Beach, CA, USA. IEEE, 2020: 3131-3140. doi: 10.1109/cvpr.2019.00325http://dx.doi.org/10.1109/cvpr.2019.00325
HSU CC, HSU KJ, TSAI CC, et al. Weakly supervised instance segmentation using the bounding box tightness prior[J]. Advances in Neural Information Processing Systems. 2019: 6582-6593.
WANG X G, FENG J P, HU B, et al. Weakly-supervised instance segmentation via class-agnostic learning with salient images[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).20-25, 2021, Nashville, TN, USA. IEEE, 2021: 10220-10230. doi: 10.1109/cvpr46437.2021.01009http://dx.doi.org/10.1109/cvpr46437.2021.01009
TIAN Z, SHEN C H, WANG X L, et al. Boxinst: high-performance instance segmentation with box annotations[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).20-25, 2021, Nashville, TN, USA. IEEE, 2021: 5439-5448. doi: 10.1109/cvpr46437.2021.00540http://dx.doi.org/10.1109/cvpr46437.2021.00540
TIAN Z, SHEN C H, CHEN H. Conditional Convolutions for Instance Segmentation[M]. Computer Vision - ECCV 2020. Cham: Springer International Publishing, 2020: 282-298. doi: 10.1007/978-3-030-58452-8_17http://dx.doi.org/10.1007/978-3-030-58452-8_17
LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 21-26, 2017, Honolulu, HI, USA. IEEE, 2017: 936-944. doi: 10.1109/cvpr.2017.106http://dx.doi.org/10.1109/cvpr.2017.106
HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.18-23, 2018, Salt Lake City, UT, USA. IEEE, 2018: 7132-7141. doi: 10.1109/cvpr.2018.00745http://dx.doi.org/10.1109/cvpr.2018.00745
黄滢, 何自芬, 杨宏宽, 等. 极化自注意力调控的情景式视频实例多尺度分割[J]. 计算机学报, 2022, 45(12): 2605-2618. doi: 10.11897/SP.J.1016.2022.02605http://dx.doi.org/10.11897/SP.J.1016.2022.02605
HUANG Y, HE Z F, YANG H K, et al. Multi-scale segmentation of episodic video instance through polarized self-attention manipulation[J]. Chinese Journal of Computers, 2022, 45(12): 2605-2618.(in Chinese). doi: 10.11897/SP.J.1016.2022.02605http://dx.doi.org/10.11897/SP.J.1016.2022.02605
WANG P Q, CHEN P F, YUAN Y, et al. Understanding convolution for semantic segmentation[C]. 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).12-15, 2018, Lake Tahoe, NV, USA. IEEE, 2018: 1451-1460. doi: 10.1109/wacv.2018.00163http://dx.doi.org/10.1109/wacv.2018.00163
0
Views
17
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution