浏览全部资源
扫码关注微信
1.天津理工大学 天津市先进机电系统设计与智能控制重点实验室,天津 300384
2.天津理工大学 机电工程国家级实验教学示范中心, 天津 300384
3.中国科学院 长春光学精密机械与物理研究所,吉林 长春 130033
4.天津卓越信通科技有限公司,天津 300384
Received:12 September 2022,
Revised:01 October 2022,
Published:25 April 2023
移动端阅览
任凤雷,杨璐,周海波等.基于改进BiSeNet的实时图像语义分割[J].光学精密工程,2023,31(08):1217-1227.
REN Fenglei,YANG Lu,ZHOU Haibo,et al.Real-time semantic segmentation based on improved BiSeNet[J].Optics and Precision Engineering,2023,31(08):1217-1227.
任凤雷,杨璐,周海波等.基于改进BiSeNet的实时图像语义分割[J].光学精密工程,2023,31(08):1217-1227. DOI: 10.37188/OPE.20233108.1217.
REN Fenglei,YANG Lu,ZHOU Haibo,et al.Real-time semantic segmentation based on improved BiSeNet[J].Optics and Precision Engineering,2023,31(08):1217-1227. DOI: 10.37188/OPE.20233108.1217.
为了提升图像语义分割算法的性能,使其同时满足准确性和实时性需求,本文提出了一种基于改进BiSeNet的实时图像语义分割算法。首先,通过使双分支网络头部共享以消除BiSeNet网络结构部分通道和参数的冗余,同时有效提取图像的浅层特征;然后,将上述共享网络拆分为由细节分支和语义分支组成的双分支网络,并分别用于提取空间细节信息和语义上下文信息;此外,在语义分支尾部引入通道和空间注意力机制以增强特征表达能力,通过使用双注意力机制对BiSeNet算法进行优化以更有效地提取语义上下文特征;最后,对细节分支和语义分支的特征进行融合并通过上采样操作恢复至输入图像分辨率大小以实现图像语义分割。本文算法在Cityscapes数据集以95.3FPS的实时性表现达到77.2% mIoU的准确性;在CamVid数据集以179.1 FPS的实时性表现达到73.8% mIoU的准确性。实验结果表明,本文算法在实时性和准确性方面获得了很好的平衡,其语义分割性能相较于BiSeNet算法及其它现有算法得到了显著的提升。
To improve the performance of image semantic segmentation on accuracy and efficiency for practical applications, in this study, we propose a real-time semantic segmentation algorithm based on improved BiSeNet. First, the redundancy of certain channels and parameters of BiSeNet is eliminated by sharing the heads of dual branches, and the affluent shallow features are effectively extracted at the same time. Subsequently, the shared layers are divided into dual branches, namely, the detail branch and the semantic branch, which are used to extract detailed spatial information and contextual semantic information, respectively. Furthermore, both the channel attention mechanism and spatial attention mechanism are introduced into the tail of the semantic branch to enhance the feature representation; thus the BiSeNet is optimized by using dual attention mechanisms to extract contextual semantic features more effectively. Finally, the features of the detail branch and semantic branch are fused and up-sampled to the resolution of the input image to obtain semantic segmentation. Our proposed algorithm achieves 77.2% mIoU on accuracy with real-time performance of 95.3 FPS on Cityscapes dataset and 73.8% mIoU on accuracy with real-time performance of 179.1 FPS on CamVid dataset. The experiments demonstrate that our proposed semantic segmentation algorithm achieves a good trade-off between accuracy and efficiency. Furthermore, the performance of semantic segmentation is significantly improved compared with BiSeNet and other existing algorithms.
王中宇 , 倪显扬 , 尚振东 . 利用卷积神经网络的自动驾驶场景语义分割 [J]. 光学 精密工程 , 2019 , 27 ( 11 ): 2429 - 2438 . doi: 10.3788/ope.20192711.2429 http://dx.doi.org/10.3788/ope.20192711.2429
WANG ZH Y , NI X Y , SHANG ZH D . Autonomous driving semantic segmentation with convolution neural networks [J]. Opt. Precision Eng. , 2019 , 27 ( 11 ): 2429 - 2438 . (in Chinese) . doi: 10.3788/ope.20192711.2429 http://dx.doi.org/10.3788/ope.20192711.2429
RONNEBERGER O , FISCHER P , BROX T . U-Net : Convolutional Networks For Biomedical Image Segmentation [M]. Lecture Notes in Computer Science . Cham : Springer International Publishing , 2015 : 234 - 241 . doi: 10.1007/978-3-319-24574-4_28 http://dx.doi.org/10.1007/978-3-319-24574-4_28
CHEN J , LU Y , YU Q , et al . TransUNet : Transformers Make Strong Encoders For Medical Image Segmentation [EB/OL]. 2021 : arXiv : 2102 . 04306 . https://arxiv.org/abs/2102.04306 https://arxiv.org/abs/2102.04306
王雅男 , 王挺峰 , 田玉珍 , 等 . 基于改进的局部表面凸性算法三维点云分割 [J]. 中国光学 , 2017 , 10 ( 3 ): 348 - 354 . doi: 10.3788/co.20171003.0348 http://dx.doi.org/10.3788/co.20171003.0348
WANG Y N , WANG T F , TIAN Y Z , et al . Improved local convexity algorithm of segmentation for 3D point cloud [J]. Chinese Optics , 2017 , 10 ( 3 ): 348 - 354 . (in Chinese) . doi: 10.3788/co.20171003.0348 http://dx.doi.org/10.3788/co.20171003.0348
REN F L , ZHOU H B , YANG L , et al . ADPNet: Attention based dual path network for lane detection [J]. Journal of Visual Communication and Image Representation , 2022 , 87 : 103574 . doi: 10.1016/j.jvcir.2022.103574 http://dx.doi.org/10.1016/j.jvcir.2022.103574
LONG J , SHELHAMER E , DARRELL T . Fully Convolutional Networks for Semantic Segmentation [C]. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 712,2015 , Boston, MA, USA. IEEE , 2015 : 3431 - 3440 . doi: 10.1109/cvpr.2015.7298965 http://dx.doi.org/10.1109/cvpr.2015.7298965
CHEN L C , PAPANDREOU G , KOKKINOS I , et al . DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2018 , 40 ( 4 ): 834 - 848 . doi: 10.1109/tpami.2017.2699184 http://dx.doi.org/10.1109/tpami.2017.2699184
CHEN L C , PAPANDREOU G , SCHROFF F , et al . Rethinking Atrous Convolution for Semantic Image Segmentation [EB/OL]. 2017 : arXiv : 1706 . 05587 . https://arxiv.org/abs/1706.05587 https://arxiv.org/abs/1706.05587 . doi: 10.1007/978-3-030-01234-2_49 http://dx.doi.org/10.1007/978-3-030-01234-2_49
CHEN L C , ZHU Y K , PAPANDREOU G , et al . Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [M]. Computer Vision-ECCV 2018 . Cham : Springer International Publishing , 2018 : 833 - 851 . doi: 10.1007/978-3-030-01234-2_49 http://dx.doi.org/10.1007/978-3-030-01234-2_49
任凤雷 , 何昕 , 魏仲慧 , 等 . 基于DeepLabV3+与超像素优化的语义分割 [J]. 光学 精密工程 , 2019 , 27 ( 12 ): 2722 - 2729 . doi: 10.3788/ope.20192712.2722 http://dx.doi.org/10.3788/ope.20192712.2722
REN F L , HE X , WEI ZH H , et al . Semantic segmentation based on DeepLabV3+ and superpixel optimization [J]. Opt. Precision Eng. , 2019 , 27 ( 12 ): 2722 - 2729 . (in Chinese) . doi: 10.3788/ope.20192712.2722 http://dx.doi.org/10.3788/ope.20192712.2722
WANG J D , SUN K , CHENG T H , et al . Deep high-resolution representation learning for visual recognition [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2021 , 43 ( 10 ): 3349 - 3364 . doi: 10.1109/tpami.2020.2983686 http://dx.doi.org/10.1109/tpami.2020.2983686
PASZKE A , CHAURASIA A , KIM S , et al . ENet : A Deep Neural Network Architecture for Real-Time Semantic Segmentation [EB/OL]. 2016 : arXiv : 1606 . 02147 . https://arxiv.org/abs/1606.02147 https://arxiv.org/abs/1606.02147
WANG H C , JIANG X L , REN H B , et al . SwiftNet: real-time video object segmentation [C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2025,2021 , Nashville, TN, USA. IEEE , 2021 : 1296 - 1305 . doi: 10.1109/cvpr46437.2021.00135 http://dx.doi.org/10.1109/cvpr46437.2021.00135
LI H C , XIONG P F , FAN H Q , et al . DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation [C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1520,2019 , Long Beach, CA, USA. IEEE , 2020 : 9514 - 9523 . doi: 10.1109/cvpr.2019.00975 http://dx.doi.org/10.1109/cvpr.2019.00975
FAN M Y , LAI S Q , HUANG J S , et al . Rethinking BiSeNet for Real-Time Semantic Segmentation [C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2025,2021 , Nashville, TN, USA. IEEE , 2021 : 9711 - 9720 . doi: 10.1109/cvpr46437.2021.00959 http://dx.doi.org/10.1109/cvpr46437.2021.00959
HU P , PERAZZI F , HEILBRON F C , et al . Real-time semantic segmentation with fast attention [J]. IEEE Robotics and Automation Letters , 2021 , 6 ( 1 ): 263 - 270 . doi: 10.1109/lra.2020.3039744 http://dx.doi.org/10.1109/lra.2020.3039744
YU C Q , WANG J B , PENG C , et al . BiSeNet : Bilateral Segmentation Network for Real-Time Semantic Segmentation [M]. Computer Vision - ECCV 2018 . Cham : Springer International Publishing , 2018 : 334 - 349 . doi: 10.1007/978-3-030-01261-8_20 http://dx.doi.org/10.1007/978-3-030-01261-8_20
YU C Q , GAO C X , WANG J B , et al . BiSeNet V2: bilateral network with guided aggregation for real-time semantic segmentation [J]. International Journal of Computer Vision , 2021 , 129 ( 11 ): 3051 - 3068 . doi: 10.1007/s11263-021-01515-2 http://dx.doi.org/10.1007/s11263-021-01515-2
HONG Y , PAN H , SUN W , et al . Deep Dual-Resolution Networks for Real-Time and Accurate Semantic Segmentation of Road Scenes [EB/OL]. 2021 : arXiv : 2101 . 06085 . https://arxiv.org/abs/2101.06085 https://arxiv.org/abs/2101.06085
POUDEL R P K , LIWICKI S , CIPOLLA R . Fast-SCNN : Fast Semantic Segmentation Network [EB/OL]. 2019 : arXiv : 1902 . 04502 . https://arxiv.org/abs/1902.04502 https://arxiv.org/abs/1902.04502
ZHAO H S , SHI J P , QI X J , et al . Pyramid Scene Parsing Network [C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2126,2017 , Honolulu, HI, USA. IEEE , 2017 : 6230 - 6239 . doi: 10.1109/cvpr.2017.660 http://dx.doi.org/10.1109/cvpr.2017.660
KUMAAR S , LYU Y , NEX F , et al . CABiNet: Efficient Context Aggregation Network for Low-Latency Semantic Segmentation [C]. 2021 IEEE International Conference on Robotics and Automation (ICRA). 305,2021 , Xi'an, China. IEEE , 2021 : 13517 - 13524 . doi: 10.1109/icra48506.2021.9560977 http://dx.doi.org/10.1109/icra48506.2021.9560977
HE K M , ZHANG X Y , REN S Q , et al . Deep Residual Learning for Image Recognition [C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). June 27 - 30 , 2016 . Las Vegas, NV, USA. IEEE , 2016 : 770 - 778 .
SANDLER M , HOWARD A , ZHU M L , et al . MobileNetV2: Inverted Residuals and Linear Bottlenecks [C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. June 18 - 23 , 2018 . Salt Lake City, UT. IEEE , 2018 : 4510 - 4520 .
ZHANG X Y , ZHOU X Y , LIN M X , et al . ShuffleNet: an extremely efficient convolutional neural network for mobile devices [C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . 1823,2018 , Salt Lake City, UT, USA . IEEE , 2018 : 6848 - 6856 . doi: 10.1109/cvpr.2018.00716 http://dx.doi.org/10.1109/cvpr.2018.00716
CHOLLET F . Xception: deep learning with depthwise separable convolutions [C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2126,2017 , Honolulu, HI, USA. IEEE , 2017 : 1800 - 1807 . doi: 10.1109/cvpr.2017.195 http://dx.doi.org/10.1109/cvpr.2017.195
VASWANI A , SHAZEER N , PARMAR N , et al . Attention is all you need [J]. Advances in neural information processing systems , 2017 , 30 .
HU J , SHEN L , ALBANIE S , et al . Squeeze-and-excitation networks [C]. IEEE Transactions on Pattern Analysis and Machine Intelligence . 29,2019 , IEEE , 2019 : 2011 - 2023 . doi: 10.1109/tpami.2019.2913372 http://dx.doi.org/10.1109/tpami.2019.2913372
IOFFE S , SZEGEDY C . Batch normalization: accelerating deep network training by reducing internal covariate shift [C]. Proceedings of the 32nd International Conference on International Conference on Machine Learning-Volume 37. New York : ACM , 2015 : 448 - 456 .
GLOROT X , BORDES A , BENGIO Y . Deep sparse rectifier neural networks [C]. Proceedings of the fourteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings , 2011 : 315 - 323 .
FU J , LIU J , TIAN H J , et al . Dual attention network for scene segmentation [C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1520,2019 , Long Beach, CA, USA. IEEE , 2020 : 3141 - 3149 . doi: 10.1109/cvpr.2019.00326 http://dx.doi.org/10.1109/cvpr.2019.00326
CORDTS M , OMRAN M , RAMOS S , et al . The cityscapes dataset for semantic urban scene understanding [C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . IEEE , 2016 : 3213 - 3223 . doi: 10.1109/cvpr.2016.350 http://dx.doi.org/10.1109/cvpr.2016.350
BROSTOW GJ , FAUQUEUR J , CIPOLLA R . Semantic object classes in video: a high-definition ground truth database [J]. Pattern Recognition Letters , 2009 , 30 ( 2 ): 88 - 97 . doi: 10.1016/j.patrec.2008.04.005 http://dx.doi.org/10.1016/j.patrec.2008.04.005
BADRINARAYANAN V , KENDALL A , CIPOLLA R . SegNet: a deep convolutional encoder-decoder architecture for image segmentation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2017 , 39 ( 12 ): 2481 - 2495 . doi: 10.1109/tpami.2016.2644615 http://dx.doi.org/10.1109/tpami.2016.2644615
王曦 , 于鸣 , 任洪娥 . UNET与FPN相结合的遥感图像语义分割 [J]. 液晶与显示 , 2021 , 36 ( 3 ): 475 - 483 . doi: 10.37188/CJLCD.2020-0116 http://dx.doi.org/10.37188/CJLCD.2020-0116
WANG X , YU M , REN H E . Remote sensing image semantic segmentation combining UNET and FPN [J]. Chinese Journal of Liquid Crystals and Displays , 2021 , 36 ( 3 ): 475 - 483 . (in Chinese) . doi: 10.37188/CJLCD.2020-0116 http://dx.doi.org/10.37188/CJLCD.2020-0116
沈言善 , 王阿川 . 基于深度学习的遥感图像地物分割方法 [J]. 液晶与显示 , 2021 , 36 ( 5 ): 733 - 740 . doi: 10.37188/CJLCD.2020-0294 http://dx.doi.org/10.37188/CJLCD.2020-0294
SHEN Y SH , WANG A CH . Remote sensing image feature segmentation method based on deep learning [J]. Chinese Journal of Liquid Crystals and Displays , 2021 , 36 ( 5 ): 733 - 740 . (in Chinese) . doi: 10.37188/CJLCD.2020-0294 http://dx.doi.org/10.37188/CJLCD.2020-0294
0
Views
1022
下载量
6
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution