融合注意力机制的改进型DeepLabv3+语义分割

闫河; 雷秋霞; 王旭

doi:10.37188/OPE.20253301.0123CSTR：32169.14.OPE.20253301.0123

您当前的位置：

首页 >

文章列表页 >

融合注意力机制的改进型DeepLabv3+语义分割

信息科学 | 更新时间：2025-03-03

- 融合注意力机制的改进型DeepLabv3+语义分割
- Improved DeepLabv3+ semantic segmentation incorporating attention mechanisms
- 光学精密工程 2025年33卷第1期页码：123-134
- 作者机构：
  
  重庆理工大学两江人工智能学院，重庆 401135
- 作者简介：
  
  [ "闫河（1972-），男，陕西勉县人，博士，教授，硕士生导师，主要从事图像多尺度几何分析、目标跟踪和模式识别等研究。E-mail：yanhe@ cqut.edu.cn" ]
  [ "雷秋霞（1999-），女，四川广安人，2022于西华大学获得学士学位，主要研究方向为与语义分割相结合的定位和建图。E-mail：3486412760@qq.com" ]
- 基金信息：
  
  国家重点研发计划“智能机器人”重点专项项目(2018YFB1308602);国家自然科学基金面上项目(61173184);重庆市自然科学基金项目(cstc2018jcy-jAX0694)
- DOI：10.37188/OPE.20253301.0123CSTR：32169.14.OPE.20253301.0123
  中图分类号： TP391;TP181
- 收稿日期：2024-06-28，
  
  修回日期：2024-08-14，
  
  纸质出版日期：2025-01-10
- 稿件说明：
移动端阅览
闫河,雷秋霞,王旭.融合注意力机制的改进型DeepLabv3+语义分割[J].光学精密工程,2025,33(01):123-134.

YAN He,LEI Qiuxia,WANG Xu.Improved DeepLabv3+ semantic segmentation incorporating attention mechanisms[J].Optics and Precision Engineering,2025,33(01):123-134.
闫河,雷秋霞,王旭.融合注意力机制的改进型DeepLabv3+语义分割[J].光学精密工程,2025,33(01):123-134. DOI： 10.37188/OPE.20253301.0123CSTR：32169.14.OPE.20253301.0123.

YAN He,LEI Qiuxia,WANG Xu.Improved DeepLabv3+ semantic segmentation incorporating attention mechanisms[J].Optics and Precision Engineering,2025,33(01):123-134. DOI： 10.37188/OPE.20253301.0123CSTR：32169.14.OPE.20253301.0123.

摘要

针对DeepLabv3+语义分割网络计算复杂度高、对图像细节提取能力弱、分割的图像边界模糊的问题，提出了一种融合注意力机制的改进型DeepLabv3+语义分割网络。以轻量级网络MobileNetV2为骨干，在保持较高表征能力的同时显著减少模型参数，在骨干网络的低层特征后面加入轻量级、无参数注意力机制（Simple， Parameter-Free Attention Module，SimAM），对输入的特征进行加权，以增强关键特征的提取能力。将ASPP模块的全局平均池化替换成Haar小波变换下采样（Haar Wavelet Downsampling，HWD），以避免丢失空间信息，同时在ASPP模块之后加入外部注意力机制（External Attention，EANet），以更好地利用上下文信息，实现多尺度融合，从而提升语义理解能力和语义分割的准确性。实验结果表明，该模型在VOC2012数据集上相较于原有的DeepLabv3+语义分割模型，平均交并比（mIoU）提高了2.82%。本文提出的改进模型显著提高了模型语义分割的精度，为计算机视觉领域应用提供了新的思路。

Abstract

To address the challenges of high computational complexity， limited detail extraction， and fuzzy boundaries in the current DeepLabv3+ semantic segmentation network， this study proposes an enhanced DeepLabv3+ model incorporating attention mechanisms. Specifically， the lightweight MobileNetV2 is employed as the backbone to balance high representational capacity with a significant reduction in model parameters. A parameter-free lightweight attention mechanism （SimAM） is integrated into the low-level features of the backbone network to prioritize key features and enhance feature extraction capabilities. Furthermore， the global average pooling in the ASPP module is replaced with Haar Wavelet Transform Downsampling （HWD） to preserve spatial information. An External Attention Mechanism （EANet） is also introduced after the ASPP module to leverage contextual information and achieve multi-scale feature fusion， thereby improving semantic understanding and segmentation accuracy. Experimental results demonstrate that the proposed model achieves a 2.82% improvement in mean Intersection over Union （mIoU） on the VOC2012 dataset compared to the original DeepLabv3+ model. This research enhances the precision of semantic segmentation and offers novel insights for advancing applications in computer vision.

关键词

Keywords

references

任凤雷，杨璐，周海波，等 . 基于改进BiSeNet的实时图像语义分割［J］. 光学精密工程， 2023 ， 31 （ 8 ）： 1217 - 1227 . doi: 10.37188/OPE.20233108.1217 http://dx.doi.org/10.37188/OPE.20233108.1217

REN F L ， YANG L ， ZHOU H B ， et al . Real-time semantic segmentation based on improved BiSeNet ［J］. Opt. Precision Eng. ， 2023 ， 31 （ 8 ）： 1217 - 1227 . （in Chinese） . doi: 10.37188/OPE.20233108.1217 http://dx.doi.org/10.37188/OPE.20233108.1217

CHEN B K ， GONG C ， YANG J . Importance-aware semantic segmentation for autonomous vehicles ［J］. IEEE Transactions on Intelligent Transportation Systems ， 20 （ 1 ）： 137 - 148 . doi: 10.1109/tits.2018.2801309 http://dx.doi.org/10.1109/tits.2018.2801309

CHEN L C ， PAPANDREOU G ， SCHROFF F ， et al . Rethinking atrous convolution for semantic image segmentation ［EB/OL］. 2017： 1706 . 05587 . https：//arxiv.org/abs/1706.05587v3 https://arxiv.org/abs/1706.05587v3 . doi: 10.1007/978-3-030-01234-2_49 http://dx.doi.org/10.1007/978-3-030-01234-2_49

LONG J ， SHELHAMER E ， DARRELL T . Fully convolutional networks for semantic segmentation ［C］. 2015 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. June 7 - 12 ， 2015 . Boston， MA， USA. IEEE ， 2015 ： 3431 - 3440 .

RONNEBERGER O ， FISCHER P ， BROX T . U - Net ： Convolutional Networks for Biomedical Image Segmentation ［M］. Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015. Cham ： Springer International Publishing ， 2015 ： 234 - 241 . doi: 10.1007/978-3-319-24574-4_28 http://dx.doi.org/10.1007/978-3-319-24574-4_28

STRUDEL R ， GARCIA R ， LAPTEV I ， et al . Segmenter： transformer for semantic segmentation ［C］. 2021 IEEE/CVF International Conference on Computer Vision （ICCV）. October 10 - 17 ， 2021 . Montreal， QC， Canada. IEEE ， 2021 ： 7262 - 7272 .

KIRILLOV A ， MINTUN E ， RAVI N ， et al . Segment anything ［C］. 2023 IEEE/CVF International Conference on Computer Vision （ICCV）. October 1 - 6 ， 2023 . Paris， France. IEEE ， 2023 ： 4015 - 4026 .

赵为平，陈雨，项松，等 . 基于改进的DeepLabv3+图像语义分割算法研究［J］. 系统仿真学报， 2023 ， 35 （ 11 ）： 2333 - 2344 .

ZHAO W P ， CHEN Y ， XIANG S ， et al . Image semantic segmentation algorithm based on improved DeepLabv3+ ［J］. Journal of System Simulation ， 2023 ， 35 （ 11 ）： 2333 - 2344 . （in Chinese）

CHEN L C ， ZHU Y K ， PAPANDREOU G ， et al . Encoder-decoder with atrous separable convolution for semantic image segmentation ［C］. Computer Vision-ECCV 2018. Cham ： Springer International Publishing ， 2018 ： 833 - 851 . doi: 10.1007/978-3-030-01234-2_49 http://dx.doi.org/10.1007/978-3-030-01234-2_49

王潇棠，闫河，刘建骐，等 . 一种边缘梯度插值的双分支deeplabv3+语义分割模型［J］. 智能系统学报， 2023 ， 18 （ 3 ）： 604 - 612 .

WANG X T ， YAN H ， LIU J Q ， et al . A new deeplabv3+ semantic segmentation model of edge gradient interpolation with double branch structure ［J］. CAAI Transactions on Intelligent Systems ， 2023 ， 18 （ 3 ）： 604 - 612 . （in Chinese）

周羿，刘德儿 . 融合注意力机制及DenseASPP改进的DeeplabV 3+ 遥感图像分割方法［J］. 遥感信息， 2023 ， 38 （ 3 ）： 85 - 92 .

ZHOU Y ， LIU D E . A semantic segmentation method for remote sensing image based on fusion attention mechanism and DenseASPP improved DeeplabV3 + ［J］. Remote Sensing Information ， 2023 ， 38 （ 3 ）： 85 - 92 . （in Chinese）

SANDLER M ， HOWARD A ， ZHU M L ， et al . MobileNetV2： inverted residuals and linear bottlenecks ［C］. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. June 18 - 23 ， 2018 . Salt Lake City， UT. IEEE ， 2018 ： 4510 - 4520 .

HAN D C ， YE T Z ， HAN Y Z ， et al . Agent attention： on the integration of softmax and linear attention ［EB/OL］. 2023： 2312 . 08874 . https：//arxiv.org/abs/2312.08874v3 https://arxiv.org/abs/2312.08874v3 . doi: 10.1007/978-3-031-72973-7_8 http://dx.doi.org/10.1007/978-3-031-72973-7_8

Yang L ， Zhang R Y ， Li L ， et al . Simam： A simple， parameter-free attention module for convolutional neural networks ［C］. International conference on machine learning. PMLR ， 2021 ： 11863 - 11874 . doi: 10.1109/mlbdbi51377.2020.00079 http://dx.doi.org/10.1109/mlbdbi51377.2020.00079

ZHU X Z ， CHENG D Z ， ZHANG Z ， et al . An empirical study of spatial attention mechanisms in deep networks ［C］. 2019 IEEE/CVF International Conference on Computer Vision （ICCV）. October 27-November 2 ， 2019 . Seoul， Korea （South）. IEEE ， 2019： 6688 - 6697 . doi: 10.1109/iccv.2019.00679 http://dx.doi.org/10.1109/iccv.2019.00679

HUANG Z L ， WANG X G ， HUANG L C ， et al . CCNet： criss-cross attention for semantic segmentation ［C］. 2019 IEEE/CVF International Conference on Computer Vision （ICCV）. October 27-November 2 ， 2019 . Seoul， Korea （South）. IEEE ， 2019： 603 - 612 . doi: 10.1109/iccv.2019.00069 http://dx.doi.org/10.1109/iccv.2019.00069

YUAN Y H ， CHEN X L ， WANG J D . Object - contextual Representations for Semantic Segmentation ［M］. Computer Vision-ECCV 2020. Cham ： Springer International Publishing ， 2020 ： 173 - 190 . doi: 10.1007/978-3-030-58539-6_11 http://dx.doi.org/10.1007/978-3-030-58539-6_11

GUO M H ， LIU Z N ， MU T J ， et al . Beyond self-attention： external attention using two linear layers for visual tasks ［J］. IEEE Trans Pattern Anal Mach Intell ， 2023 ， 45 （ 5 ）： 5436 - 5447 .

PAN X R ， GE C J ， LU R ， et al . On the integration of self-attention and convolution ［C］. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. June 18 - 24 ， 2022 . New Orleans， LA， USA. IEEE ， 2022 ： 815 - 825 .

LIN Z H ， FENG M W ， DOS SANTOS C N ， et al . A structured self-attentive sentence embedding ［EB/OL］. 2017： 1703 . 03130 . https：//arxiv.org/abs/1703.03130v1 https://arxiv.org/abs/1703.03130v1 .

YU D J ， WANG H L ， CHEN P Q ， et al . Mixed pooling for convolutional neural networks ［M］. Rough Sets and Knowledge Technology. Cham ： Springer International Publishing ， 2014 ： 364 - 375 . doi: 10.1007/978-3-319-11740-9_34 http://dx.doi.org/10.1007/978-3-319-11740-9_34

HSIAO T Y ， CHANG Y C ， CHOU H H ， et al . Filter-based deep-compression with global average pooling for convolutional networks ［J］. Journal of Systems Architecture ， 2019 ， 95 ： 9 - 18 . doi: 10.1016/j.sysarc.2019.02.008 http://dx.doi.org/10.1016/j.sysarc.2019.02.008

CHENG T H ， WANG X G ， HUANG L C ， et al . Boundary - preserving mask R - CNN ［M］. Computer Vision-ECCV 2020. Cham ： Springer International Publishing ， 2020 ： 660 - 676 . doi: 10.1007/978-3-030-58568-6_39 http://dx.doi.org/10.1007/978-3-030-58568-6_39

YUAN Y H ， XIE J Y ， CHEN X L ， et al . SegFix ： Model - agnostic Boundary Refinement for Segmentation ［M］. Computer Vision-ECCV 2020. Cham ： Springer International Publishing ， 2020 ： 489 - 506 . doi: 10.1007/978-3-030-58610-2_29 http://dx.doi.org/10.1007/978-3-030-58610-2_29

张小国，丁立早，刘亚飞，等 . 基于双注意力模块的FDA-DeepLab语义分割网络［J］. 东南大学学报（自然科学版）， 2022 ， 52 （ 6 ）： 1145 - 1151 .

ZHANG X G ， DING L Z ， LIU Y F ， et al . FDA-DeepLab semantic segmentation network based on dual attention module ［J］. Journal of Southeast University （Natural Science Edition）， 2022 ， 52 （ 6 ）： 1145 - 1151 . （in Chinese）

XU G P ， LIAO W T ， ZHANG X ， et al . Haar wavelet downsampling： a simple but effective downsampling module for semantic segmentation ［J］. Pattern Recognition ， 2023 ， 143 ： 109819 . doi: 10.1016/j.patcog.2023.109819 http://dx.doi.org/10.1016/j.patcog.2023.109819

EVERINGHAM M ， ESLAMI S MALI ， VAN GOOL L ， et al . The pascal visual object classes challenge： a retrospective ［J］. International Journal of Computer Vision ， 2015 ， 111 （ 1 ）： 98 - 136 . doi: 10.1007/s11263-014-0733-5 http://dx.doi.org/10.1007/s11263-014-0733-5

WANG Q L ， WU B G ， ZHU P F ， et al . ECA-net： efficient channel attention for deep convolutional neural networks ［C］. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. June 13 - 19 ， 2020 . Seattle， WA， USA. IEEE ， 2020 ： 11534 - 11542 .

浏览量

253

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于DeepLabV3+与超像素优化的语义分割

基于局部摄影的单目视觉输电线路弧垂测量

基于跨层次聚合网络的实时城市街景语义分割

联合线性引导与网格优化的混凝土裂缝分割