融合CNN和Transformer的遥感图像建筑物快速提取

张云佐; 郭威; 武存宇

doi:10.37188/OPE.20233111.1700

您当前的位置：

首页 >

文章列表页 >

融合CNN和Transformer的遥感图像建筑物快速提取

信息科学 | 更新时间：2023-07-08

- 融合CNN和Transformer的遥感图像建筑物快速提取
- Fast extraction of buildings from remote sensing images by fusion of CNN and Transformer
- 光学精密工程 2023年31卷第11期页码：1700-1709
- 作者机构：
  
  1.石家庄铁道大学信息科学与技术学院，河北石家庄 050043
  2.河北省电磁环境效应与信息处理重点实验室，河北石家庄 050043
- 作者简介：
  
  [ "张云佐（1984-），男，河北石家庄人，博士，副教授，博士生导师，2016年于北京理工大学获得博士学位，主要从事计算机视觉、人工智能、大数据方面的研究。E-mail： zhangyunzuo888@sina.com" ]
  郭威（1998-），女，河北石家庄人，硕士研究生，2020年于保定学院获得学士学位，主要从事图像处理，目标检测方面的研究。E-mail：gw19981225 @126.com
- 基金信息：
  
  国家自然科学基金资助项目(61702347;62027801);河北省自然科学基金资助项目(F202210007;F2017210161);河北省高等学校科学技术研究项目(ZD2022100;QN2017132);中央引导地方科技发展资金项目(226Z0501G)
- DOI：10.37188/OPE.20233111.1700
  中图分类号： TP751
- 收稿日期：2022-09-15，
  
  修回日期：2022-10-12，
  
  纸质出版日期：2023-06-10
- 稿件说明：
移动端阅览
张云佐,郭威,武存宇.融合CNN和Transformer的遥感图像建筑物快速提取[J].光学精密工程,2023,31(11):1700-1709.

ZHANG Yunzuo,GUO Wei,WU Cunyu.Fast extraction of buildings from remote sensing images by fusion of CNN and Transformer[J].Optics and Precision Engineering,2023,31(11):1700-1709.
张云佐,郭威,武存宇.融合CNN和Transformer的遥感图像建筑物快速提取[J].光学精密工程,2023,31(11):1700-1709. DOI： 10.37188/OPE.20233111.1700.

ZHANG Yunzuo,GUO Wei,WU Cunyu.Fast extraction of buildings from remote sensing images by fusion of CNN and Transformer[J].Optics and Precision Engineering,2023,31(11):1700-1709. DOI： 10.37188/OPE.20233111.1700.

摘要

遥感图像建筑物高效提取在城市规划、灾害救援、军事侦察等领域发挥着重要作用。基于深度学习的建筑物提取方法虽然具有很高的精准度，但通常是由复杂的卷积运算和极大的网络模型实现的，提取速度低，难以满足现实需求。为此，设计了一种遥感图像建筑物快速提取方法。在STTNet模型的特征提取网络中引入多尺度卷积，在同一卷积层内提取多尺度特征，进一步提高模型的特征提取能力。改进空间稀疏特征提取器结构，在带有空间注意力权值的特征图中应用通道注意力，有效学习通道注意力权值，进而解决使用骨干网络输出特征图学习时通道注意力权值浮动的问题。为降低模型参数量，加快模型的运算速度，将STTNet模型由并联结构改为串联结构。INRIA建筑物数据集上的实验表明，本文方法在保证精度和IoU的前提下速度比STTNet提升了18.3%，明显优于主流方法。

Abstract

The efficient extraction of buildings from remote sensing images plays an important role in urban planning， disaster rescue， and military reconnaissance. Building extraction methods based on deep learning have made significant progress in accuracy， especially with the sparse token transformer network （STTNet） achieving extremely high accuracy. However， these methods are usually implemented using complex convolution operations in extremely large network models， which results in low extraction speed， thereby presenting difficulties in fulfilling practical needs. Therefore， in this study， a method is designed for the fast extraction of buildings from remote sensing images. First， multi-scale convolution is introduced into the feature extraction network of the STTNet model， whereby multi-scale features are extracted in the same convolution layer to further improve the feature extraction capability of the model. Second， channel attention is applied to the feature map of the force weights， to effectively learn channel attention weights， thereby solving the problem of floating channel attention weights when using the backbone network to output the learned feature map. Finally， to reduce the number of model parameters and speed up the model， the STTNet model structure is changed from parallel to series. Experiments on the INRIA building dataset show that in terms of accuracy and the intersection over union （IoU） metric， the proposed method is 18.3% faster than STTNet and thus better than current mainstream methods.

关键词

Keywords

references

徐胜军，欧阳朴衍，郭学源，等 . 多尺度特征融合空洞卷积ResNet遥感图像建筑物分割［J］. 光学精密工程， 2020 ， 28 （ 7 ）： 1588 - 1599 . doi: 10.37188/OPE.20202807.1588 http://dx.doi.org/10.37188/OPE.20202807.1588

XU SH J ， OUYANG P Y ， GUO X Y ， et al . Building segmentation in remote sensing image based on multiscale-feature fusion dilated convolution resnet ［J］. Optics and Precision Engineering ， 2020 ， 28 （ 7 ）： 1588 - 1599 . （in Chinese） . doi: 10.37188/OPE.20202807.1588 http://dx.doi.org/10.37188/OPE.20202807.1588

王舒洋，慕晓冬，杨东方，等 . 融合高阶信息的遥感影像建筑物自动提取［J］. 光学精密工程， 2019 ， 27 （ 11 ）： 2474 - 2483 . doi: 10.3788/ope.20192711.2474 http://dx.doi.org/10.3788/ope.20192711.2474

WANG S Y ， MU X D ， YANG D F ， et al . High-order statistics integration method for automatic building extraction of remote sensing images ［J］. Optics and Precision Engineering ， 2019 ， 27 （ 11 ）： 2474 - 2483 . （in Chinese） . doi: 10.3788/ope.20192711.2474 http://dx.doi.org/10.3788/ope.20192711.2474

ZHANG Z X ， WANG Y H . JointNet： a common neural network for road and building extraction ［J］. Remote Sensing ， 2019 ， 11 （ 6 ）： 696 . doi: 10.3390/rs11060696 http://dx.doi.org/10.3390/rs11060696

PAN X R ， YANG F ， GAO L R ， et al . Building extraction from high-resolution aerial imagery using a generative adversarial network with spatial and channel attention mechanisms ［J］. Remote Sensing ， 2019 ， 11 （ 8 ）： 917 . doi: 10.3390/rs11080917 http://dx.doi.org/10.3390/rs11080917

LUC P ， COUPRIE C ， CHINTALA S ， et al . Semantic segmentation using adversarial networks ［EB/OL］. 2016 ： arXiv ： 1611 . 08408 . ［ 2022-08-25 ］. https：//arxiv.org/abs/1611.08408 https://arxiv.org/abs/1611.08408 "

ZHANG X Q ， XIAO Z H ， LI D Y ， et al . Semantic segmentation of remote sensing images using multiscale decoding network ［J］. IEEE Geoscience and Remote Sensing Letters ， 2019 ， 16 （ 9 ）： 1492 - 1496 . doi: 10.1109/lgrs.2019.2901592 http://dx.doi.org/10.1109/lgrs.2019.2901592

LIU P H ， LIU X P ， LIU M X ， et al . Building footprint extraction from high-resolution images via spatial residual inception convolutional neural network ［J］. Remote Sensing ， 2019 ， 11 （ 7 ）： 830 - 848 . doi: 10.3390/rs11070830 http://dx.doi.org/10.3390/rs11070830

HE N J ， FANG L Y ， PLAZA A . Hybrid first and second order attention Unet for building segmentation in remote sensing images ［J］. Science China Information Sciences ， 2020 ， 63 （ 4 ）： 1 - 12 . doi: 10.1007/s11432-019-2791-7 http://dx.doi.org/10.1007/s11432-019-2791-7

ZHENG S X ， LU J C ， ZHAO H S ， et al . Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers ［C］. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. 2025，2021 ， Nashville， TN， USA. IEEE ， 2021 ： 6877 - 6886 . doi: 10.1109/cvpr46437.2021.00681 http://dx.doi.org/10.1109/cvpr46437.2021.00681

ZHAO X ， GUO J Y ， ZHANG Y T ， et al . Memory-augmented transformer for remote sensing image semantic segmentation ［J］. Remote Sensing ， 2021 ， 13 （ 22 ）： 4518 . doi: 10.3390/rs13224518 http://dx.doi.org/10.3390/rs13224518

XU Z Y ， ZHANG W C ， ZHANG T X ， et al . Efficient transformer for remote sensing image segmentation ［J］. Remote Sensing ， 2021 ， 13 （ 18 ）： 3585 . doi: 10.3390/rs13183585 http://dx.doi.org/10.3390/rs13183585

YUAN W ， XU W B . MSST-net： a multi-scale adaptive network for building extraction from remote sensing images based on swin transformer ［J］. Remote Sensing ， 2021 ， 13 （ 23 ）： 4743 . doi: 10.3390/rs13234743 http://dx.doi.org/10.3390/rs13234743

CHEN K Y ， ZOU Z X ， SHI Z W . Building extraction from remote sensing images with sparse token transformers ［J］. Remote Sensing ， 2021 ， 13 （ 21 ）： 4441 - 4462 . doi: 10.3390/rs13214441 http://dx.doi.org/10.3390/rs13214441

LI D ， YAO A B ， CHEN Q F . PSConv ： Squeezing Feature Pyramid into one Compact Poly-scale Convolutional Layer ［M］. Computer Vision - ECCV 2020. Cham ： Springer International Publishing ， 2020 ： 615 - 632 . doi: 10.1007/978-3-030-58589-1_37 http://dx.doi.org/10.1007/978-3-030-58589-1_37

MAGGIORI E ， TARABALKA Y ， CHARPIAT G ， et al . Can semantic labeling methods generalize to any city？ the inria aerial image labeling benchmark［C］. 2017 IEEE International Geoscience and Remote Sensing Symposium （IGARSS）. 2328，2017 ， Fort Worth， TX， USA. IEEE ， 2017 ： 3226 - 3229 . doi: 10.1109/igarss.2017.8127684 http://dx.doi.org/10.1109/igarss.2017.8127684

KHALEL A ， EL-SABAN M . Automatic pixelwise object labeling for aerial imagery using stacked U-nets ［EB/OL］. 2018 ： arXiv ： 1803 . 04953 . https：//arxiv.org/abs/1803.04953 https://arxiv.org/abs/1803.04953 .

LI X ， YAO X J ， FANG Y . Building-A-nets： robust building extraction from high-resolution remote sensing images with adversarial networks ［J］. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing ， 2018 ， 11 （ 10 ）： 3680 - 3687 . doi: 10.1109/jstars.2018.2865187 http://dx.doi.org/10.1109/jstars.2018.2865187

MA J J ， WU L L ， TANG X ， et al . Building extraction of aerial images by a global and multi-scale encoder-decoder network ［J］. Remote Sensing ， 2020 ， 12 （ 15 ）： 2350 . doi: 10.3390/rs12152350 http://dx.doi.org/10.3390/rs12152350

浏览量

495

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

面向单幅遥感图像超分辨率的空间自适应及频率融合网络

面向遥感建筑物提取的轻型多尺度差异网络

融合边缘增强与非局部模块的遥感图像超分辨率重建生成对抗网络

融合分形几何特征Resnet遥感图像建筑物分割

多尺度特征融合空洞卷积ResNet遥感图像建筑物分割