ZHANG Yunzuo,GUO Wei,WU Cunyu.Fast extraction of buildings from remote sensing images by fusion of CNN and Transformer[J].Optics and Precision Engineering,2023,31(11):1700-1709.
ZHANG Yunzuo,GUO Wei,WU Cunyu.Fast extraction of buildings from remote sensing images by fusion of CNN and Transformer[J].Optics and Precision Engineering,2023,31(11):1700-1709. DOI: 10.37188/OPE.20233111.1700.
Fast extraction of buildings from remote sensing images by fusion of CNN and Transformer
The efficient extraction of buildings from remote sensing images plays an important role in urban planning, disaster rescue, and military reconnaissance. Building extraction methods based on deep learning have made significant progress in accuracy, especially with the sparse token transformer network (STTNet) achieving extremely high accuracy. However, these methods are usually implemented using complex convolution operations in extremely large network models, which results in low extraction speed, thereby presenting difficulties in fulfilling practical needs. Therefore, in this study, a method is designed for the fast extraction of buildings from remote sensing images. First, multi-scale convolution is introduced into the feature extraction network of the STTNet model, whereby multi-scale features are extracted in the same convolution layer to further improve the feature extraction capability of the model. Second, channel attention is applied to the feature map of the force weights, to effectively learn channel attention weights, thereby solving the problem of floating channel attention weights when using the backbone network to output the learned feature map. Finally, to reduce the number of model parameters and speed up the model, the STTNet model structure is changed from parallel to series. Experiments on the INRIA building dataset show that in terms of accuracy and the intersection over union (IoU) metric, the proposed method is 18.3% faster than STTNet and thus better than current mainstream methods.
XU SH J , OUYANG P Y , GUO X Y , et al . Building segmentation in remote sensing image based on multiscale-feature fusion dilated convolution resnet [J]. Optics and Precision Engineering , 2020 , 28 ( 7 ): 1588 - 1599 . (in Chinese) . doi: 10.37188/OPE.20202807.1588 http://dx.doi.org/10.37188/OPE.20202807.1588
WANG S Y , MU X D , YANG D F , et al . High-order statistics integration method for automatic building extraction of remote sensing images [J]. Optics and Precision Engineering , 2019 , 27 ( 11 ): 2474 - 2483 . (in Chinese) . doi: 10.3788/ope.20192711.2474 http://dx.doi.org/10.3788/ope.20192711.2474
ZHANG Z X , WANG Y H . JointNet: a common neural network for road and building extraction [J]. Remote Sensing , 2019 , 11 ( 6 ): 696 . doi: 10.3390/rs11060696 http://dx.doi.org/10.3390/rs11060696
PAN X R , YANG F , GAO L R , et al . Building extraction from high-resolution aerial imagery using a generative adversarial network with spatial and channel attention mechanisms [J]. Remote Sensing , 2019 , 11 ( 8 ): 917 . doi: 10.3390/rs11080917 http://dx.doi.org/10.3390/rs11080917
LUC P , COUPRIE C , CHINTALA S , et al . Semantic segmentation using adversarial networks [EB/OL]. 2016 : arXiv : 1611 . 08408 . [ 2022-08-25 ]. https://arxiv.org/abs/1611.08408 https://arxiv.org/abs/1611.08408 "
ZHANG X Q , XIAO Z H , LI D Y , et al . Semantic segmentation of remote sensing images using multiscale decoding network [J]. IEEE Geoscience and Remote Sensing Letters , 2019 , 16 ( 9 ): 1492 - 1496 . doi: 10.1109/lgrs.2019.2901592 http://dx.doi.org/10.1109/lgrs.2019.2901592
LIU P H , LIU X P , LIU M X , et al . Building footprint extraction from high-resolution images via spatial residual inception convolutional neural network [J]. Remote Sensing , 2019 , 11 ( 7 ): 830 - 848 . doi: 10.3390/rs11070830 http://dx.doi.org/10.3390/rs11070830
HE N J , FANG L Y , PLAZA A . Hybrid first and second order attention Unet for building segmentation in remote sensing images [J]. Science China Information Sciences , 2020 , 63 ( 4 ): 1 - 12 . doi: 10.1007/s11432-019-2791-7 http://dx.doi.org/10.1007/s11432-019-2791-7
ZHENG S X , LU J C , ZHAO H S , et al . Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers [C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2025,2021 , Nashville, TN, USA. IEEE , 2021 : 6877 - 6886 . doi: 10.1109/cvpr46437.2021.00681 http://dx.doi.org/10.1109/cvpr46437.2021.00681
ZHAO X , GUO J Y , ZHANG Y T , et al . Memory-augmented transformer for remote sensing image semantic segmentation [J]. Remote Sensing , 2021 , 13 ( 22 ): 4518 . doi: 10.3390/rs13224518 http://dx.doi.org/10.3390/rs13224518
XU Z Y , ZHANG W C , ZHANG T X , et al . Efficient transformer for remote sensing image segmentation [J]. Remote Sensing , 2021 , 13 ( 18 ): 3585 . doi: 10.3390/rs13183585 http://dx.doi.org/10.3390/rs13183585
YUAN W , XU W B . MSST-net: a multi-scale adaptive network for building extraction from remote sensing images based on swin transformer [J]. Remote Sensing , 2021 , 13 ( 23 ): 4743 . doi: 10.3390/rs13234743 http://dx.doi.org/10.3390/rs13234743
CHEN K Y , ZOU Z X , SHI Z W . Building extraction from remote sensing images with sparse token transformers [J]. Remote Sensing , 2021 , 13 ( 21 ): 4441 - 4462 . doi: 10.3390/rs13214441 http://dx.doi.org/10.3390/rs13214441
LI D , YAO A B , CHEN Q F . PSConv : Squeezing Feature Pyramid into one Compact Poly-scale Convolutional Layer [M]. Computer Vision - ECCV 2020. Cham : Springer International Publishing , 2020 : 615 - 632 . doi: 10.1007/978-3-030-58589-1_37 http://dx.doi.org/10.1007/978-3-030-58589-1_37
MAGGIORI E , TARABALKA Y , CHARPIAT G , et al . Can semantic labeling methods generalize to any city? the inria aerial image labeling benchmark[C]. 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS). 2328,2017 , Fort Worth, TX, USA. IEEE , 2017 : 3226 - 3229 . doi: 10.1109/igarss.2017.8127684 http://dx.doi.org/10.1109/igarss.2017.8127684
KHALEL A , EL-SABAN M . Automatic pixelwise object labeling for aerial imagery using stacked U-nets [EB/OL]. 2018 : arXiv : 1803 . 04953 . https://arxiv.org/abs/1803.04953 https://arxiv.org/abs/1803.04953 .
LI X , YAO X J , FANG Y . Building-A-nets: robust building extraction from high-resolution remote sensing images with adversarial networks [J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , 2018 , 11 ( 10 ): 3680 - 3687 . doi: 10.1109/jstars.2018.2865187 http://dx.doi.org/10.1109/jstars.2018.2865187
MA J J , WU L L , TANG X , et al . Building extraction of aerial images by a global and multi-scale encoder-decoder network [J]. Remote Sensing , 2020 , 12 ( 15 ): 2350 . doi: 10.3390/rs12152350 http://dx.doi.org/10.3390/rs12152350