To address the poor performance of building extraction caused by low discrimination between the building target and background environment in remote sensing images
a high-order statistics integrated encoder-decoder network method was proposed to improve the accuracy of automatic building extraction. First
the deep encoder-decoder network was used to extract the low-order semantic features of building targets. Then
the polynomial kernels were used to achieve the high-order description of intermediate feature maps to improve the ability to recognize ambiguous features. Finally
the lower-order feature maps cascading with the higher-order features were sent to the end of the network to obtain the segmentation results of the building. Experiments on the Massachusetts Buildings dataset show that the proposed approach can achieve recall of 85.1%
precision of 77.5% and F1-score of 80.9%. Compared with the baseline network
the proposed approach is 4% higher in the metric of F1-score. The proposed method improves the performance of encoder-decoder networks for automatic building extraction of remote sensing images
and can extract building targets with low discrimination more accurately; hence
YANG ZH, MU X D, WANG SH Y, et al .. Scene classification of remote sensing images based on multiscale features fusion[J]. Opt. Precision Eng. , 2018, 26(12): 3099-3107. (in Chinese)
HUANG X, ZHANG L P. Morphological building/shadow index for building extraction from high-resolution imagery over urban areas[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , 2012, 5(1): 161-172.
OK A O, SENARAS C, YUKSEL B. Automated detection of arbitrarily shaped buildings in complex environments from monocular VHR optical satellite imagery[J]. IEEE Transactions on Geoscience and Remote Sensing , 2013, 51(3): 1701-1717.
CUI S Y, YAN Q, REINARTZ P. Complex building description and extraction based on Hough transformation and cycle detection[J]. Remote Sensing Letters , 2012, 3(2): 151-159.
LIASIS G, STAVROU S. Building extraction in satellite images using active contours and colour features[J]. International Journal of Remote Sensing , 2016, 37(5): 1127-1153.
TIAN H, YANG J, WANG Y M, et al .. Towards automatic building extraction: variational level set model using prior shape knowledge[J]. Acta Automatica Sinica , 2010, 36(11): 1502-1511. (in Chinese)
TANG C, LING Y SH, YANG H, et al .. Decision-level fusion detection for infrared and visible spectra based on deep learning[J]. Infrared and Laser Engineering , 2019, 48(6): 456-470.
LI Y, LIU X Y, ZHANG H Q. Optical remote sensing image retrieval based convolutional neural network[J]. Opt. Precision Eng ., 2018, 26(1): 200-207. (in Chinese)
MNIH V, HINTON G E. Learning to detect roads in high-resolution aerial images[C]. European Conference on Computer Vision : Part Ⅵ . 2010: 210-223.
MNIH V, HINTOM G E. Learning to label aerial images from noisy data[J]. Proceedings of the 29th International Conference on Machine Learning (ICML-12) , 2012: 567-574.
MNIH V. Machine Learning for Aerial Image Labeling [D]. Canada : University of Toronto, 2013.
SAITO S, AOKI Y. Building and road detection from large aerial imagery[C]. Image Processing : Machine Vision Applications Ⅷ . International Society for Optics and Photonics , 2015.
ALSHEHHI R, MARPU P R, WOON W L, et al .. Simultaneous extraction of roads and buildings in remote sensing imagery with convolutional neural networks[J]. ISPRS Journal of Photogrammetry and Remote Sensing , 2017, 130: 139-149.
SHELHAMER E, LONG J, DARRELL T. Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2017, 39(4): 640-651.
ZHONG Z L, LI J, CUI W H, et al .. Fully convolutional networks for building and road extraction: Preliminary results[C]//2016 IEEE International Geoscience and Remote Sensing Symposium ( IGARSS ), July 10-15, 2016. Beijing , China . New York , USA : IEEE , 2016: 1591-1594..
WU G M, SHAO X W, GUO Z L, et al .. Automatic building segmentation of aerial imagery using multi-constraint fully convolutional networks[J]. Remote Sensing , 2018, 10(3): 407.
YANG H L, YUAN J, LUNGA D, et al .. Building extraction at scale using convolutional neural network: mapping of the united states[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , 2018, 11(8): 2600-2614.
MOSINKA A, MARQUZE-NEILA P, KOZINSKI M, et al .. Beyond the pixel-wise loss for topology-aware delineation[C]. IEEE Conference on Computer Vision and Pattern Recognition , 2018.
SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J]. ImageNet Challenge , 2014: 1-10.
KONG S, FOWLKES C. Low-rank bilinear pooling for fine-grained classification[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition ( CVPR ), July 21-26, 2017. Honolulu , HI . New York , USA : IEEE , 2017: 7025-7034.
CUI Y, ZHOU F, WANG J, et al .. Kernel pooling for convolutional neural networks[C]. Proceedings -30 th IEEE Conference on Computer Vision and Pattern Recognition , 2017: 3049-3058.
WANG H, WANG Q, GAO M, et al .. Multi-scale location-aware kernel representation for object detection[C]. 2018 IEEE Conference on Computer Vision and Pattern Recognition , 2018.
RONNEBERGER O, FISCHER P, BROX T. U-Net : Convolutional Networks for Biomedical Image Segmentation [M]. Lecture Notes in Computer Science. Cham: Springer International Publishing, 2015: 234-241.
BADRINARAYANAN V, KENDALL A, CIPOLLA R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2017, 39(12): 2481-2495.
CLEVERT D A, UNTERTHINER T, HOCHREITER S. Fast and accurate deep network learning by exponential linear units (ELUs)[C]. International Conference on Learning Representations , 2016.
IOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift[C]. Iernational Conference on Machine Learning , 2015: 448-456.
MURRAY N, PERRONNIN F. Generalized max pooling[C]. Computer Vision & Pattern Recognition , 2014, 2473-2480.
FENG J, NI B, TIAN Q, et al .. Geometric äp-norm feature pooling for image classification.[C]. Computer Vision & Pattern Recognition. IEEE , 2011, 2609-2704.
CAI S, ZUO W, ZHANG L. Higher-order integration of hierarchical convolutional activations for fine-grained visual categorization[J]. Proceedings of the IEEE International Conference on Computer Vision , 2017: 511-520.
KOLDA T G, BADER B W. Tensor decompositions and applications[J]. SIAM Review , 2009, 51(3): 455-500.
KINGA D, ADAM J B. A method for stochastic optimization[C]. International Conference on Learning Representations ( ICLR ), 2015, 5.