A pedestrian detection network based on the weight learning of fusing multimodal information was developed to address the issues of the pedestrian detection method based on infrared and visible modal fusion in adapting to changes in the external environment. First, unlike the fusion method used in several recent studies in which two modalities are stacked directly, the weight learning fusion network reflects different contributions of the modalities to the pedestrian detection task under different environmental conditions. The differences between the two modalities were determined through dual-stream interaction learning. Next, based on the current characteristics of each modal feature, the weight learning fusion network assigned the corresponding weights to each modal feature to generate the fusion feature by performing weighted fusion autonomously. Finally, a new feature pyramid based on the fusion feature was generated, and previous information about the pedestrian was improved by changing the size and density of prior boxes to complete the pedestrian detection task. The experimental results indicated that the log-average miss rate of the Kaist multispectral pedestrian detection dataset reached 26.96%, which was 2.77% and 27.84% lower than that of the direct stacking method and baseline method, respectively. The adaptive weight fusion of infrared and visible modal information could effectively be used to obtain complementary modal information to adapt to external environmental changes and significantly improve pedestrian detection performance.
关键词
Keywords
references
LIU J J , ZHANG S T , WANG S , et al .. Multispectral deep neural networks for pedestrian detection [C]. British Machine Vision Conference , York , UK , 2016 : 73 . 1-73 . 13 .
KONIG D , ADAM M , JARVERS C , et al .. Fully convolutional region proposal networks for multispectral person detection [C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops , Honolulu , Hawaii , 2017 : 49 - 56 .
REN S Q , HE K M , GIRSHICK R , et al . . Faster R-CNN: Towards real-time object detection with region proposal networks [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2017 , 39 ( 6 ): 1137 - 1149 .
CAO Y P , GUAN D Y , HUANG W L , et al . . Pedestrian detection with unsupervised multispectral feature learning using deep neural networks [J]. Information Fusion , 2019 : 206 - 217 .
HOU Y L , SONG Y , HAO X , et al . . Multispectral pedestrian detection based on deep convolutional neural networks [J]. Infrared Physics & Technology , 2018 , 94 : 69 - 77 .
LEE Y , BUI T D , SHIN J . Pedestrian detection based on deep fusion network using feature correlation [C]. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. IEEE , 2018 : 694 - 699 .
LI C Y , SONG D , TONG R , et al . . Illumination-aware faster R-CNN for robust multispectral pedestrian detection [J]. Pattern Recognition , 2019 , 85 : 161 - 171 .
GUAN D Y , CAO Y P , YANG J , et al . . Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection [J]. Information Fusion , 2019 , 50 : 148 - 157 .
ZHANG L , LIU Z , ZHANG S , et al . . Cross-modality interactive attention network for multispectral pedestrian detection [J]. Information Fusion , 2019 , 50 : 20 - 29 .
WU X Y , GU CH N , WANG SH J . Special video classification based on multitask learning and multimodal feature fusion [J]. Opt. Precision Eng. , 2020 , 28 ( 5 ): 1177 - 1186 . (in Chinese)
LI Z , ZHOU F . FSSD: Feature fusion single shot multibox detector [J/OL]. ArXiv e-prints , 2018-5-17 [ 2020-5-29 ]. https://arxiv.org/abs/1712.00960 https://arxiv.org/abs/1712.00960 .
LIU W , ANGUELOV D , ERHAN D , et al .. SSD: Single shot multibox detector [C]. European Conference on Computer Vision , Amsterdam , The Netherlands: Springer , 2016 : 21 - 37 .
FAN L L , ZHAO H W , ZHAO H Y , et al . . Survey of target detection based on deep convolutional neural networks [J]. Opt. Precision Eng. , 2020 , 28 ( 5 ): 1152 - 1164 . (in Chinese)
KIM J , KOH J , KIM Y , et al .. Robust deep multi-modal learning based on gated information fusion network [C]. Asian Conference on Computer Vision , Perth , Australia , 2018 : 90 - 106 .
WANG J L , FU X S , HUANG ZH CH , et al . . Multi-type cooperative targets detection using improved YOLOv2 convolutional neural network [J]. Opt. Precision Eng. , 2020 , 28 ( 1 ): 251 - 260 . (in Chinese)
HWANG S , PARK J , KIM N , et al .. Multispectral pedestrian detection: Benchmark dataset and baseline [C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , Boston , Massachusetts , 2015 : 1037 - 1045 .
VANDERSTEEGEN M , VANBEECK K , GOEDEME T , et al .. Real-Time multispectral pedestrian detection with a single-pass deep neural network [C]. International Conference on Image Analysis and Recognition , Portugal , 2018 : 419 - 426 .
DOLLAR P , WOJEK C , SCHIELE B , et al . . Pedestrian detection: An evaluation of the state of the art [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2012 , 34 ( 4 ): 743 - 761 .