1.中国矿业大学 信息与控制工程学院,江苏 徐州 221116
2.中国矿业大学 计算机与科学技术学院,江苏 徐州 221116
扫 描 看 全 文
程德强,张华强,寇旗旗等.基于层级特征融合的室内自监督单目深度估计[J].光学精密工程,2023,31(20):2993-3009.
CHENG Deqiang,ZHANG Huaqiang,KOU Qiqi,et al.Indoor self-supervised monocular depth estimation based on level feature fusion[J].Optics and Precision Engineering,2023,31(20):2993-3009.
程德强,张华强,寇旗旗等.基于层级特征融合的室内自监督单目深度估计[J].光学精密工程,2023,31(20):2993-3009. DOI: 10.37188/OPE.20233120.2993.
CHENG Deqiang,ZHANG Huaqiang,KOU Qiqi,et al.Indoor self-supervised monocular depth estimation based on level feature fusion[J].Optics and Precision Engineering,2023,31(20):2993-3009. DOI: 10.37188/OPE.20233120.2993.
针对目前自监督单目深度估计网络在充斥着大量低纹理、低光照区域的室内复杂场景中存在预测深度信息不精确、物体边缘模糊以及细节丢失严重等问题,本文提出一种基于层级特征融合的室内自监督单目深度估计网络模型。首先,通过映射一致性图像增强模块来处理室内图像,提升低光照区域可见性并且保持亮度一致性,丰富纹理细节,一定程度上解决了训练网络时出现模糊假平面恶化模型的问题。然后,设计结合基于注意力机制的跨层级特征调整模块的深度估计网络,充分融合编码器以及编-解码器多层级特征信息,提升网络的特征利用能力,缩小预测深度与真实深度的语义差距。最后,设计基于图像风格特征的格拉姆矩阵相似性损失函数作为额外的自监督信号约束网络模型,提升网络预测深度的能力,进一步提高了预测深度的精度。在NYU Depth V2 和ScanNet室内数据集上进行训练与测试,正确预测深度像素的比例能够分别达到81.9%和76.0%。实验结果表明,相比现有主要的室内自监督单目深度估计网络,本文网络模型很好地保持了物体边缘和细节信息,有效地提高了预测深度的精度。
Due to a high number of areas with low texture and lighting in complex indoor scenes, current self-supervised monocular depth estimation network models suffer from certain issues. These problems include imprecise depth predictions, noticeable blurriness around object edges in the predictions, and significant loss of details. This paper introduces an indoor self-supervised monocular depth estimation network model based on level feature fusion. First, to enhance the visibility of poorly lit areas and address the issue of pseudo planes deteriorating the model, the Mapping-Consistent Image Enhancement module was applied to process indoor images. This module simultaneously maintained brightness consistency. Subsequently, a novel self-supervised monocular depth estimation network model that incorporates the Cross-Level Feature Adjustment module was proposed, utilizing an attention mechanism. This module effectively fused multilevel feature information from the encoder to the decoder, enhancing the network's ability to utilize feature information and reducing the semantic gap between predicted depth and true depth. Finally, the Gram Matrix Similarity Loss function was introduced based on image style features, as an additional self-supervised signal to further constrain the network model. This addition enhanced the network’s depth prediction capabilities, leading to improved accuracy. Through training and testing on NYU Depth V2 and ScanNet indoor datasets, this paper achieves a pixel accuracy rate of 81.9% and 76.0%, respectively. The experimental results also include a comparative analysis with existing main indoor self-supervised monocular depth estimation network models. The network model proposed in this paper excels in preserving object edges and details, effectively enhancing the accuracy of predicted depth.
自监督单目深度估计图像增强层级特征融合格拉姆矩阵
self-supervisionmonocular depth estimationimage enhancementfeature fusiongram matrix
ZHANG S G, CHEN Y F, ZHANG L J, et al. Study on robot grasping system of SSVEP-BCI based on augmented reality stimulus[J]. Tsinghua Science and Technology, 2022, 28(2): 322-329. doi: 10.26599/tst.2021.9010085http://dx.doi.org/10.26599/tst.2021.9010085
LIN D H, FIDLER S, URTASUN R. Holistic Scene Understanding for 3D Object Detection with RGBD Cameras[C]. 2013 IEEE International Conference on Computer Vision.1-8, 2013, Sydney, NSW, Australia. IEEE, 2014: 1417-1424. doi: 10.1109/iccv.2013.179http://dx.doi.org/10.1109/iccv.2013.179
RASOULI A, TSOTSOS J K. Autonomous vehicles that interact with pedestrians: a survey of theory and practice[J]. IEEE Transactions on Intelligent Transportation Systems, 2019, 21(3): 900-918. doi: 10.1109/tits.2019.2901817http://dx.doi.org/10.1109/tits.2019.2901817
KHAMIS S, FANELLO S, RHEMANN C, et al. StereoNet: Guided Hierarchical Refinement for Real-Time Edge-Aware Depth Prediction[EB/OL]. 2018: arXiv: 1807.08865. https://arxiv.org/abs/1807.08865.pdfhttps://arxiv.org/abs/1807.08865.pdf. doi: 10.1007/978-3-030-01267-0_35http://dx.doi.org/10.1007/978-3-030-01267-0_35
RANFTL R, VINEET V, CHEN Q F, et al. Dense Monocular Depth Estimation in Complex Dynamic Scenes[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).27-30, 2016, Las Vegas, NV, USA. IEEE, 2016: 4058-4066. doi: 10.1109/cvpr.2016.440http://dx.doi.org/10.1109/cvpr.2016.440
伍锡如, 薛其威. 基于激光雷达的无人驾驶系统三维车辆检测[J]. 光学 精密工程, 2022, 30(4): 489-497. doi: 10.37188/OPE.20223004.0489http://dx.doi.org/10.37188/OPE.20223004.0489
WU X R, XUE Q W. 3D vehicle detection for unmanned driving systerm based on lidar[J]. Opt. Precision Eng., 2022, 30(4): 489-497. (in Chinese). doi: 10.37188/OPE.20223004.0489http://dx.doi.org/10.37188/OPE.20223004.0489
EIGEN D, PUHRSCH C, FERGUS R. Depth map prediction from a single image using a multi-scale deep network[C]. Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. December 8 - 13, 2014, Montreal, Canada. New York: ACM, 2014: 2366-2374.
GODARD C, AODHA OMAC, BROSTOW G J. Unsupervised monocular depth estimation with left-right consistency[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).21-26, 2017, Honolulu, HI, USA. IEEE, 2017: 6602-6611. doi: 10.1109/cvpr.2017.699http://dx.doi.org/10.1109/cvpr.2017.699
ZHOU T H, BROWN M, SNAVELY N, et al. Unsupervised learning of depth and ego-motion from video[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).21-26, 2017, Honolulu, HI, USA. IEEE, 2017: 6612-6619. doi: 10.1109/cvpr.2017.700http://dx.doi.org/10.1109/cvpr.2017.700
GODARD C, AODHA OMAC, FIRMAN M, et al. Digging into self-supervised monocular depth estimation[C]. 2019 IEEE/CVF International Conference on Computer Vision (ICCV).27-2,2019, Seoul, Korea (South). IEEE, 2020: 3827-3837. doi: 10.1109/iccv.2019.00393http://dx.doi.org/10.1109/iccv.2019.00393
GUIZILINI V, AMBRUS R, PILLAI S, et al. 3D Packing for self-supervised monocular depth estimation[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).13-19, 2020, Seattle, WA, USA. IEEE, 2020: 2482-2491. doi: 10.1109/cvpr42600.2020.00256http://dx.doi.org/10.1109/cvpr42600.2020.00256
YU Z H, JIN L, GAO S H. P2Net: Patch-Match and Plane-Regularization for Unsupervised Indoor Depth Estimation[M]. Computer Vision - ECCV 2020. Cham: Springer International Publishing, 2020: 206-222. doi: 10.1007/978-3-030-58586-0_13http://dx.doi.org/10.1007/978-3-030-58586-0_13
GEIGER A, LENZ P, URTASUN R. Are we ready for autonomous driving? The KITTI vision benchmark suite[C]. 2012 IEEE Conference on Computer Vision and Pattern Recognition.16-21, 2012, Providence, RI, USA. IEEE, 2012: 3354-3361. doi: 10.1109/cvpr.2012.6248074http://dx.doi.org/10.1109/cvpr.2012.6248074
SAXENA A, SUN M, NG A Y. Make3D: learning 3D scene structure from a single still image[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(5): 824-840. doi: 10.1109/tpami.2008.132http://dx.doi.org/10.1109/tpami.2008.132
EIGEN D, FERGUS R. Predicting Depth, Surface normals and semantic labels with a common multi-scale convolutional architecture[C]. 2015 IEEE International Conference on Computer Vision (ICCV).7-13, 2015, Santiago, Chile. IEEE, 2016: 2650-2658. doi: 10.1109/iccv.2015.304http://dx.doi.org/10.1109/iccv.2015.304
LIU F Y, SHEN C H, LIN G S, et al. Learning depth from single monocular images using deep convolutional neural fields[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(10): 2024-2039. doi: 10.1109/tpami.2015.2505283http://dx.doi.org/10.1109/tpami.2015.2505283
KIM S, PARK K, SOHN K, et al. Unified Depth Prediction and Intrinsic Image Decomposition From A Single Image Via Joint Convolutional Neural Fields[M]. Computer Vision - ECCV 2016. Cham: Springer International Publishing, 2016: 143-159. doi: 10.1007/978-3-319-46484-8_9http://dx.doi.org/10.1007/978-3-319-46484-8_9
FU H, GONG M M, WANG C H, et al. Deep ordinal regression network for monocular depth estimation[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.18-23, 2018, Salt Lake City, UT, USA. IEEE, 2018: 2002-2011. doi: 10.1109/cvpr.2018.00214http://dx.doi.org/10.1109/cvpr.2018.00214
WOFK D, MA F C, YANG T J, et al. FastDepth: fast monocular depth estimation on embedded systems[C]. 2019 International Conference on Robotics and Automation (ICRA).20-24, 2019, Montreal, QC, Canada. IEEE, 2019: 6101-6108. doi: 10.1109/icra.2019.8794182http://dx.doi.org/10.1109/icra.2019.8794182
SHU C, YU K, DUAN Z X, et al. Feature-Metric Loss for Self-Supervised Learning of Depth and Egomotion[M]. Computer Vision - ECCV 2020. Cham: Springer International Publishing, 2020: 572-588. doi: 10.1007/978-3-030-58529-7_34http://dx.doi.org/10.1007/978-3-030-58529-7_34
CHEN Y H, SCHMID C, SMINCHISESCU C. Self-Supervised learning with geometric constraints in monocular video: connecting flow, depth, and camera[C]. 2019 IEEE/CVF International Conference on Computer Vision (ICCV).27-2,2019, Seoul, Korea (South). IEEE, 2020: 7062-7071. doi: 10.1109/iccv.2019.00716http://dx.doi.org/10.1109/iccv.2019.00716
ZHOU J S, WANG Y W, QIN K H, et al. Moving indoor: unsupervised video depth learning in challenging environments[C]. 2019 IEEE/CVF International Conference on Computer Vision (ICCV).27-2,2019, Seoul, Korea (South). IEEE, 2020: 8617-8626. doi: 10.1109/iccv.2019.00871http://dx.doi.org/10.1109/iccv.2019.00871
BAY H, TUYTELAARS T, VAN GOOL L. SURF: Speeded Up Robust Features[M]. Computer Vision-ECCV 2006. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006: 404-417. doi: 10.1007/11744023_32http://dx.doi.org/10.1007/11744023_32
LI B Y, HUANG Y, LIU Z Y, et al. Structdepth: leveraging the structural regularities for self-supervised indoor depth estimation[C]. 2021 IEEE/CVF International Conference on Computer Vision (ICCV).10-17, 2021, Montreal, QC, Canada. IEEE, 2022: 12643-12653. doi: 10.1109/iccv48922.2021.01243http://dx.doi.org/10.1109/iccv48922.2021.01243
CHENG D Q, CHEN L L, LV C, et al. Light-guided and cross-fusion U-net for anti-illumination image super-resolution[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(12): 8436-8449. doi: 10.1109/tcsvt.2022.3194169http://dx.doi.org/10.1109/tcsvt.2022.3194169
黄慧, 董林鹭, 刘小芳, 等. 改进Retinex的低光照图像增强[J]. 光学 精密工程, 2020, 28(8): 1835-1849.
HUANG H, DONG L L, LIU X F, et al. Improved retinex low light image enhancement method[J]. Opt. Precision Eng., 2020, 28(8): 1835-1849. (in Chinese)
ZHANG Y H, ZHANG J W, GUO X J. Kindling the darkness: a practical low-light image enhancer[C]. Proceedings of the 27th ACM International Conference on Multimedia. 21-25,2019, Nice, France. New York: ACM, 2019: 1632-1640. doi: 10.1145/3343031.3350926http://dx.doi.org/10.1145/3343031.3350926
CAI J R, GU S H, ZHANG L. Learning a deep single image contrast enhancer from multi-exposure images[J]. IEEE Transactions on Image Processing, 2018, 27(4): 2049-2062. doi: 10.1109/tip.2018.2794218http://dx.doi.org/10.1109/tip.2018.2794218
CHEN C, CHEN Q F, XU J, et al. Learning to see in the dark[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.18-23, 2018, Salt Lake City, UT, USA. IEEE, 2018: 3291-3300. doi: 10.1109/cvpr.2018.00347http://dx.doi.org/10.1109/cvpr.2018.00347
JIANG Y F, GONG X Y, LIU D, et al. EnlightenGAN: deep light enhancement without paired supervision[J]. IEEE Transactions on Image Processing, 2021, 30: 2340-2349. doi: 10.1109/tip.2021.3051462http://dx.doi.org/10.1109/tip.2021.3051462
GUO C L, LI C Y, GUO J C, et al. Zero-reference deep curve estimation for low-light image enhancement[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).13-19, 2020, Seattle, WA, USA. IEEE, 2020: 1777-1786. doi: 10.1109/cvpr42600.2020.00185http://dx.doi.org/10.1109/cvpr42600.2020.00185
WANG K, ZHANG Z Y, YAN Z Q, et al. Regularizing nighttime weirdness: efficient self-supervised monocular depth estimation in the dark[C]. 2021 IEEE/CVF International Conference on Computer Vision (ICCV).10-17, 2021, Montreal, QC, Canada. IEEE, 2022: 16035-16044. doi: 10.1109/iccv48922.2021.01575http://dx.doi.org/10.1109/iccv48922.2021.01575
HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).27-30, 2016, Las Vegas, NV, USA. IEEE, 2016: 770-778. doi: 10.1109/cvpr.2016.90http://dx.doi.org/10.1109/cvpr.2016.90
HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).21-26, 2017, Honolulu, HI, USA. IEEE, 2017: 2261-2269. doi: 10.1109/cvpr.2017.243http://dx.doi.org/10.1109/cvpr.2017.243
SIMONYAN K, ZISSERMAN A. Very Deep Convolutional Networks for Large-Scale Image Recognition[EB/OL]. 2014: arXiv: 1409.1556. https://arxiv.org/abs/1409.1556.pdfhttps://arxiv.org/abs/1409.1556.pdf
程德强, 赵佳敏, 寇旗旗, 等. 多尺度密集特征融合的图像超分辨率重建[J]. 光学 精密工程, 2022, 30(20): 2489-2500. doi: 10.37188/OPE.20223020.2489http://dx.doi.org/10.37188/OPE.20223020.2489
CHENG D Q, ZHAO J M, KOU Q Q, et al. Multi-scale dense feature fusion network for image super-resolution[J]. Opt. Precision Eng., 2022, 30(20): 2489-2500. (in Chinese). doi: 10.37188/OPE.20223020.2489http://dx.doi.org/10.37188/OPE.20223020.2489
程德强, 陈杰, 寇旗旗, 等. 融合层次特征和注意力机制的轻量化矿井图像超分辨率重建方法[J]. 仪器仪表学报, 2022, 43(8): 73-84.
CHENG D Q, CHEN J, KOU Q Q, et al. Lightweight super-resolution reconstruction method based on hierarchical features fusion and attention mechanism for mine image[J]. Chinese Journal of Scientific Instrument, 2022, 43(8): 73-84.(in Chinese)
蔡体健, 彭潇雨, 石亚鹏, 等. 通道注意力与残差级联的图像超分辨率重建[J]. 光学 精密工程, 2021, 29(1): 142-151. doi: 10.37188/OPE.20212901.0142http://dx.doi.org/10.37188/OPE.20212901.0142
CAI T J, PENG X Y, SHI Y P, et al. Channel attention and residual concatenation network for image super-resolution[J]. Opt. Precision Eng., 2021, 29(1): 142-151. (in Chinese). doi: 10.37188/OPE.20212901.0142http://dx.doi.org/10.37188/OPE.20212901.0142
HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.18-23, 2018, Salt Lake City, UT, USA. IEEE, 2018: 7132-7141. doi: 10.1109/cvpr.2018.00745http://dx.doi.org/10.1109/cvpr.2018.00745
GHIASI G, LEE H, KUDLUR M, et al. Exploring The Structure of a Real-Time, Arbitrary Neural Artistic Stylization Network[EB/OL]. 2017: arXiv: 1705.06830. https://arxiv.org/abs/1705.06830.pdfhttps://arxiv.org/abs/1705.06830.pdf. doi: 10.5244/c.31.114http://dx.doi.org/10.5244/c.31.114
LIU L N, SONG X B, WANG M M, et al. Self-supervised monocular depth estimation for all day images using domain separation[C]. 2021 IEEE/CVF International Conference on Computer Vision (ICCV).10-17, 2021, Montreal, QC, Canada. IEEE, 2022: 12717-12726. doi: 10.1109/iccv48922.2021.01250http://dx.doi.org/10.1109/iccv48922.2021.01250
JADERBERG M, SIMONYAN K, ZISSERMAN A, et al. Spatial Transformer Networks[EB/OL]. 2015: arXiv: 1506.02025. https://arxiv.org/abs/1506.02025.pdfhttps://arxiv.org/abs/1506.02025.pdf. doi: 10.1007/s11263-015-0823-zhttp://dx.doi.org/10.1007/s11263-015-0823-z
WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing: a Publication of the IEEE Signal Processing Society, 2004, 13(4): 600-612. doi: 10.1109/tip.2003.819861http://dx.doi.org/10.1109/tip.2003.819861
SILBERMAN N, HOIEM D, KOHLI P, et al. Indoor Segmentation and Support Inference from RGBD Images[M]. Computer Vision - ECCV 2012. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012: 746-760. doi: 10.1007/978-3-642-33715-4_54http://dx.doi.org/10.1007/978-3-642-33715-4_54
DAI A, CHANG A X, SAVVA M, et al. ScanNet: richly-annotated 3d reconstructions of indoor scenes[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).21-26, 2017, Honolulu, HI, USA. IEEE, 2017: 2432-2443. doi: 10.1109/cvpr.2017.261http://dx.doi.org/10.1109/cvpr.2017.261
HU J J, OZAY M, ZHANG Y, et al. Revisiting single image depth estimation: toward higher resolution maps with accurate object boundaries[C]. 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).7-11, 2019, Waikoloa, HI, USA. IEEE, 2019: 1043-1051. doi: 10.1109/wacv.2019.00116http://dx.doi.org/10.1109/wacv.2019.00116
YIN W, LIU Y F, SHEN C H, et al. Enforcing geometric constraints of virtual normal for depth prediction[C]. 2019 IEEE/CVF International Conference on Computer Vision (ICCV).27-2,2019, Seoul, Korea (South). IEEE, 2020: 5683-5692. doi: 10.1109/iccv.2019.00578http://dx.doi.org/10.1109/iccv.2019.00578
FAROOQ BHAT S, ALHASHIM I, WONKA P. AdaBins: depth estimation using adaptive bins[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).20-25, 2021, Nashville, TN, USA. IEEE, 2021: 4008-4017. doi: 10.1109/cvpr46437.2021.00400http://dx.doi.org/10.1109/cvpr46437.2021.00400
NIKLAUS S, MAI L, YANG J M, et al. 3D Ken Burns effect from a single image[J]. ACM Transactions on Graphics, 38(6): 1-15. doi: 10.1145/3355089.3356528http://dx.doi.org/10.1145/3355089.3356528
ZHAO W, LIU S H, SHU Y Z, et al. Towards better generalization: joint depth-pose learning without PoseNet[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).13-19, 2020, Seattle, WA, USA. IEEE, 2020: 9148-9158. doi: 10.1109/cvpr42600.2020.00917http://dx.doi.org/10.1109/cvpr42600.2020.00917
BIAN J W, ZHAN H Y, WANG N Y, et al. Unsupervised scale-consistent depth learning from video[J]. International Journal of Computer Vision, 2021, 129(9): 2548-2564. doi: 10.1007/s11263-021-01484-6http://dx.doi.org/10.1007/s11263-021-01484-6
BIAN JW, ZHAN H, WANG N, et al. Unsupervised depth learning in challenging indoor video: Weak rectification to rescue[J]. arXiv preprint arXiv:2006.02708, 2020.
JIANG H L, DING L Y, HU J J, et al. PLNet: plane and line priors for unsupervised indoor depth estimation[C]. 2021 International Conference on 3D Vision (3DV).1-3, 2021, London, United Kingdom. IEEE, 2022: 741-750. doi: 10.1109/3dv53792.2021.00083http://dx.doi.org/10.1109/3dv53792.2021.00083
BIAN J W, ZHAN H Y, WANG N Y, et al. Auto-rectify network for unsupervised indoor depth estimation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(12): 9802-9813. doi: 10.1109/tpami.2021.3136220http://dx.doi.org/10.1109/tpami.2021.3136220
0
Views
23
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution