三维语义场景复原网络

林金花; 王延杰

doi:10.3788/OPE.20182605.1231

您当前的位置：

首页 >

文章列表页 >

三维语义场景复原网络

信息科学 | 更新时间：2020-08-13

- 三维语义场景复原网络
- Three-dimentional reconstruction of semantic scene based on RGB-D map
- 光学精密工程 2018年26卷第5期页码：1231-1241
- 作者机构：
  
  1.长春工业大学应用技术学院, 吉林长春 130000
  2.中国科学院长春光学精密机械与物理研究所, 吉林长春 130031
- 作者简介：
  
  [ "林金花(1980-), 女, 吉林长春人, 博士, 讲师, 2004年、2008年于西安交通大学分别获得学士、硕士学位, 2017年于中国科学院长春光机所获得博士学位, 主要从事数字图像处理与目标识别方面的研究。E-mail:ljh3832@163.com" ]
  [ "王延杰(1963-), 男, 吉林长春人, 研究员, 博士生导师, 1988年于吉林工业大学获得学士学位, 1998年于中国科学院长春光机所获得硕士学位, 主要从事数字图像处理, 信息处理, 自动目标识别等方面的研究。E-mail:wangyj@ciomp.ac.cn" ]
- 基金信息：
  
  国家863高技术研究发展计划项目资助(2014AA7031010B);吉林省“十三五”计划科研项目资助(吉教字[2016]345)
- DOI：10.3788/OPE.20182605.1231
  中图分类号： TP391.41
- 收稿日期：2017-10-10，
  
  录用日期：2017-11-6，
  
  纸质出版日期：2018-05-25
- 稿件说明：
移动端阅览
林金花, 王延杰. 三维语义场景复原网络[J]. 光学精密工程, 2018,26(5):1231-1241.

Jin-hua LIN, Yan-jie WANG. Three-dimentional reconstruction of semantic scene based on RGB-D map[J]. Optics and precision engineering, 2018, 26(5): 1231-1241.
林金花, 王延杰. 三维语义场景复原网络[J]. 光学精密工程, 2018,26(5):1231-1241. DOI： 10.3788/OPE.20182605.1231.

Jin-hua LIN, Yan-jie WANG. Three-dimentional reconstruction of semantic scene based on RGB-D map[J]. Optics and precision engineering, 2018, 26(5): 1231-1241. DOI： 10.3788/OPE.20182605.1231.

摘要

从不完整的视觉信息中推断出物体的三维几何形状是机器视觉系统应当具备的重要能力，而识别出场景中物体的语义是机器视觉系统的核心。传统方法通常将二者分离实现，本文将场景复原与目标语义紧密结合，提出了一种三维语义场景复原网络模型，仅以单一深度图作为输入，实现对三维场景的语义分类和场景复原。首先，建立一种端到端的三维卷积神经网络，网络的输入是深度图，使用三维上下文模块来对相机视锥体内的区域进行学习，进而输出带有语义标签的三维体素；其次，建立了带有密集体积标签的合成三维场景数据集，用于训练本文的深度学习网络模型；最后通过实验表明，与现有的语义分类和场景复原方法相比，语义场景的复原接收区域增加了2.0%。结果表明：三维学习网络的复原性能良好，语义标注的准确率较高。

Abstract

Reconstruction of 3D object is an important part in machine vision system

and the semantic understanding of 3D object is a core function for the machine vision system. In this paper

3D restoration was combined with the semantic understanding of 3D object

a 3D semantic scene recovery network was proposed. The semantic classification and scene restoration of 3D scene were achieved only by using a single RGB-D map as input. Firstly

an end-to-end 3D convolution neural network was established. The input of the network was a depth map. The 3D context module was used for learning the region within the camera view

then the 3D voxels with semantic labels were generated. Secondly

a synthetic data set with dense volume labels was established to train the depth learning network. Finally

the experimental results showed that the recovery performance w improved by 2.0% compared with the state-of-art. It can be seen that the 3D learning network plays well in 3D scene restoration

it owns high accuracy in semantic annotation of object in the scene.

关键词

Keywords

references

GUPTA S, ARBELÁEZ P, MALIK J. Perceptual organization and recognition of indoor scenes from RGB-D images[C]. Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition , IEEE, 2013: 564-571.

REN X F, BO L F, FOX D. RGB-(D) scene labeling: features and algorithms[C]. Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition , IEEE, 2012: 2759-2766.

SILBERMAN N, HOIEM D, KOHLI P, et al .. Indoor segmentation and support inference from RGBD images[C]. Proceedings of the 12 th European Conference on Computer Vision , Springer, 2012: 746-760.

LAI K, BO L F, FOX D. Unsupervised feature learning for 3D scene labeling[C]. Proceedings of 2014 IEEE International Conference on Robotics and Automation , IEEE, 2014: 3050-3057.

ROCK J, GUPTA T, THORSEN J, et al .. Completing 3D object shape from one depth image[C]. Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition , IEEE, 2015: 2484-2493.

MONSZPART A, MELLADO N, BROSTOW G J, et al .. RAPter:rebuilding man-made scenes with regular arrangements of planes[J]. ACM Transactions on Graphics, 2015, 34(4):103.

FIRMAN M, AODHA O M, JULIER S, et al .. Structured prediction of unobserved voxels from a single depth image[C]. Proceedings of 2016 IEEE Computer Vision and Pattern Recognition , IEEE, 2016: 5431-5440.

GUPTA S, ARBELÁEZ P, GIRSHICK R, et al .. Aligning 3D models to RGB-D images of cluttered scenes[C]. Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition , IEEE, 2015: 4731-4740.

SONG S R, XIAO J X. Sliding shapes for 3D object detection in depth images[C]. Proceedings of the 13 th European Conference on Computer Vision , Springer, 2014: 634-651.

GEIGER A, WANG CH H. Joint 3D object and layout inference from A single RGB-D image[M]//GALL J, GEHLER P, LEIBE B. Pattern Recognition . Cham: Springer, 2015: 183-195.

NAN L L, XIE K, SHARF A. A search - classify approach for cluttered indoor scene understanding[J]. ACM Transactions on Graphics, 2012, 31(6):137.

LIN D H, FIDLER S, URTASUN R. Holistic scene understanding for 3D object detection with RGBD cameras[C]. Proceedings of 2013 IEEE International Conference on Computer Vision , IEEE, 2013: 1417-1424.

SONG S, XIAO J. Deep sliding shapes for amodal 3D object detection in RGB-D images[J]. Computer Science, 2015, 139(2):808-816.

ZHENG B, ZHAO Y B, YU J C, et al .. Beyond point clouds: scene understanding by reasoning geometry and physics[C]. Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition, IEEE , 2013: 3127-3134.

KIM B S, KOHLI P, SAVARESE S. 3D scene understanding by voxel-CRF[C]. Proceedings of 2013 IEEE International Conference on Computer Vision , IEEE, 2013: 1425-1432.

HÄNE C, ZACH C, COHEN A, et al .. Joint 3D scene reconstruction and class segmentation[C]. Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition , IEEE, 2013: 97-104.

BLÁHA M, VOGEL C, RICHARD A, et al .. Large-scale semantic 3D reconstruction: an adaptive multi-resolution model for multi-class volumetric labeling[C]. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition , IEEE, 2016: 3176-3184.

HANDA A, PATRAUCEAN V, BADRINARAYANAN V, et al .. SceneNet:understanding real world indoor scenes with synthetic data[J]. Computer Science, 2015:4077-4085.

吕朝辉, 沈萦华, 李精华.基于Kinect的深度图像修复方法[J].吉林大学学报(工学版), 2016, 46(5):1697-1703.

LÜ CH H, SHEN Y H, LI J H. Depth map inpainting method based on Kinect sensor[J]. Journal of Jilin University (Engineering and Technology Edition), 2016, 46(5):1697-1703. (in Chinese)

刘迎, 王朝阳, 高楠, 等.特征提取的点云自适应精简[J].光学精密工程, 2017, 25(1):245-254.

LIU Y, WANG CH Y, GAON, et al .. Point cloud adaptive simplification of feature extraction[J]. Opt. Precision Eng., 2017, 25(1):245-254. (in Chinese)

胡长胜, 詹曙, 吴从中.基于深度特征学习的图像超分辨率重建[J].自动化学报, 2017, 43(5):814-821.

HU CH SH, ZHAN SH, WU C ZH. Image super-resolution based on deep learning features[J]. Acta Automatica Sinica, 2017, 43(5):814-821. (in Chinese)

CHANG A X, FUNKHOUSER T, GUIBAS L, et al .. ShapeNet: an information-rich 3D model repository[J]. arXiv: 1512. 03012, 2015.

JIA Y Q, SHELHAMER E, DONAHUE J, et al .. Caffe: convolutional architecture for fast feature embedding[C]. Proceedings of the 22 nd ACM International Conference on Multimedia , ACM, 2014: 675-678.

NEWCOMBE R A, IZADI S, HILLIGES O, et al .. KinectFusion: real-time dense surface mapping and tracking[C]. Proceedings of the 10 th IEEE International Symposium on Mixed and Augmented Reality , IEEE, 2011: 127-136.

GUO R Q, ZOU CH H, HOIEM D. Predicting complete 3D models of indoor scenes[J]. arXiv: 1504. 02437, 2015.

蔡强, 郝佳云, 曹健, 等.结合局部特征及全局特征的显著性检测[J].光学精密工程, 2017, 25(3):772-778.

CAI Q, HAO J Y, CAO J, et al .. Salient detection via local and global feature[J]. Opt. Precision Eng., 2017, 25(3):772-778. (in Chinese)

浏览量

424

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

电力巡检中的偏振图像特征融合

基于机器视觉的透明软管内微量液体体积测量

基于圆卷积神经网络的粘连导电粒子检测

融合多尺度特征的蜗杆表面缺陷检测

相位测量的不连续位置自适应分割