Jin-hua LIN, Yan-jie WANG. Three-dimentional reconstruction of semantic scene based on RGB-D map[J]. Optics and precision engineering, 2018, 26(5): 1231-1241.
DOI:
Jin-hua LIN, Yan-jie WANG. Three-dimentional reconstruction of semantic scene based on RGB-D map[J]. Optics and precision engineering, 2018, 26(5): 1231-1241. DOI: 10.3788/OPE.20182605.1231.
Three-dimentional reconstruction of semantic scene based on RGB-D map
Reconstruction of 3D object is an important part in machine vision system
and the semantic understanding of 3D object is a core function for the machine vision system. In this paper
3D restoration was combined with the semantic understanding of 3D object
a 3D semantic scene recovery network was proposed. The semantic classification and scene restoration of 3D scene were achieved only by using a single RGB-D map as input. Firstly
an end-to-end 3D convolution neural network was established. The input of the network was a depth map. The 3D context module was used for learning the region within the camera view
then the 3D voxels with semantic labels were generated. Secondly
a synthetic data set with dense volume labels was established to train the depth learning network. Finally
the experimental results showed that the recovery performance w improved by 2.0% compared with the state-of-art. It can be seen that the 3D learning network plays well in 3D scene restoration
it owns high accuracy in semantic annotation of object in the scene.
关键词
Keywords
references
GUPTA S, ARBELÁEZ P, MALIK J. Perceptual organization and recognition of indoor scenes from RGB-D images[C]. Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition , IEEE, 2013: 564-571.
REN X F, BO L F, FOX D. RGB-(D) scene labeling: features and algorithms[C]. Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition , IEEE, 2012: 2759-2766.
SILBERMAN N, HOIEM D, KOHLI P, et al .. Indoor segmentation and support inference from RGBD images[C]. Proceedings of the 12 th European Conference on Computer Vision , Springer, 2012: 746-760.
LAI K, BO L F, FOX D. Unsupervised feature learning for 3D scene labeling[C]. Proceedings of 2014 IEEE International Conference on Robotics and Automation , IEEE, 2014: 3050-3057.
ROCK J, GUPTA T, THORSEN J, et al .. Completing 3D object shape from one depth image[C]. Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition , IEEE, 2015: 2484-2493.
MONSZPART A, MELLADO N, BROSTOW G J, et al .. RAPter:rebuilding man-made scenes with regular arrangements of planes[J]. ACM Transactions on Graphics, 2015, 34(4):103.
FIRMAN M, AODHA O M, JULIER S, et al .. Structured prediction of unobserved voxels from a single depth image[C]. Proceedings of 2016 IEEE Computer Vision and Pattern Recognition , IEEE, 2016: 5431-5440.
GUPTA S, ARBELÁEZ P, GIRSHICK R, et al .. Aligning 3D models to RGB-D images of cluttered scenes[C]. Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition , IEEE, 2015: 4731-4740.
SONG S R, XIAO J X. Sliding shapes for 3D object detection in depth images[C]. Proceedings of the 13 th European Conference on Computer Vision , Springer, 2014: 634-651.
GEIGER A, WANG CH H. Joint 3D object and layout inference from A single RGB-D image[M]//GALL J, GEHLER P, LEIBE B. Pattern Recognition . Cham: Springer, 2015: 183-195.
NAN L L, XIE K, SHARF A. A search - classify approach for cluttered indoor scene understanding[J]. ACM Transactions on Graphics, 2012, 31(6):137.
LIN D H, FIDLER S, URTASUN R. Holistic scene understanding for 3D object detection with RGBD cameras[C]. Proceedings of 2013 IEEE International Conference on Computer Vision , IEEE, 2013: 1417-1424.
SONG S, XIAO J. Deep sliding shapes for amodal 3D object detection in RGB-D images[J]. Computer Science, 2015, 139(2):808-816.
ZHENG B, ZHAO Y B, YU J C, et al .. Beyond point clouds: scene understanding by reasoning geometry and physics[C]. Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition, IEEE , 2013: 3127-3134.
KIM B S, KOHLI P, SAVARESE S. 3D scene understanding by voxel-CRF[C]. Proceedings of 2013 IEEE International Conference on Computer Vision , IEEE, 2013: 1425-1432.
HÄNE C, ZACH C, COHEN A, et al .. Joint 3D scene reconstruction and class segmentation[C]. Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition , IEEE, 2013: 97-104.
BLÁHA M, VOGEL C, RICHARD A, et al .. Large-scale semantic 3D reconstruction: an adaptive multi-resolution model for multi-class volumetric labeling[C]. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition , IEEE, 2016: 3176-3184.
HANDA A, PATRAUCEAN V, BADRINARAYANAN V, et al .. SceneNet:understanding real world indoor scenes with synthetic data[J]. Computer Science, 2015:4077-4085.
LÜ CH H, SHEN Y H, LI J H. Depth map inpainting method based on Kinect sensor[J]. Journal of Jilin University (Engineering and Technology Edition), 2016, 46(5):1697-1703. (in Chinese)
HU CH SH, ZHAN SH, WU C ZH. Image super-resolution based on deep learning features[J]. Acta Automatica Sinica, 2017, 43(5):814-821. (in Chinese)
CHANG A X, FUNKHOUSER T, GUIBAS L, et al .. ShapeNet: an information-rich 3D model repository[J]. arXiv: 1512. 03012, 2015.
JIA Y Q, SHELHAMER E, DONAHUE J, et al .. Caffe: convolutional architecture for fast feature embedding[C]. Proceedings of the 22 nd ACM International Conference on Multimedia , ACM, 2014: 675-678.
NEWCOMBE R A, IZADI S, HILLIGES O, et al .. KinectFusion: real-time dense surface mapping and tracking[C]. Proceedings of the 10 th IEEE International Symposium on Mixed and Augmented Reality , IEEE, 2011: 127-136.
GUO R Q, ZOU CH H, HOIEM D. Predicting complete 3D models of indoor scenes[J]. arXiv: 1504. 02437, 2015.