1.宁波大学 信息科学与工程学院,浙江 宁波 315211
2.宁波大学 科技学院, 浙江 宁波 315300
[ "封哲宇(2000-),男,浙江海宁人,硕士研究生,2023年于宁波大学获得学士学位,现为宁波大学信息科学与工程学院研究生,主要从事光场图像压缩等方面的研究。E-mail:15068368500@163.com" ]
[ "蒋刚毅(1964-),男,浙江绍兴人,博士,教授,博士生导师,2000年于韩国Ajou大学获得博士学位,主要从事多媒体信号处理与通信、计算成像、视觉感知与编码、图像与视频质量评价等方面的研究工作。E-mail: jianggangyi@nbu.edu.cn" ]
收稿:2025-03-31,
修回:2025-07-10,
纸质出版:2025-09-25
移动端阅览
封哲宇,蒋志迪,万立飞等.高效Mamba驱动的端到端光场图像压缩[J].光学精密工程,2025,33(18):2980-2995.
FENG Zheyu,JIANG Zhidi,WAN Lifei,et al.Efficient mamba-driven end-to-end light field image compression[J].Optics and Precision Engineering,2025,33(18):2980-2995.
封哲宇,蒋志迪,万立飞等.高效Mamba驱动的端到端光场图像压缩[J].光学精密工程,2025,33(18):2980-2995. DOI: 10.37188/OPE.20253318.2980. CSTR: 32169.14.OPE.20253318.2980.
FENG Zheyu,JIANG Zhidi,WAN Lifei,et al.Efficient mamba-driven end-to-end light field image compression[J].Optics and Precision Engineering,2025,33(18):2980-2995. DOI: 10.37188/OPE.20253318.2980. CSTR: 32169.14.OPE.20253318.2980.
光场图像因记录了光线的空间与角度信息,可提供比传统2D图像更丰富的视觉信息,但其高维特性导致现有压缩方法在全局特征利用、长距离相关性挖掘及计算复杂度上存在局限,限制了压缩性能和效率的提升。为此,本文提出了一种高效Mamba驱动的端到端光场图像压缩方法。首先,从4D光场图像中提取包含空间和极平面信息的2D切片,并利用Mamba充分捕捉其全局上下文信息。其次,为了在多个方向上扫描光场图像并避免计算复杂度的大幅增加,引入了一种通道高效的2D选择性扫描策略,以精确高效地提取光场特征。最后,在解码端设计了一个残差重建模块,该模块在降低参数量和减少编解码时间的基础上,显著提升了重建图像的质量。实验结果表明,与现有代表方法SADN相比,所提方法在7×7角度分辨率的光场图像上平均实现了7.4%的码率降低和0.37 dB的PSNR提升,同时在主观视觉质量上也表现更佳。在编解码时间方面,所提方法实现了10~20倍的显著提升。此外,与现有最新方法LFIC-DRASC相比,所提方法在13×13角度分辨率的光场图像上平均实现了19.5%的码率降低和0.58 dB的PSNR提升。
Light field images capture both spatial and angular information of light rays, providing richer visual information than traditional 2D images. However, their high-dimensional nature poses challenges for existing compression methods in terms of global feature utilization, long-range correlation exploration, and computational complexity, limiting the improvements of compression performance and efficiency. To address these issues, this paper proposed an efficient Mamba-driven end-to-end light field image compression method. Firstly, 2D slices containing spatial and epipolar plane information were extracted from the 4D light field image, and Mamba was employed to fully capture their global contextual information. Secondly, to scan the light field image in multiple directions while avoiding a significant increase in computational complexity, a channel-efficient 2D selective scanning strategy was introduced to extract light field features accurately and efficiently. Finally, on the decoding end, a residual reconstruction module was designed to enhance the reconstructed image quality while reducing the number of parameters and decreasing the encoding and decoding time. The experimental results show that compared with the existing representative method SADN, the proposed method achieves an average bitrate reduction of 7.4% and a PSNR improvement of 0.37 dB on light field images with a 7×7 angular resolution, while also demonstrating superior subjective visual quality. In terms of encoding and decoding time, the proposed method has achieved a significant improvement of 10 to 20 times. Furthermore, compared to the state-of-the-art method LFIC-DRASC, the proposed method achieves an average bitrate reduction of 19.5% and a PSNR improvement of 0.58 dB on light field images with a 13×13 angular resolution.
SAMARAKOON T , ABEYWARDENA K , EDUSSOORIYA C U S . Arbitrary Volumetric Refocusing of Dense and Sparse Light Fields [EB/OL]. ( 2025-02-26 ). https://arxiv.org/abs/2502.19238 https://arxiv.org/abs/2502.19238 .
吕晓波 , 刘宇丰 , 李毅威 , 等 . 快照式光谱光场成像技术 [J]. 光学 精密工程 , 2021 , 29 ( 2 ): 220 - 230 .
LÜ X B , LIU Y F , LI Y W , et al . Snapshot spectral light-field imaging technology [J]. Opt. Precision Eng. , 2021 , 29 ( 2 ): 220 - 230 . (in Chinese)
LEVOY M , HANRAHAN P . Light Field Rendering [M]. New York : ACM Press , 2023 : 441 - 452 . doi: 10.1145/3596711.3596759 http://dx.doi.org/10.1145/3596711.3596759
DONG L , WANG L Z , LI L , et al . Pseudo-sequence-based light field image compression [C]. 2016 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). 11-15,2016 , Seattle, WA. IEEE , 2016 : 1 - 4 . doi: 10.1109/icmew.2016.7574674 http://dx.doi.org/10.1109/icmew.2016.7574674
SULLIVAN G J , OHM J R , HAN W J , et al . Overview of the high efficiency video coding (HEVC) standard [J]. IEEE Transactions on Circuits and Systems for Video Technology , 2012 , 22 ( 12 ): 1649 - 1668 . doi: 10.1109/tcsvt.2012.2221191 http://dx.doi.org/10.1109/tcsvt.2012.2221191
BROSS B , CHEN J L , OHM J R , et al . Developments in international video coding standardization after AVC, with an overview of versatile video coding (VVC) [J]. Proceedings of the IEEE , 2021 , 109 ( 9 ): 1463 - 1493 . doi: 10.1109/jproc.2020.3043399 http://dx.doi.org/10.1109/jproc.2020.3043399
SHAO J R , BAI E J , JIANG X Q , et al . Light-field image compression based on a two-dimensional prediction coding structure [J]. Information , 2024 , 15 ( 6 ): 339 . doi: 10.3390/info15060339 http://dx.doi.org/10.3390/info15060339
HUANG X P , AN P , CHEN Y L , et al . Low bitrate light field compression with geometry and content consistency [J]. IEEE Transactions on Multimedia , 2020 , 24 : 152 - 165 .
ZHANG Y Z , WAN L F , MAO Y F , et al . Geometry-aware view reconstruction network for light field image compression [J]. Scientific Reports , 2022 , 12 : 22254 . doi: 10.1038/s41598-022-26887-4 http://dx.doi.org/10.1038/s41598-022-26887-4
LIU D Y , HUANG Y , FANG Y M , et al . Multi-stream dense view reconstruction network for light field image compression [J]. IEEE Transactions on Multimedia , 2022 , 25 : 4400 - 4414 . doi: 10.1109/tmm.2022.3175023 http://dx.doi.org/10.1109/tmm.2022.3175023
SHENG H , ZHAO P , ZHANG S , et al . Occlusion-aware depth estimation for light field using multi-orientation EPIs [J]. Pattern Recognition , 2018 , 74 : 587 - 599 . doi: 10.1016/j.patcog.2017.09.010 http://dx.doi.org/10.1016/j.patcog.2017.09.010
LIU D Y , AN P , MA R , et al . Content-based light field image compression method with Gaussian process regression [J]. IEEE Transactions on Multimedia , 2020 , 22 ( 4 ): 846 - 859 . doi: 10.1109/tmm.2019.2934426 http://dx.doi.org/10.1109/tmm.2019.2934426
SCHIOPU I , MUNTEANU A . Deep-learning-based macro-pixel synthesis and lossless coding of light field images [J]. APSIPA Transactions on Signal and Information Processing , 2019 , 8 ( 1 ): 20 . doi: 10.1017/atsip.2019.14 http://dx.doi.org/10.1017/atsip.2019.14
MINNEN D , BALLÉ J , TODERICI G D . Joint autoregressive and hierarchical priors for learned image compression [C]. Advances in Neural Information Processing Systems , 2018 : 10794 - 10803 .
MINNEN D , SINGH S . Channel-wise autoregressive entropy models for learned image compression [C]. 2020 IEEE International Conference on Image Processing (ICIP). October 25 - 28 , 2020 . Abu Dhabi, United Arab Emirates. IEEE , 2020 : 3339 - 3343 .
HE D L , ZHENG Y Y , SUN B C , et al . Checkerboard context model for efficient learned image compression [C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 20 - 25 , 2021 . Nashville, TN, USA. IEEE , 2021 : 14766 - 14775 .
JIANG W , YANG J Y , ZHAI Y Q , et al . MLIC: multi-reference entropy model for learned image compression [C]. Proceedings of the 31st ACM International Conference on Multimedia. Ottawa ON Canada . ACM , 2023 : 7618 - 7627 . doi: 10.1145/3581783.3611694 http://dx.doi.org/10.1145/3581783.3611694
JIANG W , WANG R . MLIC++: Linear complexity multi-reference entropy modeling for learned image compression [C]. ICML 2023 Workshop Neural Compression : From Information Theory to Applications. Honolulu, HI, USA , 2023 . doi: 10.1145/3581783.3611694 http://dx.doi.org/10.1145/3581783.3611694
程俊 , 郁梅 , 蒋刚毅 . 结合视差补偿与3D数据处理的盲光场图像质量评价 [J]. 光学 精密工程 , 2023 , 31 ( 8 ): 1202 - 1216 . doi: 10.37188/ope.20233108.1202 http://dx.doi.org/10.37188/ope.20233108.1202
CHENG J , YU M , JIANG G Y . Blind light field image quality assessment combining disparity compensation with 3D data processing [J]. Opt. Precision Eng. , 2023 , 31 ( 8 ): 1202 - 1216 . (in Chinese) . doi: 10.37188/ope.20233108.1202 http://dx.doi.org/10.37188/ope.20233108.1202
SINGH M , RAMESHAN R M . Learning-based practical light field image compression using a disparity-aware model [C]. 2021 Picture Coding Symposium (PCS). June 29-July 2 , 2021 . Bristol, United Kingdom. IEEE , 2021: 1 - 5 . doi: 10.1109/pcs50896.2021.9477448 http://dx.doi.org/10.1109/pcs50896.2021.9477448
ZHONG T T , JIN X , TONG K D . 3D-CNN autoencoder for plenoptic image compression [C]. 2020 IEEE International Conference on Visual Communications and Image Processing (VCIP). 1-4,2020 , Macau, China. IEEE , 2020 : 209 - 212 . doi: 10.1109/vcip49819.2020.9301793 http://dx.doi.org/10.1109/vcip49819.2020.9301793
TONG K D , JIN X , WANG C , et al . SADN: learned light field image compression with spatial-angular decorrelation [C]. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 23-27,2022 , Singapore, Singapore. IEEE , 2022 : 1870 - 1874 . doi: 10.1109/icassp43922.2022.9747377 http://dx.doi.org/10.1109/icassp43922.2022.9747377
YE K S , LI Y , LI G , et al . End-to-end light field image compression with multi-domain feature learning [J]. Applied Sciences , 2024 , 14 ( 6 ): 2271 . doi: 10.3390/app14062271 http://dx.doi.org/10.3390/app14062271
FENG S Y , ZHANG Y , ZHU L W , et al . LFIC-DRASC: deep light field image compression using disentangled representation and asymmetrical strip convolution [J]. IEEE Transactions on Broadcasting , 2025 , 71 ( 3 ): 889 - 902 . doi: 10.1109/tbc.2025.3579225 http://dx.doi.org/10.1109/tbc.2025.3579225
VASWANI A , SHAZEER N , PARMAR N , et al . Attention is all you need [C]. Advances in Neural Information Processing Systems , 2017 .
TONG K D , JIN X , YANG Y Q , et al . Learned focused plenoptic image compression with microimage preprocessing and global attention [J]. IEEE Transactions on Multimedia , 2023 , 26 : 890 - 903 . doi: 10.1109/tmm.2023.3272747 http://dx.doi.org/10.1109/tmm.2023.3272747
LIU G S , YUE H J , WEN B H , et al . Learned focused plenoptic image compression with local-global correlation learning [J]. IEEE Transactions on Multimedia , 2025 , 27 : 1216 - 1227 . doi: 10.1109/tmm.2024.3521815 http://dx.doi.org/10.1109/tmm.2024.3521815
GU A , GOEL K , RÉ C , et al . Efficiently Modeling Long Sequences with Structured State Spaces [EB/OL]. ( 2021-10-31 )[ 2022-08-05 ]. https://arxiv.org/abs/2111.00396 https://arxiv.org/abs/2111.00396 .
GU A , JOHNSON I , GOEL K , et al . Combining recurrent, convolutional, and continuous-time models with linear state-space layers [C]. Neural Information Processing Systems , 2008 .
GU A , DAO T . Mamba: Linear-time Sequence Modeling with Selective State Spaces [EB/OL]. ( 2023-12-01 )[ 2024-05-31 ]. https://arxiv.org/abs/2312.00752 https://arxiv.org/abs/2312.00752 .
LIU Y , TIAN Y , ZHAO Y , et al . VMamba: Visual state space model [J]. Advances in Neural Information Processing Systems , 2024 , 37 : 103031 - 103063 .
ZHU L , LIAO B , ZHANG Q , et al . Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model [EB/OL]. 2024: 2401 . 09417 . https://arxiv.org/abs/2401.09417 https://arxiv.org/abs/2401.09417 .
QIN S , WANG J , ZHOU Y , et al . MambaVC: Learned Visual Compression with Selective State Spaces [EB/OL]. ( 2024-05-24 )[ 2024-05-28 ]. https://arxiv.org/abs/2405.15413 https://arxiv.org/abs/2405.15413 .
CHENG Z X , SUN H M , TAKEUCHI M , et al . Learned image compression with discretized gaussian mixture likelihoods and attention modules [C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 13 - 19 , 2020 . Seattle, WA, USA. IEEE , 2020 : 7939 - 7948 .
LIANG Z Y , WANG Y Q , WANG L G , et al . Learning non-local spatial-angular correlation for light field image super-resolution [C]. 2023 IEEE/CVF International Conference on Computer Vision (ICCV). 1-6,2023 , Paris, France. IEEE , 2023 : 12342 - 12352 . doi: 10.1109/iccv51070.2023.01137 http://dx.doi.org/10.1109/iccv51070.2023.01137
LU Y , WANG S , WANG Z , et al . LFMamba: light field image super-resolution with state space model [EB/OL]. ( 2024-06-18 ). https://arxiv.org/abs/2406.12463 https://arxiv.org/abs/2406.12463 .
GAO R S , XIAO Z Y , XIONG Z W . Mamba-based light field super-resolution with efficient subspace scanning [C]. Computer Vision-ACCV 2024. Singapore : Springer Nature Singapore , 2025 : 421 - 437 . doi: 10.1007/978-981-96-0917-8_24 http://dx.doi.org/10.1007/978-981-96-0917-8_24
ZHANG Y L , LI K P , LI K , et al . Image super-resolution using very deep residual channel attention networks [C]. Computer Vision-ECCV 2018. Cham : Springer International Publishing , 2018 : 294 - 310 . doi: 10.1007/978-3-030-01234-2_18 http://dx.doi.org/10.1007/978-3-030-01234-2_18
GUO H , LI J M , DAI T , et al . MambaIR : A Simple Baseline for Image Restoration with State - Space Model [M]. Computer Vision – ECCV 2024. Cham : Springer Nature Switzerland , 2024 : 222 - 241 . doi: 10.1007/978-3-031-72649-1_13 http://dx.doi.org/10.1007/978-3-031-72649-1_13
ELFWING S , UCHIBE E , DOYA K . Sigmoid-weighted linear units for neural network function approximation in reinforcement learning [J]. Neural Networks , 2018 , 107 : 3 - 11 . doi: 10.1016/j.neunet.2017.12.012 http://dx.doi.org/10.1016/j.neunet.2017.12.012
PEI X , HUANG T , XU C . EfficientVmamba: Atrous Selective Scan for Light Weight Visual Mamba [EB/OL]. ( 2024-03-15 ). https://arxiv.org/abs/2403.09977 https://arxiv.org/abs/2403.09977 . doi: 10.1609/aaai.v39i6.32690 http://dx.doi.org/10.1609/aaai.v39i6.32690
BÉGAINT J , RACAPÉ F , FELTMAN S , et al . CompressAI: a PyTorch library and evaluation platform for end-to-end compression research [EB/OL]. ( 2020-11-05 ). https://arxiv.org/abs/2011.03029 https://arxiv.org/abs/2011.03029 .
RERABEK M , EBRAHIMI T . New light field image dataset [C]. 8th International Conference on Quality of Multimedia Experience (QoMEX). Lisbon, Portugal , 2016 .
"HEVC Official Test Model ." https://vcgit.hhi.fraunhofer.de/jvet/HM/-/tags https://vcgit.hhi.fraunhofer.de/jvet/HM/-/tags .
"VVC Official Test Model ." https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM .
0
浏览量
0
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621
