多阶段帧对齐的视频超分辨率重建网络

王森; 祝阳; 张印辉; 王庆健; 何自芬

doi:10.37188/OPE.20233116.2430

您当前的位置：

首页 >

文章列表页 >

多阶段帧对齐的视频超分辨率重建网络

信息科学 | 更新时间：2023-08-25

- 多阶段帧对齐的视频超分辨率重建网络
- Multi-stage frame alignment video super- resolution network
- 光学精密工程 2023年31卷第16期页码：2430-2443
- 作者机构：
  
  昆明理工大学机电工程学院，云南昆明 650500
- 作者简介：
  
  [ "王森（1983-），男，河南信阳人，博士，副教授，硕士生导师，2007年于郑州轻工业大学获得学士学位，2014年于昆明理工大学获得硕士学位，2017年在昆明理工大学获得博士学位，现为昆明理工大学机电工程学院副教授，主要从事机器视觉、视觉智能感知与测量、故障诊断方面的研究。E-mail： wangsen0401@126.com" ]
  [ "祝阳（1998-），男，江西鹰潭人，硕士研究生，2021年于温州理工学院获得学士学位，现为昆明理工大学机电工程学院硕士研究生，主要从事计算机视觉中图像复原方面的算法研究。E-mail: zhuyang1023@foxmail.com" ]
  [ "张印辉（1977-），男，河北衡水人，教授，博士生导师，分别于2000年、2005年西安理工大学获得学士、硕士学位，2010年于昆明理工大学获得博士学位，现为昆明理工大学机电工程学院教授，主要从事计算机视觉中图像分割方面的算法研究。E-mail： zhangyinhui@kust.edu.cn" ]
- 基金信息：
  
  国家自然科学基金资助项目(52065035;62061022;62171206)
- DOI：10.37188/OPE.20233116.2430
  中图分类号： TP391
- 收稿日期：2022-12-14，
  
  修回日期：2023-01-13，
  
  纸质出版日期：2023-08-25
- 稿件说明：
移动端阅览
王森,祝阳,张印辉等.多阶段帧对齐的视频超分辨率重建网络[J].光学精密工程,2023,31(16):2430-2443.

WANG Sen,ZHU Yang,ZHANG Yinhui,et al.Multi-stage frame alignment video super- resolution network[J].Optics and Precision Engineering,2023,31(16):2430-2443.
王森,祝阳,张印辉等.多阶段帧对齐的视频超分辨率重建网络[J].光学精密工程,2023,31(16):2430-2443. DOI： 10.37188/OPE.20233116.2430.

WANG Sen,ZHU Yang,ZHANG Yinhui,et al.Multi-stage frame alignment video super- resolution network[J].Optics and Precision Engineering,2023,31(16):2430-2443. DOI： 10.37188/OPE.20233116.2430.

摘要

视频超分辨率（Video-Super Resolution，VSR）旨在将低分辨率视频帧序列重建为高分辨率视频帧序列。相较于图像超分辨率，VSR由于增加了时间维度的信息，因此通常需要依赖邻近帧高度相关信息实现当前帧的重建。如何对齐相邻帧，并获取帧间高度相关信息，是VSR任务关注的重点问题。本文将VSR任务分为去模糊、对齐、重建三个阶段。在去模糊阶段，将当前帧与相邻帧进行预对齐，获取与当前帧高度相关的特征信息，通过强化当前帧的细节以便实现初始阶段更多特征信息的提取。在对齐阶段，通过对输入特征进行二次对齐操作，利用相邻帧中高度相关信息进一步强化当前帧中特征信息。在重建阶段，通过聚合原始低分辨率帧以在网络末端提供更多特征信息。本文利用多层感知机（Multi-Layer Perceptron，MLP）代替传统卷积操作构造特征提取模块，同时对生成的特征信息进行二次对齐，以细化图像特征获得更优的视频帧重建效果。实验结果表明，本文提出的算法在多种公开数据集上的视频帧序列重建精度更高的同时，也取得了更少的网络参数量和更连贯的视频序列重建表现。

Abstract

Video-Super Resolution （VSR） aims to reconstruct low-resolution video frame sequences into high-resolution video frame sequences. Compared with single image super-resolution， VSR usually relies on the height-dependent information of neighboring frames to reconstruct the current frame because of the added information of temporal dimension. How to align adjacent frames and obtain highly correlated information between frames is the key issue of VSR task. In this paper， the VSR task is divided into three stages： deblurring， alignment， and reconstruction. In the deblurring stage， the current frame is pre-aligned with adjacent frames to obtain feature information highly related to the current frame， and the details of the current frame are enhanced to achieve more feature information extraction in the initial stage. In the alignment stage， the highly correlated information in adjacent frames is used to further strengthen the feature information in the current frame by performing a secondary alignment operation on the input features. In the reconstruction stage， raw low-resolution frames are aggregated to provide more feature information at the end of the network. In this paper， we use Multi-Layer Perceptron （MLP） instead of the traditional convolution operation to construct a feature extraction module， and also perform a secondary alignment of the generated feature information to refine the image features to obtain better video frame reconstruction results. The experimental results show that the proposed algorithm achieves a higher accuracy of video frame sequence reconstruction on a variety of publicly available datasets while achieving a lower number of network parameters and a more coherent video sequence reconstruction performance.

关键词

Keywords

references

DONG C ， LOY C C ， HE K M ， et al . Learning a Deep Convolutional Network for Image Super-Resolution ［M］. Computer Vision-ECCV 2014. Cham ： Springer International Publishing ， 2014 ： 184 - 199 . doi: 10.1007/978-3-319-10593-2_13 http://dx.doi.org/10.1007/978-3-319-10593-2_13

LIM B ， SON S ， KIM H ， et al . Enhanced deep residual networks for single image super-resolution ［C］. 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops （CVPRW） . 21 - 26 ， 2017， Honolulu， HI， USA. IEEE ， 2017： 1132 - 1140 . doi: 10.1109/cvprw.2017.151 http://dx.doi.org/10.1109/cvprw.2017.151

ZHANG Y L ， LI K P ， LI K ， et al . Image Super-Resolution Using Very Deep Residual Channel Attention Networks ［M］. Computer Vision - ECCV 2018. Cham ： Springer International Publishing ， 2018 ： 294 - 310 . doi: 10.1007/978-3-030-01234-2_18 http://dx.doi.org/10.1007/978-3-030-01234-2_18

SHI W Z ， CABALLERO J ， HUSZÁR F ， et al . Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network ［C］. 2016 IEEE Conference on Computer Vision and Pattern Recognition （CVPR） . 27 - 30 ， 2016， Las Vegas， NV， USA. IEEE ， 2016： 1874 - 1883 . doi: 10.1109/cvpr.2016.207 http://dx.doi.org/10.1109/cvpr.2016.207

蔡体健，彭潇雨，石亚鹏，等 . 通道注意力与残差级联的图像超分辨率重建［J］. 光学精密工程， 2021 ， 29 （ 1 ）： 142 - 151 . doi: 10.37188/OPE.20212901.0142 http://dx.doi.org/10.37188/OPE.20212901.0142

CAI T J ， PENG X Y ， SHI Y P ， et al . Channel attention and residual concatenation network for image super-resolution ［J］. Opt. Precision Eng. ， 2021 ， 29 （ 1 ）： 142 - 151 . （in Chinese） . doi: 10.37188/OPE.20212901.0142 http://dx.doi.org/10.37188/OPE.20212901.0142

程德强，赵佳敏，寇旗旗，等 . 多尺度密集特征融合的图像超分辨率重建［J］. 光学精密工程， 2022 ， 30 （ 20 ）： 2489 - 2500 . doi: 10.37188/OPE.20223020.2489 http://dx.doi.org/10.37188/OPE.20223020.2489

CHENG D Q ， ZHAO J M ， KOU Q Q ， et al . Multi-scale dense feature fusion network for image super-resolution ［J］. Opt. Precision Eng. ， 2022 ， 30 （ 20 ）： 2489 - 2500 . （in Chinese） . doi: 10.37188/OPE.20223020.2489 http://dx.doi.org/10.37188/OPE.20223020.2489

耿铭昆，吴凡路，王栋 . 轻量化火星遥感影像超分辨率重建网络［J］. 光学精密工程， 2022 ， 30 （ 12 ）： 1487 - 1498 . doi: 10.37188/OPE.20223012.1487 http://dx.doi.org/10.37188/OPE.20223012.1487

GENG M K ， WU F L ， WANG D . Lightweight Mars remote sensing image super-resolution reconstruction network ［J］. Opt. Precision Eng. ， 2022 ， 30 （ 12 ）： 1487 - 1498 . （in Chinese） . doi: 10.37188/OPE.20223012.1487 http://dx.doi.org/10.37188/OPE.20223012.1487

LIU H Y ， RUAN Z B ， ZHAO P ， et al . Video super-resolution based on deep learning： a comprehensive survey ［J］. Artificial Intelligence Review ， 2022 ， 55 （ 8 ）： 5981 - 6035 . doi: 10.1007/s10462-022-10147-y http://dx.doi.org/10.1007/s10462-022-10147-y

KAPPELER A ， YOO S ， DAI Q Q ， et al . Video super-resolution with convolutional neural networks ［J］. IEEE Transactions on Computational Imaging ， 2016 ， 2 （ 2 ）： 109 - 122 . doi: 10.1109/tci.2016.2532323 http://dx.doi.org/10.1109/tci.2016.2532323

CHAN K C K ， WANG X T ， YU K ， et al . BasicVSR： the search for essential components in video super-resolution and beyond ［C］. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . 20 - 25 ， 2021， Nashville， TN， USA. IEEE ， 2021： 4945 - 4954 . doi: 10.1109/cvpr46437.2021.00491 http://dx.doi.org/10.1109/cvpr46437.2021.00491

CHAN K C K ， ZHOU S C ， XU X Y ， et al . Basicvsr： improving video super-resolution with enhanced propagation and alignment ［C］. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . 18 - 24 ， 2022， New Orleans， LA， USA. IEEE ， 2022： 5962 - 5971 . doi: 10.1109/cvpr52688.2022.00588 http://dx.doi.org/10.1109/cvpr52688.2022.00588

WANG X T ， CHAN K C K ， YU K ， et al . EDVR： video restoration with enhanced deformable convolutional networks ［C］. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops （CVPRW） . 16 - 17 ， 2019， Long Beach， CA， USA. IEEE ， 2020： 1954 - 1963 . doi: 10.1109/cvprw.2019.00247 http://dx.doi.org/10.1109/cvprw.2019.00247

CABALLERO J ， LEDIG C ， AITKEN A ， et al . Real-time video super-resolution with spatio-temporal networks and motion compensation ［C］. 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR） . 21 - 26 ， 2017， Honolulu， HI， USA. IEEE ， 2017： 2848 - 2857 . doi: 10.1109/cvpr.2017.304 http://dx.doi.org/10.1109/cvpr.2017.304

XUE T F ， CHEN B A ， WU J J ， et al . Video enhancement with task-oriented flow ［J］. International Journal of Computer Vision ， 2019 ， 127 （ 8 ）： 1106 - 1125 . doi: 10.1007/s11263-018-01144-2 http://dx.doi.org/10.1007/s11263-018-01144-2

TAO X ， GAO H Y ， LIAO R J ， et al . Detail-revealing deep video super-resolution ［C］. 2017 IEEE International Conference on Computer Vision （ICCV） . 22 - 29 ， 2017， Venice， Italy. IEEE ， 2017： 4482 - 4490 . doi: 10.1109/iccv.2017.479 http://dx.doi.org/10.1109/iccv.2017.479

YAN Q S ， GONG D ， SHI Q F ， et al . Attention-guided network for ghost-free high dynamic range imaging ［C］. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . 15 - 20 ， 2019， Long Beach， CA， USA. IEEE ， 2020： 1751 - 1760 . doi: 10.1109/cvpr.2019.00185 http://dx.doi.org/10.1109/cvpr.2019.00185

KUPYN O ， MARTYNIUK T ， WU J R ， et al . DeblurGAN-v2： Deblurring （orders-of-magnitude） faster and better ［C］. 2019 IEEE/CVF International Conference on Computer Vision （ICCV）. 272，2019 ， Seoul， Korea （South）. IEEE ， 2020 ： 8877 - 8886 . doi: 10.1109/iccv.2019.00897 http://dx.doi.org/10.1109/iccv.2019.00897

TIAN Y P ， ZHANG Y L ， FU Y ， et al . TDAN： Temporally-deformable alignment network for video super-resolution ［C］. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . 13 - 19 ， 2020， Seattle， WA， USA. IEEE ， 2020： 3357 - 3366 . doi: 10.1109/cvpr42600.2020.00342 http://dx.doi.org/10.1109/cvpr42600.2020.00342

JO Y ， OH S W ， KANG J ， et al . Deep video super-resolution network using dynamic upsampling filters without explicit motion compensation ［C］. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . 18 - 23 ， 2018， Salt Lake City， UT， USA. IEEE ， 2018： 3224 - 3232 . doi: 10.1109/cvpr.2018.00340 http://dx.doi.org/10.1109/cvpr.2018.00340

KIM S Y ， LIM J ， NA T ， et al . 3DSRnet： Video Super-Resolution Using 3D Convolutional Neural Networks ［EB/OL］. 2018 ： arXiv ： 1812 . 09079 . https：//arxiv.org/abs/1812.09079 https://arxiv.org/abs/1812.09079 . doi: 10.1109/ijcnn.2018.8489036 http://dx.doi.org/10.1109/ijcnn.2018.8489036

LI S ， HE F X ， DU B ， et al . Fast spatio-temporal residual network for video super-resolution ［C］. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . 15 - 20 ， 2019， Long Beach， CA， USA. IEEE ， 2020： 10514 - 10523 . doi: 10.1109/cvpr.2019.01077 http://dx.doi.org/10.1109/cvpr.2019.01077

HUANG Y ， WANG W ， WANG L . Video super-resolution via bidirectional recurrent convolutional networks ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence ， 2018 ， 40 （ 4 ）： 1015 - 1028 . doi: 10.1109/tpami.2017.2701380 http://dx.doi.org/10.1109/tpami.2017.2701380

ZHU X B ， LI Z Z ， ZHANG X Y ， et al . Residual invertible spatio-temporal network for video super-resolution ［J］. Proceedings of the AAAI Conference on Artificial Intelligence ， 2019 ， 33 （ 1 ）： 5981 - 5988 . doi: 10.1609/aaai.v33i01.33015981 http://dx.doi.org/10.1609/aaai.v33i01.33015981

FUOLI D ， GU S H ， TIMOFTE R . Efficient video super-resolution through recurrent latent space propagation ［C］. 2019 IEEE/CVF International Conference on Computer Vision Workshop （ICCVW） . 27 - 28 ， 2019， Seoul， Korea （South）. IEEE ， 2020： 3476 - 3485 . doi: 10.1109/iccvw.2019.00431 http://dx.doi.org/10.1109/iccvw.2019.00431

VALANARASU J M J ， PATEL V M . UNeXt ： MLP-Based Rapid Medical Image Segmentation Network ［M］. Lecture Notes in Computer Science . Cham ： Springer Nature Switzerland ， 2022 ： 23 - 33 . doi: 10.1007/978-3-031-16443-9_3 http://dx.doi.org/10.1007/978-3-031-16443-9_3

RANJAN A ， BLACK M J . Optical flow estimation using a spatial pyramid network ［C］. 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. July 21 - 26 ， 2017 . Honolulu， HI. IEEE ， 2017 ： 4161 - 4170 .

ZAMIR S W ， ARORA A ， KHAN S ， et al . Multi-stage progressive image restoration ［C］. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. June 20 - 25 ， 2021 . Nashville， TN， USA. IEEE ， 2021 ： 14821 - 14831 .

YI P ， WANG Z Y ， JIANG K ， et al . Omniscient video super-resolution ［C］. 2021 IEEE/CVF International Conference on Computer Vision （ICCV）. October 10 - 17 ， 2021 . Montreal， QC， Canada. IEEE ， 2021 ： 4429 - 4438 .

WANG Z Y ， YI P ， JIANG K ， et al . Multi-memory convolutional neural network for video super-resolution ［J］. IEEE Transactions on Image Processing ， 2019 ， 28 （ 5 ）： 2530 - 2544 . doi: 10.1109/tip.2018.2887017 http://dx.doi.org/10.1109/tip.2018.2887017

YI P ， WANG Z Y ， JIANG K ， et al . Multi-temporal ultra dense memory network for video super-resolution ［J］. IEEE Transactions on Circuits and Systems for Video Technology ， 2020 ， 30 （ 8 ）： 2503 - 2516 . doi: 10.1109/tcsvt.2019.2925844 http://dx.doi.org/10.1109/tcsvt.2019.2925844

YI P ， WANG Z Y ， JIANG K ， et al . Progressive fusion video super-resolution network via exploiting non-local spatio-temporal correlations ［C］. 2019 IEEE/CVF International Conference on Computer Vision （ICCV）. October 27 - November 2 ， 2019 ， Seoul， Korea （South）. IEEE ， 2020 ： 3106 - 3115 . doi: 10.1109/iccv.2019.00320 http://dx.doi.org/10.1109/iccv.2019.00320

ISOBE T ， JIA X ， GU S H ， et al . Video Super-Resolution with Recurrent Structure-Detail Network ［M］. Computer Vision-ECCV 2020. Cham ： Springer International Publishing ， 2020 ： 645 - 660 . doi: 10.1007/978-3-030-58610-2_38 http://dx.doi.org/10.1007/978-3-030-58610-2_38

SAJJADI MS ， VEMULAPALLI R ， BROWN M . Frame-recurrent video super-resolution ［C］. Proceedings of the IEEE conference on computer vision and pattern recognition ， 2018 ： 6626 - 6634 . doi: 10.1109/cvpr.2018.00693 http://dx.doi.org/10.1109/cvpr.2018.00693

YAN B ， LIN C ， TAN W . Frame and feature-context video super-resolution ［C］. Proceedings of the AAAI conference on artificial intelligence ， 2019 ， 33 （ 01 ）： 5597 - 5604 . doi: 10.1609/aaai.v33i01.33015597 http://dx.doi.org/10.1609/aaai.v33i01.33015597

LOSHCHILOV I ， HUTTER F . SGDR： Stochastic Gradient Descent with Warm Restarts ［EB/OL］. 2016 ： arXiv ： 1608 . 03983 . https：//arxiv.org/abs/1608.03983 https://arxiv.org/abs/1608.03983

KINGMA D P ， BA J . Adam： a Method for Stochastic Optimization ［EB/OL］. 2014 ： arXiv ： 1412 . 6980 . https：//arxiv.org/abs/1412.6980 https://arxiv.org/abs/1412.6980

NAH S ， BAIK S ， HONG S ， et al . NTIRE 2019 challenge on video deblurring and super-resolution： dataset and study ［C］. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops （CVPRW） . 16 - 17 ， 2019， Long Beach， CA， USA. IEEE ， 2020： 1996 - 2005 . doi: 10.1109/cvprw.2019.00251 http://dx.doi.org/10.1109/cvprw.2019.00251

LIU C ， SUN D Q . On Bayesian adaptive video super resolution ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence ， 2014 ， 36 （ 2 ）： 346 - 360 . doi: 10.1109/tpami.2013.127 http://dx.doi.org/10.1109/tpami.2013.127

LI W B ， TAO X ， GUO T A ， et al . MuCAN ： Multi-Correspondence Aggregation Network For Video Super-Resolution ［M］. Computer Vision - ECCV 2020. Cham ： Springer International Publishing ， 2020 ： 335 - 351 . doi: 10.1007/978-3-030-58607-2_20 http://dx.doi.org/10.1007/978-3-030-58607-2_20

ISOBE T ， LI S J ， JIA X ， et al . Video super-resolution with temporal group attention ［C］. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . 13 - 19 ， 2020， Seattle， WA， USA. IEEE ， 2020： 8005 - 8014 . doi: 10.1109/cvpr42600.2020.00803 http://dx.doi.org/10.1109/cvpr42600.2020.00803

ISOBE T ， ZHU F ， WANG S . Revisiting Temporal Modeling for Video Super-Resolution ［EB/OL］. 2020 ： arXiv ： 2008 . 05765 . https：//arxiv.org/abs/2008.05765 https://arxiv.org/abs/2008.05765

HARIS M ， SHAKHNAROVICH G ， UKITA N . Recurrent back-projection network for video super-resolution ［C］. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . 15 - 20 ， 2019， Long Beach， CA， USA. IEEE ， 2020： 3892 - 3901 . doi: 10.1109/cvpr.2019.00402 http://dx.doi.org/10.1109/cvpr.2019.00402

YI P , WANG Z , JIANG K , et al . Progressive fusion video super-resolution network via exploiting non-local spatio-temporal correlations [C]. Proceedings of the IEEE/CVF international conference on computer vision . 2019 : 3106 - 3115 . doi: 10.1109/iccv.2019.00320 http://dx.doi.org/10.1109/iccv.2019.00320

YANG R ， WANG S ， WU X ， et al . Using lightweight convolutional neural network to track vibration displacement in rotating body video ［J］. Mechanical Systems and Signal Processing ， 2022 ， 177 ： 109137 . doi: 10.1016/j.ymssp.2022.109137 http://dx.doi.org/10.1016/j.ymssp.2022.109137

ZHOU J W ， LI H G ， ZHANG L ， et al . Vibration measurement with video processing based on alternating optimization of frequency and phase shifts ［J］. IEEE Transactions on Instrumentation and Measurement ， 2021 ， 70 ： 1 - 13 . doi: 10.1109/tim.2021.3116322 http://dx.doi.org/10.1109/tim.2021.3116322

浏览量

176

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于机载的红外动态目标视频实时超分辨率重建

基于注意力机制的宽波段小目标实时去模糊

基于改进YOLOv4的道路交通标志识别

基于注意力机制的多尺度车辆行人检测算法

基于改进双流时空网络的人体行为识别