EvTexture: Event-driven Texture Enhancement for Video Super-Resolution

Corresponding Author
1University of Science and Technology of China 2Institute of Artificial Intelligence, Hefei Comprehensive National Science Center
ICML 2024

Video Demos

Abstract

Event-based vision has drawn increasing attention due to its unique characteristics, such as high temporal resolution and high dynamic range. It has been used in video super-resolution (VSR) recently to enhance the flow estimation and temporal alignment. Rather than for motion learning, we propose in this paper the first VSR method that utilizes event signals for texture enhancement. Our method, called EvTexture, leverages high-frequency details of events to better recover texture regions in VSR. In our EvTexture, a new texture enhancement branch is presented. We further introduce an iterative texture enhancement module to progressively explore the high-temporal-resolution event information for texture restoration. This allows for gradual refinement of texture regions across multiple iterations, leading to more accurate and rich high-resolut ion details. Experimental results show that our EvTexture achieves state-of-the-art performance on four datasets. For the Vid4 dataset with rich textures, our method can get up to 4.67dB gain compared with recent event-based methods.

Motivation

comparison

Comparative results of VSR methods on the City clip in Vid4. It can be observed that current VSR methods, with (EGVSR and EBVSR) or without (BasicVSR++) event signals, still suffer from blurry textures or jitter effects, resulting in large errors in texture regions. In contrast, our method can predict the texture regions successfully and greatly reduce errors in the restored frames.

VSR Methods Comparisons

comparison

(a) RGB-based methods usually focus on motion leaning to recover the missing details from other unaligned frames. (b) Previous event-based methods have attempted to use events to enhance the motion learning. (c) In contrast, our method is the first to utilize events to enhance the texture restoration in VSR. The red dotted line is an optional branch, where our method can easily adapt to approaches that use events to enhance the motion learning.

Network Architecture

comparison

(a) Following BasicVSR, our EvTexture adopts a bidirectional recurrent network, where features are propagated forward and backward. At each timestamp, it includes a motion branch and a parallel texture branch to explicitly enhance the restoration of texture regions. (b) In the texture branch, the ITE module plays a key role. It progressively refines the feature across multiple iterations, leveraging high-frequency textural information from events along with context information from the current frame.

Quantitative Results

comparison

Quantitative comparison (PSNR↑/SSIM↑) on Vid4, REDS4 and Vimeo-90K-T for 4× VSR. All results are calculated on Y-channel except REDS4 (RGB-channel). The input types "I" and "I+E" represent RGB-based and event-based methods, respectively. Red and blue colors indicate the best and second-best performances, respectively.

Qualitative Results

BibTeX

@inproceedings{kai2024evtexture,
  title={{E}v{T}exture: {E}vent-driven {T}exture {E}nhancement for {V}ideo {S}uper-{R}esolution},
  author={Kai, Dachun and Lu, Jiayao and Zhang, Yueyi and Sun, Xiaoyan},
  booktitle={Proceedings of the 41st International Conference on Machine Learning},
  pages={22817--22839},
  year={2024},
  volume={235},
  publisher={PMLR}
}