IEEE TPAMI 2026 ICML 2024

EvTexture++: Event-Driven Texture Enhancement for Video Super-Resolution

Journal extension of EvTexture, with the ICML 2024 conference paper and the IEEE TPAMI 2026 article collected on one project page.

Corresponding author
1University of Science and Technology of China 2Midea Group

Video Demos

Abstract

Event-based vision has drawn increasing attention owing to its distinctive properties, including ultra-high temporal resolution and extreme dynamic range. Recent works have introduced it to video super-resolution (VSR) to enhance flow estimation and temporal alignment. In contrast, this paper shifts the focus of event signals from motion refinement to texture enhancement in VSR. We propose EvTexture++, the first event-driven framework dedicated to texture enhancement in VSR. It leverages high-frequency spatiotemporal details from events to improve texture recovery. EvTexture++ incorporates a customized texture enhancement branch, along with an iterative texture enhancement module that progressively exploits high-temporal-resolution event information for texture restoration. This enables gradual refinement of texture regions across iterations, yielding more accurate and detailed high-resolution outputs. Besides intra-frame texture recovery, large motions could degrade inter-frame temporal consistency, particularly in texture regions, leading to texture flickering. To mitigate this, we further exploit the continuous-time motion cues of events to enhance temporal consistency, introducing a temporal texture alignment module that estimates event-guided texture-aware flow for precise inter-frame texture alignment. Moreover, EvTexture++ is designed as a plug-and-play tool to flexibly boost the performance of existing VSR models. Experiments on five datasets demonstrate that EvTexture++ achieves state-of-the-art performance. When integrated into recent VSR models, it yields significant improvements, with gains of up to 1.55 dB in PSNR on the texture-rich Vid4 dataset.

Motivation

Visual comparison on a challenging texture-rich scene

Visual comparison on a challenging texture-rich scene. While current VSR methods, whether frame-based (e.g., MIA-VSR and IART) or event-based (e.g., EGVSR), suffer from severe over-smoothing, our EvTexture++ successfully reconstructs coherent building stripes. This is further validated by the error maps, where our method exhibits significantly lower residuals by leveraging high-frequency event information.

VSR Methods Comparisons

Comparison of different VSR paradigms

Comparison of different VSR paradigms. (a) RGB-based methods primarily rely on motion alignment to aggregate temporal information. (b) Previous event-based methods leverage events mainly to assist motion learning. (c) In contrast, EvTexture++ pioneers the use of events for explicit texture restoration, while simultaneously utilizing them to refine motion alignment for better robustness.

Network Architecture

Network architecture of EvTexture++

Network architecture of EvTexture++. (a) EvTexture++ adopts a bidirectional recurrent structure with parallel event-guided texture and motion branches for spatial texture restoration and temporal texture consistency, respectively. (b) The ITE module iteratively refines features with richer textural details via a shared ConvGRU, leveraging high-frequency spatiotemporal event signals and the current frame context.

Event-guided Motion Branch

Event-guided motion branch in EvTexture++

EvTexture++ further integrates event signals into the motion branch and introduces a Temporal Texture Alignment (TTA) module, which consists of an RGB-based MEMC and an event-based MEMC that jointly improve feature alignment. In the event-based MEMC, events are converted into voxel grids and processed by a U-Net to estimate fast and non-linear motion from events for alignment. The RGB-based MEMC estimates optical flow from images using SpyNet and aligns features accordingly.

Plug-in Framework

Overview of the EvTexture++ plug-in framework

Overview of the EvTexture++ plug-in framework. During training, the frozen backbone extracts spatial features before propagation, temporal features after propagation, and bidirectional optical flow. The EvTexture++ plug-in refines propagated features conditioned on event information and the other extracted features. This flexible design can be integrated into various VSR models to consistently improve performance.

Quantitative Results

Quantitative comparison on Vid4, REDS4, and Vimeo-90K-T
Quantitative comparison at different upsampling scales
Quantitative comparison with the EvTexture++ plug-in

EvTexture++ achieves state-of-the-art performance on standard VSR benchmarks, extended scale settings, and plug-in evaluations. The plug-in variants consistently improve frozen CNN- and Transformer-based backbones, indicating that the gains come from event-guided texture cues rather than simply increasing parameter count.

Qualitative Results

BibTeX

@article{kai2026evtexture++,
  title={{E}v{T}exture++: {E}vent-{D}riven {T}exture {E}nhancement for {V}ideo {S}uper-{R}esolution},
  author={Kai, Dachun and Lu, Jiayao and Zhang, Yueyi and Sun, Xiaoyan},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  volume={48},
  number={6},
  pages={6642--6659},
  year={2026},
  publisher={IEEE}
}

@inproceedings{kai2024evtexture,
  title={{E}v{T}exture: {E}vent-driven {T}exture {E}nhancement for {V}ideo {S}uper-{R}esolution},
  author={Kai, Dachun and Lu, Jiayao and Zhang, Yueyi and Sun, Xiaoyan},
  booktitle={Proceedings of the 41st International Conference on Machine Learning},
  pages={22817--22839},
  year={2024},
  volume={235},
  publisher={PMLR}
}