Abstract

In this work, we propose using a unified representation, termed Factorized Features, for low-level vision tasks, where we test on Single Image Super-Resolution (SISR) and Image Compression. Motivated by the shared principles between these tasks, they require recovering and preserving fine image details, whether by enhancing resolution for SISR or reconstructing compressed data for Image Compression. Unlike previous methods that mainly focus on network architecture, our proposed approach utilizes a basis-coefficient decomposition as well as an explicit formulation of frequencies to capture structural components and multi-scale visual features in images, which addresses the core challenges of both tasks. We replace the representation of prior models from simple feature maps with Factorized Features to validate the potential for broad generalizability. In addition, we further optimize the compression pipeline by leveraging the mergeable-basis property of our Factorized Features, which consolidates shared structures on multi-frame compression. Extensive experiments show that our unified representation delivers state-of-the-art performance, achieving an average relative improvement of 204.4% in PSNR over the baseline in Super-Resolution (SR) and 9.35% BD-rate reduction in Image Compression compared to the previous SOTA.


TL;DR: An unified image representation to reconstruct the fine details.

Comparison between decomposition-based, architecture-based, and our Factorized Features representation

A comparison illustrating decomposition-based methods, architecture-driven approaches, and our factorized representation.
(a) Prior decomposition-based methods use fixed Fourier- or wavelet-like bases to capture periodic structures, but often fail to preserve high-fidelity textures. (b) Architecture-oriented approaches enhance model depth or attention mechanisms, yet lack explicit frequency modeling and thus struggle with repetitive patterns. (c) Our method introduces learned Factorized Features with generalizable basis–coefficient decomposition, enabling accurate reconstruction of both global structure and local periodic patterns. Pixel-importance visualizations (middle row) illustrate that our approach focuses more effectively on structurally meaningful regions.

The full Factorized Features formulation

Diagram illustrating coordinate transformation and multi-frequency modulation in Factorized Features.
This figure visualizes our complete representation defined in Eq. (7) and (8): \[ \hat{I}(x) = P\!\left( \text{Concat}_{i=1}^{N} \text{Concat}_{j=1}^{K} \left[ c_{ij}(x)\,\odot\, \psi\!\left(\alpha_j \cdot b_i(\gamma_i(x))\right) \right] \right). \] The sawtooth coordinate transform \(\gamma_i(x)\) enforces explicit periodic sampling, causing each basis \(b_i\) to repeat at controlled spatial intervals. Multi-frequency scaling \(\alpha_j\) modulates every basis with both low- and high-frequency responses, while \(\psi\) provides a nonlinear periodic mapping that encourages sharper oscillations. By combining spatially varying coefficients \(c_{ij}(x)\) with frequency-modulated and transformed bases, the formulation captures fine textures, repetitive patterns, and multi-scale structures essential for Super-Resolution and Image Compression.

Super-Resolution and Image Compression with Factorized Features

Pipeline diagrams of Factorized Features applied to Super-Resolution and Image Compression.
(a) In Super-Resolution, the low-resolution input is processed by the Coefficient Backbone to extract \(X_{\text{coeff}}\). From this shared feature representation, convolution layers generate spatially varying coefficients, while the Basis Swin Transformer produces a multi-scale set of basis maps. The final output is reconstructed through the Factorized Features formulation, enabling explicit modeling of high- and low-frequency components.

(b) For Image Compression, the synthesis transform of a learned compression model is replaced with our SR module. This design leverages SR priors and the structured basis–coefficient representation to restore details more effectively after quantization, reducing distortion at comparable bitrates.

Visual comparisons on super-resolution (4×)

A diagram explaining the method in broad strokes, like explained in the caption.


Performance (RD-Curve) evaluation on image compression using different datasets

A diagram explaining the method in broad strokes, like explained in the caption.


Acknowledgements

This research was funded by the National Science and Technology Council, Taiwan, under Grants NSTC 112-2222-E-A49-004-MY2 and 113-2628-E-A49-023-. The authors are grateful to Google, NVIDIA, and MediaTek Inc. for generous donations. Yu-Lun Liu acknowledges the Yushan Young Fellow Program by the MOE in Taiwan.