FIPER: Generalizable Factorized Fields for Joint Image Compression and Super-Resolution
Abstract
In this work, we propose a unified representation for Super-Resolution (SR) and Image Compression, termed Factorized Fields, motivated by the shared principles between these two tasks. Both SISR and Image Compression require recovering and preserving fine image details—whether by enhancing resolution or reconstructing compressed data. Unlike previous methods that mainly focus on network architecture, our proposed approach utilizes a basis-coefficient decomposition to explicitly capture multi-scale visual features and structural components in images, addressing the core challenges of both tasks. We first derive our SR model, which includes a Coefficient Backbone and Basis Swin Transformer for generalizable Factorized Fields. Then, to further unify these two tasks, we leverage the strong information-recovery capabilities of the trained SR modules as priors in the compression pipeline, improving both compression efficiency and detail reconstruction. Additionally, we introduce a merged-basis compression branch that consolidates shared structures, further optimizing the compression process. Extensive experiments show that our unified representation delivers state-of-the-art performance, achieving an average relative improvement of 204.4% in PSNR over the baseline in Super-Resolution (SR) and 9.35% BD-rate reduction in Image Compression compared to the previous SOTA.
TL;DR: A new unified representation for both super-resolution and image compression.
The correlation between coordinate transformation and downsampling
The overall pipeline of image super-resolution with our Factorized Fields
The illustration of our joint image-compression and super-resolution framework compared with the traditional compression-only method
Visual comparisons on super-resolution (4×)
Performance (RD-Curve) evaluation on image compression using different datasets
Acknowledgements
This research was funded by the National Science and Technology Council, Taiwan, under Grants NSTC 112-2222-E-A49-004-MY2 and 113-2628-E-A49-023-. The authors are grateful to Google, NVIDIA, and MediaTek Inc. for generous donations. Yu-Lun Liu acknowledges the Yushan Young Fellow Program by the MOE in Taiwan.