Restoring Quality from Bitrate Collapse: A Two-Stage GAN for Enhancing Heavily Compressed Video
Keywords:
Video enhancement, compression artifact removal, GANs for video restoration, low-bitrate video, temporal consistency, perceptual quality metrics, deep learning for post-processingAbstract
Low-bitrate video compression (e.g., H.264/AVC at ≤300 Kbps) typically introduces visible artifacts such as blocking, blurring, and texture loss. This paper proposes a two-stage Generative Adversarial Network (GAN) architecture tailored to restore visual quality in degraded video sequences. The system incorporates motion alignment, residual blocks with attention mechanisms, and multi-frame temporal modeling to enhance spatial fidelity and consistency. A novel training dataset is constructed by synthetically compressing high-quality video content to simulate real-world degradation. We analyze the architecture in detail, discuss training stability (including mode collapse mitigation), and propose a combination of distortion and perceptual losses, including L1, SSIM, LPIPS, and adversarial objectives. Quantitative evaluation on standard benchmarks shows that the proposed model achieves competitive or better performance compared to earlier methods like ESRGAN, EDVR, CVEGAN, and traditional deblocking techniques. We further present visual comparisons, ablation studies, and training dynamics to validate each architectural component. The enhanced frames exhibit restored detail and consistent temporal structure across sequences. A key novelty lies in targeting extremely compressed content and demonstrating restoration capability under these constraints. This makes the approach suitable for scenarios such as cloud video storage or ultra-low-bandwidth transmission, where post-decompression enhancement is crucial.
References
M. Maksymiv and T. Rak, “Method of video quality-improving,” Artificial Intelligence, vol. 28, no. 3, pp. 47–62, 2023. https://doi.org/10.15407/jai2023.03.047.
Z. Li, A. Aaron, I. Katsavounidis, A. Moorthy, M. Manohara, “Toward a practical perceptual video quality metric,” Netflix Technology Blog, 2016. [Online]. Available at: https://netflixtechblog.com/toward-a-practical-perceptual-video-quality-metric-653f208b9652.
A. Hore, D. Ziou, “Image quality metrics: PSNR vs. SSIM,” Pattern Recognit. Lett., vol. 30, no. 2, pp. 271–279, 2010. https://doi.org/10.1016/j.patrec.2009.08.005.
Z. Wang, A.C. Bovik, H.R. Sheikh, E.P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, 2004. https://doi.org/10.1109/TIP.2003.819861.
K. Zhang, W. Zuo, Y. Chen, D. Meng, L. Zhang, “Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising,” IEEE Trans. Image Process., vol. 26, no. 7, pp. 3142–3155, 2017. https://doi.org/10.1109/TIP.2017.2662206.
A. Foi, V. Katkovnik, K. Egiazarian, “Pointwise shape-adaptive DCT for high-quality deblocking,” Proc. SPIE, vol. 6064, 2006. https://doi.org/10.1117/12.642839.
I. Goodfellow et al., “Generative Adversarial Nets,” Adv. Neural Inf. Process. Syst. (NeurIPS), 2014. [Online]. Available at: https://papers.nips.cc/paper_files/paper/2014/hash/5ca3e9b122f61f8f06494c97b1afccf3-Abstract.html.
C. Ledig et al., “Photo-realistic single image super-resolution using a generative adversarial network,” Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, pp. 4681–4690. https://doi.org/10.1109/CVPR.2017.19.
X. Wang et al., “ESRGAN: Enhanced super-resolution generative adversarial networks,” Proc. ECCV Workshops, 2018. https://doi.org/10.1007/978-3-030-11021-5_5.
M. Chu et al., “TecoGAN: Temporally coherent GAN for video super-resolution,” ACM Trans. Graph., 2018. https://doi.org/10.1145/3386569.3392457.
M.S.M. Sajjadi et al., “Frame-recurrent video super-resolution,” Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 6626–6634. https://doi.org/10.1109/CVPR.2018.00694.
X. Wang et al., “EDVR: Video restoration with enhanced deformable convolutional networks,” Proc. CVPRW, 2019. https://doi.org/10.1109/CVPRW.2019.00247.
R. Yang et al., “Multi-frame quality enhancement for compressed video,” Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 6664–6673. https://doi.org/10.1109/CVPR.2018.00697.
R. Yang et al., “MFQE 2.0: A new benchmark and model for multi-frame quality enhancement on compressed video,” IEEE Trans. Image Process., vol. 29, pp. 6076–6090, 2020. https://doi.org/10.1109/TIP.2020.2982381.
C. Ma et al., “CVEGAN: A perceptually-inspired GAN for compressed video enhancement,” Signal Process. Image Commun., vol. 114, 2024, Art. no. 117084. https://doi.org/10.1016/j.image.2024.117127.
S.S. Andrei et al., “SUPERVEGAN: Super resolution video enhancement GAN for perceptually improving low bitrate streams,” IEEE Access, vol. 9, pp. 129456–129469, 2021. https://doi.org/10.1109/ACCESS.2021.3090344.
M. Maksymiv and T. Rak, “Multi-scale temporal GAN-based method for high-resolution and motion stable video enhancement,” Radio Electronics, Computer Science, Control, no. 3, pp. 86–95, 2025. https://doi.org/10.15588/1607-3274-2025-3-9.
A. Dosovitskiy et al., “FlowNet: Learning optical flow with convolutional networks,” Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2015, pp. 2758–2766. https://doi.org/10.1109/ICCV.2015.316.
D. Pathak et al., “Context encoders: Feature learning by inpainting,” Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 2536–2544. https://doi.org/10.1109/CVPR.2016.278.
P. Isola et al., “Image-to-image translation with conditional adversarial networks,” Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, pp. 1125–1134. https://doi.org/10.1109/CVPR.2017.632.
T. Mahmood, “AV1 compression performance compared to H.264/HEVC,” IEEE Commun. Stand. Mag., vol. 3, no. 1, pp. 32–38, 2019. doi: 10.1109/MCOMSTD.001.1800023.
Z. Wang et al., “Why is image quality assessment so difficult?,” Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), 2002, vol. 4, pp. 3313–3316. https://doi.org/10.1109/ICASSP.2002.5745084.
T. Karras et al., “Alias-free generative adversarial networks,” Adv. Neural Inf. Process. Syst. (NeurIPS), 2021. [Online]. Available: https://arxiv.org/abs/2106.12423.
Y. Wu et al., “AnimeSR: Learning real-world super-resolution for anime-style art,” Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022. https://doi.org/10.1109/CVPR52688.2022.01094.
X. Wang et al., “Distilling the knowledge in a neural network for efficient GAN inference,” Proc. Eur. Conf. Comput. Vis. (ECCV), 2020. https://doi.org/10.1007/978-3-030-58545-7_36.
Downloads
Published
How to Cite
Issue
Section
License
International Journal of Computing is an open access journal. Authors who publish with this journal agree to the following terms:• Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
• Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
• Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.