Restoring Quality from Bitrate Collapse:  A Two-Stage GAN for Enhancing Heavily Compressed Video

Mykola Maksymiv; Taras Rak

doi:10.47839/ijc.24.4.4341

Authors

Mykola Maksymiv
Taras Rak

DOI:

https://doi.org/10.47839/ijc.24.4.4341

Keywords:

Video enhancement, compression artifact removal, GANs for video restoration, low-bitrate video, temporal consistency, perceptual quality metrics, deep learning for post-processing

Abstract

Low-bitrate video compression (e.g., H.264/AVC at ≤300 Kbps) typically introduces visible artifacts such as blocking, blurring, and texture loss. This paper proposes a two-stage Generative Adversarial Network (GAN) architecture tailored to restore visual quality in degraded video sequences. The system incorporates motion alignment, residual blocks with attention mechanisms, and multi-frame temporal modeling to enhance spatial fidelity and consistency. A novel training dataset is constructed by synthetically compressing high-quality video content to simulate real-world degradation. We analyze the architecture in detail, discuss training stability (including mode collapse mitigation), and propose a combination of distortion and perceptual losses, including L1, SSIM, LPIPS, and adversarial objectives. Quantitative evaluation on standard benchmarks shows that the proposed model achieves competitive or better performance compared to earlier methods like ESRGAN, EDVR, CVEGAN, and traditional deblocking techniques. We further present visual comparisons, ablation studies, and training dynamics to validate each architectural component. The enhanced frames exhibit restored detail and consistent temporal structure across sequences. A key novelty lies in targeting extremely compressed content and demonstrating restoration capability under these constraints. This makes the approach suitable for scenarios such as cloud video storage or ultra-low-bandwidth transmission, where post-decompression enhancement is crucial.

References

M. Maksymiv and T. Rak, “Method of video quality-improving,” Artificial Intelligence, vol. 28, no. 3, pp. 47–62, 2023. https://doi.org/10.15407/jai2023.03.047.

Z. Li, A. Aaron, I. Katsavounidis, A. Moorthy, M. Manohara, “Toward a practical perceptual video quality metric,” Netflix Technology Blog, 2016. [Online]. Available at: https://netflixtechblog.com/toward-a-practical-perceptual-video-quality-metric-653f208b9652.

A. Hore, D. Ziou, “Image quality metrics: PSNR vs. SSIM,” Pattern Recognit. Lett., vol. 30, no. 2, pp. 271–279, 2010. https://doi.org/10.1016/j.patrec.2009.08.005.

Z. Wang, A.C. Bovik, H.R. Sheikh, E.P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, 2004. https://doi.org/10.1109/TIP.2003.819861.

K. Zhang, W. Zuo, Y. Chen, D. Meng, L. Zhang, “Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising,” IEEE Trans. Image Process., vol. 26, no. 7, pp. 3142–3155, 2017. https://doi.org/10.1109/TIP.2017.2662206.

A. Foi, V. Katkovnik, K. Egiazarian, “Pointwise shape-adaptive DCT for high-quality deblocking,” Proc. SPIE, vol. 6064, 2006. https://doi.org/10.1117/12.642839.

I. Goodfellow et al., “Generative Adversarial Nets,” Adv. Neural Inf. Process. Syst. (NeurIPS), 2014. [Online]. Available at: https://papers.nips.cc/paper_files/paper/2014/hash/5ca3e9b122f61f8f06494c97b1afccf3-Abstract.html.

C. Ledig et al., “Photo-realistic single image super-resolution using a generative adversarial network,” Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, pp. 4681–4690. https://doi.org/10.1109/CVPR.2017.19.

X. Wang et al., “ESRGAN: Enhanced super-resolution generative adversarial networks,” Proc. ECCV Workshops, 2018. https://doi.org/10.1007/978-3-030-11021-5_5.

M. Chu et al., “TecoGAN: Temporally coherent GAN for video super-resolution,” ACM Trans. Graph., 2018. https://doi.org/10.1145/3386569.3392457.

M.S.M. Sajjadi et al., “Frame-recurrent video super-resolution,” Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 6626–6634. https://doi.org/10.1109/CVPR.2018.00694.

X. Wang et al., “EDVR: Video restoration with enhanced deformable convolutional networks,” Proc. CVPRW, 2019. https://doi.org/10.1109/CVPRW.2019.00247.

R. Yang et al., “Multi-frame quality enhancement for compressed video,” Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 6664–6673. https://doi.org/10.1109/CVPR.2018.00697.

R. Yang et al., “MFQE 2.0: A new benchmark and model for multi-frame quality enhancement on compressed video,” IEEE Trans. Image Process., vol. 29, pp. 6076–6090, 2020. https://doi.org/10.1109/TIP.2020.2982381.

C. Ma et al., “CVEGAN: A perceptually-inspired GAN for compressed video enhancement,” Signal Process. Image Commun., vol. 114, 2024, Art. no. 117084. https://doi.org/10.1016/j.image.2024.117127.

S.S. Andrei et al., “SUPERVEGAN: Super resolution video enhancement GAN for perceptually improving low bitrate streams,” IEEE Access, vol. 9, pp. 129456–129469, 2021. https://doi.org/10.1109/ACCESS.2021.3090344.

M. Maksymiv and T. Rak, “Multi-scale temporal GAN-based method for high-resolution and motion stable video enhancement,” Radio Electronics, Computer Science, Control, no. 3, pp. 86–95, 2025. https://doi.org/10.15588/1607-3274-2025-3-9.

A. Dosovitskiy et al., “FlowNet: Learning optical flow with convolutional networks,” Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2015, pp. 2758–2766. https://doi.org/10.1109/ICCV.2015.316.

D. Pathak et al., “Context encoders: Feature learning by inpainting,” Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 2536–2544. https://doi.org/10.1109/CVPR.2016.278.

P. Isola et al., “Image-to-image translation with conditional adversarial networks,” Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, pp. 1125–1134. https://doi.org/10.1109/CVPR.2017.632.

T. Mahmood, “AV1 compression performance compared to H.264/HEVC,” IEEE Commun. Stand. Mag., vol. 3, no. 1, pp. 32–38, 2019. doi: 10.1109/MCOMSTD.001.1800023.

Z. Wang et al., “Why is image quality assessment so difficult?,” Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), 2002, vol. 4, pp. 3313–3316. https://doi.org/10.1109/ICASSP.2002.5745084.

T. Karras et al., “Alias-free generative adversarial networks,” Adv. Neural Inf. Process. Syst. (NeurIPS), 2021. [Online]. Available: https://arxiv.org/abs/2106.12423.

Y. Wu et al., “AnimeSR: Learning real-world super-resolution for anime-style art,” Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022. https://doi.org/10.1109/CVPR52688.2022.01094.

X. Wang et al., “Distilling the knowledge in a neural network for efficient GAN inference,” Proc. Eur. Conf. Comput. Vis. (ECCV), 2020. https://doi.org/10.1007/978-3-030-58545-7_36.

International Journal of Computing

Restoring Quality from Bitrate Collapse: A Two-Stage GAN for Enhancing Heavily Compressed Video

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Information