Development of an Effective Bootleg Videos Retrieval System as a Part of Content-Based Video Search Engine

Authors

  • Ahmad Sedky Adly
  • Islam Hegazy
  • Taha Elarif
  • M. S. Abdelwahab

DOI:

https://doi.org/10.47839/ijc.21.2.2590

Keywords:

Content-based video search engine, bootleg video detection, content-based video indexing and retrieval, bootleg videos, CBVSE, CBVIR, feature extraction, search engine, video search, video retrieval

Abstract

Many research studies in content-based video search engines are concerned with content-based video queries retrieval where a query by example is sent to retrieve a list of visually similar videos. However, minor research is concerned with indexing and searching public video streaming services such as YouTube, where there is a dilemma for misusing copyrighted video materials and detecting bootleg manipulated videos before being uploaded. In this paper, a novel and effective technique for a content-based video search engine with effective detection of bootleg videos is evaluated on a large-scale video index dataset of 1088 video records. A novel feature vector is introduced using video shots temporal and key-object/concept features applying combinational-based matching algorithms, using various similarity metrics for evaluation. The retrieval system was evaluated using more than 200 non-semantic-based video queries evaluating both normal and bootleg videos, with retrieval precision for normal videos of 97.9% and retrieval recall of 100% combined by the F1 measure to be 98.3%. Bootleg videos retrieval precision scored 99.2% and retrieval recall was of 96.7% combined by the F1 measure to be 97.9%. This allows making a conclusion that this technique can help in enhancing both traditional text-based search engines and commonly used bootleg detection techniques.

References

A. S. Adly, I. Hegazy, T. Elarif, M. S. Abdelwahab, “Issues and challenges for content-based video search engines: A survey,” Proceedings of the 2020 21st IEEE International Arab Conference on Information Technology (ACIT), 2020, pp. 1–18.

“YouTube for Press.” [Online]. Available at: https://blog.youtube/press/.

A. R. Baloch, U. A. Kashif, K. G. Chachar, and M. A. Solangi, “Video Copyright detection using high level objects in video clip,” Sukkur IBA J. Comput. Math. Sci., vol. 1, no. 2, p. 95, 2017.

A. Mazaheri, B. Gong, and M. Shah, “Learning a multi-concept video retrieval model with multiple latent variables,” Proceedings of the 2016 IEEE International Symposium on Multimedia, ISM 2016, 2017, pp. 615–620.

N. Garcia, “Temporal aggregation of visual features for large-scale image-to-video retrieval,” Proceedings of the 2018 ACM International Conference on Multimedia Retrieval ICMR 2018, 2018, pp. 489–492.

N. Garcia and G. Vogiatzis, “Dress like a star: Retrieving fashion products from videos,” Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017, 2017, pp. 2293–2299.

M. Mühling et al., “Deep learning for content-based video retrieval in film and television production,” Multimed. Tools Appl., vol. 76, no. 21, pp. 22169–22194, 2017.

E. G. Ortiz, A. Wright, and M. Shah, “Face recognition in movie trailers via mean sequence sparse representation-based classification,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2013, pp. 3531–3538.

G. De Oliveira Barra, M. Lux, and X. Giro-I-Nieto, “Large scale content-based video retrieval with LIvRE,” Proceedings of the International Workshop on Content-Based Multimedia Indexing, 2016, pp. 1–4.

L. Rossetto et al., “IMOTION – a content-based video retrieval engine,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2015, vol. 8936, pp. 255–260.

S. S. Gornale, A. K. Babaleshwar, and P. L. Yannawar, “Analysis and detection of content based video retrieval,” Int. J. Image, Graph. Signal Process., vol. 11, no. 3, p. 43, 2019.

G. S. N. Kumar, V. S. K. Reddy, and S. Srinivas Kumar, “High-performance video retrieval based on spatio-temporal features,” Microelectronics, Electromagnetics and Telecommunications, 2018, pp. 433–441.

R. Gaikwad and J. R. Neve, “A comprehensive study in novel content based video retrieval using vector quantization over a diversity of color spaces,” Proceedings of the International Conference on Global Trends in Signal Processing, Information Computing and Communication, ICGTSPICC 2016, 2017, pp. 38–42.

Y. Yang, Z. Ma, Z. Xu, S. Yan, and A. G. Hauptmann, “How related exemplars help complex event detection in web videos?,” Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 2104–2111.

Z. Z. Lan, Y. Yang, N. Ballas, S. I. Yu, and A. Haputmann, “Resource constrained multimedia event detection,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, vol. 8325, LNCS, no. PART 1, pp. 388–399.

V. Kantorov and I. Laptev, “Efficient feature extraction, encoding, and classification for action recognition,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2014, pp. 2593–2600.

S. Yu et al., “Informedia@TRECVID 2014 MED and MER,” TRECVID Video Retrieval Evaluation Workshop, NIST, 2014. [Online]. Available: https://www-nlpir.nist.gov/projects/tvpubs/tv.pubs.14.org.html.

P. A. Zeitschrift, S. N. Band, P. Link, and E. Dienst, “Étude comparative de la distribution florale dans une portion des Alpes et du Jura,” Bull. la Société Vaudoise des Sci. Nat., vol. 37, pp. 547-579, 2013. (in French)

M. Almousa, R. Benlamri, and R. Khoury, “NLP-enriched automatic video segmentation,” Proceedings of the International Conference on Multimedia Computing and Systems -Proceedings, 2018, pp. 1–6.

B. Castellano, “PySceneDetect.” [Online]. Available at: https://pyscenedetect.readthedocs.io/en/latest/. [Accessed: 04-May-2019].

“FFmpeg.” [Online]. Available at: https://ffmpeg.org/. [Accessed: 04-May-2019].

A. S. Adly, I. Hegazy, T. Elarif, and M. S. Abdelwahab, “Indexed dataset from YouTube for a content-based video search engine,” Int. J. Intell. Comput. Inf. Sci., vol. 21, no. 1, pp. 196–215, Feb. 2021.

K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 2, pp. 386–397, 2020.

V. T. Chasanis, A. C. Likas, and N. P. Galatsanos, “Scene detection in videos using shot clustering and sequence alignment,” IEEE Trans. Multimed., vol. 11, no. 1, pp. 89–100, 2009.

H. Chim and X. Deng, “Efficient phrase-based document similarity for clustering,” IEEE Trans. Knowl. Data Eng., vol. 20, no. 9, pp. 1217–1229, 2008.

Z. Mehmood, T. Mahmood, and M. A. Javid, “Content-based image retrieval and semantic automatic image annotation based on the weighted average of triangular histograms using support vector machine,” Appl. Intell., vol. 48, no. 1, pp. 166–181, Jan. 2018.

J. Johnson, A. Karpathy, and L. Fei-Fei, “DenseCap: Fully convolutional localization networks for dense captioning,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016, pp. 4565–4574.

K. Xu et al., “Show, attend and tell: Neural image caption generation with visual attention,” Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, 2015, vol. 3, pp. 2048–2057.

D. Bahdanau, K. H. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015 – Conference Track Proceedings, 2015, pp. 1–15.

V. Mnih, N. Heess, A. Graves, and K. Kavukcuoglu, “Recurrent models of visual attention,” Advances in Neural Information Processing Systems, 2014, vol. 3, no. January, pp. 2204–2212.

J. L. Ba, V. Mnih, and K. Kavukcuoglu, “Multiple object recognition with visual attention,” Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015 – Conference Track Proceedings, 2015, pp. 1–10.

A. Kirillov, K. He, R. Girshick, C. Rother, and P. Dollar, “Panoptic segmentation,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019, pp. 9396–9405.

S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks,” Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017, vol. 2017-Janua, pp. 5987–5995.

R. Zhu et al., “Scratchdet: Training single-shot object detectors from scratch,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019, pp. 2263–2272.

Z. Shen, Z. Liu, J. Li, Y. G. Jiang, Y. Chen, and X. Xue, “Object Detection from Scratch with Deep Supervision,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 2, pp. 398–412, 2020.

S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, 2017.

P. Dollar, R. Appel, S. Belongie, and P. Perona, “Fast feature pyramids for object detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 8, pp. 1532–1545, 2014.

J. Redmon and A. Farhadi, “YOLOv3: An Incremental Improvement,” arXiv Prepr. arXiv1804.02767, vol. 1804.02767, pp. 1–6, Apr. 2018.

W. Liu et al., “SSD: Single shot multibox detector,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2016, vol. 9905 LNCS, pp. 21–37.

T. Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal Loss for Dense Object Detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 2, pp. 318–327, 2020.

S. Zhang, L. Wen, X. Bian, Z. Lei, and S. Z. Li, “Single-shot refinement neural network for object detection,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4203–4212.

S. A. Sanchez, H. J. Romero, and A. D. Morales, “A review: Comparison of performance metrics of pretrained models for object detection using the TensorFlow framework,” in IOP Conference Series: Materials Science and Engineering, 2020, vol. 844, no. 1, pp. 1–12.

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016, pp. 779–788.

C. Szegedy et al., “Going deeper with convolutions,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2015, pp. 1–9.

J. Redmon and A. Farhadi, “YOLO9000: Better, faster, stronger,” Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017, pp. 6517–6525.

M. Olafenwa and J. Olafenwa, “ImageAI, an open source python library built to empower developers to build applications and systems with self-contained Computer Vision capabilities,” Github, 2018. [Online]. Available at: https://github.com/OlafenwaMoses/ImageAI.

P. Mehta, S. Maheshkar, and V. Maheshkar, “An effective video bootleg detection algorithm based on noise analysis in frequency domain,” Commun. Comput. Inf. Sci., vol. 1147 CCIS, pp. 227–238, 2019.

H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, “HMDB: A Large Video Database for Human Motion Recognition Terms of Use Creative Commons Attribution-Noncommercial-Share Alike 3.0 HMDB: A Large Video Database for Human Motion Recognition,” IEEE, 2011.

K. Soomro, A. R. Zamir, and M. Shah, “UCF101: A Dataset of 101 human actions classes from videos in the wild,” CoRR, abs/1212.0402, vol. abs/1212.0, pp. 1–7, 2012.

H. Idrees et al., “The THUMOS challenge on action recognition for videos ‘in the wild,’” Comput. Vis. Image Underst., vol. 155, pp. 1–23, 2017.

T. Y. Lin et al., “Microsoft COCO: Common objects in context,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, vol. 8693, LNCS, no. PART 5, pp. 740–755.

M. Kan, D. Xu, S. Shan, and X. Chen, “Semisupervised hashing via kernel hyperplane learning for scalable image search,” IEEE Trans. Circuits Syst. Video Technol., vol. 24, no. 4, pp. 704–713, 2014.

GuoKehua, PanWei, LuMingming, ZhouXiaoke, and MaJianhua, “An effective and economical architecture for semantic-based heterogeneous multimedia big data retrieval,” J. Syst. Softw., vol. 102, pp. 207–216, 2015.

R. Fernandez-Beltran and F. Pla, “Latent topics-based relevance feedback for video retrieval,” Pattern Recognit., vol. 51, pp. 72–84, 2016.

X. Han, B. Singh, V. I. Morariu, and L. S. Davis, “VRFP: On-the-fly video retrieval using web images and fast fisher vector products,” IEEE Trans. Multimed., vol. 19, no. 7, pp. 1583–1595, 2017.

V. K. Jyothi, D. S. Guru, and Y. H. Sharath Kumar, “Deep Learning for Retrieval of Natural Flower Videos,” in Procedia Computer Science, vol. 132, pp. 1533–1542, 2018.

D. Asha, Y. Madhavee Latha, and V. S. K. Reddy, “Content Based Video Retrieval System Using Multiple Features,” Int. J. Pure Appl. Math., vol. 118, no. 14, pp. 287–294, 2018.

Downloads

Published

2022-06-30

How to Cite

Adly, A. S., Hegazy, I., Elarif, T., & Abdelwahab, M. S. (2022). Development of an Effective Bootleg Videos Retrieval System as a Part of Content-Based Video Search Engine. International Journal of Computing, 21(2), 214-227. https://doi.org/10.47839/ijc.21.2.2590

Issue

Section

Articles