Cluster Analysis of Information in Complex Networks

Authors

  • Oksana Kyrychenko
  • Serhii Ostapov
  • Ihor Malyk

DOI:

https://doi.org/10.47839/ijc.22.4.3360

Keywords:

complex networks, information system, random matrix, graph clustering, Monte Carlo method

Abstract

The research is devoted to the study of information in complex networks, namely: calculation of statistical characteristics and cluster analysis of data. Special software (crawler) was developed for direct data collection from the web space. In addition, a new structure of information technology has been developed for the collection, processing, and storage of large volumes of data collected from the web space. With the help of this structure, statistical characteristics of different segments of the web space (Ukrainian – edu.ua, Polish – edu.pl and Israeli – ac.il) are studied and their cluster structure is studied. The study of the cluster structure of web space zones was carried out using the spectral clustering algorithm of РІС (Rower iteration clustering). The results of the search for the optimal number of clusters using the "elbow" method and the k-Core decomposition method are presented, graphs illustrating the cluster structure of the investigated subnets are drawn. The paper also proposes a new approach to solving the problem of clustering and finding the optimal number of clusters when clustering objects are given by unstructured data (graphs) based on the spectral analysis of the stochastic matrix of the given graph. On this basis, a new method developed by the authors for determining the optimal number of clusters is proposed. Model examples are given and testing of the new method based on Monte Carlo simulation is performed. The optimal number of clusters was found by four methods: the "elbow" method, the k-core decomposition method, the silhouette method, and a new method developed by the authors. A conclusion is made concerning the accuracy of the developed new method, its advantages and disadvantages.

References

Yu. Holovatch, O. Olemskoi, C. von Ferber, T. Holovatch, O. Mryglod, I. Olemskoi, V. Palchykov, “Complex networks,” Journal of Physical Studies, vol. 10, no. 4, pp. 247–289, 2006. https://doi.org/10.30970/jps.10.247

M. E. J. Newman, “The structure of scientific collaboration networks,” Proceedings of the National Academy of Sciences of the United States of America, vol. 98, no. 2, pp. 404–409, 2001. https://doi.org/10.1073/pnas.98.2.404.

M. E. J. Newman, “The structure and function of complex networks,” SIAM Review, vol. 45, issue 2, pp. 167–256, 2003. https://doi.org/10.1137/S003614450342480.

S. H. Strogatz, “Exploring complex networks,” Nature, vol. 410, pp. 268-276, 2001. https://doi.org/10.1038/35065725.

M. E. J. Newman, “Models of the small world,” J. Stat. Phys, vol. 101, pp. 819–841, 2000. https://doi.org/10.1023/A:1026485807148.

A. Broder, R. Kumar, F. Maghoul et al. “Graph structure in the web,” Proceedings of the 9th World Wide Web Conference on Computer networks, 2000, vol. 33 (1), pp. 309-320.

A. Barrat, M. Weight, “On the properties of small-world networks models,” The European Physical Journal, vol. 13, pp. 547–560, 2000. https://doi.org/10.1007/s100510050067.

D. J. Watts, S. H. Strogatz, “Collective dynamics of ‘small-world’ networks,” Nature, vol. 393, pp. 440–442, 1998. https://doi.org/10.1038/30918.

L. A. N. Amaral, A. Scala, M. Barthélémy, and H. E. Stanley, “Classes of small-world networks,” Proceedings of the National Academy of Sciences of the United States of America, vol. 97, no. 21, pp. 11149–11152, 2000. https://doi.org/10.1073/pnas.200327197.

D. J. Watts, Small Worlds: The Dynamics of Networks Between Order and Randomness, Princeton University Press, 1999, 262 p. https://doi.org/10.1515/9780691188331.

M. E. J. Newman, “The structure and function of complex networks,” SIAM Review, vol. 45, issue 2, pp. 167-256, 2003. https://doi.org/10.1137/S003614450342480

R. Baeza-Yates, C. Castillo, E. N. Efthimiadis, “Characterization of national web domains,” Journal ACM Transactions on Internet Technology, vol. 7, no. 2, pp. 9–33, 2007. https://doi.org/10.1145/1239971.1239973.

V. V. Pasichnyk, N. M. Ivanushchak, “Research and modelling of complex networks,” Eastern-European Journal of Enterprise Technologies, vol. 2, no. 3(44), pp. 43–48, 2010.

J. M. Kleinberg, “Navigation in a small world,” Nature, vol. 406, p. 845, 2000. https://doi.org/10.1038/35022643

M. E. J. Newman, S. H. Strogatz, and D. J. Watts, “Random graphs with arbitrary degree distribution and their applications,” Physical Review E 64, 026118, 2001. https://doi.org/10.1103/PhysRevE.64.026118.

N. Mishra, R. Schreiber, I. Stanton, and R. Tarjan, “Clustering social networks,” In Anthony Bonato and Fan R. K. Chung, editors, Algorithms and Models for the Web-Graph, Vol. 4863 of Lecture Notes in Computer Science, chapter 5, pp. 56–67. Springer Berlin Heidelberg, Berlin, Heidelberg, 2007.

P. Domingos and M. Richardson, “Mining the network value of customers,” Proceedings of the seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’01, New York, pp. 57–66, 2001. https://doi.org/10.1145/502512.502525.

S. Fortunato, “Community detection in graphs,” Physics Reports, vol. 486, issues 3-5, pp. 75–174, 2010. https://doi.org/10.1016/j.physrep.2009.11.002

R. C. Qiu, P. Antonik, Smart Grid using Big Data Analytics: A Random Matrix Theory Approach, Wiley, 2017, 632 p.

T Tao, Topics in Random Matrix Theory, American Mathematical Society, 2023. 282 p.

F. Lin and W. W. Cohen, “Power iteration clustering,” Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, June 21-24, 2010, pp. 1-10.

U. von Luxburg, “A tutorial on spectral clustering,” Statistics and Computing, vol. 17, pp. 395–416, 2007. https://doi.org/10.1007/s11222-007-9033-z.

T. Xiang and S. Gong, “Spectral clustering with eigenvector selection,” Pattern Recognition, vol. 41, issue 3, pp. 1012–1029, 2008. https://doi.org/10.1016/j.patcog.2007.07.023

A. Y. Ng, M. Jordan, and Y. Weiss, “On spectral clustering: Analysis and an algorithm,” Advances in Neural Information Processing Systems 14 (NIPS 2001), 2001, pp. 1-8.

T. C. Ramos, J. Mourão-Miranda and A. Fujita, “Spectral density-based clustering algorithms for complex networks,” Front. Neurosci., vol. 17, article 926321, 2023. https://doi.org/10.3389/fnins.2023.926321.

O. Kyrychenko, “Features of software architecture for collecting and analyzing statistical information on the global network,” Information Technology: Computer Science, Software Engineering and Cyber Security, no. 2, pp. 107–112, 2023. https://doi.org/10.32782/it/2023-2-13.

R. Xin, J. Gonzalez, M. Franklin, and I. Stoica “GraphX: A resilient distributed graph system on spark,” AMPLab, EECS, UC Berkeley 2013. https://doi.org/10.1145/2484425.2484427

S.-T. Cheng, Y.-C. Chen, and M.-S. Tsai, “Using k-Core decomposition to find cluster centers for k-Means algorithm in GraphX on Spark,” Proceedings of the 2017 Eighth International Conference on Cloud Computing, GRIDs, and Virtualization Cloud Computing’2017, 2017, pp. 93-98.

A. Kuraria, N. Jharbade, & M. Soni, Manish, “Centroid selection process using WCSS and elbow method for K-Mean clustering algorithm in data mining,” International Journal of Scientific Research in Science, Engineering and Technology, vol. 4, issue 11, pp. 190-195, 2018. https://doi.org/10.32628/IJSRSET21841122

Determining the number of clusters in a data set. [Online]. Available at: https://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set

O. Kyrychenko, “Information technology for statistical cluster analysis of information in complex networks,” Computer Systems and Information Technologies, vol. 4, pp. 47–51, 2022. https://doi.org/10.31891/csit-2022-4-7

O. L. Kyrychenko, S. Ostapov, I. Kanovsky, “Investigation of the certain internet domain statistical characteristics,” Eastern-European Journal of Enterprise Technologies, vol. 6, no. 12(66), pp. 91–96, 2014. https://doi.org/10.15587/1729-4061.2013.19698

O. L. Kyrychenko, I. V. Malyk, & S. E. Ostapov, “Stochastic models in artificial intelligence development,” Bulletin of Taras Shevchenko National University of Kyiv. Physics and Mathematics, no. 2, pp. 53–57, 2021. https://doi.org/10.17721/1812-5409.2021/2.7

E. P. Wigner, “On the distribution of the roots of certain symmetric matrices,” The Annals of Mathematics, Second Series, vol. 67, no. 2, pp. 325-327, 1958. https://doi.org/10.2307/1970008

E. P. Wigner, “Characteristic vectors of bordered matrices with infinite dimensions,” The Annals of Mathematics, second series, vol. 62, no. 3, pp. 548-564, 1955. https://doi.org/10.2307/1970079

L. Lenssen, E. Schubert, Erich, “Clustering by direct optimization of the medoid silhouette,” Proceedings of the International Conference on Similarity Search and Applications (SISAP’2022), 2022, pp. 190–204. https://doi.org/10.1007/978-3-031-17849-8_15

F. Batool, C. Hennig, “Clustering with the average silhouette width,” Computational Statistics & Data Analysis, vol. 158, 2021. https://doi.org/10.1016/j.csda.2021.107190

Downloads

Published

2023-12-31

How to Cite

Kyrychenko, O., Ostapov, S., & Malyk, I. (2023). Cluster Analysis of Information in Complex Networks. International Journal of Computing, 22(4), 515-523. https://doi.org/10.47839/ijc.22.4.3360

Issue

Section

Articles