Classification of Diabetes using Multinomial Naive Bayes, Logistic Regression, and Multi-Layer Perceptron Algorithms

Emad Majeed Hameed; Hardik Joshi; Hayder Jasim Habil; Mudhar A. Al-Obaidi

doi:10.47839/ijc.24.4.4342

Authors

Emad Majeed Hameed
Hardik Joshi
Hayder Jasim Habil
Mudhar A. Al-Obaidi

DOI:

https://doi.org/10.47839/ijc.24.4.4342

Keywords:

Diabetes, Prediction, Multinomial Naive Bayes, Logistic Regression, MLP

Abstract

Diabetes is an ongoing condition in which a human being's blood glucose levels increase to unacceptably high levels. For the purpose of organizing the required treatments and avoiding the development of more severe diseases that this disease may bring on, diabetes should be detected as early as possible. In this study, diabetes is classified through the use of different models and to determine the most appropriate model that can be used for this problem. In this study, Logistic Regression, Multinomial Naive Bayes, And Multi-Layer Perceptron Algorithms are utilised as classification models. The Indian Pima Data Set is utilised to test these techniques. The preprocessing steps used in this study involve working with the noisy data, scaling of data using normalization, processing imbalanced data using the SMOTE approach, and using sequential backward selection technique (SBS) for features selection. The classification performances of techniques Logistic Regression, Multinomial Naive Bayes, and Multi-Layer Perceptron obtained by dividing dataset into 80% training dataset and 20% testing dataset are 74.5%, 78%, and 62%, respectively. This study has specifically solved the issues of under fitting and overfitting.

References

A. M. Egan & S. F. Dinneen, “What is diabetes?” Medicine (United Kingdom), vol. 42, issue 12, pp. 679–681, 2014. https://doi.org/10.1016/j.mpmed.2014.09.005.

S. Kumari, D. Kumar, and M. Mittal, “An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier,” International Journal of Cognitive Computing in Engineering, vol. 2, pp. 40–46, 2021. https://doi.org/10.1016/j.ijcce.2021.01.001.

F. Mercaldo, V. Nardone, and A. Santone, “Diabetes mellitus affected patients classification and diagnosis through machine learning techniques,” Procedia Computer Science, vol. 112, pp. 2519–2528, 2017. https://doi.org/10.1016/j.procs.2017.08.193.

A. Viloria, Y. Herazo-Beltran, D. Cabrera, and O.B. Pineda, “Diabetes diagnostic prediction using vector support machines,” Procedia Computer Science, vol. 170, pp. 376–381, 2020. https://doi.org/10.1016/j.procs.2020.03.065.

J. Chaki, S. T. Ganesh, S. K. Cidham, & S. A. Theertan, “Machine learning and artificial intelligence based Diabetes Mellitus detection and self-management: A systematic review,” Journal of King Saud University – Computer and Information Sciences, vol. 34, issue 6, pp. 3204-3225, 2022. https://doi.org/10.1016/j.jksuci.2020.06.013.

T. Sharma, and M. Shah, “A comprehensive review of machine learning techniques on diabetes detection,” Visual Computing for Industry, Biomedicine, and Art, vol. 4, issue 1, p. 30, 2021. https://doi.org/10.1186/s42492-021-00097-7.

S. Afzali and O. Yildiz, “An effective sample preparation method for diabetes prediction,” International Arab Journal of Information Technology, vol. 15, no. 6, 2018.

E. M. Hameed, I. S. Hussein, H. G. Altameemi, & Q. K. Kadhim, “Liver disease detection and prediction using SVM techniques,” Proceedings of the 2022 3rd IEEE Information Technology to Enhance e-learning and Other Application (IT-ELA), 2022, pp. 61-66. https://doi.org/10.1109/IT-ELA57378.2022.10107961.

F. Al-Areqi and M. Z. Konyar, “Effectiveness evaluation of different feature extraction methods for classification of Covid-19 from computed tomography images: A high accuracy classification study,” Biomedical Signal Processing and Control, vol. 76, 2022, https://doi.org/10.1016/j.bspc.2022.103662.

Kaggle database. [Online]. Available at: https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database.

S. Joshi, S. R. Priyanka Shetty, “Performance analysis of different classification methods in data mining for diabetes dataset using WEKA tool,” International Journal on Recent and Innovation Trends in Computing and Communication, vol. 3, issue 3, pp. 1168-1173, 2015. https://doi.org/10.17762/ijritcc2321-8169.150361.

K. Kannadasan, D. R. Edla, and V. Kuppili, “Type 2 diabetes data classification using stacked autoencoders in deep neural networks,” Clinical Epidemiology and Global Health, vol. 7, issue 4, pp. 530–535, 2019. https://doi.org/10.1016/j.cegh.2018.12.004.

A. Mujumdar, V. Vaidehi, “Diabetes prediction using machine learning algorithms,” Proc Comput Sci., vol. 165, pp. 292-299, 2019. https://doi.org/10.1016/j.procs.2020.01.047.

P. Cihan and H. Coskun, “Performance comparison of machine learning models for diabetes prediction,” Proceedings of the 29th Signal Processing and Communications Applications Conference (SIU’2021), Istanbul, Turkey, 2021, pp. 26–30. https://doi.org/10.1109/SIU53274.2021.9477824.

V. Chang, J. Bailey, Q. A. Xu, and Z. Sun, “Pima Indians diabetes mellitus classification based on machine learning (ML) algorithms,” Neural Comput & Applic., vol. 35, pp. 16157–16173, 2023. https://doi.org/10.1007/s00521-022-07049-z.

B. Farajollahi, M. Mehmannavaz, H. Mehrjoo, F. Moghbeli, M. J. Sayadi, “Diabetes diagnosis using machine learning,” Front Health Inform, 2021. https://doi.org/10.30699/fhi.v10i1.267.

N. Sneha and T. Gangil, “Analysis of diabetes mellitus for early prediction using optimal features selection,” Journal of Big Data, vol. 6, article 13, pp. 1–19, 2019. https://doi.org/10.1186/s40537-019-0175-6.

P. B. M. Kumar, R. S. Perumal, R. K. Nadesh, and K. Arivuselvan, “Type 2: Diabetes Mellitus prediction using deep neural networks classifier,” International Journal of Cognitive Computing in Engineering, vol. 1, pp. 55–61, 2020, https://doi.org/10.1016/j.ijcce.2020.10.002.

U. Ahmed et al., “Prediction of diabetes empowered with fused machine learning,” IEEE Access, vol. 10, pp. 8529-8538, 2022. https://doi.org/10.1109/ACCESS.2022.3142097.

A. G. Karegowda, V. Punya, M. A. Jayaram, A. S. Manjunath, “Rule based classification for diabetic patients using cascaded k-means and decision tree C4.5,” International Journal of Computer Applications, vol. 45, issue 12, pp. 45-50, 2012.

E. M. Hameed, & H. Joshi, “Current diabetes classification and prediction models using intelligent techniques,” Proceedings of the VI. International Scientific Congress of Pure, Applied and Technological Sciences, MINAR CONGRESS 6, 2022, pp. 20-50. https://doi.org/10.47832/MinarCongress6-2.

G. Forman, “An extensive empirical study of feature selection metrics for text classification,” Journal of Machine Learning Research, vol. 3, pp. 1289–1305, 2003.

J. Kittler, “Feature set search algorithms,” In: C.H. Chert, Ed., Pattern Recognition and Signal Processing, Sijthoff and Noordhoff, Mphen aan den Rijn, Netherlands, 1978, pp. 41–60. https://doi.org/10.1007/978-94-009-9941-1_3.

J. Gong, & H. Kim, “RHSBoost: Improving classification performance in imbalance data,” Computational Statistics & Data Analysis, vol. 111, pp. 1-13, 2017. https://doi.org/10.1016/j.csda.2017.01.005.

G. E. Batista, R. C. Prati, & M. C. Monard, “A study of the behavior of several methods for balancing machine learning training data,” ACM SIGKDD Explorations Newsletter, vol. 6, issue 1, pp. 20-29, 2004. https://doi.org/10.1145/1007730.1007735.

N. V. Chawla, K. W. Bowyer, L. O. Hall, & W. P. Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321-357, 2002. https://doi.org/10.1613/jair.953.

Z. Zheng, Y. Cai, & Y. Li, “Oversampling method for imbalanced classification,” Computing and Informatics, vol. 34, issue 5, pp. 1017–1037, 2016. [Online]. Available at: https://www.cai.sk/ojs/index.php/cai/article/view/1277.

G. Dimitoglou, J. A. Adams, & C. M. Jim, “Comparison of the C4.5 and a Naive Bayes classifier for the prediction of lung cancer survivability index terms-data mining, mining methods and algorithms, text mining,” Journal of Computing, vol. 4, issue 8, pp. 1-9, 2012. https://doi.org/10.48550/arXiv.1206.1121.

M. A. Uddin, M. M. Islam, M. A. Talukder, M. A. A. Hossain, A. Akhter, S. Aryal, & M. Muntaha, “Machine learning based diabetes detection model for false negative reduction,” Biomedical Materials & Devices, pp. 1-17, 2023. ‏https://doi.org/10.1007/s44174-023-00104-w.

S. Kost, O. Rheinbach, and H. Schaeben, “Using logistic regression model selection towards interpretable machine learning in mineral prospectivity modeling,” Geochemistry, no. September, p. 125826, 2021, https://doi.org/10.1016/j.chemer.2021.125826.

H. A. Park, “An introduction to logistic regression: From basic concepts to interpretation with particular attention to nursing domain,” J. Korean Acad. Nurs., vol. 43, no. 2, pp. 154–164, 2013, https://doi.org/10.4040/jkan.2013.43.2.154.

E. Y. Boateng and D. A. Abaye, “A review of the logistic regression model with emphasis on medical research,” J. Data Anal. Inf. Process., vol. 7, no. 4, pp. 190–207, 2019, https://doi.org/10.4236/jdaip.2019.74012.

J. O. Awoyemi, A. O. Adetunmbi, and S. A. Oluwadare, “Credit card fraud detection using machine learning techniques: A comparative analysis,” Proceedings of the IEEE Int. Conf. Comput. Netw. Informatics, ICCNI’2017, vol. 2017 – January, 2017, pp. 1–9, https://doi.org/10.1109/ICCNI.2017.8123782.

Y. Kumar, G. Sahoo, “Analysis of Bayes, neural network and tree classifier of classification technique in data mining using WEKA,” Proceedings of the Second International Conference on Computer Science & Information Technology (CS & IT), 2012, pp. 359-369. https://doi.org/10.5121/csit.2012.2236.

D. Morariu, R. Crețulescu, M. Breazu, “The weka multilayer perceptron classifier,” International Journal of Advanced Statistics and IT&C for Economics and Life Sciences, vol. 7, issue 1, 2017.

E. M. Hameed, H. Joshi, and Q. K. Kadhim, “Advancements in artificial intelligence techniques for diabetes prediction: A comprehensive literature review,” Journal of Robotics and Control (JRC), vol. 6, issue 1, pp. 345-365, 2025. https://doi.org/10.18196/jrc.v6i1.22258.

E. M. Hameed, and H. Joshi, “Performance comparison of machine learning techniques in prediction of diabetes risk,” AIP Conference Proceedings, vol. 3051, no. 1, AIP Publishing, 2024. https://doi.org/10.1063/5.0191611.

International Journal of Computing

Classification of Diabetes using Multinomial Naive Bayes, Logistic Regression, and Multi-Layer Perceptron Algorithms

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Information