Enhancing Obesity Detection Through SMOTE -based Classification  Models: A comparative Study

John Kamwele Mutinda; Amos Langat; Regis Konan  Marcel Djaha; Jackson Ndoto Munyao; Lee  Whitaker; Millicent  Auma Omondi

doi:10.18041/2665-427X/ijeph.1.11532

Autores/as

John Kamwele Mutinda University of Science and Technology of China, Langfang, Hebei, China, People’s Republic of China https://orcid.org/0009-0004-7919-2064
Amos Langat Department of Mathematics, Technology and Innovation-JKUAT, Pan African University Institute for Basic Sciences, Nairobi, Kenya https://orcid.org/0000-0002-7813-6835
Regis Konan Marcel Djaha Basque Center for Applied Mathematics, Bilbao, Basque, Spain https://orcid.org/0009-0004-5252-759X
Jackson Ndoto Munyao African Institute for Mathematical Sciences, Limbe, Cameroon https://orcid.org/0009-0004-6017-4691
Lee Whitaker African Institute for Mathematical Sciences, Limbe, Cameroon https://orcid.org/0009-0005-3408-637X
Millicent Auma Omondi South Eastern Kenya University, Kitui County, Kenya https://orcid.org/0009-0008-8422-7427

DOI:

https://doi.org/10.18041/2665-427X/ijeph.1.11532

Palabras clave:

Obesidad, Técnica de Sobremuestreo de Minorías Sintéticas, smote, ]datos desbalanceados, toma de decisiones en salud

Resumen

Objetivo: Utilizar Técnica de Sobre muestreo de Minorías Sintéticas (SMOTE) para mejorar el equilibrio de clases y comparar el rendimiento de distintos métodos de clasificación antes y después de aplicar SMOTE

Métodos: Los métodos de clasificación fueron Regresión Logística, Naive Bayes, KNN (k=5) y Aprendizaje Profundo. Cada modelo fue entrenado y probado en el conjunto de datos, antes y después de aplicar SMOTE. Se utilizaron las métricas de evaluación: Precisión, Sensibilidad, Especificidad, Precisión equilibrada, Puntuación F1.

Resultados: Modelos como la regresión logística y Naive Bayes tuvieron problemas con sensibilidad y especificidad bajas, KNN (k=5) mostró una especificidad deficiente. Con SMOTE, se observaron mejoras significativas en todos los modelos. La regresión logística, a pesar de una disminución de la precisión (-8.8), la sensibilidad y la especificidad aumentaron sustancialmente (+56.7%), y mejoró la precisión equilibrada (+16.6%). Naive Bayes experimentó un modesto aumento de la precisión (+2.3%), mejoró la sensibilidad y la especificidad (+47.9%). El clasificador KNN mostró una mejora transformadora con aumento de la sensibilidad, la especificidad (+96.0%) y precisión equilibrada (+28.3%). El aprendizaje profundo mostró aumento significativo de sensibilidad (+69.8%), exactitud equilibrada (+29.4%) y una mejora notable de la precisión y la puntuación F1 a pesar de un ligero descenso de la especificidad (-10.9%).

Conclusiones: SMOTE contribuye a realizar predicciones más exactas y fiables. Aunque puede haber ligeras desventajas, las mejoras generales en las métricas usadas confirman la utilidad de SMOTE para mejorar el rendimiento de los modelos en conjuntos de datos desequilibrados.

Descargas

Los datos de descarga aún no están disponibles.

Referencias

Peter G Kopelman. Obesity as a medical problem. Nature. 2000; 404(6778): 635-643.

Ng M, Fleming T, Robinson M, Thomson B, Graetz N, Margono C, et al. Global, regional, and national prevalence of overweight and obesity in children and adults during 1980-2013: a systematic analysis for the global burden of disease study 2013. Lancet. 2014; 384(9945): 766-781.

Omer T. The causes of obesity: an in-depth review. Adv Obes Weight Manag Control. 2020; 10(4): 90-94.

Aljanabi M, Qutqut MH, Hijjawi M. Machine learning classification techniques for heart disease prediction: a review. Internat J Engineer Technol. 2018; 7(4): 5373-5379.

Al-Hashem MA, Alqudah AM, Qananwah Q. Performance evaluation of different machine learning classification algorithms for disease diagnosis. Internat J E-Health Med Communicat. 2021; 12(6):1-28.

An Q, Rahman S, Zhou J, Kang JJ. A comprehensive review on machine learning in healthcare industry: Classification, restrictions, opportunities and challenges. Sensors. 2023; 23(9): 4178.

Safaei M, Sundararajan EA, Driss M, Boulila W, Shapi´´. A systematic literature review on obesity: Understanding the causes & consequences of obesity and reviewing various machine learning approaches used to predict obesity. Computers Biol Med. 2021; 136: 104754.

Ferdowsy F, Alam RKS, Jabiullah I, Habib T. A machine learning approach for obesity risk prediction. Current Res Behavioral Sci. 2021; 2: 100053.

Astuti TS, Sidik AD, Kuswanto H, Lawi A, Nasir S. Predicting obesity in adults using machine learning techniques: an analysis of Indonesian basic health research 2018. Frontiers Nutrition. 2021; 8: 669155.

Curbelo MCA, Fergus P, Hussain A, Al-Jumeily D, Abdulaimma B, Hind J, Radi N. Machine learning approaches for the prediction of obesity using publicly available genetic profiles. In 2017 International Joint Conference on Neural Networks (IJCNN), pages 2743-2750. IEEE, 2017.

Zheng Z, Ruggiero K. Using machine learning to predict obesity in high school students. In 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 2132-2138. IEEE, 2017.

Cheng X, Lin S-Y, Liu J, Liu S, Zhang J, Nie P, et al. Does physical activity predict obesity-a machine learning and statistical method-based analysis. Internat J Environm Res Public Health. 2021; 18(8): 3966.

Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. Smote: synthetic minority over-sampling technique. J Artificial Intelligence Research. 2002; 16: 321-357.

Kosolwattana T, Liu C, Hu R, Han S, Chen H, Lin Y. A self-inspected adaptive smote algorithm (sasmote) for highly imbalanced data classification in healthcare. BioData Mining. 2023; 16(1): 15.

Fernández A, Garcia S, Herrera F, Chawla NV. Smote for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. J Artificial Intelligence Res. 2018; 61: 863-905.

Blagus R, Lara L. Smote for high-dimensional class-imbalanced data. BMC bioinformatics. 2013; 14: 1-16.

Han H, Wang W-Y, Mao B-H. Borderline-smote: a new over-sampling method in imbalanced data sets learning. International conference on intelligent computing, pages 878-887. Springer; 2005.

Sreejith S, Khanna NH, Kannan A. Clinical data classification using an enhanced smote and chaotic evolutionary feature selection. Computers Biology Medicine. 2020; 126: 103991.

Elreedy D, Atiya AF. A comprehensive analysis of synthetic minority oversampling technique (smote) for handling class imbalance. Information Sciences. 2019; 505: 32-64.

Ismail E, Gad W, Hashem M. A hybrid stacking-smote model for optimizing the prediction of autistic genes. BMC bioinformatics. 2023; 24(1): 379.

Wang L. Imbalanced credit risk prediction based on smote and multi-kernel FCM improved by particle swarm optimization. Applied Soft Computing. 2022; 114: 108153.

Yee CPC, Yang Y, Giin LB. Enhancing financial fraud detection through addressing class imbalance using hybrid smote-gan techniques. Internat J Financial Studies. 2023; 11(3): 110.

Li H, Liu H, Hu Y. Prediction of unbalanced financial risk based on gra-topsis and smote-cnn. Scientific Programming. 2022; 2022(1): 8074516.

Özdemir A, Polat K, Alhudhaif A. Classification of imbalanced hyperspectral im- ages using smote-based deep learning methods. Expert Systems Applications. 2021; 178: 114986.

Chamseddine E, Mansouri N, Soui M, Abed M. Handling class imbalance in covid-19 chest x-ray images classification: Using smote and weighted loss. Applied Soft Computing. 2022; 129: 109588.

Sami JA. Heart disease prediction system using (smote technique) balanced dataset and decision tree classifier. AIP Conference Proceedings, volume 2834. AIP Publishing; 2023.

Prasad PS, Sreedevi M. An improved prediction of kidney disease using smote. Indian J Sci Technol. 2016; 9(31): 1-7.

Sowjanya AM, Mrudula O. Effective treatment of imbalanced datasets in health care using modified smote coupled with stacked deep learning algorithms. Applied Nanoscience. 2023; 13(3): 1829-1840.

Nasteski V. An overview of the supervised machine learning methods. Horizons b. 2017; 4: 51-62.

Peterson LE. K-nearest neighbor. Scholarpedia. 2009; 4(2): 1883.

Cunningham P, Delany SJ. k-nearest neighbour classifiers-a tutorial. ACM computing surveys. 2021; 54(6): 1-25.

Rish I. An empirical study of the naive bayes classifier. IJCAI 2001 workshop on empirical methods in artificial intelligence; 2001.

Goodfellow I, Bengio Y, Courville A. Deep learning. MIT press; 2016.

Stevens E, Antiga L, Viehmann T. Deep learning with PyTorch. Manning Publications; 2020.

Kofi NI, Nyarko-Boateng O, Aning J, et al. Performance of machine learning algorithms with different k values in k-fold crossvalidation. Internat J Information Technol Computer Sci. 2021; 13(6): 61-71.

Tamilarasi P, Rani RU. Diagnosis of crime rate against women using k-fold cross validation through machine learning. 2020 fourth international conference on computing methodologies and communication (ICCMC). IEEE; 2020.

Misra P, Singh YA. Improving the classification accuracy using recursive feature elimination with cross-validation. Int J Emerg Technol. 2020; 11(3): 659-665.

Ram DR, Mukherjee I, Chakraborty C. Obesity disease risk prediction using machine learning. Internat J Data Sci Analytics. 2024. Doi: 0.1007/s41060-023-00491-9.

Ab MNL, Anuar S. Machine learning modelling for imbalanced dataset: Case study of adolescent obesity in malaysia. J Adv Res Applied Sci Engineer Technol. 2023; 36(1): 189-202.

Enhancing Obesity Detection Through SMOTE -based Classification Models: A comparative Study

Autores/as

DOI:

Palabras clave:

Resumen

Descargas

Referencias

Descargas

Publicado

Número

Sección

Licencia

Cómo citar

Artículos similares

Enviar un artículo

Desarrollado por

Idioma

Información

Navegar