Enhancing Obesity Detection Through SMOTE -based Classification Models: A comparative Study
DOI:
https://doi.org/10.18041/2665-427X/ijeph.1.11532Keywords:
Obesity, classification algorithms, synthetic minority over-sampling technique, smote, imbalanced data, healthcare decision-makingAbstract
Objective: To use SMOTE to enhance class balance and compare the performance of different classification methods before and after applying SMOTE.
Methods: The study used a dataset from Kaggle. Consisted of several health-related features linked to obesity prediction. Checking for class imbalance within the dataset affected initial model performance. SMOTE was applied to synthetically increase the representation of minority classes, reducing the class imbalance. It was conducted in two stages: 1. Training and testing the classification algorithms before applying SMOTE. 2. Training and testing the same models after applying SMOTE to enhance class balance. The performance of all models was evaluated based on metrics before and after the SMOTE application.
Results: Models Logistic Regression and Naive Bayes struggled with low sensitivity and specificity, and KNN (k=5) showed poor specificity. Significant improvements were observed across all models after applying SMOTE. Logistic Regression, despite a decrease in accuracy(-8.8), sensitivity and specificity increased substantially(+56.7%), with balanced accuracy improving(+16.6%). Naive Bayes saw a modest accuracy increase(+2.3%), with sensitivity and specificity improving(+47.9%). The KNN classifier exhibited a transformative enhancement with sensitivity and specificity increasing(+96.0%) and balanced accuracy(+28.3%). Deep Learning showed a significant increase in sensitivity (+69.8%), balanced accuracy (+29.4%), and an improvement in precision and F1-score despite a slight decrease in specificity(-10.9%).
Conclusion: The results demonstrate that while there might be slight trade-offs, the overall improvements in key metrics such as sensitivity, specificity, balanced accuracy, precision, and F1-score affirm the utility of SMOTE in enhancing model performance for imbalanced datasets
Downloads
References
Peter G Kopelman. Obesity as a medical problem. Nature. 2000; 404(6778): 635-643.
Ng M, Fleming T, Robinson M, Thomson B, Graetz N, Margono C, et al. Global, regional, and national prevalence of overweight and obesity in children and adults during 1980-2013: a systematic analysis for the global burden of disease study 2013. Lancet. 2014; 384(9945): 766-781.
Omer T. The causes of obesity: an in-depth review. Adv Obes Weight Manag Control. 2020; 10(4): 90-94.
Aljanabi M, Qutqut MH, Hijjawi M. Machine learning classification techniques for heart disease prediction: a review. Internat J Engineer Technol. 2018; 7(4): 5373-5379.
Al-Hashem MA, Alqudah AM, Qananwah Q. Performance evaluation of different machine learning classification algorithms for disease diagnosis. Internat J E-Health Med Communicat. 2021; 12(6):1-28.
An Q, Rahman S, Zhou J, Kang JJ. A comprehensive review on machine learning in healthcare industry: Classification, restrictions, opportunities and challenges. Sensors. 2023; 23(9): 4178.
Safaei M, Sundararajan EA, Driss M, Boulila W, Shapi´´. A systematic literature review on obesity: Understanding the causes & consequences of obesity and reviewing various machine learning approaches used to predict obesity. Computers Biol Med. 2021; 136: 104754.
Ferdowsy F, Alam RKS, Jabiullah I, Habib T. A machine learning approach for obesity risk prediction. Current Res Behavioral Sci. 2021; 2: 100053.
Astuti TS, Sidik AD, Kuswanto H, Lawi A, Nasir S. Predicting obesity in adults using machine learning techniques: an analysis of Indonesian basic health research 2018. Frontiers Nutrition. 2021; 8: 669155.
Curbelo MCA, Fergus P, Hussain A, Al-Jumeily D, Abdulaimma B, Hind J, Radi N. Machine learning approaches for the prediction of obesity using publicly available genetic profiles. In 2017 International Joint Conference on Neural Networks (IJCNN), pages 2743-2750. IEEE, 2017.
Zheng Z, Ruggiero K. Using machine learning to predict obesity in high school students. In 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 2132-2138. IEEE, 2017.
Cheng X, Lin S-Y, Liu J, Liu S, Zhang J, Nie P, et al. Does physical activity predict obesity-a machine learning and statistical method-based analysis. Internat J Environm Res Public Health. 2021; 18(8): 3966.
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. Smote: synthetic minority over-sampling technique. J Artificial Intelligence Research. 2002; 16: 321-357.
Kosolwattana T, Liu C, Hu R, Han S, Chen H, Lin Y. A self-inspected adaptive smote algorithm (sasmote) for highly imbalanced data classification in healthcare. BioData Mining. 2023; 16(1): 15.
Fernández A, Garcia S, Herrera F, Chawla NV. Smote for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. J Artificial Intelligence Res. 2018; 61: 863-905.
Blagus R, Lara L. Smote for high-dimensional class-imbalanced data. BMC bioinformatics. 2013; 14: 1-16.
Han H, Wang W-Y, Mao B-H. Borderline-smote: a new over-sampling method in imbalanced data sets learning. International conference on intelligent computing, pages 878-887. Springer; 2005.
Sreejith S, Khanna NH, Kannan A. Clinical data classification using an enhanced smote and chaotic evolutionary feature selection. Computers Biology Medicine. 2020; 126: 103991.
Elreedy D, Atiya AF. A comprehensive analysis of synthetic minority oversampling technique (smote) for handling class imbalance. Information Sciences. 2019; 505: 32-64.
Ismail E, Gad W, Hashem M. A hybrid stacking-smote model for optimizing the prediction of autistic genes. BMC bioinformatics. 2023; 24(1): 379.
Wang L. Imbalanced credit risk prediction based on smote and multi-kernel FCM improved by particle swarm optimization. Applied Soft Computing. 2022; 114: 108153.
Yee CPC, Yang Y, Giin LB. Enhancing financial fraud detection through addressing class imbalance using hybrid smote-gan techniques. Internat J Financial Studies. 2023; 11(3): 110.
Li H, Liu H, Hu Y. Prediction of unbalanced financial risk based on gra-topsis and smote-cnn. Scientific Programming. 2022; 2022(1): 8074516.
Özdemir A, Polat K, Alhudhaif A. Classification of imbalanced hyperspectral im- ages using smote-based deep learning methods. Expert Systems Applications. 2021; 178: 114986.
Chamseddine E, Mansouri N, Soui M, Abed M. Handling class imbalance in covid-19 chest x-ray images classification: Using smote and weighted loss. Applied Soft Computing. 2022; 129: 109588.
Sami JA. Heart disease prediction system using (smote technique) balanced dataset and decision tree classifier. AIP Conference Proceedings, volume 2834. AIP Publishing; 2023.
Prasad PS, Sreedevi M. An improved prediction of kidney disease using smote. Indian J Sci Technol. 2016; 9(31): 1-7.
Sowjanya AM, Mrudula O. Effective treatment of imbalanced datasets in health care using modified smote coupled with stacked deep learning algorithms. Applied Nanoscience. 2023; 13(3): 1829-1840.
Nasteski V. An overview of the supervised machine learning methods. Horizons b. 2017; 4: 51-62.
Peterson LE. K-nearest neighbor. Scholarpedia. 2009; 4(2): 1883.
Cunningham P, Delany SJ. k-nearest neighbour classifiers-a tutorial. ACM computing surveys. 2021; 54(6): 1-25.
Rish I. An empirical study of the naive bayes classifier. IJCAI 2001 workshop on empirical methods in artificial intelligence; 2001.
Goodfellow I, Bengio Y, Courville A. Deep learning. MIT press; 2016.
Stevens E, Antiga L, Viehmann T. Deep learning with PyTorch. Manning Publications; 2020.
Kofi NI, Nyarko-Boateng O, Aning J, et al. Performance of machine learning algorithms with different k values in k-fold crossvalidation. Internat J Information Technol Computer Sci. 2021; 13(6): 61-71.
Tamilarasi P, Rani RU. Diagnosis of crime rate against women using k-fold cross validation through machine learning. 2020 fourth international conference on computing methodologies and communication (ICCMC). IEEE; 2020.
Misra P, Singh YA. Improving the classification accuracy using recursive feature elimination with cross-validation. Int J Emerg Technol. 2020; 11(3): 659-665.
Ram DR, Mukherjee I, Chakraborty C. Obesity disease risk prediction using machine learning. Internat J Data Sci Analytics. 2024. Doi: 0.1007/s41060-023-00491-9.
Ab MNL, Anuar S. Machine learning modelling for imbalanced dataset: Case study of adolescent obesity in malaysia. J Adv Res Applied Sci Engineer Technol. 2023; 36(1): 189-202.

Published
Issue
Section
License
Copyright (c) 2024 Interdisciplinary Journal of Epidemiology and Public Health

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
-
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
-
NonCommercial — You may not use the material for commercial purposes.
-
NoDerivatives — If you remix, transform, or build upon the material, you may not distribute the modified material.
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.