Hybrid Gradient Boosting and SMOTE-ENN for Toddler Nutritional Status Classification on Imbalanced Data
(1) Universitas Katolik Widya Mandira
(2) Universitas Tunas Pembangunan Surakarta
(3) Universitas Sahid Surakarta
(4) Universitas Mercubaktijaya
(5) Universiti Muhammadiyah Malaysia
(*) Corresponding Author
Abstract
Stunting in toddlers remains a serious global health issue with long-term impacts on children's physical and cognitive development. One of the main challenges in classifying nutritional status is class imbalance, where the number of normal cases significantly exceeds that of minority classes such as stunted and severely stunted. This study aims to develop a hybrid approach by integrating the Gradient Boosting algorithm with the SMOTE-ENN (Synthetic Minority Oversampling Technique–Edited Nearest Neighbors) method to improve classification performance on imbalanced data. The dataset used was obtained from the Kaggle platform, consisting of 121,000 entries with four nutritional status categories. Data preprocessing included label encoding, numerical feature standardization, and stratified data splitting with an 80:20 ratio. The model was evaluated using accuracy, precision, recall, F1-score, and ROC-AUC metrics. The proposed hybrid model successfully increased the recall for the stunted class from 61.80% to 98.41%, and the F1-score from 71.93% to 83.58%. Overall accuracy improved from 92.39% to 93.35%, while the ROC-AUC score increased from 99.08% to 99.63%. These findings demonstrate that integrating Gradient Boosting with SMOTE-ENN is effective in enhancing sensitivity to minority classes and improving overall multi-class classification performance.
Keywords
Full Text:
PDFReferences
M. A. Ahmed, F. S. Duale, M. A. Ali, and A. M. Ibrahim, “Prevalence of Stunting and Associated Factors Among Under Five Years Children in Galkaio Town, Puntland, Somalia 2023: A cross-sectional study design,” J. Drug Deliv. Ther., vol. 14, no. 7, pp. 83–96, 2024, doi: 10.22270/jddt.v14i7.6699.
B. Soetono and A. S. Barokah, “Trends in Stunting Prevalence Reduction: an Examination of Data Toward Achieving the 2024 Target in Indonesia,” Soc. Perspect. J., vol. 3, no. 1, pp. 51–68, 2024, doi: 10.53947/tspj.v3i1.795.
E. Miranda, M. Aryuni, A. Y. Zakiyyah, Y. E. Kurniawati, A. V. D. Sano, and M. Kumbangsila, “An early prediction model for toddler nutrition based on machine learning from imbalanced data,” Procedia Comput. Sci., vol. 245, pp. 263–271, 2024, doi: 10.1016/j.procs.2024.10.251.
S. Syahrial, R. Ilham, Z. F. Asikin, and S. S. I. Nurdin, “Stunting Classification in Children’s Measurement Data Using Machine Learning Models,” J. La Multiapp, vol. 3, no. 2, pp. 52–60, 2022, doi: 10.37899/journallamultiapp.v3i2.614.
M. Yunus, M. K. Biddinika, and A. Fadlil, “Classification of Stunting in Children Using the C4.5 Algorithm,” JOIN (Jurnal Online Inform., vol. 8, no. 1, pp. 99–106, 2023, doi: 10.15575/join.v8i1.1062.
D. Alita, I. Ahmad, R. J. Rumandan, M. Erkamim, and W. Widyasmoro, “Stunting Classification in Toddlers: Implementation and Evaluation of the Decision Tree Algorithm,” in International Conference on Information Technology and Computing (ICITCOM), IEEE, 2024, pp. 207–212. doi: 10.1109/ICITCOM62788.2024.10762254.
N. W. Azani and M. Afdal, “Implementation of Naïve Bayes Classifier and Support Vector Machine for Stunting Classification,” Indones. J. Comput. Sci., vol. 13, no. 4, pp. 5079–5087, 2024, doi: 10.33022/ijcs.v13i4.4040.
N. Ma’muriyah, E. Noersasongko, P. Purwanto, S. Winarno, and M. I. Ashiddiq, “XG Boost Based Data Imputation and Outlier Detection Methods for Classification of Stunting,” in International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), IEEE, 2024, pp. 812–817. doi: 10.1109/ISRITI64779.2024.10963432.
Abdullah-All-Tanvir, I. Ali Khandokar, A. K. M. Muzahidul Islam, S. Islam, and S. Shatabda, “A gradient boosting classifier for purchase intention prediction of online shoppers,” Heliyon, vol. 9, no. 4, p. e15163, 2023, doi: 10.1016/j.heliyon.2023.e15163.
M. H. L. Louk and B. A. Tama, “Revisiting Gradient Boosting-Based Approaches for Learning Imbalanced Data: A Case of Anomaly Detection on Power Grids,” Big Data Cogn. Comput., vol. 6, no. 2, 2022, doi: 10.3390/bdcc6020041.
X. Wang et al., “Diabetes mellitus early warning and factor analysis using ensemble Bayesian networks with SMOTE-ENN and Boruta,” Sci. Rep., vol. 13, no. 1, pp. 1–15, 2023, doi: 10.1038/s41598-023-40036-5.
Parjito, I. Ahmad, R. I. Borman, A. D. Alexander, and Y. Jusman, “Combining Extreme Learning Machine and Linear Discriminant Analysis for Optimized Apple Leaf Disease Classification,” in International Conference on Electronic and Electrical Engineering and Intelligent System (ICE3IS), IEEE, 2024, pp. 138–143. doi: 10.1109/ICE3IS62977.2024.10775844.
J. Dasilva, “Diabetes Dataset,” Kaggle. [Online]. Available: https://www.kaggle.com/datasets/johndasilva/diabetes
R. I. Borman, F. Rossi, Y. Jusman, A. A. A. Rahni, S. D. Putra, and A. Herdiansah, “Identification of Herbal Leaf Types Based on Their Image Using First Order Feature Extraction and Multiclass SVM Algorithm,” in International Conference on Electronic and Electrical Engineering and Intelligent System (ICE3IS), IEEE, 2021, pp. 12–17.
H. Bichri, A. Chergui, and M. Hain, “Investigating the Impact of Train / Test Split Ratio on the Performance of Pre-Trained Models with Custom Datasets,” Int. J. Adv. Comput. Sci. Appl., vol. 15, no. 2, pp. 331–339, 2024, doi: 10.14569/IJACSA.2024.0150235.
M. Muntasir Nishat et al., “A Comprehensive Investigation of the Performances of Different Machine Learning Classifiers with SMOTE-ENN Oversampling Technique and Hyperparameter Optimization for Imbalanced Heart Failure Dataset,” Sci. Program., vol. 2022, pp. 1–17, 2022, doi: 10.1155/2022/3649406.
J. Wang, “Prediction of postoperative recovery in patients with acoustic neuroma using machine learning and SMOTE-ENN techniques,” Math. Biosci. Eng., vol. 19, no. 10, pp. 10407–10423, 2022, doi: 10.3934/mbe.2022487.
P. Nie, M. Roccotelli, M. P. Fanti, Z. Ming, and Z. Li, “Prediction of home energy consumption based on gradient boosting regression tree,” Energy Reports, vol. 7, pp. 1246–1255, 2021, doi: 10.1016/j.egyr.2021.02.006.
F. H. Yagin, I. B. Cicek, and Z. Kucukakcali, “Classification of stroke with gradient boosting tree using smote-based oversampling method,” Med. Sci. Int. Med. J., vol. 10, no. 4, p. 1510, 2021, doi: 10.5455/medscience.2021.09.322.
D. A. Setyarini, A. A. M. D. Gayatri, C. S. K. Aditya, and D. R. Chandranegara, “Stroke Prediction with Enhanced Gradient Boosting Classifier and Strategic Hyperparameter,” MATRIK J. Manajemen, Tek. Inform. dan Rekayasa Komput., vol. 23, no. 2, pp. 477–490, 2024, doi: 10.30812/matrik.v23i2.3555.
R. Rusliyawati, K. Karnadi, A. M. Tanniewa, A. C. Widyawati, Y. Jusman, and R. I. Borman, “Detection of Pepper Leaf Diseases Through Image Analysis Using Radial Basis Function Neural Networks,” in BIO Web of Conferences, 2024, pp. 1–10. doi: 10.1051/bioconf/202414401005.
Y. Liu, Y. Li, and D. Xie, “Implications of imbalanced datasets for empirical ROC-AUC estimation in binary classification tasks,” J. Stat. Comput. Simul., vol. 94, no. 1, pp. 183–203, Jan. 2024, doi: 10.1080/00949655.2023.2238235.
M. Lamari et al., “SMOTE–ENN-Based Data Sampling and Improved Dynamic Ensemble Selection for Imbalanced Medical Data Classification BT - Advances on Smart and Soft Computing,” in Advances on Smart and Soft Computing, F. Saeed, T. Al-Hadhrami, F. Mohammed, and E. Mohammed, Eds., Singapore: Springer Singapore, 2021, pp. 37–49.
S. Dhanalakshmi, S. Das, and R. Senthil, “Speech features-based Parkinson’s disease classification using combined SMOTE-ENN and binary machine learning,” Health Technol. (Berl)., vol. 14, no. 2, pp. 393–406, 2024, doi: 10.1007/s12553-023-00810-x.
M. S. Hosen and R. Amin, “Significant of Gradient Boosting Algorithm in Data Management System,” Eng. Int., vol. 9, no. 2, pp. 85–100, 2021, doi: 10.18034/ei.v9i2.559.
G. W. Cha, H. J. Moon, and Y. C. Kim, “Comparison of random forest and gradient boosting machine models for predicting demolition waste based on small datasets and categorical variables,” Int. J. Environ. Res. Public Health, vol. 18, no. 16, 2021, doi: 10.3390/ijerph18168530.
DOI: http://dx.doi.org/10.61944/bids.v3i2.93
Refbacks
- There are currently no refbacks.
Copyright (c) 2024 Alfry Aristo Jansen Sinlae, Moh. Erkamim, Farid Fitriyadi, Lilik Suhery, Rachmat Destriana

This work is licensed under a Creative Commons Attribution 4.0 International License.
Bulletin of Informatics and Data Science
Asosiasi Peneliti Data Science Indonesia
Email: pdsi.bids@gmail.com
This work is licensed under a Creative Commons Attribution 4.0 International License.
