Diabetes Classification using Gain Ratio Feature Selection in Support Vector Machine Method

Nabila Al Rasyid(1), Iis Afrianty(2*), Elvia Budianita(3), Siska Kurnia Gusti(4),

(1) Universitas Islam Negerti Sultan Syarif Kasim Riau, Pekanbaru
(2) Universitas Islam Negerti Sultan Syarif Kasim Riau, Pekanbaru
(3) Universitas Islam Negerti Sultan Syarif Kasim Riau, Pekanbaru
(4) Universitas Islam Negerti Sultan Syarif Kasim Riau, Pekanbaru
(*) Corresponding Author

Abstract


Diabetes is a major cause of many chronic diseases such as visual impairment, stroke and kidney failure. Early detection especially in groups that have a high risk of developing diabetes needs to be done to prevent problems that have a wide impact. Indonesia is ranked seventh in the world with a prevalence of 10.7% of the total number of people with diabetes. This research aims to determine the attributes in the diabetes dataset that most affect the classification and apply the Support Vector Machine method for diabetes classification. For the determination process, Gain Ratio feature selection technique is applied. The dataset used consists of 768 data with 8 attributes. In this classification process, 3 SVM kernels (Linear, Polynomial, and RBF) are used with three possible data divisions using the ratio (70:30; 80:20; 90:10). Before applying feature selection, there were 8 attributes used and achieved the highest accuracy of 94.81% at a ratio of 80:20 using the RBF kernel with a combination of two parameters namely C = 100, Gamma = 3 and C = 100, Gamma = Scale.  Feature selection parameters in the form of thresholds used include 0.02; 0.03; and 0.05. After applying feature selection, the attribute that produces the highest accuracy uses 6 attributes. The highest accuracy after applying feature selection reached 95.45% at a threshold of 0.02 with a ratio of 80:20 using the RBF kernel with parameters C = 100 and Gamma = Scale. The results showed that there was an increase in accuracy after applying feature selection


Keywords


Data Mining; Diabetes; Feature Selection; Gain Ratio; Support Vector Machine

Full Text:

PDF

References


H. Sohibul Wafa, A. I. Hadiana, and F. Rakhmat Umbara, “Prediksi Penyakit Diabetes Menggunakan Algoritma Support Vector Machine (SVM),” vol. 4, no. 1, pp. 40–45, 2022

Erika, “Meningkatkan Pemahaman Masyarakat Pentingnya Deteksi Dini Diabetes Melitus Melalui Penyuluhan Dan Pengukuran Gula Dan Tekanan Darah,” Jurnal Pengabdian Masyarakat, vol. 1, no. 7, pp. 685–697, 2023.

R. P. Febrinasari, T. A. Sholikah, D. N. Pakha, and S. E. Putra, Buku Saku Diabetes Melitus Untuk Awam, 1st ed. Surakarta, 2020. [Online]. Available: https://www.researchgate.net/publication/346495581

A. Fanani and L. Sulaiman, “Faktor Obesitas dan Faktor Keturunan dengan Kejadian Kasus Diabetes Mellitus,” Riset Informasi Kesehatan, vol. 10, no. 1, 2021, doi: 10.30644/rik.v8i2.464.

H. Esti Ardiani, T. Astika Endah Permatasari, and Sugiatmi, “Obesitas, Pola Diet, dan Aktifitas Fisik dalam Penanganan Diabetes Melitus pada Masa Pandemi Covid-19,” Muhammadiyah Journal of Nutrition and Food Science (MJNF), vol. 2, no. 1, pp. 1–12, 2021, doi: 10.24853/mjnf.2.1.1-12.

Ardiansyah. Aswin, E. C. O. Telaumbanua, A. S. Gultom, and A. A. S. M. Limbong, “Klasifikasi Penyakit Diabetes Menggunakan Metode SVM Dan KNN,” Jurnal Penelitian Rumpun Ilmu Teknik, vol. 3, no. 1, pp. 77–83, 2024, doi: 10.55606/juprit.v3i1.3151.

S. Setyaningtyas, B. Indarmawan Nugroho, and Z. Arif, “Tinjauan Pustaka Sistematis Pada Data Mining: Studi Kasus Algoritma K-Means Clustering,” Jurnal Teknoif Teknik Informatika Institut Teknologi Padang, vol. 10, no. 2, pp. 52–61, 2022, doi: 10.21063/jtif.2022.v10.2.52-61.

P. W. Rahayu et al., BUKU AJAR DATA MINING. Jambi: PT. Sonpedia Publishing Indonesia, 2024. [Online]. Available: https://www.researchgate.net/publication/377415198

I. Made Arya Adinata Dwija Putra, I. Made Gede Sunarya, and I. Gede Aris Gunadi, “Perbandingan Algoritma Naive Bayes Berbasis Feature Selection Gain Ratio dengan Naive Bayes Kovensional dalam Prediksi Komplikasi Hipertensi,” JTIM : Jurnal Teknologi Informasi dan Multimedia, vol. 6, no. 1, pp. 37–49, 2024, doi: 10.35746/jtim.v6i1.488.

Ivandari, M. Adib Al Karomi, and M. Rifqi Maulana, “Improved C45 Performance With Gain Ratio For Credit Approval Dataset,” JAICT Journal of Applied Communication and Information Technologies, vol. 7, no. 2, pp. 135–139, 2022.

S. Z. HR, A. Aziz, and W. Harianto, “Optimasi Algoritma K-Nearest Neighbor (Knn) Dengan Normalisasi Dan Seleksi Fitur Untuk Klasifikasi Penyakit Liver,” Jurnal Mahasiswa Teknik Informatika, vol. 6, no. 2, pp. 439–445, 2022

N. Wijaya, M. Endah, and M. Feliati, “Penerapan Algoritma Decision Tree C.45 Untuk Klasifikasi Data Status Huni Rumah Rehabilitasi Pasca Erupsi Merapi Application Of C.45 Decision Tree Algorithm For Rehabilitation Household Data Classification Post Eruption Of Merapi,” Seminar Nasional UNRIYO, pp. 424–430, 2020.

M. Hilmy Haidar Aly, “Klasifikasi Diabetes Menggunakan Algoritma Support Vector Machine Radial Basis Function,” Jurnal Teknik Informatika dan Teknologi Informasi, vol. 4, no. 1, pp. 28–38, 2024, doi: 10.55606/jutiti.v4i1.3420.

A. Wildan Mucholladin, F. Abdurrachman Bachtiar, and M. Tanzil Furqon, “Klasifikasi Penyakit Diabetes menggunakan Metode Support Vector Machine,” Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 5, no. 2, pp. 622–633, 2021, [Online]. Available: http://j-ptiik.ub.ac.id

S. Sri Rahayu, I. Afrianty, E. Budianita, and F. Syafria, “Klasifikasi Tulang Tengkorak Berdasarkan Jenis Kelamin Dalam Antropologi Forensik Menggunakan Metode Support Vector Machine,” Jurnal Inovtek Polbeng, vol. 9, no. 1, pp. 243–256, 2024.

N. Wijaya, M. Endah, and M. Feliati, “Penerapan Algoritma Decision Tree C.45 Untuk Klasifikasi Data Status Huni Rumah Rehabilitasi Pasca Erupsi Merapi,” Seminar Nasional UNRIYO, pp. 424–430, 2020.

Kurniabudi, A. Harris, and A. Edward Mintaria, “Komparasi Information Gain, Gain Ratio, CFs-Bestfirst dan CFs-PSO Search Terhadap Performa Deteksi Anomali,” Jurnal Media Informatika Budidarma, vol. 5, no. 1, pp. 332–343, 2021, doi: 10.30865/mib.v5i1.2258.

A. Lutfia, Gunawan, R. Saepul Rohman, and A. Gunawan, “Penerapan Seleksi Fitur Gain Ratio Pada Prediksi Penyakit Jantung Berbasis Naïve Bayes,” Jurnal Responsif, vol. 6, no. 1, pp. 1–10, 2024

M. Yamin Amzah, Kusnadi, and L. Bayuaji, “Optimasi Algoritma Support Vector Machine Dengan Menggunakan Feature Selection Gain Ratio Untuk Analisis Sentimen,” Inovtek Polbeng, vol. 9, no. 1, pp. 326–340, 2024.

M. I. Prasetiyowati, N. Ulfa Maulidevi, and K. Surendro, “Determining Threshold Value On Information Gain Feature Selection To Increase Speed And Prediction Accuracy Of Random Forest,” J Big Data, vol. 8, pp. 1–22, 2021, doi: 10.1186/s40537-021-00472-4.

A. Wildan Attabi’, L. Muflikhah, and M. A. Fauzi, “Penerapan Analisis Sentimen untuk Menilai Suatu Produk pada Twitter Berbahasa Indonesia dengan Metode Naïve Bayes Classifier dan Information Gain,” JPTIIK, vol. 2, no. 11, pp. 4548–4554, 2018

Ivandari, M. Adib Al Karomi, and M. Rifqi Maulana, “Improved C45 performance with gain ratio for credit approval dataset,” Journal of Applied Communication and Information Technologies, vol. 7, no. 2, p. 2022, 2022.

Y. Harni, I. Afrianty, S. Sanjaya, R. Abdillah, F. Yanto, and F. Syafria, “Performance Analysis of LVQ 1 Using Feature Selection Gain Ratio for Sex Classification in Forensic Anthropology,” Building of Informatics, Technology and Science (BITS), vol. 5, no. 1, 2023, doi: 10.47065/bits.v5i1.3625.

M. Yamin Amzah, Kusnadi, and L. Bayuaji, “Optimasi Algoritma Support Vector Machine Dengan Menggunakan Feature Selection Gain Ratio Untuk Analisis Sentimen,” Jurnal INOVTEK Polbeng, vol. 9, no. 1, pp. 326–340, 2024.

A. Srirahayu and L. Setya Pribadie, “Review Paper Data Mining Klasifikasi Data Mining,” Jurnal Ilmiah Informatika Global, vol. 14, no. 1, pp. 7–12, 2023, doi: 10.36982/jiig.v13i2.2307.

L. Budhy Adzy, Asriyanik, and A. Pambudi, “Algoritma Naïve Bayes Untuk Klasifikasi Kelayakan Penerima Bantuan Iuran Jaminan Kesehatan Pemerintah Daerah Kabupaten Sukabumi,” Jurnal MNEMONIC, vol. 6, no. 1, 2023.

I. Permana and F. N. Salisah, “Pengaruh Normalisasi Data Terhadap Performa Hasil Klasifikasi Algoritma Backpropagation,” Indonesian Journal of Informatic Research and Software Engineering, vol. 2, no. 1, pp. 67–72, 2022, Accessed: Apr. 16, 2025. [Online]. Available: https://journal.irpi.or.id/index.php/ijirse

I. Afrianty, D. Nasien, and H. Haron, “Performance Analysis of Support Vector Machine in Sex Classification of The Sacrum Bone in Forensic Anthropology,” JURNAL TEKNIK INFORMATIKA, vol. 15, no. 1, pp. 63–72, 2022, doi: 10.15408/jti.v15i1.25254.




DOI: http://dx.doi.org/10.61944/bids.v4i1.114

Refbacks

  • There are currently no refbacks.


Copyright (c) 2025 Nabila Al Rasyid, Iis Afrianty, Elvia Budianita, Siska Kurnia Gusti

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Bulletin of Informatics and Data Science
Asosiasi Peneliti Data Science Indonesia
Email: pdsi.bids@gmail.com
This work is licensed under a Creative Commons Attribution 4.0 International License.