Enhancing Support Vector Machine Performance for Heart Attack Prediction using RobustScaler-Based Outlier Handling

M Munawir Lasiyono(1*), Nurhayati Nurhayati(2), Teotino Gomes Soares(3), Mulyadi Mulyadi(4),

(1) Politeknik Mitra Karya Mandiri, Brebes
(2) Universitas Muhammadiyah Tangerang, Tangerang
(3) Dili Institute of Technology, Dili
(4) Universitas Nurdin Hamzah, Jambi
(*) Corresponding Author

Abstract


Cardiovascular disease remains the leading cause of death worldwide, with most cases attributed to heart attacks and strokes. Early detection is crucial, yet conventional diagnostic methods are often constrained by time, cost, and uneven distribution of clinical expertise. Consequently, machine learning-based approaches offer a promising alternative for efficiently supporting heart attack prediction. This study employs the Support Vector Machine (SVM) algorithm, focusing on enhancing its performance through RobustScaler as a preprocessing technique to address outliers common in medical datasets. The objective of this study is to evaluate the impact of RobustScaler on SVM performance in heart attack classification. The model was developed using a dataset of 303 patient records, consisting of eight numerical features and one binary target label. Experiments were conducted under two preprocessing scenarios: without scaling (baseline) and with RobustScaler. Model performance was assessed using accuracy, precision, recall, F1-score, and ROC-AUC. The results show that applying RobustScaler significantly improves model performance, with accuracy increasing from 64.77% to 85.23%, representing a 20.46% improvement, and ROC-AUC rising from 73.65% to 93.36%, indicating a 26.78% increase in discriminatory ability. Additionally, recall for the negative class improved dramatically from 26.47% to 99.02%, reflecting better sensitivity in identifying non-heart attack cases. These findings demonstrate that proper preprocessing, particularly using RobustScaler, plays a vital role in optimizing SVM performance, especially when handling clinical data with extreme values

Full Text:

PDF

References


T. A. Gaziano, Cardiovascular Diseases Worldwide. Boca Raton: CRC Press, 2022. doi: 10.1201/b23266-2.

J. Lin, Y. Chen, N. Jiang, Z. Li, and S. Xu, “Burden of Peripheral Artery Disease and Its Attributable Risk Factors in 204 Countries and Territories From 1990 to 2019,” Front. Cardiovasc. Med., vol. 9, pp. 420–431, 2022, doi: 10.3389/fcvm.2022.868370.

M. Bansal, A. Goyal, and A. Choudhary, “A comparative analysis of K-Nearest Neighbor, Genetic, Support Vector Machine, Decision Tree, and Long Short Term Memory algorithms in machine learning,” Decis. Anal. J., vol. 3, pp. 1–21, 2022, doi: 10.1016/j.dajour.2022.100071.

M. Awais, L. Chiari, E. A. F. Ihlen, J. L. Helbostad, and L. Palmerini, “Classical machine learning versus deep learning for the older adults free-living activity classification,” Sensors, vol. 21, no. 14, pp. 1–13, 2021, doi: 10.3390/s21144669.

F. Bozkurt, “A Comparative Study on Classifying Human Activities Using Classical Machine and Deep Learning Methods,” Arab. J. Sci. Eng., vol. 47, no. 2, pp. 1507–1521, 2022, doi: 10.1007/s13369-021-06008-5.

I. Bah and X. Yu, “KNN Algorithm Used for Heart Attack Detection,” FES J. Eng. Sci., vol. 11, no. 1, pp. 7–19, 2021, doi: 10.52981/fjes.v11i1.758.

O. P. Barus, K. Lauwren, J. J. Pangaribuan, and Romindo, “Implementation of the Naive Bayes Algorithm to Predict the Safety of Heart Failure Patients,” IAIC Int. Conf. Ser., vol. 4, no. 1, pp. 172–177, 2023, doi: 10.34306/conferenceseries.v4i1.651.

V. Febriani, D. Lestari, S. Mardiyati, and O. Lilyasari, “Fuzzy Logistic Regression Application on Predictions Coronary Heart Disease,” BAREKENG J. Ilmu Mat. dan Terap., vol. 17, no. 1, pp. 0571–0580, 2023, doi: 10.30598/barekengvol17iss1pp0571-0580.

H. Azis, “Assessing the Performance of Logistic Regression in Heart Disease Detection through 5-Fold Cross-Validation,” Int. J. Artif. Intell. Med. Issues, vol. 2, no. 1, pp. 1–11, 2024, doi: 10.56705/ijaimi.v2i1.137.

F. Muhammad, R. Akhdan, A. Ismail, I. A. Mashudi, and A. L. Maukar, “Comparative Analysis of Decision Tree and Artificial Neural Network Methods for Predicting Potential Heart Disease,” Inf. J. Ilm. Bid. Teknol. Inf. dan Komun., vol. 10, no. 1, pp. 29–33, 2025.

N. Jasmin, R. K. Dinata, and I. Sahputra, “Implementation of Data Mining for Vertigo Disease Classification Using the Support Vector Machine (SVM) Method,” J. Adv. Comput. Knowl. Algorithms, vol. 1, no. 4, pp. 103–108, 2024.

R. Hoque, M. Billah, A. Debnath, S. M. S. Hossain, and N. Bin Sharif, “Heart Disease Prediction using SVM,” Int. J. Sci. Res. Arch., vol. 11, no. 2, pp. 412–420, 2024, doi: 10.30574/ijsra.2024.11.2.0435.

R. Guido, S. Ferrisi, D. Lofaro, and D. Conforti, “An Overview on the Advancements of Support Vector Machine Models in Healthcare Applications: A Review,” Inf., vol. 15, no. 4, 2024, doi: 10.3390/info15040235.

Z. L. Thakker and S. H. Buch, “Effect of Feature Scaling Pre-processing Techniques on Machine Learning Algorithms to Predict Particulate Matter Concentration for Gandhinagar, Gujarat, India,” Int. J. Sci. Res. Sci. Technol., vol. 11, no. 1, pp. 410–419, 2024, doi: 10.32628/ijsrst52411150.

A. Khoirunnisa and N. G. Ramadhan, “Improving malaria prediction with ensemble learning and robust scaler: An integrated approach for enhanced accuracy,” J. Infotel, vol. 15, no. 4, pp. 326–334, 2023, doi: 10.20895/infotel.v15i4.1056.

R. I. Borman, F. Rossi, D. Alamsyah, R. Nuraini, and Y. Jusman, “Classification of Medicinal Wild Plants Using Radial Basis Function Neural Network with Least Mean Square,” in International Conference on Electronic and Electrical Engineering and Intelligent System (ICE3IS), IEEE, 2022.

S. S. Brar, “Heart Attack Dataset,” Kaggle. Accessed: Mar. 15, 2025. [Online]. Available: https://www.kaggle.com/datasets/sukhmandeepsinghbrar/heart-attack-dataset

M. M. Ahsan, M. A. P. Mahmud, P. K. Saha, K. D. Gupta, and Z. Siddique, “Effect of Data Scaling Methods on Machine Learning Algorithms and Model Performance,” Technologies, vol. 9, no. 52, pp. 1–17, 2021, doi: 10.3390/technologies9030052.

I. O. Muraina, “Ideal Dataset Splitting Ratios in Machine Learning Algorithms: General Concerns for Data Scientists and Data Analysts,” in International Mardin Artuklu Scientific Researches Conference, 2022, pp. 496–505.

M. P. Sharma, U. Meena, and G. K. Sharma, “Intelligent Data Analysis using Optimized Support Vector Machine Based Data Mining Approach for Tourism Industry,” ACM Trans. Knowl. Discov. Data, vol. 16, no. 5, 2022, doi: 10.1145/3494566.

Parjito, I. Ahmad, R. I. Borman, A. D. Alexander, and Y. Jusman, “Combining Extreme Learning Machine and Linear Discriminant Analysis for Optimized Apple Leaf Disease Classification,” in International Conference on Electronic and Electrical Engineering and Intelligent System (ICE3IS), IEEE, 2024, pp. 138–143. doi: 10.1109/ICE3IS62977.2024.10775844.

Y. Liu, Y. Li, and D. Xie, “Implications of imbalanced datasets for empirical ROC-AUC estimation in binary classification tasks,” J. Stat. Comput. Simul., vol. 94, no. 1, pp. 183–203, Jan. 2024, doi: 10.1080/00949655.2023.2238235.

I. Izonin, B. Ilchyshyn, R. Tkachenko, M. Gregus, N. Shakhovska, and C. Strauss, “Towards Data Normalization Task for the Efficient Mining of Medical Data,” in International Conference on Advanced Computer Information Technologies (ACIT), 2022, pp. 480–484. doi: 10.1109/ACIT54803.2022.9913112.

A. Ramsauer, P. M. Baumann, and C. Lex, “The Influence of Data Preparation on Outlier Detection in Driveability Data,” SN Comput. Sci., vol. 2, no. 3, pp. 1–16, 2021, doi: 10.1007/s42979-021-00607-7.

L. K. Ramasamy, S. Kadry, Y. Nam, and M. N. Meqdad, “Performance analysis of sentiments in Twitter dataset using SVM models,” Int. J. Electr. Comput. Eng., vol. 11, no. 3, pp. 2275–2284, 2021, doi: 10.11591/ijece.v11i3.pp2275-2284.

A. Kurani, P. Doshi, A. Vakharia, and M. Shah, “A Comprehensive Comparative Study of Artificial Neural Network (ANN) and Support Vector Machines (SVM) on Stock Forecasting,” Ann. Data Sci., vol. 10, no. 1, pp. 183–208, 2023, doi: 10.1007/s40745-021-00344-x.




DOI: http://dx.doi.org/10.61944/bids.v4i1.94

Refbacks

  • There are currently no refbacks.


Copyright (c) 2025 M Munawir Lasiyono, Nurhayati, Teotino Gomes Soares, Mulyadi

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Bulletin of Informatics and Data Science
Asosiasi Peneliti Data Science Indonesia
Email: pdsi.bids@gmail.com
This work is licensed under a Creative Commons Attribution 4.0 International License.