ICENTE'22, Konya, Türkiye, 17 - 19 Kasım 2022, ss.55
According to data from the World Health Organization, around 17.9 million people die annually from cardiovascular diseases. This is equivalent to approximately 32% of all global deaths. Over seventy-five percent of these deaths occur in low- and middle-income nations. Determining the features that have the greatest impact on the death or survival of heart patients and developing models that accurately predict patient survival is an important issue of the present day. . In recent years, machine learning has been used to predict patients' survival during follow-up by combining their medical records with other features such as gender, age and weight. However, the enormous quantity of features makes it challenging for physicians to diagnose diseases and severely impacts the prediction performance of machine learning in terms of cost and time. In this regard, it is essential to keep an optimal number of features and select the most effective ones. In the proposed study, a dataset was used on the survival of heart patients from the data repository at the University of California Irvine. This dataset includes a total of 13 different patient features, which were collected from 299 different individuals. The recursive feature elimination method was used for feature selection in order to identify the parameters that have the most impact on patient survival. The yeo-johnson power transformation was applied from the normalizing approaches to make the feature sets that do not have a normal distribution from the selected features closer to the normal distribution. Finally, Support Vector Machines, Naive Bayes, Random Forest, Decision Tree, Logistic Regression, XGBoost, CatBoost, and the K-Nearest Neighbor machine learning algorithms were used to predict the survival of patients with heart disease. As a result of the study, the number of features used to predict patient survival was reduced to six, and a confusion matrix was produced to assess and compare the results of machine learning models in terms of accuracy, recall, and precision. According to the obtained results, the algorithm XGBoost best predicts the survival of patients with a 90% level of accuracy.