PREDICTIVE MODELING OF STROKE OCCURRENCE USING PYTHON FOR IMPROVED RISK ASSESSMENT
Abstract
This paper examines the use of Machine Learning (ML) techniques, particularly Logistic Regression and Random Forests, to predict the occurrence of strokes. It integrates demographic, clinical, and lifestyle factors. The study uses Python as the primary tool for model development and analysis, focusing on binary classification to categorize individuals as either having had a stroke or not. The dataset includes attributes such as age, gender, hypertension, smoking status, and more, which are used to train and evaluate the models. Through extensive experimentation and evaluation, the paper demonstrates the effectiveness of Logistic Regression and Random Forests in stroke prediction. Logistic Regression provides a straightforward baseline, while Random Forests offer higher predictive accuracy. The findings highlight the importance of ML-based approaches in healthcare risk assessment and showcase Python's versatility in facilitating such analyses.
References
Bonkhoff, A. K., & Grefkes, C. (2022). Precision medicine in stroke: towards personalized outcome predictions using artificial intelligence. Brain, 145(2), 457-475.
Couronné, R., Probst, P., & Boulesteix, A. L. (2018). Random forest versus logistic regression: a large-scale benchmark experiment. BMC bioinformatics, 19, 1-14.
Fernandez-Lozano, C., Hervella, P., Mato-Abad, V., Rodríguez-Yáñez, M., Suárez-Garaboa, S., López-Dequidt, I., Estany-Gestal, A., Sobrino, T., Campos, F., Castillo, J., Rodríguez-Yáñez, S., & Iglesias-Rey, R. (2021). Random forest-based prediction of stroke outcome. Scientific reports, 11(1), 10071. https://doi.org/10.1038/s41598-021-89434-7
Hajipour, F., Jozani, M. J., & Moussavi, Z. (2020). A comparison of regularized Logistic Regression and Random Forest Machine Learning models for daytime diagnosis of obstructive sleep apnea. Medical & Biological Engineering & Computing, 58(10), 2517–2529. doi:10.1007/s11517-020-02206-9
https://www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset (15.03.2024.)
Jing, Y. (2022). Machine Learning Performance Analysis to Predict Stroke Based on Imbalanced Medical Dataset. In CAIBDA 2022; 2nd International Conference on Artificial Intelligence, Big Data and Algorithms (pp. 1-7). Nanjing, China.
Kokkotis, C., Moustakidis, S., Giarmatzis, G., Giannakou, E., Makri, E., Sakellari, P., ... & Aggelousis, N. (2022). Machine Learning Techniques for the Prediction of Functional Outcomes in the Rehabilitation of Post-Stroke Patients: A Scoping Review. BioMed, 3(1), 1-20. https://doi.org/10.3390/biomed3010001
Maier, O., & Handels, H. (2016). Predicting Stroke Lesion and Clinical Outcome with Random Forests. In Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries (pp. 219–230). Springer International Publishing. https://doi.org/10.1007/978-3-319-55524-9_21
Mainali, S., Darsie, M. E., & Smetana, K. S. (2021). Machine learning in action: stroke diagnosis and outcome prediction. Frontiers in neurology, 12, 734345. https://doi.org/10.3389/fneur.2021.734345
Merdas, H. M. (2024). Elastic Net–MLP–SMOTE (EMS)-Based Model for Enhancing Stroke Prediction. Medinformatics, 1(2), 73-78.
Poorani, K., Karuppasamy, M., Jansi Rani, M., & Prabha, M. (2023). Classifier Comparison for Stroke Prediction Ensembling SMOTE+ENN using Machine Learning Approach. Research Square. https://doi.org/10.21203/rs.3.rs-1675863/v1
Su, P. Y., Wei, Y. C., Luo, H., Liu, C. H., Huang, W. Y., Chen, K. F., ... & Lee, T. H. (2022). Machine learning models for predicting influential factors of early outcomes in acute ischemic stroke: registry-based study. JMIR Medical Informatics, 10(3), e32508. https://doi.org/10.2196/32508
Wang, W., Kiik, M., Peek, N., Curcin, V., Marshall, I. J., Rudd, A. G., ... & Bray, B. (2020). A systematic review of machine learning models for predicting outcomes of stroke with structured data. PloS one, 15(6), e0234722. https://doi.org/10.1371/journal.pone.0234722
Wu, Y., & Fang, Y. (2020). Stroke prediction with machine learning methods among older Chinese. International journal of environmental research and public health, 17(6), 1828. https://doi.org/10.3390/ijerph17061828
Zu, W., Huang, X., Xu, T., Du, L., Wang, Y., Wang, L., & Nie, W. (2023). Machine learning in predicting outcomes for stroke patients following rehabilitation treatment: A systematic review. Plos one, 18(6), e0287308. https://doi.org/10.1371/journal.pone.0287308