Ozturk, Berk, Lawton, Tom, Smith, Stephen orcid.org/0000-0002-6885-2643 et al. (1 more author) (2024) Balancing Acts: Tackling Data Imbalance in Machine Learning for Predicting Myocardial Infarction in Type 2 Diabetes. In: Mantas, John, Hasman, Arie, Demiris, George, Saranto, Kaija, Marschollek, Michael, Arvanitis, Theodoros N., Ognjanovic, Ivana, Benis, Arriel, Gallos, Parisis, Zoulias, Emmanouil and Andrikopoulou, Elisavet, (eds.) Digital Health and Informatics Innovations for Sustainable Health Care Systems - Proceedings of MIE 2024. 34th Medical Informatics Europe Conference, MIE 2024, 25-29 Aug 2024 Studies in Health Technology and Informatics. IOS Press BV, GRC, pp. 626-630.
Abstract
Type 2 Diabetes (T2D) is a prevalent lifelong health condition. It is predicted that over 500 million adults will be diagnosed with T2D by 2040. T2D can develop at any age, and if it progresses, it may cause serious comorbidities. One of the most critical T2D-related comorbidities is Myocardial Infarction (MI), known as heart attack. MI is a life-threatening medical emergency, and it is important to predict it and intervene in a timely manner. The use of Machine Learning (ML) for clinical prediction is gaining pace, but the class imbalance in predictive models is a key challenge for establishing a trustworthy deployment of the technology. This may lead to bias and overfitting in the ML models, and it may cause misleading interpretations of the ML outputs. In our study, we showed how systematic use of Class Imbalance Handling (CIH) techniques may improve the performance of the ML models. We used the Connected Bradford dataset, consisting of over one million real-world health records. Three commonly used CIH techniques, Oversampling, Undersampling, and Class Weighting (CW) have been used for Naive Bayes (NB), Neural Network (NN), Random Forest (RF), Support Vector Machine (SVM), and Ensemble models. We report that CW overperforms among the other techniques with the highest Accuracy and F1 values of 0.9948 and 0.9556, respectively. Applying the most appropriate CIH techniques for the ML models using real-world healthcare data provides promising results for helping to reduce the risk of MI in patients with T2D.
Metadata
| Item Type: | Proceedings Paper |
|---|---|
| Authors/Creators: |
|
| Editors: |
|
| Copyright, Publisher and Additional Information: | © 2024 The Authors. |
| Keywords: | class imbalance,dataset,heart attack,machine learning,Type 2 diabetes |
| Dates: |
|
| Institution: | The University of York |
| Academic Units: | The University of York > Faculty of Sciences (York) > Computer Science (York) The University of York > Faculty of Sciences (York) > Physics (York) |
| Date Deposited: | 01 Apr 2026 11:00 |
| Last Modified: | 06 May 2026 05:05 |
| Published Version: | https://doi.org/10.3233/SHTI240491 |
| Status: | Published |
| Publisher: | IOS Press BV |
| Series Name: | Studies in Health Technology and Informatics |
| Identification Number: | 10.3233/SHTI240491 |
| Related URLs: | |
| Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:239708 |

CORE (COnnecting REpositories)
CORE (COnnecting REpositories)