dc.contributor.author | García-Vicente, Clara | |
dc.contributor.author | Chushig-Muzo, David | |
dc.contributor.author | Mora-Jimenez, Inmaculada | |
dc.contributor.author | Fabelo, Himar | |
dc.contributor.author | Gram, Inger Torhild | |
dc.contributor.author | Løchen, Maja-Lisa | |
dc.contributor.author | Granja, Conceição | |
dc.contributor.author | Soguero-Ruiz, Cristina | |
dc.date.accessioned | 2024-03-26T09:50:17Z | |
dc.date.available | 2024-03-26T09:50:17Z | |
dc.date.issued | 2023-01-21 | |
dc.description.abstract | Noncommunicable diseases are among the most significant health threats in our society, being cardiovascular diseases (CVD) the most prevalent. Because of the severity and prevalence of these illnesses, early detection and prevention are critical for reducing the worldwide health and economic burden. Though machine learning (ML) methods usually outperform conventional approaches in many domains, class imbalance can hinder the learning process. Oversampling techniques on the minority classes can help to overcome this issue. In particular, in this paper we apply oversampling methods to categorical data, aiming to improve the identification of risk factors associated with CVD. To conduct this study, questionnaire data (categorical) obtained by the Norwegian Centre for E-health Research associated with healthy and CVD patients are considered. The goal of this work is two-fold. Firstly, evaluating the influence of combining oversampling techniques and linear/nonlinear supervised ML methods in binary tasks. Secondly, identifying the most relevant features for predicting healthy and CVD cases. Experimental results show that oversampling and FS techniques help to improve CVD prediction. Specifically, the use of Generative Adversarial Networks and linear models usually achieve the best performance (area under the curve of 67%), outperforming other oversampling techniques. Synthetic data generation has proved to be beneficial for both identifying risk factors and creating models with reasonable generalization capability in the CVD prediction. | en_US |
dc.identifier.citation | García-Vicente, Chushig-Muzo, Mora-Jimenez, Fabelo, Gram, Løchen, Granja, Soguero-Ruiz. Clinical Synthetic Data Generation to Predict and Identify Risk Factors for Cardiovascular Diseases. Lecture Notes in Computer Science (LNCS). 2023;13814 | en_US |
dc.identifier.cristinID | FRIDAID 2226604 | |
dc.identifier.doi | 10.1007/978-3-031-23905-2_6 | |
dc.identifier.issn | 0302-9743 | |
dc.identifier.issn | 1611-3349 | |
dc.identifier.uri | https://hdl.handle.net/10037/33275 | |
dc.language.iso | eng | en_US |
dc.publisher | Springer Nature | en_US |
dc.relation.journal | Lecture Notes in Computer Science (LNCS) | |
dc.rights.accessRights | openAccess | en_US |
dc.rights.holder | Copyright 2023 The Author(s) | en_US |
dc.title | Clinical Synthetic Data Generation to Predict and Identify Risk Factors for Cardiovascular Diseases | en_US |
dc.type.version | acceptedVersion | en_US |
dc.type | Journal article | en_US |
dc.type | Tidsskriftartikkel | en_US |
dc.type | Peer reviewed | en_US |