Show simple item record

dc.contributor.authorGarcía-Vicente, Clara
dc.contributor.authorChushig-Muzo, David
dc.contributor.authorMora-Jimenez, Inmaculada
dc.contributor.authorFabelo, Himar
dc.contributor.authorGram, Inger Torhild
dc.contributor.authorLøchen, Maja-Lisa
dc.contributor.authorGranja, Conceição
dc.contributor.authorSoguero-Ruiz, Cristina
dc.date.accessioned2024-03-26T09:50:17Z
dc.date.available2024-03-26T09:50:17Z
dc.date.issued2023-01-21
dc.description.abstractNoncommunicable diseases are among the most significant health threats in our society, being cardiovascular diseases (CVD) the most prevalent. Because of the severity and prevalence of these illnesses, early detection and prevention are critical for reducing the worldwide health and economic burden. Though machine learning (ML) methods usually outperform conventional approaches in many domains, class imbalance can hinder the learning process. Oversampling techniques on the minority classes can help to overcome this issue. In particular, in this paper we apply oversampling methods to categorical data, aiming to improve the identification of risk factors associated with CVD. To conduct this study, questionnaire data (categorical) obtained by the Norwegian Centre for E-health Research associated with healthy and CVD patients are considered. The goal of this work is two-fold. Firstly, evaluating the influence of combining oversampling techniques and linear/nonlinear supervised ML methods in binary tasks. Secondly, identifying the most relevant features for predicting healthy and CVD cases. Experimental results show that oversampling and FS techniques help to improve CVD prediction. Specifically, the use of Generative Adversarial Networks and linear models usually achieve the best performance (area under the curve of 67%), outperforming other oversampling techniques. Synthetic data generation has proved to be beneficial for both identifying risk factors and creating models with reasonable generalization capability in the CVD prediction.en_US
dc.identifier.citationGarcía-Vicente, Chushig-Muzo, Mora-Jimenez, Fabelo, Gram, Løchen, Granja, Soguero-Ruiz. Clinical Synthetic Data Generation to Predict and Identify Risk Factors for Cardiovascular Diseases. Lecture Notes in Computer Science (LNCS). 2023;13814en_US
dc.identifier.cristinIDFRIDAID 2226604
dc.identifier.doi10.1007/978-3-031-23905-2_6
dc.identifier.issn0302-9743
dc.identifier.issn1611-3349
dc.identifier.urihttps://hdl.handle.net/10037/33275
dc.language.isoengen_US
dc.publisherSpringer Natureen_US
dc.relation.journalLecture Notes in Computer Science (LNCS)
dc.rights.accessRightsopenAccessen_US
dc.rights.holderCopyright 2023 The Author(s)en_US
dc.titleClinical Synthetic Data Generation to Predict and Identify Risk Factors for Cardiovascular Diseasesen_US
dc.type.versionacceptedVersionen_US
dc.typeJournal articleen_US
dc.typeTidsskriftartikkelen_US
dc.typePeer revieweden_US


File(s) in this item

Thumbnail

This item appears in the following collection(s)

Show simple item record