Clinical Synthetic Data Generation to Predict and Identify Risk Factors for Cardiovascular Diseases
Permanent lenke
https://hdl.handle.net/10037/33275Dato
2023-01-21Type
Journal articleTidsskriftartikkel
Peer reviewed
Forfatter
García-Vicente, Clara; Chushig-Muzo, David; Mora-Jimenez, Inmaculada; Fabelo, Himar; Gram, Inger Torhild; Løchen, Maja-Lisa; Granja, Conceição; Soguero-Ruiz, CristinaSammendrag
Noncommunicable diseases are among the most significant health threats in our society, being cardiovascular diseases (CVD) the most prevalent. Because of the severity and prevalence of these illnesses, early detection and prevention are critical for reducing the worldwide health and economic burden. Though machine learning (ML) methods usually outperform conventional approaches in many domains, class imbalance can hinder the learning process. Oversampling techniques on the minority classes can help to overcome this issue. In particular, in this paper we apply oversampling methods to categorical data, aiming to improve the identification of risk factors associated with CVD. To conduct this study, questionnaire data (categorical) obtained by the Norwegian Centre for E-health Research associated with healthy and CVD patients are considered. The goal of this work is two-fold. Firstly, evaluating the influence of combining oversampling techniques and linear/nonlinear supervised ML methods in binary tasks. Secondly, identifying the most relevant features for predicting healthy and CVD cases. Experimental results show that oversampling and FS techniques help to improve CVD prediction. Specifically, the use of Generative Adversarial Networks and linear models usually achieve the best performance (area under the curve of 67%), outperforming other oversampling techniques. Synthetic data generation has proved to be beneficial for both identifying risk factors and creating models with reasonable generalization capability in the CVD prediction.
Forlag
Springer NatureSitering
García-Vicente, Chushig-Muzo, Mora-Jimenez, Fabelo, Gram, Løchen, Granja, Soguero-Ruiz. Clinical Synthetic Data Generation to Predict and Identify Risk Factors for Cardiovascular Diseases. Lecture Notes in Computer Science (LNCS). 2023;13814Metadata
Vis full innførselSamlinger
Copyright 2023 The Author(s)