ub.xmlui.mirage2.page-structure.muninLogoub.xmlui.mirage2.page-structure.openResearchArchiveLogo
    • EnglishEnglish
    • norsknorsk
  • Velg spraakEnglish 
    • EnglishEnglish
    • norsknorsk
  • Administration/UB
View Item 
  •   Home
  • Det helsevitenskapelige fakultet
  • Institutt for samfunnsmedisin
  • Artikler, rapporter og annet (samfunnsmedisin)
  • View Item
  •   Home
  • Det helsevitenskapelige fakultet
  • Institutt for samfunnsmedisin
  • Artikler, rapporter og annet (samfunnsmedisin)
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Clinical Synthetic Data Generation to Predict and Identify Risk Factors for Cardiovascular Diseases

Permanent link
https://hdl.handle.net/10037/33275
DOI
https://doi.org/10.1007/978-3-031-23905-2_6
Thumbnail
View/Open
article.pdf (837.0Kb)
Accepted manuscript version (PDF)
Date
2023-01-21
Type
Journal article
Tidsskriftartikkel
Peer reviewed

Author
García-Vicente, Clara; Chushig-Muzo, David; Mora-Jimenez, Inmaculada; Fabelo, Himar; Gram, Inger Torhild; Løchen, Maja-Lisa; Granja, Conceição; Soguero-Ruiz, Cristina
Abstract
Noncommunicable diseases are among the most significant health threats in our society, being cardiovascular diseases (CVD) the most prevalent. Because of the severity and prevalence of these illnesses, early detection and prevention are critical for reducing the worldwide health and economic burden. Though machine learning (ML) methods usually outperform conventional approaches in many domains, class imbalance can hinder the learning process. Oversampling techniques on the minority classes can help to overcome this issue. In particular, in this paper we apply oversampling methods to categorical data, aiming to improve the identification of risk factors associated with CVD. To conduct this study, questionnaire data (categorical) obtained by the Norwegian Centre for E-health Research associated with healthy and CVD patients are considered. The goal of this work is two-fold. Firstly, evaluating the influence of combining oversampling techniques and linear/nonlinear supervised ML methods in binary tasks. Secondly, identifying the most relevant features for predicting healthy and CVD cases. Experimental results show that oversampling and FS techniques help to improve CVD prediction. Specifically, the use of Generative Adversarial Networks and linear models usually achieve the best performance (area under the curve of 67%), outperforming other oversampling techniques. Synthetic data generation has proved to be beneficial for both identifying risk factors and creating models with reasonable generalization capability in the CVD prediction.
Publisher
Springer Nature
Citation
García-Vicente, Chushig-Muzo, Mora-Jimenez, Fabelo, Gram, Løchen, Granja, Soguero-Ruiz. Clinical Synthetic Data Generation to Predict and Identify Risk Factors for Cardiovascular Diseases. Lecture Notes in Computer Science (LNCS). 2023;13814
Metadata
Show full item record
Collections
  • Artikler, rapporter og annet (samfunnsmedisin) [1515]
Copyright 2023 The Author(s)

Browse

Browse all of MuninCommunities & CollectionsAuthor listTitlesBy Issue DateBrowse this CollectionAuthor listTitlesBy Issue Date
Login

Statistics

View Usage Statistics
UiT

Munin is powered by DSpace

UiT The Arctic University of Norway
The University Library
uit.no/ub - munin@ub.uit.no

Accessibility statement (Norwegian only)