ub.xmlui.mirage2.page-structure.muninLogoub.xmlui.mirage2.page-structure.openResearchArchiveLogo
    • EnglishEnglish
    • norsknorsk
  • Velg spraaknorsk 
    • EnglishEnglish
    • norsknorsk
  • Administrasjon/UB
Vis innførsel 
  •   Hjem
  • Det helsevitenskapelige fakultet
  • Institutt for samfunnsmedisin
  • Artikler, rapporter og annet (samfunnsmedisin)
  • Vis innførsel
  •   Hjem
  • Det helsevitenskapelige fakultet
  • Institutt for samfunnsmedisin
  • Artikler, rapporter og annet (samfunnsmedisin)
  • Vis innførsel
JavaScript is disabled for your browser. Some features of this site may not work without it.

Clinical Synthetic Data Generation to Predict and Identify Risk Factors for Cardiovascular Diseases

Permanent lenke
https://hdl.handle.net/10037/33275
DOI
https://doi.org/10.1007/978-3-031-23905-2_6
Thumbnail
Åpne
article.pdf (837.0Kb)
Akseptert manusversjon (PDF)
Dato
2023-01-21
Type
Journal article
Tidsskriftartikkel
Peer reviewed

Forfatter
García-Vicente, Clara; Chushig-Muzo, David; Mora-Jimenez, Inmaculada; Fabelo, Himar; Gram, Inger Torhild; Løchen, Maja-Lisa; Granja, Conceição; Soguero-Ruiz, Cristina
Sammendrag
Noncommunicable diseases are among the most significant health threats in our society, being cardiovascular diseases (CVD) the most prevalent. Because of the severity and prevalence of these illnesses, early detection and prevention are critical for reducing the worldwide health and economic burden. Though machine learning (ML) methods usually outperform conventional approaches in many domains, class imbalance can hinder the learning process. Oversampling techniques on the minority classes can help to overcome this issue. In particular, in this paper we apply oversampling methods to categorical data, aiming to improve the identification of risk factors associated with CVD. To conduct this study, questionnaire data (categorical) obtained by the Norwegian Centre for E-health Research associated with healthy and CVD patients are considered. The goal of this work is two-fold. Firstly, evaluating the influence of combining oversampling techniques and linear/nonlinear supervised ML methods in binary tasks. Secondly, identifying the most relevant features for predicting healthy and CVD cases. Experimental results show that oversampling and FS techniques help to improve CVD prediction. Specifically, the use of Generative Adversarial Networks and linear models usually achieve the best performance (area under the curve of 67%), outperforming other oversampling techniques. Synthetic data generation has proved to be beneficial for both identifying risk factors and creating models with reasonable generalization capability in the CVD prediction.
Forlag
Springer Nature
Sitering
García-Vicente, Chushig-Muzo, Mora-Jimenez, Fabelo, Gram, Løchen, Granja, Soguero-Ruiz. Clinical Synthetic Data Generation to Predict and Identify Risk Factors for Cardiovascular Diseases. Lecture Notes in Computer Science (LNCS). 2023;13814
Metadata
Vis full innførsel
Samlinger
  • Artikler, rapporter og annet (samfunnsmedisin) [1515]
Copyright 2023 The Author(s)

Bla

Bla i hele MuninEnheter og samlingerForfatterlisteTittelDatoBla i denne samlingenForfatterlisteTittelDato
Logg inn

Statistikk

Antall visninger
UiT

Munin bygger på DSpace

UiT Norges Arktiske Universitet
Universitetsbiblioteket
uit.no/ub - munin@ub.uit.no

Tilgjengelighetserklæring