Vis enkel innførsel

dc.contributor.authorChomutare, Taridzo
dc.contributor.authorYigzaw, Kassaye Yitbarek
dc.contributor.authorBudrionis, Andrius
dc.contributor.authorMakhlysheva, Alexandra
dc.contributor.authorGodtliebsen, Fred
dc.contributor.authorDalianis, Hercules
dc.date.accessioned2021-06-20T21:14:21Z
dc.date.available2021-06-20T21:14:21Z
dc.date.issued2020
dc.description.abstractSensitive data is normally required to develop rule-based or train machine learning-based models for de-identifying electronic health record (EHR) clinical notes; and this presents important problems for patient privacy. In this study, we add non-sensitive public datasets to EHR training data; (i) scientific medical text and (ii) Wikipedia word vectors. The data, all in Swedish, is used to train a deep learning model using recurrent neural networks. Tests on pseudonymized Swedish EHR clinical notes showed improved precision and recall from 55.62% and 80.02% with the base EHR embedding layer, to 85.01% and 87.15% when Wikipedia word vectors are added. These results suggest that non-sensitive text from the general domain can be used to train robust models for de-identifying Swedish clinical text; and this could be useful in cases where the data is both sensitive and in low-resource languages.en_US
dc.identifier.citationChomutare, Yigzaw, Budrionis, Makhlysheva, Godtliebsen, Dalianis H. De-identifying Swedish EHR text using public resources in the general domain. Studies in Health Technology and Informatics. 2020;270:148-152en_US
dc.identifier.cristinIDFRIDAID 1819501
dc.identifier.doi10.3233/SHTI200140
dc.identifier.issn0926-9630
dc.identifier.issn1879-8365
dc.identifier.urihttps://hdl.handle.net/10037/21473
dc.language.isoengen_US
dc.publisherIOS Pressen_US
dc.relation.journalStudies in Health Technology and Informatics
dc.rights.accessRightsopenAccessen_US
dc.rights.holderCopyright 2020 The Author(s)en_US
dc.subjectVDP::Medical disciplines: 700::Health sciences: 800::Epidemiology medical and dental statistics: 803en_US
dc.subjectVDP::Medisinske Fag: 700::Helsefag: 800::Epidemiologi medisinsk og odontologisk statistikk: 803en_US
dc.titleDe-identifying Swedish EHR text using public resources in the general domainen_US
dc.type.versionpublishedVersionen_US
dc.typeJournal articleen_US
dc.typeTidsskriftartikkelen_US
dc.typePeer revieweden_US


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel