dc.contributor.author | Chomutare, Taridzo | |
dc.contributor.author | Yigzaw, Kassaye Yitbarek | |
dc.contributor.author | Budrionis, Andrius | |
dc.contributor.author | Makhlysheva, Alexandra | |
dc.contributor.author | Godtliebsen, Fred | |
dc.contributor.author | Dalianis, Hercules | |
dc.date.accessioned | 2021-06-20T21:14:21Z | |
dc.date.available | 2021-06-20T21:14:21Z | |
dc.date.issued | 2020 | |
dc.description.abstract | Sensitive data is normally required to develop rule-based or train machine learning-based models for de-identifying electronic health record (EHR) clinical notes; and this presents important problems for patient privacy. In this study, we add non-sensitive public datasets to EHR training data; (i) scientific medical text and (ii) Wikipedia word vectors. The data, all in Swedish, is used to train a deep learning model using recurrent neural networks. Tests on pseudonymized Swedish EHR clinical notes showed improved precision and recall from 55.62% and 80.02% with the base EHR embedding layer, to 85.01% and 87.15% when Wikipedia word vectors are added. These results suggest that non-sensitive text from the general domain can be used to train robust models for de-identifying Swedish clinical text; and this could be useful in cases where the data is both sensitive and in low-resource languages. | en_US |
dc.identifier.citation | Chomutare, Yigzaw, Budrionis, Makhlysheva, Godtliebsen, Dalianis H. De-identifying Swedish EHR text using public resources in the general domain. Studies in Health Technology and Informatics. 2020;270:148-152 | en_US |
dc.identifier.cristinID | FRIDAID 1819501 | |
dc.identifier.doi | 10.3233/SHTI200140 | |
dc.identifier.issn | 0926-9630 | |
dc.identifier.issn | 1879-8365 | |
dc.identifier.uri | https://hdl.handle.net/10037/21473 | |
dc.language.iso | eng | en_US |
dc.publisher | IOS Press | en_US |
dc.relation.journal | Studies in Health Technology and Informatics | |
dc.rights.accessRights | openAccess | en_US |
dc.rights.holder | Copyright 2020 The Author(s) | en_US |
dc.subject | VDP::Medical disciplines: 700::Health sciences: 800::Epidemiology medical and dental statistics: 803 | en_US |
dc.subject | VDP::Medisinske Fag: 700::Helsefag: 800::Epidemiologi medisinsk og odontologisk statistikk: 803 | en_US |
dc.title | De-identifying Swedish EHR text using public resources in the general domain | en_US |
dc.type.version | publishedVersion | en_US |
dc.type | Journal article | en_US |
dc.type | Tidsskriftartikkel | en_US |
dc.type | Peer reviewed | en_US |