• De-identifying Swedish EHR text using public resources in the general domain 

      Chomutare, Taridzo; Yigzaw, Kassaye Yitbarek; Budrionis, Andrius; Makhlysheva, Alexandra; Godtliebsen, Fred; Dalianis, Hercules (Journal article; Tidsskriftartikkel; Peer reviewed, 2020)
      Sensitive data is normally required to develop rule-based or train machine learning-based models for de-identifying electronic health record (EHR) clinical notes; and this presents important problems for patient privacy. In this study, we add non-sensitive public datasets to EHR training data; (i) scientific medical text and (ii) Wikipedia word vectors. The data, all in Swedish, is used to train a ...