Emnet: A System for Privacy-preserving Statistical Computation on Distributed Health Data
AuthorHailemichael, Meskerem Asfaw
Motivation: Despite its enormous benefits, EHR data reuse is limited because of multi-dimensional challenges where privacy comes on the forefront. Recently various privacy-preserving statistical computation tools have emerged. However, they have limited privacy guarantee and use ad-hoc techniques for privacy-preserving computation of statistical functions. Purpose: The purpose of this thesis is to develop a system that enables to compute a wide variety of statistical functions on distributed EHRs, while preserving the privacy of patients and health institutions. Materials and Methods: Systematic literature review of privacy-preserving techniques for health data reuse was performed to understand the state-of-the-art. The result of the review and meetings with users were used as sources of requirements. Agile methodology was used for implementation of a prototype system called Emnet. Emnet uses openEHR-based EHRs as common data model to achieve interoperability among health institutions. We have prepared test openEHR data sets and a virtual environment that simulates the real working environment for testing. Result: We have developed and tested privacy-preserving techniques for research data set preparation and statistical computation. The research eligibility criteria and required attributes are expressed as a computable query using Archetype Query Language (AQL), and each health institution executes the query and locally stores the resulting data set. The data sets are physically distributed across the health institutions, yet they collectively make the research data set, which we call Virtual Dataset. Statistical computations on the Virtual Dataset are performed using two main techniques, (1) decomposition of statistical functions into summation forms and described as a computation graph; and (2) secure summation protocols. Conclusion: The developed techniques enable statistical computation on distributed health data, while preserving the privacy of patients and health institutions. Currently, mean, variance, Standard Deviation, Covariance and Pearson’s r are implemented in Emnet. However, the techniques are generic to implement more statistical functions, as long as they can be decomposed into summation forms. The work presented in this thesis contributes for advancement of privacy-preserving health data reuse. It is also relevant to other domains where they have similar requirements as health care.
PublisherUiT Norges arktiske universitet
UiT The Arctic University of Norway
The following license file are associated with this item: