Emnet: A System for Privacy-preserving Statistical Computation on Distributed Health Data
Permanent lenke
https://hdl.handle.net/10037/9154Dato
2015-05-18Type
Master thesisMastergradsoppgave
Forfatter
Hailemichael, Meskerem AsfawSammendrag
Motivation: Despite its enormous benefits, EHR data reuse is limited because of multi-dimensional challenges where privacy comes on the forefront. Recently various privacy-preserving statistical computation tools have emerged. However, they have limited privacy guarantee and use ad-hoc techniques for privacy-preserving computation of statistical functions.
Purpose: The purpose of this thesis is to develop a system that enables to compute a wide
variety of statistical functions on distributed EHRs, while preserving the privacy of patients
and health institutions.
Materials and Methods: Systematic literature review of privacy-preserving techniques for
health data reuse was performed to understand the state-of-the-art. The result of the review
and meetings with users were used as sources of requirements. Agile methodology was used
for implementation of a prototype system called Emnet.
Emnet uses openEHR-based EHRs as common data model to achieve interoperability among
health institutions. We have prepared test openEHR data sets and a virtual environment that
simulates the real working environment for testing.
Result: We have developed and tested privacy-preserving techniques for research data set
preparation and statistical computation. The research eligibility criteria and required
attributes are expressed as a computable query using Archetype Query Language (AQL), and
each health institution executes the query and locally stores the resulting data set. The data
sets are physically distributed across the health institutions, yet they collectively make the
research data set, which we call Virtual Dataset.
Statistical computations on the Virtual Dataset are performed using two main techniques, (1)
decomposition of statistical functions into summation forms and described as a computation
graph; and (2) secure summation protocols.
Conclusion: The developed techniques enable statistical computation on distributed health
data, while preserving the privacy of patients and health institutions. Currently, mean,
variance, Standard Deviation, Covariance and Pearson’s r are implemented in Emnet.
However, the techniques are generic to implement more statistical functions, as long as they
can be decomposed into summation forms. The work presented in this thesis contributes for
advancement of privacy-preserving health data reuse. It is also relevant to other domains
where they have similar requirements as health care.
Forlag
UiT Norges arktiske universitetUiT The Arctic University of Norway
Metadata
Vis full innførselSamlinger
Copyright 2015 The Author(s)
Følgende lisensfil er knyttet til denne innførselen: