Secure and scalable statistical computation of questionnaire data in R
ForfatterYigzaw, Kassaye Yitbarek; Michalas, Antonis; Bellika, Johan Gustav
Collecting data via a questionnaire and analyzing them while preserving respondents' privacy may increase the number of respondents and the truthfulness of their responses. It may also reduce the systematic differences between respondents and non-respondents. In this paper, we propose a privacy- preserving method for collecting and analyzing survey responses using secure multi-party computation. The method is secure under the semi-honest adversarial model. The proposed method computes a wide variety of statistics. Total and stratified statistical counts are computed using the secure protocols developed in this paper. Then, additional statistics, such as a contingency table, a chi-square test, an odds ratio, and logistic regression, are computed within the R statistical environment using the statistical counts as building blocks. The method was evaluated on a questionnaire data set of 3158 respondents sampled for a medical study and simulated questionnaire data sets of up to 50 000 respondents. The computation time for the statistical analyses linearly scales as the number of respondents increases. The results show that the method is efficient and scalable for practical use. It can also be used for other applications in which categorical data are collected.
Source: doi: 10.1109/ACCESS.2016.2599851