Preserving Privacy in Interactions with Large Language Models

Lorentzen, Nikolai

dc.contributor.advisor	Kozyri, Elisavet
dc.contributor.advisor	Årsand, Eirik
dc.contributor.advisor	Henriksen, André
dc.contributor.author	Lorentzen, Nikolai
dc.date.accessioned	2025-07-26T15:32:52Z
dc.date.available	2025-07-26T15:32:52Z
dc.date.issued	2025
dc.description.abstract	In this thesis we investigate the preservation of privacy in user interactions with Large Language Models (LLMs), focusing on transforming user queries to enhance privacy while maintaining the usability of answers. The research is contextualized within the FysBot mobile health application, which aims to motivate physical activity. The core problem addressed is the potential leakage of sensitive user information through prompts sent to LLM-based chatbots, stemming from risks like data memorization, re-identification, and logging. This thesis proposes a privacy-preserving system designed to mitigate these risks by modifying queries before they reach the external LLM. The developed system employs several techniques: numerical data (e.g., steps, geolocation, heart rate, time) is perturbed using randomized noise through methods like General Additive Data Perturbation (GADP) and Multiplicative Data Perturbation (MDP), tailored to the specific data type to maintain utility. Sensitive textual information is identified and substituted with semantic labels chosen via cosine similarity on text embeddings. The system was implemented in Python, utilizing models like ChatGPT 3.5 and text-embedding-3-small. Evaluation of the system involved performance benchmarking and a user survey. Benchmarking revealed a significant overhead, with an approximate 2.3-fold increase in data sent, a 3.7-fold increase in data received, and a 3-fold increase in execution time when the privacy-preserving system was used. The user survey, conducted with participants from health research and the general public, indicated that while a vast majority preferred answers generated from original, sensitive prompts. 50% of participants were willing to accept a reduction in the usability of answers in exchange for enhanced privacy. Hesitancy was often linked to the criticality of the sensitive information (diagnoses), where accuracy was deemed paramount. This thesis concludes that it is feasible to develop a system that enhances end-user privacy in LLM interactions with a manageable loss in usability. However, the introduced overhead suggests Backend implementation is more viable for mobile applications. Future work could focus on handling real-time sensitive data detection or optimizing the system’s performance.
dc.description.abstract
dc.identifier.uri	https://hdl.handle.net/10037/37874
dc.identifier	no.uit:wiseflow:7267640:62187160
dc.language.iso	eng
dc.publisher	UiT The Arctic University of Norway
dc.rights.holder	Copyright 2025 The Author(s)
dc.rights.uri	https://creativecommons.org/licenses/by/4.0	en_US
dc.rights	Attribution 4.0 International (CC BY 4.0)	en_US
dc.title	Preserving Privacy in Interactions with Large Language Models
dc.type	Master thesis

File(s) in this item

Name:: no.uit:wiseflow:7267640:621871 ...
Size:: 1.064Mb
Format:: PDF

View/Open

This item appears in the following collection(s)

Mastergradsoppgaver i informatikk [129]

Show simple item record

Except where otherwise noted, this item's license is described as Attribution 4.0 International (CC BY 4.0)