Show simple item record

dc.contributor.advisorKozyri, Elisavet
dc.contributor.advisorÅrsand, Eirik
dc.contributor.advisorHenriksen, André
dc.contributor.authorLorentzen, Nikolai
dc.date.accessioned2025-07-26T15:32:52Z
dc.date.available2025-07-26T15:32:52Z
dc.date.issued2025
dc.description.abstractIn this thesis we investigate the preservation of privacy in user interactions with Large Language Models (LLMs), focusing on transforming user queries to enhance privacy while maintaining the usability of answers. The research is contextualized within the FysBot mobile health application, which aims to motivate physical activity. The core problem addressed is the potential leakage of sensitive user information through prompts sent to LLM-based chatbots, stemming from risks like data memorization, re-identification, and logging. This thesis proposes a privacy-preserving system designed to mitigate these risks by modifying queries before they reach the external LLM. The developed system employs several techniques: numerical data (e.g., steps, geolocation, heart rate, time) is perturbed using randomized noise through methods like General Additive Data Perturbation (GADP) and Multiplicative Data Perturbation (MDP), tailored to the specific data type to maintain utility. Sensitive textual information is identified and substituted with semantic labels chosen via cosine similarity on text embeddings. The system was implemented in Python, utilizing models like ChatGPT 3.5 and text-embedding-3-small. Evaluation of the system involved performance benchmarking and a user survey. Benchmarking revealed a significant overhead, with an approximate 2.3-fold increase in data sent, a 3.7-fold increase in data received, and a 3-fold increase in execution time when the privacy-preserving system was used. The user survey, conducted with participants from health research and the general public, indicated that while a vast majority preferred answers generated from original, sensitive prompts. 50% of participants were willing to accept a reduction in the usability of answers in exchange for enhanced privacy. Hesitancy was often linked to the criticality of the sensitive information (diagnoses), where accuracy was deemed paramount. This thesis concludes that it is feasible to develop a system that enhances end-user privacy in LLM interactions with a manageable loss in usability. However, the introduced overhead suggests Backend implementation is more viable for mobile applications. Future work could focus on handling real-time sensitive data detection or optimizing the system’s performance.
dc.description.abstract
dc.identifier.urihttps://hdl.handle.net/10037/37874
dc.identifierno.uit:wiseflow:7267640:62187160
dc.language.isoeng
dc.publisherUiT The Arctic University of Norway
dc.rights.holderCopyright 2025 The Author(s)
dc.rights.urihttps://creativecommons.org/licenses/by/4.0en_US
dc.rightsAttribution 4.0 International (CC BY 4.0)en_US
dc.titlePreserving Privacy in Interactions with Large Language Models
dc.typeMaster thesis


File(s) in this item

Thumbnail

This item appears in the following collection(s)

Show simple item record

Attribution 4.0 International (CC BY 4.0)
Except where otherwise noted, this item's license is described as Attribution 4.0 International (CC BY 4.0)