Preserving Privacy in Interactions with Large Language Models

Lorentzen, Nikolai

Permanent lenke

https://hdl.handle.net/10037/37874

Åpne

no.uit:wiseflow:7267640:62187160.pdf (1.064Mb)

(PDF)

Dato

2025

Type

Master thesis

Forfatter

Lorentzen, Nikolai

Sammendrag

In this thesis we investigate the preservation of privacy in user interactions with Large Language Models (LLMs), focusing on transforming user queries to enhance privacy while maintaining the usability of answers. The research is contextualized within the FysBot mobile health application, which aims to motivate physical activity. The core problem addressed is the potential leakage of sensitive user information through prompts sent to LLM-based chatbots, stemming from risks like data memorization, re-identification, and logging. This thesis proposes a privacy-preserving system designed to mitigate these risks by modifying queries before they reach the external LLM. The developed system employs several techniques: numerical data (e.g., steps, geolocation, heart rate, time) is perturbed using randomized noise through methods like General Additive Data Perturbation (GADP) and Multiplicative Data Perturbation (MDP), tailored to the specific data type to maintain utility. Sensitive textual information is identified and substituted with semantic labels chosen via cosine similarity on text embeddings. The system was implemented in Python, utilizing models like ChatGPT 3.5 and text-embedding-3-small. Evaluation of the system involved performance benchmarking and a user survey. Benchmarking revealed a significant overhead, with an approximate 2.3-fold increase in data sent, a 3.7-fold increase in data received, and a 3-fold increase in execution time when the privacy-preserving system was used. The user survey, conducted with participants from health research and the general public, indicated that while a vast majority preferred answers generated from original, sensitive prompts. 50% of participants were willing to accept a reduction in the usability of answers in exchange for enhanced privacy. Hesitancy was often linked to the criticality of the sensitive information (diagnoses), where accuracy was deemed paramount. This thesis concludes that it is feasible to develop a system that enhances end-user privacy in LLM interactions with a manageable loss in usability. However, the introduced overhead suggests Backend implementation is more viable for mobile applications. Future work could focus on handling real-time sensitive data detection or optimizing the system’s performance.

Forlag

UiT The Arctic University of Norway

Metadata

Vis full innførsel

Samlinger

Mastergradsoppgaver i informatikk [135]

Med mindre det står noe annet, er denne innførselens lisens beskrevet som Attribution 4.0 International (CC BY 4.0)