dc.contributor.advisor | Dalmo, Rune | |
dc.contributor.advisor | Pedersen, Bjørn-Richard | |
dc.contributor.author | Wilhelmsen, Kristoffer Berg | |
dc.date.accessioned | 2024-07-18T06:24:31Z | |
dc.date.available | 2024-07-18T06:24:31Z | |
dc.date.issued | 2024-05-15 | en |
dc.description.abstract | This thesis assesses the impact of fine-tuning and rag on llms in accurately assigning icd-10 codes to historical causes of death. Using funeral records from Trondheim, Norway (1830-1920), we fine-tuned Llama 3 and Mistral on 2000 records. Twelve experiments were conducted on 2000 additional records to evaluate the accuracy of each knowledge-injection technique, as well as a combination of the two.
The results indicate that fine-tuning as a standalone knowledge-injection technique achieved the highest accuracy, generating 88% full matches and 2% partial matches for icd-10 codes, up from 58% full matches and 25% partial matches in previous research. However, concerns regarding memorization of training data due to the lack of diversity in the available dataset remain. Moreover, combining RAG with fine-tuning led to a decrease in accuracy, while a sole rag approach decreased the results even further. These findings serve as proof-of-concept for the automatic assignment of icd-10 codes to historical causes of death, paving the way for future research. | en_US |
dc.identifier.uri | https://hdl.handle.net/10037/34160 | |
dc.language.iso | eng | en_US |
dc.publisher | UiT Norges arktiske universitet | no |
dc.publisher | UiT The Arctic University of Norway | en |
dc.rights.holder | Copyright 2024 The Author(s) | |
dc.rights.uri | https://creativecommons.org/licenses/by-nc-sa/4.0 | en_US |
dc.rights | Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) | en_US |
dc.subject.courseID | DTE-3900 | |
dc.subject | fine-tuning | en_US |
dc.subject | large language models | en_US |
dc.subject | retrieval-augmented generation | en_US |
dc.subject | icd-10 | en_US |
dc.subject | quantization | en_US |
dc.subject | low-rank adaptation | en_US |
dc.title | Fine-tuning Large Language Models on historical causes of death data | en_US |
dc.type | Master thesis | en |
dc.type | Mastergradsoppgave | no |