Vis enkel innførsel

dc.contributor.authorPedersen, Bjørn-Richard
dc.contributor.authorIslam, Maisha
dc.contributor.authorKristoffersen, Doris Tove
dc.contributor.authorBongo, Lars Ailo Aslaksen
dc.contributor.authorGarrett, Eilidh
dc.contributor.authorReid, Alice
dc.contributor.authorSommerseth, Hilde Leikny
dc.date.accessioned2025-01-20T12:48:02Z
dc.date.available2025-01-20T12:48:02Z
dc.date.issued2024-10-31
dc.description.abstractThis paper investigates the feasibility of using pre-trained generative Large Language Models (LLMs) to automate the assignment of ICD-10 codes to historical causes of death. Due to the complex narratives often found in historical causes of death, this task has traditionally been manually performed by coding experts. We evaluate the ability of GPT-3.5, GPT-4, and Llama 2 LLMs to accurately assign ICD-10 codes on the HiCaD dataset that contains causes of death recorded in the civil death register entries of 19,361 individuals from Ipswich, Kilmarnock, and the Isle of Skye in the UK between 1861–1901. Our findings show that GPT-3.5, GPT-4, and Llama 2 assign the correct code for 69%, 83%, and 40% of causes, respectively. However, we achieve a maximum accuracy of 89% by standard machine learning techniques. All LLMs performed better for causes of death that contained terms still in use today, compared to archaic terms. Also, they performed better for short causes (1–2 words) compared to longer causes. LLMs therefore do not currently perform well enough for historical ICD-10 code assignment tasks. We suggest further fine-tuning or alternative frameworks to achieve adequate performance.en_US
dc.identifier.citationPedersen, Islam, Kristoffersen, Bongo, Garrett, Reid, Sommerseth. Coding Historical Causes of Death Data with Large Language Models. Lecture Notes in Computer Science (LNCS). 2024;Bridging the Gap Between AI and Realityen_US
dc.identifier.cristinIDFRIDAID 2342336
dc.identifier.doi10.1007/978-3-031-73741-1_3
dc.identifier.issn0302-9743
dc.identifier.issn1611-3349
dc.identifier.urihttps://hdl.handle.net/10037/36234
dc.language.isoengen_US
dc.publisherSpringer Natureen_US
dc.relation.journalLecture Notes in Computer Science (LNCS)
dc.rights.accessRightsopenAccessen_US
dc.rights.holderCopyright 2025 The Author(s)en_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0en_US
dc.rightsAttribution 4.0 International (CC BY 4.0)en_US
dc.titleCoding Historical Causes of Death Data with Large Language Modelsen_US
dc.type.versionpublishedVersionen_US
dc.typeJournal articleen_US
dc.typeTidsskriftartikkelen_US
dc.typePeer revieweden_US


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel

Attribution 4.0 International (CC BY 4.0)
Med mindre det står noe annet, er denne innførselens lisens beskrevet som Attribution 4.0 International (CC BY 4.0)