Vis enkel innførsel

dc.contributor.authorPedersen, Bjørn-Richard
dc.contributor.authorJohansen, Rigmor Katrine
dc.contributor.authorHolsbø, Einar Jakobsen
dc.contributor.authorSommerseth, Hilde Leikny
dc.contributor.authorBongo, Lars Ailo Aslaksen
dc.date.accessioned2024-09-05T09:29:55Z
dc.date.available2024-09-05T09:29:55Z
dc.date.issued2024-04-04
dc.description.abstractAny machine learning method for transcribing historical text requires manual verification and correction, which is often time-consuming and expensive. Our aim is to make it more efficient. Previously, we developed a machine learning model to transcribe 2.3 million handwritten occupation codes from the Norwegian 1950 census. Here, we manually review the 90,000 codes (3%) for which our model had the lowest confidence scores. We allocated these codes to human reviewers, who used our custom annotation tool to review them. The reviewers agreed with the model's labels 31.9% of the time. They corrected 62.8% of the labels, and 5.1% of the images were uncertain or assigned invalid labels. 9,000 images were reviewed by multiple reviewers, resulting in an agreement of 86.4% and a disagreement of 9%. The results suggest that one reviewer per image is sufficient. We recommend that reviewers indicate any uncertainty about the label they assign to an image by adding a flag to their label. Our interviews show that the reviewers performed internal quality control and found our custom tool to be useful and easy to operate. We provide guidelines for efficient and accurate transcription of historical text by combining machine learning and manual review. We have open-sourced our custom annotation tool and made the reviewed images open access.en_US
dc.identifier.citationPedersen, Johansen, Holsbø, Sommerseth, Bongo. More Efficient Manual Review of Automatically Transcribed Tabular Data. Historical Life Course Studies. 2024;14:3-15en_US
dc.identifier.cristinIDFRIDAID 2268298
dc.identifier.doi10.51964/hlcs15456
dc.identifier.issn2352-6343
dc.identifier.urihttps://hdl.handle.net/10037/34528
dc.language.isoengen_US
dc.publisherIISHen_US
dc.relation.journalHistorical Life Course Studies
dc.rights.accessRightsopenAccessen_US
dc.rights.holderCopyright 2024 The Author(s)en_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0en_US
dc.rightsAttribution 4.0 International (CC BY 4.0)en_US
dc.titleMore Efficient Manual Review of Automatically Transcribed Tabular Dataen_US
dc.type.versionpublishedVersionen_US
dc.typeJournal articleen_US
dc.typeTidsskriftartikkelen_US
dc.typePeer revieweden_US


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel

Attribution 4.0 International (CC BY 4.0)
Med mindre det står noe annet, er denne innførselens lisens beskrevet som Attribution 4.0 International (CC BY 4.0)