ub.xmlui.mirage2.page-structure.muninLogoub.xmlui.mirage2.page-structure.openResearchArchiveLogo
    • EnglishEnglish
    • norsknorsk
  • Velg spraakEnglish 
    • EnglishEnglish
    • norsknorsk
  • Administration/UB
View Item 
  •   Home
  • Fakultet for humaniora, samfunnsvitenskap og lærerutdanning
  • Institutt for arkeologi, historie, religionsvitenskap og teologi
  • Artikler, rapporter og annet (arkeologi, historie, religionsvitenskap og teologi)
  • View Item
  •   Home
  • Fakultet for humaniora, samfunnsvitenskap og lærerutdanning
  • Institutt for arkeologi, historie, religionsvitenskap og teologi
  • Artikler, rapporter og annet (arkeologi, historie, religionsvitenskap og teologi)
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Lessons Learned Developing and Using a Machine Learning Model to Automatically Transcribe 2.3 Million Handwritten Occupation Codes

Permanent link
https://hdl.handle.net/10037/24000
DOI
https://doi.org/10.51964/hlcs11331
Thumbnail
View/Open
article.pdf (1.723Mb)
Published version (PDF)
Date
2022-01-06
Type
Journal article
Tidsskriftartikkel
Peer reviewed

Author
Pedersen, Bjørn-Richard; Holsbø, Einar; Andersen, Trygve; Shvetsov, Nikita; Ravn, Johan; Sommerseth, Hilde Leikny; Bongo, Lars Ailo
Abstract
Machine learning approaches achieve high accuracy for text recognition and are therefore increasingly used for the transcription of handwritten historical sources. However, using machine learning in production requires a streamlined end-to-end pipeline that scales to the dataset size and a model that achieves high accuracy with few manual transcriptions. The correctness of the model results must also be verified. This paper describes our lessons learned developing, tuning and using the Occode end-to-end machine learning pipeline for transcribing 2.3 million handwritten occupation codes from the Norwegian 1950 population census. We achieve an accuracy of 97% for the automatically transcribed codes, and we send 3% of the codes for manual verification . We verify that the occupation code distribution found in our results matches the distribution found in our training data, which should be representative for the census as a whole. We believe our approach and lessons learned may be useful for other transcription projects that plan to use machine learning in production.
Citation
Pedersen B, Holsbø EJ, Andersen T, Shvetsov N, Ravn J, Sommerseth HL, Bongo LA. Lessons Learned Developing and Using a Machine Learning Model to Automatically Transcribe 2.3 Million Handwritten Occupation Codes. Historical Life Course Studies. 2022;11:1-17
Metadata
Show full item record
Collections
  • Artikler, rapporter og annet (arkeologi, historie, religionsvitenskap og teologi) [301]
Copyright 2022 The Author(s)

Browse

Browse all of MuninCommunities & CollectionsAuthor listTitlesBy Issue DateBrowse this CollectionAuthor listTitlesBy Issue Date
Login

Statistics

View Usage Statistics
UiT

Munin is powered by DSpace

UiT The Arctic University of Norway
The University Library
uit.no/ub - munin@ub.uit.no

Accessibility statement (Norwegian only)