ub.xmlui.mirage2.page-structure.muninLogoub.xmlui.mirage2.page-structure.openResearchArchiveLogo
    • EnglishEnglish
    • norsknorsk
  • Velg spraakEnglish 
    • EnglishEnglish
    • norsknorsk
  • Administration/UB
View Item 
  •   Home
  • Universitetsbiblioteket
  • Artikler, rapporter og annet (UB)
  • View Item
  •   Home
  • Universitetsbiblioteket
  • Artikler, rapporter og annet (UB)
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Automating Historical Source Transcription

Permanent link
https://hdl.handle.net/10037/24070
DOI
https://doi.org/10.51964/hlcs9568
Thumbnail
View/Open
article.pdf (170.0Kb)
Published version (PDF)
Date
2021-03-31
Type
Journal article
Tidsskriftartikkel
Peer reviewed

Author
Thorvaldsen, Gunnar
Abstract
Transcribing the 1950 Norwegian census with 3.3 million person records and linking it to the Central Population Register (CPR) provides longitudinal information about significant population groups during the understudied period of the mid-20th century. Since this source is closed to the public, we receive no help from genealogists and rather use machine learning techniques to semi-automate the transcription. First the scanned manuscripts are split into individual cells and multiple names are divided. After the birthdates were transcribed manually in India, a lookup routine searches for families with matching sets of birthdates in the 1960 census and the CPR. After manual checks with GUI routines, the names are copied to the text version of the 1950 census, also storing the links to the CPR. Other fields like occupations or gender contain numeric or letter codes and are transcribed wholesale with routines interpreting the layout of the graphical images. Work employing these methods has also started on the 1930 census, which is the last of the Norwegian censuses to be transcribed.
Publisher
Openjournals
Citation
Thorvaldsen G. Automating Historical Source Transcription. Historical Life Course Studies. 2021;10(3):59-63
Metadata
Show full item record
Collections
  • Artikler, rapporter og annet (UB) [3257]
Copyright 2021 The Author(s)

Browse

Browse all of MuninCommunities & CollectionsAuthor listTitlesBy Issue DateBrowse this CollectionAuthor listTitlesBy Issue Date
Login

Statistics

View Usage Statistics
UiT

Munin is powered by DSpace

UiT The Arctic University of Norway
The University Library
uit.no/ub - munin@ub.uit.no

Accessibility statement (Norwegian only)