Record linkage of Norwegian historical census data using machine learning
Permanent link
https://hdl.handle.net/10037/28399Date
2022-08-02Type
MastergradsoppgaveMaster thesis
Author
Park, NaraeAbstract
The Historical Population Register (HPR) is a project to build the longitudinal life history of individuals by integrating the historical records of the people in Norway since the 19th century. This study attempted to improve the linking rate between the 1875-1900 censuses in HPR, which is currently low, using machine learning approaches. To this end, I developed a machine learning model for linking that is suitable for the Norwegian census and tested various algorithms, feature sets, and match selection options. I compared the results in terms of performance and match size, and also examined their representativeness to the entire population. The study results showed that the linking rate of HPR can be significantly improved by machine learning approaches while maintaining high accuracy. In addition, this study presented a reference for future use by demonstrating how the performance varies depending on the feature set and match selection. On the other hand, this study also revealed that linked data generally do not represent the population of the census, and the characteristics and degree of bias vary depending on the linking algorithm, suggesting that caution is needed when using linked data for research.
Description
For errata and source code: https://github.com/uit-hdl/rhd-linking.
Publisher
UiT Norges arktiske universitetUiT The Arctic University of Norway
Metadata
Show full item recordCollections
Copyright 2022 The Author(s)
The following license file are associated with this item: