Faculty of Health Science Educational inequality in population-based health studies Insights from the Tromsø Study Chi Quynh Vo A dissertation for the degree of Philosophiae Doctor – November 2023 Acknowledgments I want to thank High North Population Studies for funding my PhD project. Being a part of this interdisciplinary project has been an interesting and instructive journey! A special thank you to my team of supervisors: Elise, Torbjørn, Hilde, and Per-Jostein. It has been a privileged to work under your guidance. Thanks for all the interesting discussions, thorough feedback, and support you have shown over the past four years. All of you inspire me in so many ways. Hilde, a special thank you for starting the monthly PhD seminar. This was my “safe zone.” As a PhD student, I really needed a place like that. A special thanks to my co-author, Tom Wilsgaard, for all the discussions we had around the papers and for giving me good statistical advice. You have been very understanding and patient. To all my co-authors and professors in the research group “Social Inequality in Health,” who have given me good suggestions to improve my work with constructive feedback, I feel blessed to have been working with such a fantastic group of researchers. To my “Social Inequality in Health” research group friends, Emre, Petja, Sigbjørn, and Bjørn- Richard. Coming in as the last PhD student in the research group, you welcomed me with your kindness and introduced me to a world of inequalities. Thanks for all the interesting discussions about everything, your support, and, most of all, your friendship. To all my colleagues at ISM, thank you for the discussions, encouragement, help with diverse challenges, and support. Ina, Ragnhild, and Torill, thanks for being so supportive and taking good care of me. It has been such a joy to work with and to discuss teaching and research experience with you. Sondre, thanks for being a good colleague and for all the cinnamon rolls! To my dear family, thank you for your unconditional support and love. Thanks for hearing me complain, wiping away my tears, and giving me motivation to finish this journey. Mom and Dad, you frequently visited me in Tromsø because you were concerned that I wouldn’t eat properly. Thank you for always taking good care of me and being there for me no matter what. You two are the inspiration in my life! Table of Contents 1 Introduction .................................................................................................................................... 1 1.1 Social inequality in health ....................................................................................................... 1 1.2 Indicators of social inequality in health ................................................................................... 2 1.3 Educational inequality in health .............................................................................................. 5 1.4 The education variable in epidemiologic research .................................................................. 6 1.5 Education in population-based studies .................................................................................... 7 1.6 Rationale and aims .................................................................................................................. 8 2 Materials and Methods ................................................................................................................. 10 2.1 The Tromsø Study ................................................................................................................. 10 2.2 Measurements of educational level ....................................................................................... 12 2.3 Statistics Norway – national institute of Norway .................................................................. 13 2.4 Study samples and variables .................................................................................................. 14 2.5 Ethics ..................................................................................................................................... 16 2.6 Statistical analyses ................................................................................................................. 17 3 Results – main results of papers ................................................................................................... 20 3.1 Paper I ................................................................................................................................... 20 3.2 Paper II .................................................................................................................................. 20 3.3 Paper III: ................................................................................................................................ 21 4 Discussion ..................................................................................................................................... 22 4.1 Discussion of methodology ................................................................................................... 22 4.1.1 Data considerations ....................................................................................................... 22 4.1.2 Study design .................................................................................................................. 23 4.1.3 Internal validity ............................................................................................................. 23 4.1.4 Selection bias ................................................................................................................. 23 4.1.5 Recall and information bias ........................................................................................... 25 4.1.6 Confounding .................................................................................................................. 26 4.1.7 External validity ............................................................................................................ 27 4.1.8 Statistical considerations ............................................................................................... 28 4.2 Summary of the results .......................................................................................................... 31 4.3 Discussion of main results ..................................................................................................... 31 5 Conclusion .................................................................................................................................... 35 6 Implications and further perspectives ........................................................................................... 36 References ............................................................................................................................................. 38 Paper I-III Appendix List of Tables Table 1. Overview of surveys in the Tromsø Study this thesis is based on ........................................... 11 Table 2. The Norwegian Standard Classification of Education as of 2023 (109). ............................... 13 Table 3. Calculation of completeness and correctness by Hogan and Wagner (117). ......................... 29 List of Figures Figure 1. Directed acyclic graph of variables in paper III................................................................... 27 Figure 2. Histogram of residuals for the dependent variable (A) and the random intercept (B) ......... 30 Abbreviations BMI body mass index CI confidence interval CVD cardiovascular disease DAGs directed acyclic graphs LLDs lipid-lowering drugs mmol/L millimole per liter NUS The Norwegian Standard Classification of Education OR odds ratio PPV positive predictive value REK North Regional Committees for Medical and Health Research Ethics North SD standard deviation SES socioeconomic status SSB Statistics Norway List of Papers This thesis is based on the following papers, which hereafter are referred to as paper I, II, and III: I. Vo CQ, Samuelsen PJ, Sommerseth HL, Wisløff T, Wilsgaard T, Eggen AE. Comparing the sociodemographic characteristics of participants and non- participants in the population-based Tromsø Study. BMC Public Health. doi: 10.1186/s12889-023-15928-w II. Vo CQ, Samuelsen PJ, Sommerseth HL, Wisløff T, Wilsgaard T, Eggen AE. Validity of self-reported educational level in the Tromsø Study. Scandinavian Journal of Public Health, 2022. doi: 10.1177/14034948221088004 III. Vo CQ, Wilsgaard T, Samuelsen PJ, Sommerseth HL, Mathiesen EB, Eggen AE, Wisløff T. Longitudinal cholesterol trends across educational groups: the influence of lipid-lowering drugs in a population-based Tromsø Study 1994–2016. (Submitted, under review). Abstract Background: Population-based studies have provided insight into the sociodemographic profile and health status of the population. However, ensuring that population-based studies effectively represent the entire spectrum of education levels is challenging. Participants in these studies are voluntary, resulting in some groups choosing not to participate or providing inaccurate information about their education, which can lead to inaccurate estimates of the association between education and health outcomes. Aim: The aim of this thesis was to assess the sociodemographic characteristics of participants and non-participants in the Tromsø Study (Tromsø7, 2015-16), focusing on education as a proxy for socioeconomic status (SES) by investigating the completeness and correctness of self-reported education compared to national registry data and exploring the consequences of using these two data sources on educational trends in cardiometabolic diseases. Finally, this study sought to explore the longitudinal educational gradient in total cholesterol levels and the potential influence of LLD (lipid-lowering drug) treatment on this gradient. Methods: In paper I, sociodemographic characteristics of participants and non-participants were examined by linking the Tromsø7 invitation file to Statistics Norway (SSB). To explore the association between these characteristics and participation we applied logistic regression. In paper II, Tromsø7 were also linked to SSB to compare self-reported and SSB-recorded educational levels. Completeness and correctness were used to measure the validity of self- reported educational levels. Moreover, multivariable logistic regression was used to compare educational trends in cardiometabolic diseases between self-reported and SSB-recorded educational levels. Finally, linear mixed models were used to assess longitudinal change in cholesterol according to educational levels in sex-specific models, divided into groups based on whether they were treated or untreated. Results: Sociodemographic characteristics vary between participants and non-participants. Non-participants were, men aged 40–49 and 80–99 years, who were unmarried, widowed, separated, or divorced, foreign born, lower education and income, residential renters and lived in low-SES areas. Self-reported education was found to be adequately complete and correct, however, it yielded a weaker association in cardiometabolic diseases compared to the registry data. The educational gradient attenuated over time and disappeared in Tromsø7. There was no educational gradient in cholesterol levels among LLD users in any survey or age group. Conclusion: Sociodemographic differences in participation must be considered, particularly when investigating the relationship between SES and health outcomes. The self-reported educational level was adequately complete and correct. Educational trends in the risk of cardiometabolic diseases were observed in both the self-report and the SSB data. No educational gradient was observed among LLD users, which suggests the potential role of LLD treatment in reducing social inequality in health. Sammendrag Bakgrunn: Befolkningsundersøkelser gir innsikt i sosiodemografiske profiler og helsestatus for en befolkning. Det er imidlertid en utfordring å sikre at befolkningsundersøkelser representerer alle med ulike utdanningsnivåer. Deltakere i disse studier er frivillige, noe som resulterer i at noen grupper velger å ikke delta eller gir unøyaktig informasjon om sin utdanning. Dette kan potensielt føre til unøyaktig estimering av sammenhengen mellom utdanning og helseutfall. Formål: Formålet var å sammenligne de sosiodemografiske kjennetegnene til deltakere og ikke deltakere i Tromsøundersøkelsen (Tromsø7, 2015-16). Undersøke validiteten av selvrapporterte utdanningsnivå og utforske konsekvensen av å bruke selvrapporterte og registerdata om utdanningstrender innen kardiometabolske sykdommer. Til slutt, undersøke utdanningsgradienten i totalt kolesterolnivå og utforske virkningen av lipidsenkende medisiner (LLD) over tid. Metode: I artikkel I ble sosiodemografiske forskjeller blant deltakere og ikke deltakere undersøkt ved å koble invitasjonsfilen til Tromsø7 til Statistisk Sentralbyrå (SSB). Vi estimerte odds ratio (OR) for deltakelse. I artikkel II, koblet vi også Tromsø7 til SSB for å sammenligne den selvrapporterte og SSB-registrerte utdanningsnivåene. Completeness, correctness ble brukt for å måle validiteten av selvrapportert utdanningsnivå. Multinominal logistisk regresjon ble brukt for å undersøke utdanningstrenden i kardiometabolske sykdommer mellom Tromsø7 og SSB. I artikkel III, brukte vi lineær mixed models for å undersøke den longitudinelle endringen i kolesterolnivået i de ulike utdanningsgruppene og effekten av LLD. Resultat: Den sosiodemografiske profilen varierer mellom deltakere og ikke deltakere. Ikke deltakere kjennetegnes av menn, aldersgruppene 40–49 og 80–99 år, ugifte, enker og separerte/skilte, født utenfor Norge, hadde lavere utdanning og inntekt, var leietakere og bodde i et lavt sosioøkonomisk område. Selvrapportert utdanningsnivå er tilstrekkelig complete og correct, men kan gi svakere sammenhenger mellom utdanningsnivå og kardiometabolske sykdommer sammenlignet med registerdata. Utdanningsgradienten ble svekket over tid og ingen gradient ble observert i Tromsø7. Ingen observert utdanningsgradient i kolesterolnivåer blant brukere av LLD i noen av undersøkelsene eller aldersgrupper. Konklusjon: Sosiodemografiske forskjeller i deltakelse ser ikke ut til å påvirke estimering av sammenhengen mellom eksponering og helseutfall i stor grad. Selvrapportert utdanningsnivå er tilstrekkelig complete og correct. Utdanningstrender for kardiometabolske sykdommer ble observert hos både i Tromsø7 og hos SSB. Utdanningsgradienten i kolesterolverdier som ble observert blant ikke brukere, ble ikke observert blant brukere av LLD. Funnene indikerer at behandling med LLD har en utjevnende effekt på sosial ulikhet i helse. 1 1 Introduction When comparing socioeconomic groups in the society, one can observe systematic differences in health. The higher the education and income the group has, the higher the proportion of the group’s member have good health. These differences are known as social inequality in health (1). Social inequality in health is a major concern of public health (2), which is, a major concern of epidemiology, given its focus on the cause of the distribution of health and disease in populations. Epidemiological research is one of many areas of study that provide evidence for understanding the causes of social inequality in health and how to reduce it (3). Information concerning an individual’s socioeconomic status (SES) is collected often through self-administrated questionnaires within epidemiological population-based studies. Persons from diverse backgrounds who participate in epidemiological research should represent the study population (4). The questionnaire gathers the intended information, ensuring the validity of estimated exposure-outcome associations. In this thesis, I will address sociodemographic differences in participation and the validity of the variable serving as a proxy for SES in the Tromsø Study. I then utilized this knowledge in empirical research by investigating the educational gradient in total cholesterol levels and the longitudinal influence of lipid-lowering drugs (LLDs). Hence, the thesis focused on education as a proxy for SES, with education as this thesis’s primary exposure. More detail on the thesis’s three aims is at the end of this chapter. This chapter first presents social inequality in health and indicators usually used to measure inequality in health. I then narrow the focus to how education is used in epidemiological research and population-based studies. 1.1 Social inequality in health Norway is characterized by an increasingly high standard of living for much of the population. It is one of the best countries to live in worldwide, with high scores on health parameters (5). Nevertheless, there are some significant and growing social inequalities. Social inequality in health can be described as unequally distributed resources due to social positions or statuses (6). For example, a social gradient is present when the higher education and income a person has, the healthier and longer lives they will have compared to their less educated and income peers. 2 During the 1980s and 1990s, socioeconomic inequalities in mortality widened in many countries (7). This, differences in social groups within countries today are often substantial. It has been 16 years since Norway adopted its first National Strategy to Reduce Social Inequalities in Health (8). Even after adoption, there are still persistent inequalities in life expectancy and mortality rates in Norway (5). For instance, men and women with higher education have 6.4 and 5 years longer life expectancy than men and women with primary education in Norway (9). In European countries, there is a difference between 5 and 10 years in average life expectancy at birth (6). Social inequality in health is not only seen in Europe but also worldwide (10-14). Social inequality in health currently represents one of the greatest challenges for public health worldwide, because it leads to unfair suffering while hindering social progress and economic development (15). Health inequalities are believed to arise due to disparities in social, economic, and environmental living conditions, constituting an unfair burden on certain groups of the population (16). Therefore, an in-depth investigation of the characteristics of groups most affected by these inequalities is important. 1.2 Indicators of social inequality in health Several social indicators have been used in epidemiological research to describe SES. The most frequently used have been education, income, and occupation. In health research, they have been used to understand the complex interplay between SES and various health outcomes. Occupation In Western European countries, occupation is often categorized based on prestige, skills, social influence, and power (17). There are several ways to categorize occupation. One possibility is listed in the International Standard Classification of Occupation (ISCO-08), prepared by the International Labour Organization. The ISCO-08 divides occupations into ten major groups: managers; professionals; technicians and associate professionals; clerical support workers; skilled agricultural, forestry, and fishery workers; craft-related trades workers; plant and machine operators and assemblers; elementary occupations; and the armed forces (18). One of the strengths of occupational data is that it is often available in routine 3 data sources, including census data. However, the drawbacks include that occupation cannot be readily assigned to people currently unemployed (19). Income Income is a straightforward indicator of material resources. Individual and household income are two variables often used in research to describe a person’s SES. Individual income reflects an individual’s earning ability, while household income captures the living standards and life chances household members experience by sharing goods and services (20, 21). The most typical income-based indicators are a household’s total cash income, measured over months and calendar years. Nonetheless, there are limitations to using household income since household members may have unequal access to household income (21). Moreover, accurately measuring family income through interviews or self-administrated questionnaires may be a difficult task due to several reasons, such as lack of information about spouse income or finding income to be a sensitive matter or the question being seen as intrusive. Therefore, the proportion of (informative) missing values may be higher than for education or occupation, making it difficult for the person completing the questionnaire to accurately report the income of all family members, which increases the likelihood of measurement errors (22). In addition, comparing income across populations and studies can be complex, as different studies collect different types of income (e.g., family disposable income vs. income from work only and net vs. gross). Education Information on education is often measured as years completed or formal schooling credentials (17). Education is commonly used as an indicator of SES in health inequalities research due to the general acceptance that it is easy to measure and normally fixed in early adulthood. Additionally, in most nations, education shapes individuals’ future occupational positions and earning potential (23). Once established, the level of education is almost not subject to change, making it less applicable than occupation and income when it comes to tackling important intervention questions (24). Indeed, people’s occupations may change throughout their lifetimes. However, their level of education likely remains relatively stable as they age. For instance, people’s occupations and incomes are influenced by their health and 4 market fluctuations. Thus, when they reach retirement age, their SES may change to that of a pensioner while their educational level remains constant (25). However, there are drawbacks to measuring education. Most societies have complex educational systems that have often changed over time. Therefore, education is difficult to compare over time since the percentage of people with higher education has increased, making age an important factor when investigating educational differences (26). Such cohort effects can be essential and should be considered in epidemiological studies (19). Furthermore, measuring the years of education or levels of attainment may contain no information about the quality of the educational experience, which can be crucial when considering the impact of education on health outcomes tied to knowledge, cognitive skills, and analytic abilities (19). Other indicators As interest in social inequality has grown, more social indicators have been used to investigate its association with health inequality because various health outcomes are not just associated with education, income, and occupation but with other sociodemographic indicators. The term “sociodemographic” refers to a combination of social and demographic indicators that define individuals within a specific group or population. Other commonly used indicators in epidemiology research are age (27, 28), sex (29, 30), race and ethnicity (31, 32), marital status (33, 34), residential status (35), geographic area (36, 37), and neighborhood deprivation/area-level-SES (38-40). Area-level characteristics are derived from individual-level or small-area data and can categorize regions along the spectrum from disadvantaged to prosperous. They also serve as a proxy for SES of those living in such areas (41). When comparing participants and non-participants, these indicators are often used for investigating epidemiological population-based cohorts. These diverse characteristics provide valuable insights about both individuals and their communities (42). In epidemiological research, lifestyle (43, 44), disease (45, 46), and health behaviors (47-49) are correlated with sociodemographic indicators. Often each indicator is related to aspects of socioeconomic stratification and may be more or less relevant to different health outcomes at different stages over the life course. Hence, these indicators depend on the aim or research question and which indicator is most suitable to represent an individual’s SES. 5 1.3 Educational inequality in health Extensive epidemiological research over the decades, has established that education is strongly correlated with health. Individuals with higher educational attainment live healthier and longer lives compared to individuals with lower education (50). The general pattern of increasing education being related to better health is referred to as the “educational gradient” (51). Education shapes lives and is critical to lifting people out of poverty and reducing socioeconomic inequalities (52). With decreasing education, individuals may experience more unfavorable health across a range of outcomes, including depressive symptoms (53), cardiovascular risk factors (54-56), and poor self-rated health (57). Moreover, education attainment is associated with health literacy and has been hypothesized to be on the pathway between education and health (58). Indeed, poor health in early life can limit education in adulthood. However, concerning occupation and income changes that can lead to reverse causation, education level has the advantage of being unlikely to be influenced by adverse health conditions in adulthood (59). Despite perceptions of education as a straightforward measure for social epidemiological purposes, the relationship between education and health is complex, with underlying mechanisms, including psychosocial, material, and behavioral mechanisms and pathways (23). Hahn and Truman (60) constructed a model indicating three major pathways linking education and health outcomes in adulthood with knowledge, problem-solving skills, emotional awareness, self-regulation, personal values, and interactional skills. The first pathway is the psychosocial environment, which includes a sense of control, social standing, and social support, reflecting and bolstering capacity and agency. The second pathway is work, through which individuals may achieve satisfaction and income while gaining access to many health-related resources. Third, healthy behavior may protect an individual against health risks and facilitate negotiation with the healthcare system (60). The knowledge and skills attained through education may affect people’s cognitive functioning, making them more receptive to health messages and more able to communicate with and access appropriate health services (19). Education may improve the ability to navigate the healthcare system and obtain access to optimal care (61), and facilitates the understanding of therapeutic measures (62), resulting in better compliance and a higher commitment to treatment. Patient adherence to medication treatment has been associated with 6 educational attainment since it contributes to understanding medication’s effects, side effects, and usage (63). 1.4 The education variable in epidemiologic research In epidemiological research, education is often used as a proxy for SES because it believed to capture individual’ knowledge-related assets. Education is often used as an exposure (64), covariate (65), or mediator (66) in statistical analyses. Thus, an inaccurate measure of education can affect the study results in several ways. First, introducing errors in estimates may result in wrongful claims of associations. Second, if education is used as an exposure, the results may not accurately reflect the true effect of the exposure on the outcome of interest. Lastly, if education is used as a covariate or mediator, it can lead to reduced precision and potential confounding. It can also introduce bias in the statistical adjustment that may not accurately account for the true association between variables, leading to inaccurate or misleading results (67). Hence, education must be accurately measured and transparently reported since it can have significant implications for the interpretation of research findings and the development of public health policies. Pharmacoepidemiology is “the study of interactions between drugs and the human population, investigating, in real life conditions of life, benefits, risks and use of drugs” (68). The pharmacoepidemiologic literature has demonstrated an association between SES and medication treatment (69). Cardiovascular disease (CVD) is the leading cause of death globally and the second leading cause of death in Norway after cancer (70). In Norway, high total cholesterol ranked as the fourth most crucial risk factor for mortality in 2016 (71). Use of medication is an important and critical intervention to prevent and treat CVD. Moreover, LLDs help control elevated levels of different lipid forms in patients with hyperlipidemia (72). Understanding the efficacy and potential side effects of medication treatment has been associated with education (73, 74). Specifically, educational gradients in drug consumption have been reported to be similar to socioeconomic gradients in disease prevalence and incidence. For instance, a Swedish study reported an educational gradient in dispensed drugs, with lower levels of education having the highest odds of being prescribed drugs related to CVD, such as antihypertensive drugs and LLDs (73). Moreover, low SES has been positively associated with nonadherence (75, 76), which can widen the social differences in LLD treatment. On the other hand, some researchers have suggested that LLDs reduce the 7 educational gradient (56, 77), indicating the importance of medication treatment in reducing social health inequality. 1.5 Education in population-based studies Population-based studies play an essential role in enhancing our understanding of the association, the causation and the prevalence and incidence rates while helping to identify risk factors (78). Results from population-based studies provide wide spectrum of knowledge on various health outcomes (79-82). Moreover, providing knowledge that has implication for health policies. Hence, achieving sufficient participation to ensure that the sample accurately represents the population under investigation is of importance. Participation in research is voluntary, and there have been reported that individuals with lower education participate less in population-based studies (83-85). Systematic differences between participants and non- participants, can challenge the validity and generalizability of the results. Investigating the characteristics of non-participants can provide valuable insight and enable the development of strategies to increase participation among specific groups, and not just only educational groups. However, research on non-participation in population-based studies is often limited due to the high cost, for example, of linking to a registry. Nonetheless, some countries have investigated non-participation characteristics. Certain characteristics (e.g., being female, being married, having a high education and income, owning a home, and residing in a privileged neighborhood), have been extensively documented in the literature as factors influencing participation in population-based studies (86-89). Individuals with these characteristics are more likely to accept invitations to participate in health studies. For instance, individuals with higher education tend to have greater trust in science and are more motivated to contribute to health-related research (90), while the reason for lower participation among those with lower education was for declining health (91). Moreover, measures of participants’ education have played a crucial role in population-based studies. They have described the population while providing a critical explanatory variable for diverse health-related outcomes (92-95). Self-reported questionnaires are commonly used in population-based studies to collect data. While they provide valuable participant information, self-reported data are prone to various biases and can potentially yield less reliable findings (96). The feeling of overwhelming with comprehensive questionnaire and uncertainty of meaning of the questions among participants 8 with lower education have been reported (97). Given that the self-reported questionnaire targets an entire population, including individuals from diverse SES backgrounds, these instruments must be designed to be easily comprehensible. A common risk arises when a question intended to measure a particular aspect fails to do so due to the complexity of the phrasing or the participant’s misunderstanding of the question. Regardless of whether participants intentionally or unintentionally self-report wrong personal information, it impacts the questionnaire’s validity and introduces measurement error, which can impact the study’s internal validity, making it more difficult to establish a clear cause- and-effect relationship. Therefore, validity (i.e., data accuracy) is determined by comparing self-reported data to registry data; however, this method requires comprehensive registry data that allows linkage, which is limited in most countries (98). Indeed, validation studies of self- reported education have been previously conducted in countries like Switzerland (99), the UK (100), and the US (101). However, to my knowledge, these studies have not used a population-based data. This investigation is important to contextualize the accuracy of self- reported education within a population-based setting. 1.6 Rationale and aims There is a common understanding in the literature that higher education attainment has fewer risk factors related to CVD, this information are often provided by population-based studies. Yet, what if the information available from these studies regarding education and health is comprised by bias? This raises questions about the extent to which these biases might affect the interpretation and implication of research findings. The overall aim of this thesis was to investigate how the Tromsø Study reflects the target population and produced valid education information. Moreover, the study sought to determine how these findings affected the longitudinal education gradient in cholesterol levels, especially in the context of LLD use. The specific aims are delineated in three papers, listed below: Paper I: A comparison of the sociodemographic characteristics of participants and non- participants of the seventh survey in the Tromsø Study (Tromsø7, 2015-16), a population- based health survey. 9 Paper II: An investigation of the completeness and correctness of self-reported educational level in the Tromsø Study, using data from Statistics Norway (SSB), and exploring the consequence of using these two data sources on educational trends in cardiometabolic diseases. Paper III: Investigate the longitudinal educational gradient in total cholesterol levels and whether this is affected by the use of LLD. 10 2 Materials and Methods 2.1 The Tromsø Study The Tromsø Study is an ongoing, longitudinal, population-based cohort study covering a broad range of health problems and diseases. Tromsø is the largest urban area in Northern Norway, and the Tromsø municipality is the eighth largest municipality in the country, with approximately 75,000 inhabitants (102). There was a rapid increase in CVD mortality in Norway from 1951 to 1970. Especially the northern areas had a very high prevalence of CVD mortality. Researchers at the newly established University of Tromsø started in 1974 a population study aiming at identifying the major CVD risk factors and investigate why the Northen Norway had a very high CVD mortality rate compared to others (103). Since then, the Tromsø Study has expanded to cover many conditions and purposes. Seven surveys, referred to as Tromsø1–7, were conducted between 1974 and 2016. Tromsø8 is planned for 2025–2026. The Tromsø Study has been extensively described in cohort profiles (104-106), and on the Tromsø Study homepage, www.tromsostudy.com. Individuals with registered home addresses in Tromsø municipality meeting the age criteria (Table 1) have been recruited to the Tromsø Study. The Tromsø Study’s data collection has included questionnaires and interviews, measurements, biological sampling, and clinical examinations. From Tromsø4 onward, more extensive clinical examinations were added, and each survey included two visits. This thesis includes participation from Tromsø4 (1994–1995), Tromsø5 (2001), Tromsø6 (2007–2008), and Tromsø7 (2015–2016). The English translation of the full questionnaires is available on the home page. Who was invited? The first visit consisted of a basic examination of the total study sample invited to participate. The data collection included questionnaires with several examinations, such as blood pressure and pulse measurements, anthropometric measurements, pain sensitivity tests (Tromsø7 only), and the sampling of blood, urine, hair, nose, and throat swabs. Questionnaires included previous and present diseases, symptoms, and health problems. Moreover, they inquired about the use of drugs, lifestyle choices, social and psychological functioning, and SES (106). The second visit included a pre-defined subsample of the total invited sample, which was different from Tromsø survey to Tromsø survey. The data collection included additional 11 biological sampling, physical function tests, cognitive tests, 12-lead ECGs, echocardiology, carotid artery ultrasonography, echocardiography, lung function tests, eye examinations (Tromsø6-7 only), and DXA scans (Tromsø6-7 only) (104). We did not use information from the second visit in our articles in this thesis. Table 1. Overview of surveys in the Tromsø Study this thesis is based on Survey Year Invited Participants Age Attendance Tromsø4 1994–95 37,558 27,158 25–97 72% Tromsø5 2001 10,353 8130 30–89 79% Tromsø6 2007–08 19,762 12,984 30–87 66% Tromsø7 2015–16 32,591 21,083 40–104 65% Tromsø4 Tromsø4 was conducted during 1994–1995 as the largest of the Tromsø Studies. In Tromsø4, all inhabitants 25 years and older were invited to participate (N = 37,558), and 27,158 women and men participated (72% attendance). All participants received a brief questionnaire accompanied by an invitation. Everyone in the 55–74 age group, along with 5–10% in the 25– 54 and 75-85 age groups, was invited to undergo an extensive second examination (107). Tromsø5 Tromsø5 was conducted in 2001. Two groups participated: those who participated in the second visit of Tromsø4 and a small group that the Norwegian Institute of Public Health (NIPH) wanted to investigate as part of its nationwide health study (the NIPH panel; n = 1,916). The attendance rate was high among individuals who participated in Tromsø4 (89%) but somewhat lower for individuals in the NIPH panel (57%). All invited received a three- page questionnaire accompanied by the invitation. Different questionnaires were sent to people under 70 and those 70 years and older. All participants received another questionnaire for the health survey by Troms and Finnmark (2001–2002), which they were asked to complete (108). Tromsø6 Tromsø6 was conducted during 2007–2008, and four groups were invited: all residents aged 40–42 and 60–97 years, a 40% random sample of subjects aged 43–59 years, a 10% random sample of subjects aged 30–39, and subjects who attended the second visit in Tromsø4. New 12 questions were also introduced in Tromsø6 concerning household income and occupation status (105). Tromsø7 Tromsø7 was conducted during 2015–2016, and all inhabitants 40 years of age or older were invited to participate (N = 32,591). All invited participants received a personal invitation by mail, including an invitation letter, an information brochure, and a four-page questionnaire (Q1) in paper format. The invitation letter included a username and password for completing the questionnaire online (106). 2.2 Measurements of educational level The Tromsø Study In each survey, education was assessed by self-report, but the question was asked differently in each survey. For instance, Tromsø4 and Tromsø6 inquired about education as follows: What is the highest level of education you have completed? 1. 7–10-year primary/secondary school, modern secondary school 2. Technical school, middle school, vocational school, and 1–2 years of senior high school 3. High school diploma (3–4 years) 4. College/university, less than 4 years 5. College/university, 4 or more years. In Tromsø5, education was assessed with the question, ‘How many years of education have you had?’. Participants wrote down their years of education. In Tromsø7, inquired about education as follows: What is the highest level of education you have completed? 1. Primary/secondary education (up to 10 years of schooling) 2. Upper secondary education (a minimum of 3 years) 3. Tertiary education, short: college/university less than 4 years 4. Tertiary education, long: college/university 4 years or more. 13 Statistics Norway The Norwegian Standard Classification of Education (NUS) was initially prepared for SSB in 1970 and subsequently revised in 1973, 1989, and 2000 (NUS2000). NUS, consisting of nine levels (0–9) along with a value for an unspecific level, was used for grouping people’s education activities and education background (Table 2). The level classification was meant to provide the best possible picture of the structure of the Norwegian education system (109). In the NUS2000, the digit “9” refers to “other” or an “unspecified” level, with a broad field, narrow field, detailed field, and individual educational programs. Table 2. The Norwegian Standard Classification of Education as of 2023 (109). SSB regrouped the NUS2000 codes for our study before linking the Tromsø data with theirs. The new codes were as follows: no education, primary education, upper secondary education, vocational education, university/college education, short university/college, long university/college, and unspecified/missing. Individuals in the category of no education were mainly immigrants who self-reported no education and Norwegians who did not pass primary education before the 1980s and did not continue with any education afterwards (110). Individuals registered with unspecified education were those for whom the SSB did not have educational information. 2.3 Statistics Norway – national institute of Norway SSB, established in 1876, is the national statistics institute of Norway and the main producer of official statistics. It is responsible for collecting, producing, and communicating statistics related to the economy, population, and society at national, regional, and local levels. SSB 14 contributes to statistics on sociodemographic variables in the population. Since 1964, with the introduction of the Norwegian personal identification number for all citizens, it has been possible to identify individual characteristics with high accuracy in SSB’s database (111). While aggregated data is publicly available, person-specific data can be purchased upon request. In this study, paper I and II were linked to SSB, and both study periods were from 2015 to 2016 (Tromsø7). More detailed information on SSB variables used in this thesis appears in Sections 2.4. 2.4 Study samples and variables Paper I Paper I’s purpose was to compare the sociodemographic characteristics of participants and non-participants. A linkage using the unique 11-digit personal identification number of all invited to Tromsø7 (N = 32,591) was created by SSB so that we could assess the information of non-participants. All variables in the analyses were collected from SSB; none were self- reported. We gained information on sociodemographic variables normally included in epidemiology research: age, sex (women/men), marital status (married, unmarried, widow(er), and divorced/separated), country of birth included Norway, Western countries (Western Europe, North America, and Oceania), Eastern Europe (including Russia), and other countries (Asia, Africa, and South America). The regions of birth in Norway included Tromsø, Northern Norway (Finnmark, Troms, and Nordland), and South Norway (counties south of Nordland). We were mainly interested in the northern area, which is why all counties south of Norway were grouped as one. Education was defined and grouped as in Tromsø7. Residential ownership was defined as an owner (freeholder or parts-/shareholder) or renter. Income was from individual work, the sum of employee income, and net income from self-employment earned during the calendar year. Cash for care and parental benefits were included. Household income was after-tax income calculated as the sum of wages and salaries, income from self-employment, property income, and transfers received minus the total assessed taxes and negative transfers of the households. Because of few observations, the lowest and highest income categories in our dataset were merged with the second lowest and second highest respectively, for both individual and household income, leading to six categories:< 249,999 NOK, 250,000–349,999 NOK, 15 350,000–449,999 NOK, 450,000–549,999 NOK, 550,000–749,999 NOK, and ≥ 750,000 NOK. Residential ownership status, education, and household and individual income were needed to create two additional variables: individual SES and area SES. More information on how individual- and area-specific SESs were constructed appears in Section 2.6. Paper II Paper II included participants attending Tromsø7 (N = 21,083) to investigate the validity of self-reported education from a population study linking Tromsø7 data to SSB, using the unique 11-digit personal identification number assigned to each resident of Norway at birth or immigration. The education variable we received from SSB is the regrouped NUS2000 code contained seven categories, as mentioned in section 2.2 – Statistics Norway. We excluded those who did not self-report their education in Tromsø7 (n = 369), lacked information about education in SSB (n = 80), and those specified as having “no education, unspecified, and preschool education” (n = 19). No education, unspecified and preschool education, were grouped as 0 in the SSB dataset. The final sample included 20,615 participants from Tromsø7. Upper secondary and vocational education were merged to compare the self- reported education variable in Tromsø7 with SSB. Variables in paper II included information about age and sex, self-reported information about myocardial infarctions, cerebral strokes, angina pectoris, and diabetes (i.e., “Do you have or have you had a myocardial infarction/cerebral stroke/angina pectoris/diabetes?” Answers included “No,” “Yes, now,” and “Yes, previously.”). Self-rated health status was a holistic reflection of the person’s disease burden and mental and social conditions (i.e., “How do you generally consider your health to be?” Answer alternatives included “Very bad,” “Bad,” “Neither good nor bad,” “Good,” and “Excellent.”). Differences in SES misreporting were studied previously (112). We were curious if health status and self-rated health status were associated with education misreporting, so they were included in the analyses. We also included variables asking if participants were living with a spouse (i.e., “Do you live with a spouse/partner?” Answer choices were binary: “Yes” or “No”) since marital status was unavailable at the time of applying for variables. Moreover, information on household income was requested (i.e., “What was the household’s total taxable income last year, including income from work, social benefits, and similar sources?” Possible answers included “Less than 150,000 NOK,” “150,000–250,000 NOK,” “251,000–350,000 NOK,” “351,000–450,000 16 NOK,” “451,000–550,000 NOK,” “551,000–750,000 NOK,” “751,000–1,000,000 NOK,” and “More than 1,000,000 NOK”. Paper III Paper III’s purpose was to investigate the longitudinal educational gradient in total cholesterol levels and whether there is affected by the use of LLDs. Data were from participants who attended Tromsø4 in 1994–1995 and at least participated in one or more surveys conducted in 2001 (Tromsø5), 2007–2008 (Tromsø6), or/and 2015–2016 (Tromsø7). Participants who used LLDs (n = 241), were over 80 years of age (n = 5), had missing information on education (n = 103) in Tromsø4, and did not have a total cholesterol measurement in the surveys (n = 353) were excluded. Ending with 17,550 participants between the ages of 25 and 78. Baseline information on education was compiled using a self-reported questionnaire. Use of LLD was assessed through questionnaire (“Yes” or “No” in Tromsø4-Tromsø7) and written list of brand-named medicines used regularly during the preceding four weeks and participants could bring the medications with them to the study center; those writing a brand name with ATC code C10 were included as LLD users. The questionnaire information was checked by health personnel at the examination site. Information regarding a history of stroke, myocardial infarction, and daily smoking was also assessed using a self-report questionnaire. The Tromsø Study data collection had trained personnel who performed all clinical measurements and blood sampling with similar procedures. Non-fasting venous blood samples were collected with a brief venous statis applied to the upper arm, released before venipuncture with the participant sitting. The serum total cholesterol concentrations were analyzed within 24 hours (laboratory ISO certification NS-EN ISO 15189:2012) at the Department of Laboratory Medicine, University Hospital of North Norway, Tromsø. Weight and height were measured at the examination site as well. Body mass index (BMI; kg/m2) was calculated as weight (kg) divided by the square of heigh (m). 2.5 Ethics The Regional Committee for Medical and Health Research Ethics (REK) and the Norwegian Data Protection Authority approved the Tromsø Study. All procedures performed were in accordance with the 1964 Declaration of Helsinki and its later amendments. This Ph.D. 17 project was approved by the REK North (reference 60845) and the Norwegian Centre for research data (now Sikt) (ref. 809230). However, in paper I, we asked for information on individuals who did not attend the Tromsø Study. Therefore, the project team conducted a data protection impact assessment (DPIA). The project was approved on 22.11.2021, by the Data Protection Officer at UiT the Arctic University of Norway. Paper I and II were not defined as health research by the REK North and were exempt from the requirement of study preapproval. For paper III, informed written consent was obtained from all participants in Tromsø4–7. 2.6 Statistical analyses Descriptive analyses are illustrated with means and standard deviations (SDs) for continuous variables or as numbers and percentages for categorical variables. All analyses in paper I and II were performed in Stata 16.0 (StataCorp, College Station, TX, USA). Paper III used Stata MP 17.0 (StataCorp LLC, College Station, TX, USA). Paper I In paper I, sex-specific binary logistic regression was used to estimate odds ratios (ORs) and corresponding 95% confidence intervals (CIs) of participation in unadjusted and age-adjusted models. We additionally adjusted for individual-level SES on area SES level. To illustrate differences in participation, we created a sex-specific forest plot across levels of each coefficient. We used z-scores to create an index for individual SES and area SES. Z-scores allow taking data points from populations with different means and standard deviations to place them on a common scale (113). Individual-level SES was calculated based on educational level, individual income, total household income, and residential status. For each of these four variables, a z-score was calculated and summarized to give an individual-level SES score. We also created a geographical SES index based on 36 geographical subdivisions of the municipality of Tromsø defined in a local public health report (114). The geographical SES index was calculated as the average individual-level SES score in each geographical subdivision, resulting in a continuous variable ranging from –1.73 to 1.24. We then categorized the subdivisions as low- SES, medium-SES, or high-SES areas based on tertiles using the command “xtile” in the statistical program Stata. Participation in Tromsø7 within each of the 36 geographical 18 subdivisions was also divided into tertiles: low (59.3%), medium (66.7%), and high (68.5%), and the spatial distribution of SES areas and participation in the 36 geographical subdivisions were graphed using choropleth maps created in Python 3 (mainly using pandas, geopandas, and Plotly express packages). A GeoJSON file was collected from the Norwegian Mapping Authority (115), while a base map from OpenStreetMap (116) was used. Paper II Paper II assessed the validity of self-reported education by linkage with SSB-recorded educational level as the comparator or “gold standard” to calculate sensitivity and positive predictive value (PPV). We did not include specificity and negative predictive value because the focus was to investigate the accurately of self-reported data and sensitivity and PPV addressed this aspect. However, those two terms are normally used in clinical diagnostic, since the aim of paper II is not related to diagnostics, we used terms that are more accurate for this setting: completeness for sensitivity and correctness for PPV. Completeness measured the proportion of recorded observations actually recorded in the gold standard data source, while correctness measured the proportion of recorded observations in the registry that were correct (117). Agreement between self-reported and SSB-recorded educational levels was measured by the percentages observed in Cohen’s kappa agreement and weighted kappa, two standard tools for describing the degree of agreement between two observers on a categorical scale (118). When study results can be expressed in more than two categories, weighted kappa is the most appropriate (119). Weighted kappa coefficient and kappa agreement were interpreted as proposed by Viera and Garrett (120) (less than chance: < 0.00, slight: 0.00–0.20, fair: 0.21–0.40, moderate: 0.41–0.60, substantial: 0.61–0.80, and almost perfect: 0.81–1.00). Multinomial logistic regression was further used to calculate ORs of over- or underreporting educational levels, reported using 95% CIs. Comparisons between self-reported and SSB- recorded educational levels were stratified by sex and age group (40–52, 53–62, and 63–99 years). These age groups were constructed after considering the school reform of 1959, which made seven years of primary education mandatory. Those who started primary school in 1959 were 63 years old in Tromsø7. The 53–62 age group was constructed to reflect another school reform in 1969, where nine years of primary schooling became mandatory in Norway. We considered that 63–99 was a wide age range, but we decided not to split the age group because the oldest age group would become too small. Logistic regression models were used 19 to estimate ORs of self-reported cardiometabolic diseases in Tromsø7 according to self- reported and SSB-recorded educational levels. A randomization test with 10,000 permutations of the data file was used to compare trends (i.e., the categorical educational level variable modeled as a linear term) between self-reported and SSB-recorded educational levels. This analysis was conducted in R version 4.2.2 (2022-11-01 ucrt). Paper III In paper III, we used linear mixed models to estimate mean cholesterol according to baseline education, survey of the Tromsø Study, and LLD use. The main models were adjusted for baseline age. They included indicator variables for baseline education, survey time, LLD use, and all two- and three-way cross-products between the indicator variables. A random intercept on the participant level was included to control for repeated observations within each subject. In separate models, we tested for a linear trend over education by modeling education as a continuous variable in the aforementioned models. We also calculated the change in the coefficient of total cholesterol from Tromsø4 to Tromsø7. We additionally performed sex- specific analyses in each of two baseline age groups: 25–49 and 50–78 years. A sensitivity analysis was conducted among the participants who attended all four surveys (i.e., completely observed) to determine if the results were consistent with the principal analysis. Figures in the article was created in Excel. 20 3 Results – main results of papers 3.1 Paper I Comparing the sociodemographic characteristics of participants and non-participants in the population-based Tromsø Study The aim of paper I was to compare the sociodemographic characteristics of participants and non-participants of the seventh survey of the Tromsø Study (Tromsø7, 2015–2016). A total of 32,591 individuals over 40 years old were invited to participate in Tromsø7, 21,083 (65%) participated. The characteristics of non-participation were men between 40–49 and 80–99 years with a marital status other than married who were foreign-born individuals and residential renters. Moreover, they had lower education and income and lived in low-SES areas. 3.2 Paper II Validity of self-reported educational level in the Tromsø Study. The aim of paper II was to investigate the completeness and correctness of self-reported educational level in the Tromsø Study using data from SSB to explore the consequences of using these two data sources on educational trends in cardiometabolic diseases. A total of 20,615 participants over 40 years old were included in this study. Overall, we found that the self-reported educational level in the Tromsø Study was adequately complete and correct for research, with fair weighted kappa values in all age groups for women and men. The completeness of self-reported educational level was very high among those with a college/university education ≥4 years (≥ 97% in all age groups and both sexes). High completeness (67-92% in all age groups and both sexes) among those with primary education. Low correctness was also found in both of these educational groups, with 29-62% for college/university education ≥4 years and 48-67% for primary education. The highest degree of underreporting was among those with SSB-recorded upper secondary educational levels and self-reported primary educational levels, while the highest degree of overreporting was among those with SSB-recorded college/university education of less than four years and self-reported college/university education of equal to or greater than four years. Women were more likely to overreport or underreport their educational level than men. 21 Unfortunately, the weighted kappa coefficient in the supplementary Table 1 and 2 in paper II, is Cohens kappa coefficient. Coefficient in the paper was reported as “fair” for all age groups and both women (0.41, 0.48 & 0.51) and men (0.52, 0.53 & 0.59). The right weighted kappa coefficient for women (0.66, 0.65 & 0.60) and men (0.72, 0.69 & 0.68) (appendix 2 & 3) indicating moderate to substantial as proposed by Viera and Garrett (120). Educational trends were found in the risk of self-reported cardiometabolic diseases when using self-reported and SSB-recorded educational levels. The educational trends in cardiometabolic diseases were strongest pronounced when using the educational levels as found in the registry, and less pronounced when using self-reported educational levels. However, the way self-report from questionnaire and SSB measure education is not a 100% overlapping. 3.3 Paper III: Longitudinal cholesterol trends across educational groups: The influence of lipid- lowering drugs in the population-based Tromsø Study 1994-2016. The aim of paper III was to investigate the longitudinal educational gradient in total cholesterol levels and whether this is affected by the use of LLD. Among untreated women, the cholesterol level increased from 1994-95 to 2015-16 (0.33-0.48 mmol/L), except for women with primary education (-0.12 mmol/L). Cholesterol levels also decreased among untreated men (-0.40 to -0.06 mmol/L), but more extensively among treated women (-1.88 to -1.35 mmol/L), and men (-2.21 to -1.84 mmol/L) across all educational groups. The treated and untreated participants with the lowest educational levels were generally associated with the largest decreases in cholesterol levels. At baseline, we observed a significant inverse association between education and cholesterol levels among the untreated. This educational trend attenuated over time and disappeared in 2015-16, no association between education levels and cholesterol levels was observed, among women or men or among untreated or those who became LLD users. LLD users experienced a more substantial decrease in cholesterol levels over time compared to non-users. The educational gradient in cholesterol levels observed among LLD non-users was not apparent among users. LLD treatment may reduce the social disparity associated with cholesterol management. 22 4 Discussion The overall aim of this thesis was to investigate how the Tromsø Study reflects the target population and produced valid education information. Moreover, the study sought to determine how these findings affected the longitudinal education gradient in cholesterol levels, especially in the context of LLD use. 4.1 Discussion of methodology This chapter presents the methodological considerations of this thesis and discusses bias types and their implications. 4.1.1 Data considerations Official statistics and administrative data offer a broader and potentially more accurate information source for population estimates (121). However, population-based studies often provide more comprehensive information, allowing researchers to collect a broader range of data directly from individuals. For instance, the Norwegian Prescription Database can provide information on prescribed drugs only, while population-based surveys can provide information about both prescribed and over-the-counter drugs. SSB gathers information about Norway’s population from various reliable sources, and quality requirements for Norwegian official statistics are described in the Statistics Act (111). The education attainment of the Norwegian population includes residents in Norway aged 16 years or older by the end of the calendar year (122). In general, missing data on education is minimal (about 3%), and most is from immigrants (123). SSB collects sociodemographic data from various administrative sources, such as the Norwegian Tax Administration, the National Diploma Database, tax registers, and other administrative sources. However, data collection and processing errors are unavoidable and include coding errors, revision errors, and data processing errors. SSB states that comprehensive efforts have been made to minimize these errors, and SSB now regards these types of errors as relatively insignificant (124). SSB does not have educational information on completed education by Norwegian-born individuals abroad between 1980 and 1986. In 1992, data on Norwegian-born individuals who studied abroad between 1986–1992 and received financial support were obtained from the State Educational Loan Fund (Lånekassen). There was some missing information on education from SSB in paper I and II (0.5%). However, we do not know if this missing 23 information was from Norwegian borns or immigrants. SSB has attempted to overcome this issue by collecting information on immigrants’ educational levels through surveys and imputation techniques. However, as in the Tromsø Study, non-response among immigrants has been high, making it difficult to check the accuracy of the information (125). 4.1.2 Study design The aim of paper I and II was to compare participants and non-participants and validate education variables in Tromsø7. Using cross-sectional data made it possible to compare and validate data collected at a specific time with SSB’s data. A longitudinal design was also an option to follow participants and non-participants over a period to understand how their characteristics changed over time and how they influenced participation and reported education. However, cross-sectional analysis seemed appropriate for these two papers due to the efficiency of comparing and validating existing data at a single time point. Paper III used an observational longitudinal design, which allowed for describing time trends in population subgroups, such as cohorts. The design was appropriate for the aim of paper III, which was to investigate the educational gradient with the same participants over time. The design, together with the statistical method of linear mixed models described below (Section 4.1.8), was considered an appropriate approach to investigate the educational gradient in cholesterol levels over time. 4.1.3 Internal validity A valid study is equivalent to an unbiased study, a study that, based on its design, methods, and procedures, will produce overall results close to the truth (119). The validity of a research study includes two domains: internal and external. Internal validity examines how the study was designed, conducted, and analyzed satisfactorily to answer the research question (126). The internal validity can be threatened by three types of bias, selection bias, information bias and confounding. These biases could also affect the external validity. External validity is defined as “the degree to which results of a study may apply, be generalized, or be transported to populations or groups that did not participate in the study” (127). 4.1.4 Selection bias I have in paper I reported and discussed groups that are underrepresented in the Tromsø Study, which could have introduced selection bias. Therefore, I do not discuss the characteristics of non-participants in this section, but other factors related to selection bias. 24 Selection bias is defined as “bias in the estimated association or effect of an exposure on an outcome that arises from the procedures used to select individuals into the study or the analysis” (127). Several concerns arise in cases of high proportions of non-participation, for instance, if there are systematic differences between participants and non-participants concerning sociodemographics or health. The participation rate in the Tromsø Study was relatively high (Table 1), while other population-based studies have experienced a decrease in participation (128). Hopstock et al. (106) believed that continuous participant dialogue, strong community ties, and researcher involvement were key to maintaining participation in the Tromsø Study. The participation rate can affect selection bias if there is a systematic difference in who chooses to participate. Participation in the Tromsø Study is voluntary, so potential bias due to differences in non- participation is unavoidable. Participants in population-based studies tend to have more favorable health statuses (129), which could lead to them not being representative. Higher- educated women were more represented in our study. However, in a validation study, despite a slight overrepresentation of women with higher education, no significant source of selection bias was observed that could seriously invalidate the estimation of population-attributable risk (130). From Tromsø4 onward, participants from the previous survey were invited to follow up, and new participants were invited to reduce the risk of selection bias. However, the response rate has been consistently higher among repeat than new participants, which could lead to selection bias because of the health profile differences between participants and non- participants (87, 131). Since we included repeated measures in paper III, certain individuals may have been more motivated to participate in repeated surveys, potentially introducing self- selection bias into its sample. In paper III, only participants attended Tromsø4 and had certain criteria that have been described in section 2.4, was included in the analyses. New participants after Tromsø4, was not included in our sample in paper III. One of the key purpose of epidemiological research is the measurement of disease outcomes in relation to a population at risk, understanding risk factors and observing potential changes in these factors (132). This could be achieved with cohort studies, however, re-inviting previous participants for longitudinal follow-up must be considered to allow for a more comprehensive examination of health- related trends, unless data could be retained from other sources such as registers. 25 4.1.5 Recall and information bias Self-reported data are prone to bias since individuals intentionally or unintentionally misreport information due to social desirability or recall bias (96). Systematic error due to differences in accuracy or completeness in recalling past events or experiences is called recall bias (127). Paper II had 561 participants over 80 years old, so recall bias could have applied since these participants had a potentially longer time after completing their education. I performed a multinominal analysis by splitting the oldest age groups, 63-79 and 80-99 to investigate if the older age groups 80-99 years are affected by recall bias (data not shown). The results showed that 80-99 years are less likely to overreport (OR 0.59, 95% Cl 0.43 – 0.82, 40-52 years old ref. group), and they underreport almost the same as the 63-79 (OR 3.01 95% Cl 2.64 – 3.45, 80-99 years OR 3.32, 95% Cl 2.56 – 4.09). When stratified by sex, women underreported their education four times more, while men underreported two times more than the reference group, indicating potential recall bias among older age groups. Recall bias may apply more to the oldest age groups, as other studies also found participants’ age significantly associated with misreporting (99). Moreover, the options in the Tromsø7 questionnaire could possibly not be recognizable for the oldest participants in our study, as educational system has changed over time. Furthermore, those who finished primary education long time ago, could have self-reported correct from when they finished, but since the educational system has changed throughout the years, SSB could place them in upper secondary, which does not necessary mean that they self-reported wrong. More to this issue is added under section 4.1.6 Confounding. In paper III, we used self-reported LLD use, which is also prone to recall bias. However, in a population-based study from Netherland linked with the drug registry, kappa values of LLDs were found to have very good agreement (0.81) (133). Moreover, self-reported use of drugs taken regularly over time reflected chronic exposure and was found reliable (134). The main weakness in our study is that we do not know the exact time when the participants started with LLD treatment. We only know their status when they attend each survey and self-report whether they are user or not. Information bias is the systematically inaccurate measurement of the exposure or outcome variables, which produces misclassification bias (78). Misclassification bias is a systematic error occurring when an individual is assigned to the wrong category, which can occur at any stage of the research process (135). Comparisons between correctly classified and 26 misclassified information can be informative, but such data are not always available or are expensive. Moreover, inaccuracy of education information of the population have been reported in Brazilian censuses (136). Exposure misclassification was observed due to misreport of educational level in paper II, which could have led to weaker association between education and cardiometabolic diseases. A Danish study (137) reported similar findings as ours. Despite some inconsistencies, the Danish authors concluded that their data were valid for socioeconomic analyses. A validation study supported this finding by reporting that while surveys and registry measures measure education differently, both data showed an association with the outcome variable (138). This finding was confirmed in our study. Even though the overall weakening of the association between self-reported education and cardiometabolic diseases in paper II was weak, we still found an association, which was confirmed using SSB data. 4.1.6 Confounding Confounding occurs when a third variable is related to the exposure and the outcome. Therefore, if the confounding variable is not properly controlled, then the estimated effect of the exposure may be biased (127). Confounders can be handled in the design of a study or in the statistical analysis by stratification or adjustment to attain valid results (139). Age and sex were considered potential confounders in all three papers because they had the potential to influence the exposure and the outcome. Not only is the risk for high cholesterol levels associated with age, but our papers also consisted of participants from 40 to 99 years old, 25 to 78 years old in paper III, which reflected that the meaning of educational levels and the implications of educational achievements vary among different birth cohorts. Comparing the educational levels of participants aged over 80 years to those aged 40 years was complicated since they were educated under different circumstances, with varying access to educational opportunities beyond compulsory education (23). Therefore, since we had participants from several birth cohorts, adjusting for age is necessary. In the literature, sex differences in health behavior (86, 140) and outcome (CVD) (141) have been reported. In paper I, age and sex were considered confounders and were handled by adjustment and stratification. We did not adjust for other potential confounders since the main goal was to examine the effect of each sociodemographic characteristic and its contribution to participation. The only exception was that individual-level SES was adjusted to understand the influence of area SES on participation. 27 In paper II, we stratified after sex and school reform in 1959 and 1969 and adjusted for age. In paper III, the confounders, centered age and sex, were handled by stratification and adjustment in the regression models. There is, however, always potential of not adjusting all possible confounders, and some confounders could be unmeasured and therefore not accounted for. We did not include potential confounders like BMI because the specific aim was to investigate the presence of an educational gradient in total cholesterol levels over time and the impact of LLDs. Thus, including BMI would have broadened the scope beyond our intention with paper III. The use of directed acyclic graphs (DAGs) in conjunction with the existing literature helped me clarify the factors involved and how they fit into a conceptual causal model (78). Concerning paper III, a DAG is illustrated in Figure 1. However, as described in the Introduction chapter, the pathways from education to good health are complex, so causal pathways are often not fully understood, which is the case in paper III. For instance, there may be other unmeasured confounders that paper III did not consider contributing to understanding this complex pathway, such as health literacy and reimbursement policy. Figure 1. Directed acyclic graph of variables in paper III 4.1.7 External validity Other population-based studies, both national and international, have showed that individuals with lower education (142-144) and income (95, 145), men (87, 146), marital statuses other than married (89), and living in lower-SES areas or neighborhoods (129) were less likely to participate. These findings were supported in paper I, indicating a common challenge within population-based studies to recruit the same subgroups across countries. However, these studies were mainly in Western countries (i.e., Norway, Denmark, Finland, Belgium, the UK, 28 and Switzerland) (85, 87-89, 129, 144, 147) and one Chinese study (143). Therefore, the results of the present study have general validity corresponding to similar studies with the same social distribution, mainly in Western countries. Furthermore, immigrants were also underrepresented in our study. Indeed, immigrants are a complex group to recruit, so most population-based studies are underrepresented by them (88, 97, 148). Various explanations have been proposed such as, language barrier, participation decrease due to explicitly stating in the questionnaire that the identity of the respondent would be saved by the investigator, questionnaire’s complexity, respondent’ lack of identification with the question, and sensitive questions (97, 148). The proportion of participants with lower education and increasing use of LLDs in paper III corresponding with the national report by SSB (149), indicating that the study participants were representative of the Norwegian population but not low- and middle-income countries, where higher statin use is associated with higher education (150). If the generous reimbursement policy in Norway could explain the results in paper III (i.e., no educational gradient among LLD users), this effect may not be seen in other countries, who have different reimbursement policy. In Norway, reimbursement policy is available for all individuals living in Norway, regardless social status. 4.1.8 Statistical considerations Paper I We used logistic regression as it is an appropriate statistical method when the outcome variable is binary. Multicollinearity may be an issue when including several explanatory variables, but is not relevant for this paper, since variables were only included in univariate and bivariate analyses (with age). Therefore, multicollinearity between different indicators of SES was not relevant. Individual- and household income variables was not normal distributed in our data, so in addition to mean value in the paper, we also added results from interquartile range. The strength of using z-scores to calculate individual and area SES was the standardization and aggregation of the data. However, it was important that the selected variables adequately captured the SES of individuals. If not, the calculated SES score may not have accurately represented the true SES. The selected variables, education, individual and household income, and residential status, were considered to capture all SESs adequately. Furthermore, data from 29 SSB was used in all analyses to help avoid bias (i.e., social desirability and recall biases) while providing an accurate reflection of each individual’s SES. Paper II We used completeness and correctness to investigate the validity of self-reported education compared to the gold standard: SSB. Figure 3 explains how completeness and correctness were calculated similarly to sensitivity and PPV. Sensitivity and PPV are well-known validation tools for screening and clinical tests, and by using completeness and correctness in paper II was considered complementary because both measures were necessary for a complete understanding of the data’s accuracy. However, one weakness of using completeness and correctness was that there were no unambiguous cut-off points for low-, moderate-, and high- quality data. Nevertheless, most authors who have used these two terms have seemed to agree that sensitivity and PPV ≥ 80% are fairly good to good, while ≥ 90% is very good to excellent (151). Table 3. Calculation of completeness and correctness by Hogan and Wagner (117). Agreement between self-report and registry data was measured with Cohen’s kappa, which is frequently used to test interrater reliability, represents the extent to which the data collected in the study are correct representations of the variables measured (152). Kappa agreement is easily calculated and directly interpretable but cannot account for possible guessing by raters, possibly overestimating agreement (152). Weighted kappa was used to calculate the weighted kappa coefficient, as one of the purposes was to investigate agreement between two data sources. 30 Kappa can be weighted to reflect the degree of disagreement, which emphasizes larger differences between ratings over smaller ones. However, this reflection is impossible with unweighted kappa, which treats all disagreements equally. Hence, unweighted kappa is unsuitable for ordinal scales (153), so weighted kappa was chosen. Weighed kappa in Stata was calculated as follows: 1 − |𝑖 − 𝑗|/(κ – 1) Here, i and j index the rows and columns of the ratings by the two raters, while κ is the maximum number of possible ratings (154). Paper III To investigate longitudinal trends over 22 years (1994–2016) for the same participants, linear mixed models were considered a suitable statistical method. One of their advantages was the ability to uncover data relationships and patterns, even with missing values. Using linear mixed models, we assumed two assumptions: 1) the dependent variable’s (i.e., cholesterol level) residuals were normally distributed, and 2) the random intercept was included to account for variation in the baseline levels of the outcome variable among different educational groups. Residuals associated with random intercepts were conducted to ensure that the assumption of normally distributed and homoscedastic residuals was met. As shown in Figure 2, the residuals were normally distributed for cholesterol levels (A) and the random intercept (B). Figure 2. Histogram of residuals for the dependent variable (A) and the random intercept (B) 31 4.2 Summary of the results Differences in sociodemographic characteristics were observed among participants and non- participants, where men, younger (40–49 years old) and older (80–99 years old) individuals, individuals with a marital status other than married, foreign-born individuals, residential renters, individuals with lower education and income, and individuals living in low-SES areas were less likely to participate in a population-based study. Self-reported educational levels were adequately complete and correct but could have produced weaker associations in cardiometabolic diseases than registry data. LLD users experienced a more substantial decrease in cholesterol over time than non-users. The educational gradient in cholesterol levels observed before the introduction of statins was not apparent among LLD users in surveys of the Tromsø study conducted after statins were introduced. 4.3 Discussion of main results The main results were detailed discussed in paper I–III. Therefore, this section overviews this thesis’ contribution to knowledge in this field. I also discuss how the results from paper I and II could have affected the results in paper III, ending with a conclusion, implications, and further perspectives. Paper I The participation in the Tromsø Study varied according to sociodemographic variables (i.e., women ages 50–79, married, Norwegian-born, with lower education and income, residential owners, and living in high-SES areas) being more likely to participate. The findings in paper I agreed with other population-based studies in Norway that reported differences in participation’s sociodemographics. In addition to sociodemographic characteristics, the Tromsø Study also reported that living in low area SES is associated with increased CVD risk due to unhealthy lifestyle (155), the Hordaland Health Study (HUSK) reported that non- participants were associated with poorer health (156), while the HUNT study found higher mortality and a higher prevalence of several chronic diseases (87), and the Oslo Health Study (HUBRO) reported that non-participants received disability benefit to a greater extent than participants (88). These studies, including paper I, underscored the presence of health inequality in Norway since health is distributed differently among the Norwegian population. 32 Lower levels of education and higher CVD risk profile are often reported among non- participants (157), this could have led to underestimation of association estimates between education and cholesterol levels in paper III. However, HUSK (156) reporting that differences in sociodemographic profiles in participation did not seem to introduce bias in the association between the exposure and outcome, which was supported by another study (158). Since HUSK had a similar profile as the Tromsø Study, selection bias might not have affected the estimated association to a significant extend of the findings in paper III. Paper II In paper II, we reported that self-reported education was adequately complete and correct for research, but results should be interpreted with caution since they had certain limitations. In paper II, the results showed that those with upper secondary education underreported while those with college/university <4 overreported, leading to misclassification bias and weaker association in cardiometabolic diseases. One way to estimate more accurately is to dichotomize the self-reported education into low (primary and upper secondary) and high (college/university <4 years & college/university ≥4 years). By dichotomizing education, there may be a stronger association and more precise estimates of the results. However, dichotomizing may come at the cost of some insights within the low and high educational groups. For instance, one could not speak of a gradient if there were only two groups. In Tromsø7, education was based on duration, whereas SSB measures the highest completed degree. These differences in measurement approaches could explain the low correctness in the lowest and highest educational level found in paper II. However, would the accuracy improve if SSB and the Tromsø Study measured education precisely the same? The answer is uncertain since we must consider other biases, for instance, the influence of social desirability bias, which involves the tendency to report socially desirable or undesirable behaviors (159). This was confirmed by another validation study of education (99, 138). Moreover, since SSB only categorized completed degree, it might not reflect an individual’s true education achievement. In Norway, it’s common for people to take work-related courses over the years that don’t lead to a formal degree but are aimed at self-development and improving their professional knowledge. These courses may have been part of the self-reported education in the Tromsø7 questionnaire, but it does not match up with SSB-registry. The wording of the education question in the Tromsø7 questionnaire should also be discussed. The wording “4 years or more” was meant as the normative duration of tertiary 33 education in the Tromsø Study. However, this phrase could have been interpreted as years spent in education, although it was probably intended to mean education at the master’s level or higher. This could partly explain why women with college/university <4 years overreport their education, they could have been on maternity leave and been delay in education. The wording of the education question in questionnaires was also of concern in another Norwegian population-based study (160). As population-based studies have individuals with different knowledge of how to interpret questions, a clear question and perhaps a follow-up question could help to avoid confusion by specifying the completed education level. Simplified wording to make the question more understandable would be important for immigrants and older participants. Even though immigrants among the participants in Tromsø7 were few, differences in the country’s educational system could confuse them, resulting in their misreporting of their actual educational levels (101). Since the education system has changed over time and could have confused those who finished their education long time ago, adding extra questions could make the line of questioning less confusing. We observed underreporting among participants with upper secondary education and overreporting of college/university <4 years, this could had affected our results in paper III. I speculate that misreport of educational level observed in Tromsø7, also applies to Tromsø4. The results in paper III might be both underestimated due to misclassification of self-reported educational levels. If we were to use the education variable from SSB, the educational gradient might be more pronounced. Although misreporting of self-reported educational levels produced a weaker gradient compared to SSB data in this thesis, there was still an observable educational gradient. This finding suggests that self-reported education can be a valid indicator for SES when investigation the relationship between education and health. Paper III A general decrease in cholesterol was observed among non-users, however, even with a decrease, their cholesterol levels were still high (≥ 5 mmol/l). Randomized controlled trials of statins have demonstrated their efficacy in primary and secondary prevention of CVD (161, 162). Similar effect of statins have also been reported in population-based studies (163, 164). In paper III, we observed that the prevalence of current LLD users was substantially higher among participants with lower education, which could have been due to the greater prevalence of risk factors such as high BMI, smoking, poor diet, and low exercise among individuals with lower education (165). All these risk factors are related to increase total 34 cholesterol levels. We also observed that individuals with lower education who were on LLD treatment experienced a larger reduction in cholesterol levels. A Danish study found that individuals with lower education are having more frequent cholesterol measurements and medical treatment than other educational groups (77), this might indicate that individuals with lower education have higher risk factors for CVD than higher education. This can underscore a Norwegian report, who found Norwegian adults with lower education visited their general practitioners more frequently than those with higher education, which could have led to closer follow-up on cholesterol measurements (166). Whether LLD treatment is as effective in women as in men has been a never-ending debate in the literature since women are often underrepresented in statin trials (167). Furthermore, non- adherent among women have been reported to be, dissatisfied with cholesterol treatment, not being prescribed with LLD and side effect (168, 169). Hunt et al. (170) reported no sex differences in cholesterol levels with LLD treatment. Our study supported this finding. Even though we observed a larger effect of LLD treatment in men, both women and men experienced a substantial decrease in cholesterol levels over 22 years across educational levels compared to non-users. The impact of medication treatment on reducing social inequalities has been reported not only for cholesterol but also for blood pressure levels (171). We also observed that, among LLD users older than 50 years, there was a larger decrease in cholesterol levels compared to participants under 49 years. This observation was consistent with findings from another study (80) and could be explained by the changing of recommended dosages of statin types, resulting in more effective treatment over time (172). The reimbursement policy may have played a role in reducing the social inequality observed in our study. Reimbursements contributed to increasing statin usage in a study from Italy, however, changes in reimbursement policies (i.e., removal of co-payment) led to a drop in statin use, which could lead to health risks and increased health inequalities (173). This was also reported in a systematic review article, where reimbursement was found to affect adherence to statins (174). 35 5 Conclusion This thesis showed that certain subgroups are underrepresented in the Tromsø Study. Exposure misclassification of self-reported education was explored, even with this limitation, educational gradient in cardiometabolic diseases was observed when using self-report and registry data. The use of LLD seemed to reduce the educational gradient in cholesterol levels and, therefore, may reduce social inequality in health. The Norwegian reimbursement policy of providing medications at a low cost to everyone may explain this faded gradient. 36 6 Implications and further perspectives To my knowledge, the Tromsø Study has never conducted a study of participants and non- participants with linkage to registry data. The results in paper I contributed to a better understanding of the non-participants and the geographic area of Tromsø municipality. Non- participants are likely to be a heterogeneous group. Hence, recruitment efforts should target a specific group, not everyone, which can lead to the involvement of those more likely to participate, possibly widening the gap between participation and non-participation groups further. Findings in this thesis contributes to where to recruit for Tromsø8 and further surveys. Some suggestions include: • Efforts should be made to recruit subgroups that are underrepresented, since they are a part of the population being studied. • The health status or risk profile of individuals with lower education who chose to participate and those who chose not to participate could differ, which should be further investigated. • Presence of participation bias may not affect the estimates of association in a large extent, but this might have implication for prevalence studies. One of the purposes of collecting information about education in the Tromsø Study was to use it as a proxy for SES since education serves as a significant explanatory variable for diverse health and behavior outcomes. Paper II contributed to validating self-reported education. Hence, further research can strengthen this understanding of self-reported education when used as an exposure, covariate, or confounder. Some suggestions include: • Validate other SES variables in the Tromsø Study and explore socioeconomic gradient in health compared to registry data. • Changing the wording or adding a follow-up question may lead to a more accurate self- report of the educational level, while avoiding misunderstandings for future Tromsø studies. • Validation of self-reported variables is highly recommended, for example, by linking the Tromsø Study to the Norwegian Prescription Database to access the validation of the self- report of LLD and other medications. Cholesterol is an important risk factor for CVD, and we know that lower levels of education are associated with higher risk of cardiovascular mortality. Our findings, supported by the 37 literature, indicate that social inequality in health exists, while LLD treatment contributes to reducing the inequality. This finding underscores the necessity of optimal medical treatment. Some suggestions include: • More studies are needed to clarify the association between different educational levels and cholesterol levels from a longitudinal perspective. • The effect of LLDs on reducing the educational gradient in cholesterol levels requires further study for policy implications. • Lower deductible for all health-related expenses contribute to reduce social inequality in health. Finally, education inequality in health is unfair. Hence, further research is needed to fully understand the underlying mechanisms and health behavior, especially among those with lower education. 38 References 1. Dahl E. Sosial ulikhet i helse : en norsk kunnskapsoversikt. Oslo: Høgskolen i Oslo og Akershus; 2014. 2. Marmot M, Bell R. Social inequalities in health: a proper concern of epidemiology. Annals of Epidemiology. 2016;26(4):238-40. 3. Marmot M. Social justice, epidemiology and health inequalities. European journal of epidemiology. 2017;32:537-46. 4. Lieb R. Population-Based Study. In: Gellman MD, Turner JR, editors. Encyclopedia of Behavioral Medicine. New York, NY: Springer New York; 2013. p. 1507-8. 5. Goldblatt P, Castedo A, Allen J, Lionello L, Bell R, Marmot M, et al. Rapid review of inequalities in health and wellbeing in Norway since 2014 - Full Report2023. 6. Mackenbach JP. The persistence of health inequalities in modern welfare states: the explanation of a paradox. Soc Sci Med. 2012;75(4):761-9. 7. Mackenbach JP, Kulhánová I, Menvielle G, Bopp M, Borrell C, Costa G, et al. Trends in inequalities in premature mortality: a study of 3.2 million deaths in 13 European countries. J Epidemiol Community Health. 2015;69(3):207-17. 8. Health Mo, Services C. National Strategy to Reduce Social Inequalities in Health. Ministry of Health and Care Services Oslo; 2007. 9. Bævre K. Life expectancy in Norway Norwegian Institute of Public Health2018 [cited 2023 10.aug]. Available from: https://www.fhi.no/en/he/hin/population/life-expectancy/?term=#major- differences-between-counties. 10. Singh GK, Daus GP, Allender M, Ramey CT, Martin EK, Perry C, et al. Social determinants of health in the United States: addressing major health inequality trends for the nation, 1935-2016. International Journal of MCH and AIDS. 2017;6(2):139. 11. Bor J, Cohen GH, Galea S. Population health in an era of rising income inequality: USA, 1980–2015. The Lancet. 2017;389(10077):1475-90. 12. Latif E. Income Inequality and Health: Panel Data Evidence from Canada. The BE Journal of Economic Analysis & Policy. 2015;15(2):927-59. 13. Bakkeli NZ. Income inequality and health in China: a panel data analysis. Social Science & Medicine. 2016;157:39-47. 14. de Azevedo Barros MB, Lima MG, Medina LdPB, Szwarcwald CL, Malta DC. Social inequalities in health behaviors among Brazilian adults: National Health Survey, 2013. International journal for equity in health. 2016;15(1):1-10. 15. Mackenbach JP, Kulhánová I, Artnik B, Bopp M, Borrell C, Clemens T, et al. Changes in mortality inequalities over two decades: register based study of European countries. bmj. 2016;353. 16. Gómez CA, Kleinman DV, Pronk N, Gordon GLW, Ochiai E, Blakey C, et al. Practice full report: addressing health equity and social determinants of health through healthy people 2030. Journal of Public Health Management and Practice. 2021;27(6):S249. 17. Braveman PA, Cubbin C, Egerter S, Chideya S, Marchi KS, Metzler M, et al. Socioeconomic status in health research: one size does not fit all. Jama. 2005;294(22):2879-88. 18. International Labour Organization 2008 [cited 2023 20.aug]. Available from: https://ilostat.ilo.org/resources/concepts-and-definitions/classification-occupation/. 19. Galobardes B, Shaw M, Lawlor DA, Lynch JW, Davey Smith G. Indicators of socioeconomic position (part 1). Journal of epidemiology and community health. 2006;60(1):7-12. 20. Shi J, Tarkiainen L, Martikainen P, van Raalte A. The impact of income definitions on mortality inequalities. SSM Popul Health. 2021;15:100915. 21. Duncan GJ, Daly MC, McDonough P, Williams DR. Optimal indicators of socioeconomic status for health research. American journal of public health. 2002;92(7):1151-7. 22. Pizzi C, Richiardi M, Charles M-A, Heude B, Lanoe J-L, Lioret S, et al. Measuring child socio-economic position in birth cohort research: the development of a novel standardized household income indicator. International Journal of Environmental Research and Public Health. 2020;17(5):1700. 39 23. Khalatbari-Soltani S, Maccora J, Blyth FM, Joannès C, Kelly-Irving M. Measuring education in the context of health inequalities. International Journal of Epidemiology. 2022;51(3):701-8. 24. Krokstad S, Kunst A, Westin S. Trends in health inequalities by educational level in a Norwegian total population study. Journal of Epidemiology & Community Health. 2002;56(5):375-80. 25. Brønnum-Hansen H, Baadsgaard M. Increase in social inequality in health expectancy in Denmark. Scandinavian Journal of Public Health. 2008;36(1):44-51. 26. Grøholt EK, Lyshol H, Helleve A, Alver K, Hagle M, Rusås-Heyerdahl N. Indicators for health inequality in the Nordic countries. 2019. 27. MacNee W, Rabinovich RA, Choudhury G. Ageing and the border between health and disease. European Respiratory Journal. 2014;44(5):1332-52. 28. Motel-Klingebiel A, von Kondratowitz H-J, Tesch-Römer C. Social inequality in the later life: cross-national comparison of quality of life. European Journal of Ageing. 2004;1(1):6-14. 29. Deeks A, Lombard C, Michelmore J, Teede H. The effects of gender and age on health related behaviors. BMC Public Health. 2009;9(1):213. 30. Matthews S, Manor O, Power C. Social inequalities in health: are there gender differences? Social science & medicine. 1999;48(1):49-60. 31. Williams DR, Mohammed SA, Leavell J, Collins C. Race, socioeconomic status, and health: complexities, ongoing challenges, and research opportunities. Annals of the new York Academy of Sciences. 2010;1186(1):69-101. 32. Cooper RS. Social inequality, ethnicity and cardiovascular disease. International journal of epidemiology. 2001;30(suppl_1):S48. 33. Hu YR, Goldman N. Mortality differentials by marital status: an international comparison. Demography. 1990;27(2):233-50. 34. Robards J, Evandrou M, Falkingham J, Vlachantoni A. Marital status, health and mortality. Maturitas. 2012;73(4):295-9. 35. Revold MK, With ML. Leietakere mindre fornøyd med livet [Renters less satisfied with life] 2022 [updated 15. February 2023; cited 2023 15.July]. Available from: https://www.ssb.no/sosiale- forhold-og-kriminalitet/levekar/artikler/leietakere-mindre-fornoyd-med-livet. 36. Oates GR, Jackson BE, Partridge EE, Singh KP, Fouad MN, Bae S. Sociodemographic patterns of chronic disease: how the mid-south region compares to the rest of the country. American journal of preventive medicine. 2017;52(1):S31-S9. 37. Kataoka A, Fukui K, Sato T, Kikuchi H, Inoue S, Kondo N, et al. Geographical socioeconomic inequalities in healthy life expectancy in Japan, 2010-2014: An ecological study. The Lancet Regional Health–Western Pacific. 2021;14. 38. Echeverria SE, Diez-Roux AV, Link BG. Reliability of self-reported neighborhood characteristics. Journal of Urban Health. 2004;81(4):682-701. 39. Stafford M, Marmot M. Neighbourhood deprivation and health: does it affect us all equally? International journal of epidemiology. 2003;32(3):357-66. 40. Ribeiro AI, Fraga S, Severo M, Kelly-Irving M, Delpierre C, Stringhini S, et al. Association of neighbourhood disadvantage and individual socioeconomic position with all-cause mortality: a longitudinal multicohort analysis. The Lancet Public Health. 2022;7(5):e447-e57. 41. Galobardes B, Shaw M, Lawlor DA, Lynch JW, Davey Smith G. Indicators of socioeconomic position (part 2). J Epidemiol Community Health. 2006;60(2):95-101. 42. Abdullahi KB. Socio-demographic statuses: Theory, methods, and applications. 2020. 43. Ohlsson B, Manjer J. Sociodemographic and Lifestyle Factors in relation to Overweight Defined by BMI and "Normal-Weight Obesity". J Obes. 2020;2020:2070297. 44. Lagström H, Halonen JI, Suominen S, Pentti J, Stenholm S, Kivimäki M, et al. Neighbourhood characteristics as a predictor of adherence to dietary recommendations: A population- based cohort study of Finnish adults. Scand J Public Health. 2022;50(2):245-9. 45. Ghanem AS, Nguyen CM, Mansour Y, Fábián G, Rusinné Fedor A, Nagy A, et al. Investigating the Association between Sociodemographic Factors and Chronic Disease Risk in Adults Aged 50 and above in the Hungarian Population. Healthcare (Basel). 2023;11(13). 46. Ashworth M, Durbaba S, Whitney D, Crompton J, Wright M, Dodhia H. Journey to multimorbidity: longitudinal analysis exploring cardiovascular risk factors and sociodemographic determinants in an urban setting. BMJ open. 2019;9(12):e031649. 40 47. Shi L. Sociodemographic characteristics and individual health behaviors. South Med J. 1998;91(10):933-41. 48. Smedberg J, Lupattelli A, Mårdby A-C, Nordeng H. Characteristics of women who continue smoking during pregnancy: a cross-sectional study of pregnant women and new mothers in 15 European countries. BMC pregnancy and childbirth. 2014;14(1):1-16. 49. Ibarra-Sanchez AS, Chen G, Wisløff T. Are relative educational inequalities in multiple health behaviors widening? A longitudinal study of middle-aged adults in Northern Norway. Frontiers in Public Health. 2023;11. 50. Hoffmann K, De Gelder R, Hu Y, Bopp M, Vitrai J, Lahelma E, et al. Trends in educational inequalities in obesity in 15 European countries between 1990 and 2010. International Journal of Behavioral Nutrition and Physical Activity. 2017;14:1-10. 51. Walsemann KM, Gee GC, Ro A. Educational attainment in the context of social inequality: new directions for research on education and health. American Behavioral Scientist. 2013;57(8):1082- 104. 52. Health TLP. Education: a neglected social determinant of health. The Lancet Public Health. 2020;5(7):e361. 53. von dem Knesebeck O, Pattyn E, Bracke P. Education and depressive symptoms in 22 European countries. International journal of public health. 2011;56:107-10. 54. Yusuf S, Joseph P, Rangarajan S, Islam S, Mente A, Hystad P, et al. Modifiable risk factors, cardiovascular disease, and mortality in 155 722 individuals from 21 high-income, middle-income, and low-income countries (PURE): a prospective cohort study. The Lancet. 2020;395(10226):795- 808. 55. Petrelli A, Sebastiani G, Di Napoli A, Macciotta A, Di Filippo P, Strippoli E, et al. Education inequalities in cardiovascular and coronary heart disease in Italy and the role of behavioral and biological risk factors. Nutrition, Metabolism and Cardiovascular Diseases. 2022;32(4):918-28. 56. Eggen AE, Mathiesen EB, Wilsgaard T, Jacobsen BK, Njolstad I. Trends in cardiovascular risk factors across levels of education in a general population: is the educational gap increasing? The Tromso study 1994-2008. J Epidemiol Community Health. 2014;68(8):712-9. 57. Hill TD, Needham BL. Gender-specific trends in educational attainment and self-rated health, 1972-2002. Am J Public Health. 2006;96(7):1288-92. 58. Van Der Heide I, Wang J, Droomers M, Spreeuwenberg P, Rademakers J, Uiters E. The relationship between health, education, and health literacy: results from the Dutch Adult Literacy and Life Skills Survey. Journal of health communication. 2013;18(sup1):172-84. 59. d'Errico A, Ricceri F, Stringhini S, Carmeli C, Kivimaki M, Bartley M, et al. Socioeconomic indicators in epidemiologic research: A practical example from the LIFEPATH study. PloS one. 2017;12(5):e0178071-e. 60. Hahn RA, Truman BI. Education improves public health and promotes health equity. International journal of health services. 2015;45(4):657-78. 61. Carlsen F, Kaarboe OM. The relationship between educational attainment and waiting time among the elderly in Norway. Health Policy. 2015;119(11):1450-8. 62. Goldman DP, Smith JP. Can patient self-management help explain the SES health gradient? Proceedings of the National Academy of Sciences. 2002;99(16):10929-34. 63. Wei MY, Ito MK, Cohen JD, Brinton EA, Jacobson TA. Predictors of statin adherence, switching, and discontinuation in the USAGE survey: understanding the use of statins in America and gaps in patient education. Journal of clinical lipidology. 2013;7(5):472-83. 64. Khan N, Javed Z, Acquah I, Hagan K, Khan M, Valero-Elizondo J, et al. Low educational attainment is associated with higher all-cause and cardiovascular mortality in the United States adult population. BMC Public Health. 2023;23(1):900. 65. Sanchez-Santos MT, Zunzunegui MV, Otero-Puime A, Canas R, Casado-Collado AJ. Self- rated health and mortality risk in relation to gender and education: a time-dependent covariate analysis. European Journal of Ageing. 2011;8:281-9. 66. Sheikh MA, Abelsen B, Olsen JA. Role of respondents’ education as a mediator and moderator in the association between childhood socio-economic status and later health and wellbeing. BMC Public Health. 2014;14:1-15. 41 67. Delgado-Rodriguez M, Llorca J. Bias. Journal of Epidemiology & Community Health. 2004;58(8):635-41. 68. Montastruc JL, Benevent J, Montastruc F, Bagheri H, Despas F, Lapeyre-Mestre M, et al. What is pharmacoepidemiology? Definition, methods, interest and clinical applications. Therapie. 2019;74(2):169-74. 69. Franks P, Tancredi D, Winters P, Fiscella K. Cholesterol treatment with statins: Who is left out and who makes it to goal? BMC health services research. 2010;10:1-8. 70. Strøm MS, Sveen KA, Raknes G, Slungård GF, Fagerås SJ. Dødsårsaksregisteret. Dødsårsaker i Norge 2022 [The cause of death register. Causes of death in Norway 2022]. 2023. Report No.: 8284063859. 71. Øverland SN, Knudsen AK, Vollset SE, Kinge JM, Skirbekk VF, Tollånes MC. Sykdomsbyrden i Norge i 2016. Resultater fra Global Burden of Diseases, Injuries, and Risk Factors Study 2016 (GBD 2016). 2018. 72. Pahan K. Lipid-lowering drugs. Cellular and molecular life sciences CMLS. 2006;63:1165-78. 73. Weitoft GR, Rosen M, Ericsson O, Ljung R. Education and drug use in Sweden--a nationwide register-based study. Pharmacoepidemiol Drug Saf. 2008;17(10):1020-8. 74. Achelrod D, Gray A, Preiss D, Mihaylova B. Cholesterol- and blood-pressure-lowering drug use for secondary cardiovascular prevention in 2004–2013 Europe. European Journal of Preventive Cardiology. 2020;24(4):426-36. 75. Aarnio E, Martikainen J, Winn AN, Huupponen R, Vahtera J, Korhonen MJ. Socioeconomic inequalities in statin adherence under universal coverage: does sex matter? Circulation: Cardiovascular Quality and Outcomes. 2016;9(6):704-13. 76. Hope HF, Binkley GM, Fenton S, Kitas GD, Verstappen SM, Symmons DP. Systematic review of the predictors of statin adherence for the primary prevention of cardiovascular disease. PLoS One. 2019;14(1):e0201196. 77. Flege MM, Kriegbaum M, Jørgensen HL, Lind BS, Bathum L, Andersen CL, et al. Associations between education level, blood-lipid measurements and statin treatment in a Danish primary health care population from 2000 to 2018. Scand J Prim Health Care. 2023;41(2):170-8. 78. Rothman KJ, Greenland S, Lash TL. Modern epidemiology: Wolters Kluwer Health/Lippincott Williams & Wilkins Philadelphia; 2008. 79. Arntsen SH, Borch KB, Wilsgaard T, Njølstad I, Hansen AH. Time trends in body height according to educational level. A descriptive study from the Tromsø Study 1979–2016. PLoS One. 2023;18(1):e0279965. 80. Hopstock LA, Bønaa KH, Eggen AE, Grimsgaard S, Jacobsen BK, Løchen M-L, et al. Longitudinal and secular trends in total cholesterol levels and impact of lipid-lowering drug use among Norwegian women and men born in 1905–1977 in the population-based Tromsø Study 1979– 2016. BMJ open. 2017;7(8):e015001. 81. Langholz PL, Wilsgaard T, Njølstad I, Jorde R, Hopstock LA. Trends in known and undiagnosed diabetes, HbA1c levels, cardiometabolic risk factors and diabetes treatment target achievement in repeated cross-sectional surveys: the population-based Tromsø study 1994–2016. BMJ open. 2021;11(3):e041846. 82. Silverman ME, Reichenberg A, Savitz DA, Cnattingius S, Lichtenstein P, Hultman CM, et al. The risk factors for postpartum depression: A population‐based study. Depression and anxiety. 2017;34(2):178-87. 83. Strandhagen E, Berg C, Lissner L, Nunez L, Rosengren A, Torén K, et al. Selection bias in a population survey with registry linkage: potential effect on socioeconomic gradient in cardiovascular risk. European Journal of Epidemiology. 2010;25(3):163-72. 84. Reinikainen J, Tolonen H, Borodulin K, Härkänen T, Jousilahti P, Karvanen J, et al. Participation rates by educational levels have diverged during 25 years in Finnish health examination surveys. European Journal of Public Health. 2017;28(2):237-43. 85. Rosendahl Jensen HA, Thygesen LC, Møller SP, Dahl Nielsen MB, Ersbøll AK, Ekholm O. The Danish Health and Wellbeing Survey: Study design, response proportion and respondent characteristics. Scandinavian journal of public health. 2022;50(7):959-67. 42 86. Van Loon AJM, Tijhuis M, Picavet HSJ, Surtees PG, Ormel J. Survey non-response in the Netherlands: effects on prevalence estimates and associations. Annals of epidemiology. 2003;13(2):105-10. 87. Langhammer A, Krokstad S, Romundstad P, Heggland J, Holmen J. The HUNT study: participation is associated with survival and depends on socioeconomic status, diseases and symptoms. BMC Medical Research Methodology. 2012;12(1):143. 88. Søgaard AJ, Selmer R, Bjertness E, Thelle D. The Oslo Health Study: The impact of self- selection in a large, population-based survey. Int J Equity Health. 2004;3(1):3. 89. Bopp M, Braun J, Faeh D, Group ftSNCS. Variation in Mortality Patterns Among the General Population, Study Participants, and Different Types of Nonparticipants: Evidence From 25 Years of Follow-up. American Journal of Epidemiology. 2014;180(10):1028-35. 90. Nadelson L, Jorcyk C, Yang D, Jarratt Smith M, Matson S, Cornell K, et al. I just don't trust them: the development and validation of an assessment instrument to measure trust in science and scientists. School Science and Mathematics. 2014;114(2):76-86. 91. Spitzer S. Biases in health expectancies due to educational differences in survey participation of older Europeans: It's worth weighting for. Eur J Health Econ. 2020;21(4):573-605. 92. Awadalla P, Boileau C, Payette Y, Idaghdour Y, Goulet J-P, Knoppers B, et al. Cohort profile of the CARTaGENE study: Quebec’s population-based biobank for public health and personalized genomics. International journal of epidemiology. 2013;42(5):1285-99. 93. Wamala SP, Mittleman MA, Schenck-Gustafsson K, Orth-Gomer K. Potential explanations for the educational gradient in coronary heart disease: a population-based case-control study of Swedish women. American Journal of Public Health. 1999;89(3):315-21. 94. Vathesatogkit P, Batty GD, Woodward M. Socioeconomic disadvantage and disease-specific mortality in Asia: systematic review with meta-analysis of population-based cohort studies. J Epidemiol Community Health. 2014;68(4):375-83. 95. Harald K, Salomaa V, Jousilahti P, Koskinen S, Vartiainen E. Non-participation and mortality in different socioeconomic groups: the FINRISK population surveys in 1972–92. Journal of Epidemiology and Community Health. 2007;61(5):449-54. 96. Althubaiti A. Information bias in health research: definition, pitfalls, and adjustment methods. J Multidiscip Healthc. 2016;9:211-7. 97. Ahlmark N, Algren MH, Holmberg T, Norredam ML, Nielsen SS, Blom AB, et al. Survey nonresponse among ethnic minorities in a national health survey–a mixed-method study of participation, barriers, and potentials. Ethnicity & health. 2015;20(6):611-32. 98. Künn S. The challenges of linking survey and administrative data. IZA World of Labor. 2015. 99. Müller F, Roberts C. Measurement error in self-and proxy reports of educational qualifications. A validation using administrative data. Unpublished Masters dissertation, University of Neuchâtel Available on request [Permission to cite gained: 220222]. 2017. 100. Battistin E, De Nadai M, Sianesi B. Misreported schooling, multiple measures and returns to educational qualifications. Journal of Econometrics. 2014;181(2):136-50. 101. Black D, Sanders S, Taylor L. Measurement of Higher Education in the Census and Current Population Survey. Journal of the American Statistical Association. 2003;98(463):545-54. 102. Statistics Norway. Norges 100 mest folkerike commune [Norway's 100 most populous communes] 2021 [cited 2022 02.February 2022]. Available from: https://www.ssb.no/befolkning/artikler-og-publikasjoner/norges-100-mest-folkerike- kommuner?tabell=446939. 103. Njølstad I, Mathiesen EB, Schirmer H, Thelle DS. The Tromsø study 1974–2016: 40 years of cardiovascular research. Scandinavian Cardiovascular Journal. 2016;50(5-6):276-81. 104. Jacobsen BK, Eggen AE, Mathiesen EB, Wilsgaard T, Njolstad I. Cohort profile: the Tromso Study. Int J Epidemiol. 2012;41(4):961-7. 105. Eggen AE, Mathiesen EB, Wilsgaard T, Jacobsen BK, Njolstad I. The sixth survey of the Tromso Study (Tromso 6) in 2007-08: collaborative research in the interface between clinical medicine and epidemiology: study objectives, design, data collection procedures, and attendance in a multipurpose population-based health survey. Scand J Public Health. 2013;41(1):65-80. 106. Hopstock LA, Grimsgaard S, Johansen H, Kanstad K, Wilsgaard T, Eggen AE. The seventh survey of the Tromsø Study (Tromsø7) 2015–2016: study design, data collection, attendance, and 43 prevalence of risk factors and disease in a multipurpose population-based health survey. Scandinavian Journal of Public Health. 2022;0(0):14034948221092294. 107. The Tromsø Study. The Fourth Tromsø Study 1994-1995 [cited 2023 10.aug]. Available from: https://uit.no/research/tromsostudy/project?pid=708901. 108. The Tromsø Study. The Fifth Tromsø Study 2001 [cited 2023 10.aug]. Available from: https://uit.no/research/tromsostudy/project?pid=708903. 109. Barrabés N, Østli GK. Norwegian Standard Classification of Education 2016. Oslo: Statistisk sentralbyrå; 2016. Contract No.: Documents 2017/02. 110. Holseter AM. Hvordan klassifiseres en persons høyeste utdanningsnivå? [How is a person's highest level of education classified?] Statistisk Sentralbyrå2019 [cited 2020 17.March 2020]. Available from: https://www.ssb.no/utdanning/artikler-og-publikasjoner/hvordan-klassifiseres-en- persons-hoyeste-utdanningsniva. 111. Statistics Norway. About Statistics Norway - An institution that counts [cited 2023 10.aug]. Available from: https://www.ssb.no/en/omssb/ssbs-virksomhet/tall-som-forteller. 112. Ljungvall Å, Gerdtham UG, Lindblad U. Misreporting and misclassification: implications for socioeconomic disparities in body-mass index and obesity. Eur J Health Econ. 2015;16(1):5-20. 113. Beyer A. z-scores and the Standard Normal Distribution. Introduction to Statistics for Psychology. 2021. 114. Hopstock L, Løvsletten O, Johansen H, Tiwari S, Njølstad I, Løchen M-L. Folkehelserapport: Den sjuende Tromsøundersøkelsen 2015-16 [Public health report: The seventh Tromsø survey 2015- 16]. Septentrio Reports. 2019. 115. GeoNorge. Statistiske enheter grunnkretser N.A [Available from: https://kartkatalog.geonorge.no/metadata/statistiske-enheter-grunnkretser/51d279f8-e2be-4f5e-9f72- 1a53f7535ec1. 116. OpenStreetMap. OpenStreetMap. N.A. 117. Hogan WR, Wagner MM. Accuracy of data in computer-based patient records. J Am Med Inform Assoc. 1997;4(5):342-55. 118. Warrens MJ. Conditional inequalities between Cohen’s kappa and weighted kappas. Statistical Methodology. 2013;10(1):14-22. 119. Szklo M, Nieto FJ. Epidemiology: Beyond the Basics. 4 ed. Sudbury: Sudbury: Jones & Bartlett Learning, LLC; 2018. p. 140-51. 120. Viera AJ, Garrett JM. Understanding interobserver agreement: the kappa statistic. Fam Med. 2005;37(5):360-3. 121. Grath-Lone LM, Jay MA, Blackburn R, Gordon E, Zylbersztejn A, Wiljaars L, et al. What makes administrative data "research-ready"? A systematic review and thematic analysis of published literature. Int J Popul Data Sci. 2022;7(1):1718. 122. Nygård G, Holseter AMR. New classification of educational attainment Statistics Norway: Statistics Norway; 2016 [cited 2021 22.02]. Available from: https://www.ssb.no/en/utdanning/artikler- og-publikasjoner/new-classification-of-educational-attainment. 123. Jentoft S. Imputation of missing data among immigrants in the Register of the Population's Level of Education (BU). 2014. 124. Statistics Norway. Income and wealth statistics for households 2022 [updated 21 December 2022; cited 2023 16.June]. Available from: https://www.ssb.no/en/inntekt-og-forbruk/inntekt-og- formue/statistikk/inntekts-og-formuesstatistikk-for-husholdninger. 125. Saarela J, Weber R. Assessment of educational misclassification in register-based data on Finnish immigrants in Sweden. Scand J Public Health. 2017;45(17_suppl):20-4. 126. Patino CM, Ferreira JC. Internal and external validity: can you apply research study results to your patients? Jornal brasileiro de pneumologia. 2018;44:183-. 127. Porta M. A dictionary of epidemiology: Oxford university press; 2014. 128. Åsvold BO, Langhammer A, Rehn TA, Kjelvik G, Grøntvedt TV, Sørgjerd EP, et al. Cohort Profile Update: The HUNT Study, Norway. International Journal of Epidemiology. 2022;52(1):e80- e91. 129. Fry A, Littlejohns TJ, Sudlow C, Doherty N, Adamska L, Sprosen T, et al. Comparison of sociodemographic and health-related characteristics of UK Biobank participants with those of the general population. American journal of epidemiology. 2017;186(9):1026-34. 44 130. Eiliv L, Merethe K, Tonje B, Anette H, Kjersti B, Elise E, et al. External validity in a population-based national prospective study–the Norwegian Women and Cancer Study (NOWAC). Cancer Causes & Control. 2003;14:1001-8. 131. Tolonen H, Dobson A, Kulathinal S. Effect on trend estimates of the difference between survey respondents and non-respondents: results from 27 populations in the WHO MONICA Project. European journal of epidemiology. 2005;20(11):887-98. 132. Coggon D, Barker D, Rose G. Epidemiology for the Uninitiated: John Wiley & Sons; 2009. 133. Monster TB, Janssen WM, de Jong PE, de Jong‐van den Berg LT, Group PS. Pharmacy data in epidemiological studies: an easy to obtain and reliable tool. Pharmacoepidemiology and drug safety. 2002;11(5):379-84. 134. Noize P, Bazin F, Pariente A, Dufouil C, Ancelin ML, Helmer C, et al. Validity of chronic drug exposure presumed from repeated patient interviews varied according to drug class. J Clin Epidemiol. 2012;65(10):1061-8. 135. Pham A, Cummings M, Lindeman C, Drummond N, Williamson T. Recognizing misclassification bias in research and medical practice. Family Practice. 2019;36(6):804-7. 136. Nepomuceno MR, Turra CM. Assessing the quality of education reporting in Brazilian censuses. Demographic Research. 2020;42:441-60. 137. Bingley P, Martinello A. Measurement error in the Survey of Health, Ageing and Retirement in Europe: A validation study with administrative data for education level, income and employment. Work Pap Ser. 2014;16:2014. 138. Kleven Ø, Ringdal K. Causes and effects of measurement errors in educational attainment.: Statistisk sentralbyrå; 2020. Report No.: 2535-7271. 139. Pourhoseingholi MA, Baghestani AR, Vahedi M. How to control confounding effects by statistical analysis. Gastroenterology and hepatology from bed to bench. 2012;5(2):79. 140. Nummela O, Sulander T, Helakorpi S, Haapola I, Uutela A, Heinonen H, et al. Register-based data indicated nonparticipation bias in a health study among aging people. Journal of Clinical Epidemiology. 2011;64(12):1418-25. 141. Bots SH, Peters SA, Woodward M. Sex differences in coronary heart disease and stroke mortality: a global assessment of the effect of ageing between 1980 and 2010. BMJ global health. 2017;2(2):e000298. 142. Christensen AI, Lau CJ, Kristensen PL, Johnsen SB, Wingstrand A, Friis K, et al. The Danish National Health Survey: Study design, response rate and respondent characteristics in 2010, 2013 and 2017. Scandinavian Journal of Public Health. 2020:1403494820966534. 143. Chou P, Kuo H-S, Chen C-H, Lin H-C. Characteristics of non-participants and reasons for non-participation in a population survey in Kin-Hu, Kinmen. European journal of epidemiology. 1997;13:195-200. 144. Demarest S, Van der Heyden J, Charafeddine R, Tafforeau J, Van Oyen H, Van Hal G. Socio- economic differences in participation of households in a Belgian national health survey. The European Journal of Public Health. 2013;23(6):981-5. 145. Selmer R, Sögaard AJ, Bjertness E, Thelle D. The Oslo Health Study: reminding the non- responders - effects on prevalence estimates. Norsk epidemiologi. 2003;13(1):89. 146. Tolonen H, Lundqvist A, Jääskeläinen T, Koskinen S, Koponen P. Reasons for non- participation and ways to enhance participation in health examination surveys—the Health 2011 Survey. European Journal of Public Health. 2017;27(5):909-11. 147. Korkeila K, Suominen S, Ahvenainen J, Ojanlatva A, Rautava P, Helenius H, et al. Non- response and related factors in a nation-wide health survey. European journal of epidemiology. 2001;17(11):991-9. 148. Carlsson F, Merlo J, Lindström M, Östergren P-O, Lithman T. Representativity of a postal public health questionnaire survey in Sweden, with special reference to ethnic differences in participation. Scandinavian Journal of Public Health. 2006;34(2):132-9. 149. Lunde ES, Otnes B, Ramm J. Sosial ulikhet i bruk av helsetjenester. En kartlegging [Social inequality in the use of health services. A mapping]. Statistisk sentralbyrå; 2017. 150. Marcus ME, Manne-Goehler J, Theilmann M, Farzadfar F, Moghaddam SS, Keykhaei M, et al. Use of statins for the prevention of cardiovascular disease in 41 low-income and middle-income 45 countries: a cross-sectional study of nationally representative, individual-level data. Lancet Glob Health. 2022;10(3):e369-e79. 151. Varmdal T, Mathiesen EB, Wilsgaard T, Njølstad I, Nyrnes A, Grimsgaard S, et al. Validating acute myocardial infarction diagnoses in national health registers for use as endpoint in research: the Tromsø study. Clinical Epidemiology. 2021:675-82. 152. McHugh ML. Interrater reliability: the kappa statistic. Biochem Med (Zagreb). 2012;22(3):276-82. 153. Sim J, Wright CC. The Kappa Statistic in Reliability Studies: Use, Interpretation, and Sample Size Requirements. Physical Therapy. 2005;85(3):257-68. 154. rkappa - stata manuals [cited 2023 10.aug.2023]. Available from: https://www.stata.com/manuals/rkappa.pdf. 155. Tiwari S, Cerin E, Wilsgaard T, Løvsletten O, Njølstad I, Grimsgaard S, et al. Lifestyle factors as mediators of area-level socio-economic differentials in cardiovascular disease risk factors. The Tromsø Study. SSM-Population Health. 2022;19:101241. 156. Knudsen AK, Hotopf M, Skogen JC, Overland S, Mykletun A. The health status of nonparticipants in a population-based health study: the Hordaland Health Study. Am J Epidemiol. 2010;172(11):1306-14. 157. Dryden R, Williams B, McCowan C, Themessl-Huber M. What do we know about who does and does not attend general health checks? Findings from a narrative scoping review. BMC public health. 2012;12:1-23. 158. Cheung KL, Ten Klooster PM, Smit C, de Vries H, Pieterse ME. The impact of non-response bias due to sampling in public health studies: A comparison of voluntary versus mandatory recruitment in a Dutch national survey on adolescent health. BMC public health. 2017;17(1):1-10. 159. Brenner PS, DeLamater J. Lies, Damned Lies, and Survey Self-Reports? Identity as a Cause of Measurement Bias. Soc Psychol Q. 2016;79(4):333-54. 160. Kristensen P, Corbett K, Mohn FA, Hanvold TN, Mehlum IS. Information bias of social gradients in sickness absence: a comparison of self-report data in the Norwegian Mother and Child Cohort Study (MoBa) and data in national registries. BMC Public Health. 2018;18(1):1275. 161. Cheung BM, Lauder IJ, Lau CP, Kumana CR. Meta‐analysis of large randomized controlled trials to evaluate the impact of statins on cardiovascular outcomes. British journal of clinical pharmacology. 2004;57(5):640-51. 162. Taylor F, Ward K, Moore TH, Burke M, Smith GD, Casas JP, et al. Statins for the primary prevention of cardiovascular disease. Cochrane database of systematic reviews. 2011(1). 163. Kytö V, Saraste A, Tornio A. Early statin use and cardiovascular outcomes after myocardial infarction: a population-based case-control study. Atherosclerosis. 2022;354:8-14. 164. Svensson E, Nielsen RB, Hasvold P, Aarskog P, Thomsen RW. Statin prescription patterns, adherence, and attainment of cholesterol treatment goals in routine clinical care: a Danish population- based study. Clin Epidemiol. 2015:213-23. 165. Kubota Y, Heiss G, MacLehose R, Roetker N, Folsom A. Association of educational attainment with lifetime risk of cardiovascular disease: the Atherosclerosis Risk in Communities Study. JAMA Intern Med. 2017; 177 (8): 1165–72. 2017. 166. Lunde ES, Ramm J. Sosial ulikhet i bruk av helsetjenester–2 [Social inequality in the use of health services. A mapping-2]. En kartlegging. 2017;16. 167. Cangemi R, Romiti GF, Campolongo G, Ruscio E, Sciomer S, Gianfrilli D, et al. Gender related differences in treatment and response to statins in primary and secondary cardiovascular prevention: the never-ending debate. Pharmacological research. 2017;117:148-55. 168. Karalis DG, Wild RA, Maki KC, Gaskins R, Jacobson TA, Sponseller CA, et al. Gender differences in side effects and attitudes regarding statin use in the Understanding Statin Use in America and Gaps in Patient Education (USAGE) study. Journal of clinical lipidology. 2016;10(4):833-41. 169. Nanna MG, Wang TY, Xiang Q, Goldberg AC, Robinson JG, Roger VL, et al. Sex differences in the use of statins in community practice: patient and provider assessment of lipid management registry. Circulation: Cardiovascular Quality and Outcomes. 2019;12(8):e005562. 46 170. Hunt NB, Emmens JE, Irawati S, de Vos S, Bos JHJ, Wilffert B, et al. Sex disparities in the effect of statins on lipid parameters: The PharmLines Initiative. Medicine (Baltimore). 2022;101(2):e28394. 171. Rydland HT. Medical innovations can reduce social inequalities in health: an analysis of blood pressure and medication in the HUNT study. Health Sociology Review. 2021;30(2):171-87. 172. Gitsels LA, Bakbergenuly I, Steel N, Kulinskaya E. Do statins reduce mortality in older people? Findings from a longitudinal study using primary care records. Family Medicine and Community Health. 2021;9(2). 173. Damiani G, Federico B, Anselmi A, Bianchi CBNA, Silvestrini G, Iodice L, et al. The impact of Regional co-payment and National reimbursement criteria on statins use in Italy: an interrupted time-series analysis. BMC health services research. 2014;14:1-8. 174. Ingersgaard MV, Helms Andersen T, Norgaard O, Grabowski D, Olesen K. Reasons for nonadherence to statins–a systematic review of reviews. Patient preference and adherence. 2020:675- 91. Paper I Vo et al. BMC Public Health (2023) 23:994 https://doi.org/10.1186/s12889-023-15928-w RESEARCH ARTICLE Open Access © The Author(s) 2023. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. The Creative Commons Public Domain Dedication waiver (http:// creat iveco mmons. org/ publi cdoma in/ zero/1. 0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. BMC Public Health Comparing the sociodemographic characteristics of participants and non-participants in the population-based Tromsø Study Chi Quynh Vo1* , Per‑Jostein Samuelsen1,2, Hilde Leikny Sommerseth3, Torbjørn Wisløff4, Tom Wilsgaard1 and Anne Elise Eggen1 Abstract Background Differences in the sociodemographic characteristics of participants and non‑participants in population‑ based studies may introduce bias and reduce the generalizability of research findings. This study aimed to compare the sociodemographic characteristics of participants and non‑participants of the seventh survey of the Tromsø Study (Tromsø7, 2015–16), a population‑based health survey. Methods A total of 32,591 individuals were invited to Tromsø7. We compared the sociodemographic character‑ istics of participants and non‑participants by linking the Tromsø7 invitation file to Statistics Norway, and explored the association between these characteristics and participation using logistic regression. Furthermore, we created a geographical socioeconomic status (area SES) index (low‑SES, medium‑SES, and high‑SES area) based on individual educational level, individual income, total household income, and residential ownership status. We then mapped the relationship between area SES and participation in Tromsø7. Results Men, people aged 40–49 and 80–89 years, those who were unmarried, widowed, separated/divorced, born outside of Norway, had lower education, had lower income, were residential renters, and lived in a low‑SES area had a lower probability of participation in Tromsø7. Conclusions Sociodemographic differences in participation must be considered to avoid biased estimates in research based on population‑based studies, especially when the relationship between SES and health is being explored. Particular attention should be paid to the recruitment of groups with lower SES to population‑based studies. Keywords Epidemiological studies, Sociodemographic characteristics, Survey, Area socioeconomic status *Correspondence: Chi Quynh Vo chi.q.vo@uit.no Full list of author information is available at the end of the article Page 2 of 10Vo et al. BMC Public Health (2023) 23:994 Background Population-based studies are important, as they are often used as a source of data on determinants of health and as a source of information on people’s health status [1]. As such, these surveys should adequately reflect the tar- get population for the relevant indicators. A problem with population-based studies is that participation is vol- untary, thus people can choose not to participate. Non- participation can reduce the precision of estimates, and more seriously may introduce selection bias if both the exposure and the outcome under investigation affect the probability of participation, and may reduce the general- izability of the results [2]. The presence of selection bias cannot usually be inferred from the study data alone; participation studies are therefore necessary to identify any underrepresented subgroups [3]. Knowledge of the characteristics of non-participants may help to improve recruitment procedures and representativeness, lead- ing to more accurate assumptions and conclusions in population-based studies, i.e., estimations of prevalence and incidence, and associations between exposures and outcomes. Sociodemographic characteristics refer to a combi- nation of social and demographic factors [4], including socioeconomic status (SES), which is often measured by an individual’s educational attainment, occupation, and income [5]. Individuals with low SES have been reported to have poorer health status and to be less likely to par- ticipate in health surveys compared with individuals with high SES [6–10]. Men, people who are unmarried, and those with low education or low income are also less likely to participate, according to previous studies [10–13]. The association between participation and age [14–16] or belonging to an ethnic minority [11, 17] is inconsistent in the literature. National registers with high-quality individual-level data can be useful in providing information on non- participants, which can be compared with information on participants. The present study used register data to compare the sociodemographic characteristics of partici- pants and non-participants of the seventh survey of the Tromsø Study (Tromsø7). Methods Study population The Tromsø Study is an ongoing population-based health survey. It currently consists of seven surveys (Tromsø1-7) conducted between 1974 and 2016 in the municipality of Tromsø, Northern Norway. The study population consists of complete birth cohorts and ran- dom samples [18, 19]. Tromsø7 was carried out between 2015 and 2016, inviting all inhabitants aged 40 years and above in the municipality of Tromsø to participate. A total of 32,591 eligible individuals were invited and 65% participated in Tromsø7 [20]. Linkage to statistics Norway Information on sociodemographic characteristics recorded in Statistics Norway (SSB), which covers the entire Norwegian population, was linked with data from the Tromsø7 invitation file, which covered all 32,591 invited individuals, using the unique 11-digit personal identification number assigned to each resident of Nor- way at birth or immigration. SSB performed the linkage and all personal identification numbers were deleted. Sociodemographic characteristics of participants and non‑participants All sociodemographic characteristics of participants and non-participants of Tromsø7 were taken from the SSB, including age (10-years age intervals), sex, and marital status (married, unmarried, widow(er), and divorced/ separated). The category “divorced/separated” included the subgroups separated (n = 517), separated partner- ship (n = 4), and divorced partner (n = 25). The category “married” included registered partnerships (n = 20). Data was also collected on country of birth, which was cate- gorized into four broad groups: Norway, Western coun- tries (Western Europe, North America, and Oceania), Eastern Europe (including Russia), and other countries (Asia, Africa, and South America). Individuals born in Norway were further categorized into three regions of birth: Tromsø, Northern Norway (Finnmark, Troms, and Nordland), and South Norway (counties south of Nord- land). Finally, information was extracted on the highest completed educational level (primary education, upper secondary education, college/university < 4  years; and college/university ≥ 4 years), income (defined as individ- ual income and total household income and categorized as in the Tromsø Study questionnaire: ≤ 250,000 Norwe- gian kroner (NOK) to ≥ 750,000 NOK), and residential ownership status (owner or renter). Statistical analyses Descriptive characteristics were presented as number (percent). Sex-specific binary logistic regression analy- ses were used to estimate odds ratios (ORs) and corre- sponding 95% confidence intervals (CIs) of participation in unadjusted and age-adjusted models. The variable area SES was adjusted for individual-level socioeconomic status. Individual-level SES was calculated based on educa- tional level, individual income, total household income, and residential ownership status. For each of these four variables, a Z-score was calculated and then summarized to give an individual-level SES score. We also created a Page 3 of 10Vo et al. BMC Public Health (2023) 23:994 geographical SES index, based on 36 geographical subdi- visions of the municipality of Tromsø defined in a local Public Health report [21]. These geographical subdivi- sions are based on the basic geographical and statistical units of the municipality of Tromsø, in order to estab- lish small, stable geographical units that give a flexible basis for the presentation of regional statistics [22]. The geographical SES index was calculated as the average individual-level SES score in each of the geographical subdivisions, resulting in a continuous variable rang- ing from -1.73 to 1.24, and then categorized as low-SES area, medium-SES area, or high-SES area, based on ter- tiles using the command xtile in the statistical program Stata. Participation in Tromsø7 within each of the 36 geographical subdivisions was also divided into tertiles: low (59.3%), medium (66.7%), and high (68.5%), and the spatial distribution of SES areas and participation in the 36 geographical subdivisions was graphed using chorop- leth maps. Analyses were performed in Stata 16.0 (StataCorp, Col- lege Station, TX, USA). Choropleth maps were created in Python 3 (using mainly the pandas, geopandas, and plotly express packages). A GeoJSON file was collected from the Norwegian Mapping Authority [23], while a base map from OpenStreetMap [24] was used. Results A total of 32,591 individuals were invited to Tromsø7, of which 11,508 (35%) did not participate. The mean age of participants and non-participants was 57.3 years and 57.6 years, respectively. The median individual and total household income for participants were 431,799 NOK (IQR: 8680—585,830 NOK) and 725,354 NOK (IQR: 489,059—943,548 NOK), respectively. The cor- responding figures for non-participants were 244,083 NOK (IQR: 0 – 524,675 NOK) and 546,086 NOK (IQR: 321,302 – 831,602 NOK). The sociodemographic dis- tribution of participants differed from that of non-par- ticipants (Table 1). In both women and men, those who were unmarried, widowed, separated/divorced, born outside of Norway, had lower education, had lower income, were residential renters, and lived in a low-SES area had a lower probability of participation (Fig. 1 and Supplementary Table 1). Men were less likely to participate than women (age- adjusted OR 0.79, 95% Cl 0.75 – 0.82, analysis not shown). Invitees aged 80–99  years were less likely to participate (women: OR 0.27, 95% Cl 0.24 – 0.31; men: OR 0.76, 95% Cl 0.65 – 0.89) compared to the young- est age group (40–49  years) and other age groups. However, the youngest age group was less likely to par- ticipate than those aged 50–79 years in both sexes. The odds of participation were highest among those with an educational level of college/university < 4 years, for both women (OR 2.20, 95% Cl 1.99 – 2.42) and men (OR 2.22, 95% Cl 2.00 – 2.47). Participation decreased with decreasing individual and total household income for men. Among women, those with medium individual income (450,000–549,999 NOK) were more likely to participate than those with the highest individual income, while women with lowest individual income were less likely to participate. Lastly, individuals living in medium- and high-SES areas had higher odds to participate than those living in low-SES areas, after adjustment for individual-level SES. How- ever, the estimated effect of area SES was not very strong (women: OR 1.24, 95% Cl 1.13 – 1.35; men: OR 1.17, 95% Cl 1.08 – 1.28). Individual-level SES showed a stronger effect, and those with high individual-level SES were around three times more likely to participate than those with low individual-level SES, in both sexes. Generally, individuals living in high-SES areas, located on the West side of the city, had higher participation. None of the low-SES areas had high participation, but not all high SES areas had high participation, and there was more variation in participation in medium-SES areas (Fig. 2). Discussion This study showed that men, people aged 49–49 and 80–89 years, those who were unmarried, widowed, sepa- rated/divorced, born outside of Norway, had lower edu- cation, had lower income, were residential renters, and lived in a low-SES area had a lower probability of partici- pation in Tromsø7. In accordance with results from Norwegian [9, 25, 26], Finnish [27, 28], and Dutch [29] studies, our study found that men were less likely to participate than women. In a previous Finnish study, women were found to engage more frequently in health behavior and to seek health- related information more often than men [30]. The ten- dency of men to have lower interest in participating in population-based studies has also been shown previously [31], and previous surveys of the Tromsø Study have had lower participation among men [18, 19]. In an attempt to increase participation among men in the age group 40–49  years, they were specifically targeted during the planning of Tromsø7 [20]. In the literature, evidence regarding study participation and age is much less consistent. We found that people aged 40–49 and 80–99  years were less likely to partici- pate, whereas some studies have found that age does not affect participation [16], others found that individuals (40–49  years old) were more likely to participate [15], and still others found higher participation among older (> 60 years) individuals [14, 32]. Less participation among Page 4 of 10Vo et al. BMC Public Health (2023) 23:994 Table 1 Distribution of sociodemographic characteristics among participants and non‑participants by sex, Tromsø7 (2015–2016) Women (n = 16,537) Men (n = 16,054) Participants n = 11,073 (%) Non‑participants n = 5464 (%) Participants n = 10,010 (%) Non‑ participants n = 6044 (%) Age, years 40–49 3377 (30.5) 1816 (33.3) 3055 (30.5) 2509 (41.4) 50–59 3245 (29.3) 1289 (23.6) 2790 (27.9) 1537 (25.4) 60–69 2677 (24.2) 909 (16.6) 2502 (25.0) 1041 (17.2) 70–79 1361 (12.3) 640 (11.7) 1315 (13.1) 582 (9.6) 80–99 413 (3.7) 810 (14.8) 348 (3.5) 375 (6.2) Marital status Married 5768 (52.1) 2096 (38.4) 6023 (60.2) 2634 (43.6) Unmarried 1429 (22.8) 1429 (26.1) 2491 (24.9) 2259 (37.3) Widowed 850 (7.7) 899 (16.5) 207 (2.0) 193 (3.2) Separated/divorced 1930 (17.4) 1040 (19.0) 1289 (12.9) 958 (15.9) Country of birtha Norway 10 328 (93.3) 4848 (88.7) 9464 (94.5) 5048 (83.5) Western countries 403 (3.6) 215 (3.9) 354 (3.5) 362 (6.0) Eastern Europe 138 (1.2) 197 (3.6) 63 (0.6) 366 (6.1) Other countries 204 (1.8) 204 (3.7) 129 (1.3) 268 (4.4) Region of birthb Tromsø 4084 (39.5) 1817 (37.5) 3966 (41.9) 2201 (43.6) Northern Norwayc 3674 (35.6) 1719 (35.4) 3125 (33.0) 1556 (30.8) South Norwayd 2570 (24.9) 1312 (27.1) 2373 (25.1) 1291 (25.6) Educational level Primary 1875 (17.0) 1655 (30.9) 1612 (16.2) 1516 (25.9) Upper secondary 4071 (36.9) 1784 (33.2) 4428 (44.9) 2406 (41.1) College/university < 4 years 3589 (32.6) 1293 (24.1) 2299 (23.1) 1030 (17.6) College/university ≥ 4 years 1486 (13.5) 632 (11.8) 1576 (15.8) 900 (15.4) Individual income (NOK)e < 249,999 4474 (40.4) 3151 (58.2) 3214 (32.1) 2583 (43.0) 250,000–349,999 660 (6.0) 288 (5.3) 309 (3.1) 289 (4.8) 350,000–449,999 1642 (14.8) 596 (11.0) 844 (8.4) 613 (10.2) 450,000–549,999 1978 (17.9) 589 (10.9) 1597 (16.0) 772 (12.8) 550,000–749,999 1746 (15.8) 550 (10.2) 2263 (22.6) 926 (15.6) ≥ 750,000 568 (5.1) 239 (4.4) 1775 (17.8) 820 (13.6) Total household income (NOK)e < 249,999 543 (4.9) 958 (17.7) 319 (3.2) 797 (13.3) 250,000–349,999 1040 (9.4) 756 (14.0) 559 (5.6) 728 (12.1) 350,000–449,999 1219 (11.0) 576 (10.6) 774 (7.7) 680 (11.3) 450,000–549,999 1161 (10.5) 611 (11.3) 947 (9.5) 667 (11.1) 550,000–749,999 2268 (20.5) 896 (16.6) 2333 (23.3) 1084 (18.1) ≥ 750,000 4839 (43.7) 1616 (29.8) 5071 (50.7) 2048 (34.1) Residential ownership status Owner 10,208 (92.2) 4269 (80.6) 9245 (92.4) 4635 (77.5) Renter 860 (7.8) 1025 (19.4) 761 (7.6) 1349 (22.5) Area SES Low 3526 (31.8) 2128 (38.9) 3142 (31.4) 2447 (40.5) Medium 4028 (36.4) 1891 (34.6) 3731 (37.3) 1986 (32.8) High 3519 (31.8) 1445 (26.5) 3137 (31.3) 1611 (26.7) Page 5 of 10Vo et al. BMC Public Health (2023) 23:994 the oldest age group could be associated with poorer health among the very old [27, 28]; however, findings from another study suggested that older people’s health conditions do not affect survey participation [33]. Dif- ferent explanations for participation in health surveys have been explored earlier [31, 34, 35]. Older persons (≥ 65 years) think that it is a civic duty to participate in population-based research, while lower participation among younger individuals may be due to a lack of time and a perception that their health is good [31, 34]. It has been suggested that marriage may encourage positive health behaviors, which over time cumulate and facilitate desirable health outcomes [36]. We observed that people with marital statuses other than married were less likely to participate than married individuals of both sexes. This is in accordance with other popu- lation-based studies [16, 25, 37]. Previous studies have highlighted the increased health and survival among married individuals compared to unmarried individuals [38, 39], which seems to be the case for men in particu- lar [39, 40]. A possible explanation was proposed in a qualitative study on participants and non-participants of community health screening, which found that the decision to participate in screening is often made by a partner [41]. Sala et al. [33] reported that, among cou- ples, if one partner took part in a health survey the other was more likely to respond as well. According to several studies, participants born in the country where a survey is conducted are more likely to participate than those born outside of the country [9, 11, 12, 15]. Even though the municipality of Tromsø is cur- rently the 12th most populous in Norway, it has relatively few immigrants (16%, year 2021) compared to other pop- ulous municipalities in the country [42, 43]. Furthermore, the Tromsø Study questionnaires are in Norwegian, and to participate in the Tromsø Study, individuals had to master the Norwegian language. In an Australian study, speaking the same language at home as was used in the questionnaire was associated with higher odds of partici- pation [15]. This indicates that language difficulties hin- der participation. In our study, increased educational level, total house- hold income, and being a residential owner were all socioeconomic factors associated with an increased probability of participation. Prior literature has also reported that participation was more likely among individuals with high educational level, income [7–11, 44], and among residential owners [14, 15, 37]. Bopp et  al. [37] suggested that residential owners are more likely to participate because they move less frequently, and are therefore easier to track. Education is con- sidered an important social determinant of health, as it helps to promote and sustain healthy lifestyles and positive health choices [45]. Nadelsen et  al. [46] found that as years of college increased, trust in science also increased. Furthermore, the authors suggested that people with more education are more likely to have a deeper understanding of science and the work of sci- entists, and are thus more likely to be engaged in criti- cal examinations of scientific issues. For instance, UiT The Arctic University of Norway and The University Hospital of North Norway are among the largest pub- lic workplaces in the municipality of Tromsø [47], and their employees belong to occupational groups with a higher educational level. As research is as a part of their work tasks, they have an deeper understanding of sci- ence and their willingness to participate might be higher than that observed in other workplaces. In addition, Table 1 (continued) Women (n = 16,537) Men (n = 16,054) Participants n = 11,073 (%) Non‑participants n = 5464 (%) Participants n = 10,010 (%) Non‑ participants n = 6044 (%) Individual‑level SES Low 3289 (29.9) 2597 (50.5) 2249 (22.6) 2490 (43.3) Medium 4358 (39.5) 1509 (29.3) 3669 (36.9) 1699 (29.5) High 3372 (30.6) 1042 (20.2) 4039 (40.5) 1567 (27.2) Percentage calculated to equal 100% in column NOK Norwegian kroner, SES Socioeconomic status, EUR Euro, USD United States dollar a Western countries (Western Europe, North America, and Oceania), Eastern Europe (including Russia), and Others (Asia, Africa, and Southern America) b Among individuals born in Norway c Northern Norway: County of Troms, Nordland, and Finnmark, excluding Tromsø d South Norway: Counties south of Nordland County e 100,000 NOK = 10,480 EUR/11,526 USD Page 6 of 10Vo et al. BMC Public Health (2023) 23:994 different employers in Tromsø were asked to give their employees time off from work to participate in Tromsø7 [48]. Indeed, a Norwegian qualitative study showed that reasons for not participating in a population-based study included difficulty in taking a day off from work and loss of salary during participation [34]. This might apply especially to individuals in low-income groups, as they are more financially vulnerable than those with higher income. Furthermore, in this qualitative study, an informant suggested that if people were to get paid by their employer to participate in health research, more people might participate [34]. Some have suggested providing modest financial compensation for lost work time and travel expenses as a token of appreciation to increase participation rates [28, 34]. Olsen et  al. [49] found that a scratch lottery ticket incentive increased participation among individuals with lower education, and this might apply to low-income groups as well. However, these approaches are expensive, especially for a population-based study whose target group is the entire general population. Individuals with low SES do not only participate less compared to individuals with high SES, but their par- ticipation decreases over time, according to a follow- up study of a randomized controlled trial [50]. Indeed, participation in health surveys decreases over time in all educational levels, though the decline seems fastest for those with low education [44]. Fig. 1 Age‑adjusted odds ratios for participation by sex, Tromsø7 (2015–2016). *Reference group. **Additionally adjusted for individual‑level socioeconomic status Page 7 of 10Vo et al. BMC Public Health (2023) 23:994 Fig. 2 Choropleth maps of socioeconomic status (SES) areas (A) and participation (%) in the Tromsø Study (B) in 36 subdivisions of the municipality Tromsø, Tromsø7 (2015–2016). Maps: Kartverket (CC‑BY 4.0), Carto/OpenStreetMap©(CC BY‑SA 2.0), MapBox©. SES area based on the average individual‑level SES score in each geographical subdivision [21]. SES: socioeconomic status Page 8 of 10Vo et al. BMC Public Health (2023) 23:994 It has been suggested that participation may not depend only on individual characteristics, but also on geographical features [51]. The SES of the surrounding area has been reported to be associated with lower par- ticipation in cohort studies [14, 16, 51]. This is consist- ent with our findings, which showed that those living in high-SES areas were more likely to participate than those living in low-SES areas. Sala et al. [33] found that participation among women was associated with their socioeconomic background and the wealth of their resi- dential area. Bender et al. [52] hypothesize that residents of deprived neighborhoods may be lack of social support to participate in health check and they may also be less trusting of public health authorities. The literature is generally consistent in showing that those living in economically disadvantaged areas have poorer health [53, 54]. For instance, the prevalence of dia- betes mellitus has been found to be lower for those living in areas with medium and high SES for both women and men, compared to those living in areas with low SES [55]. Those who volunteer to participate in health surveys are often more likely to have favorable exposures and health profiles compared to those who do not [14, 56, 57]. Soci- odemographic differences in participation can lead to bias in the population-level estimates and in the associations with health status and health behaviors. For instance, low educational level and low income are both positively associated with unhealthy dietary habits [58]. If low edu- cational level is also associated with non-participation, as we have shown, and unhealthy dietary habits are associ- ated with the probability of participation, any resultant associations will be biased (selection bias). A false conclu- sion might thus be drawn about the health status of the population. Furthermore, non-participation can lead to an underestimation of the prevalence of health indicators and harmful health behaviors [29], as well as reduced pre- cision of estimates. Efforts should be made to recruit subgroups that we have shown to be underrepresented in our study. For example, our findings show non-participation by area SES. This can help the Tromsø Study and other popula- tion-based studies when planning recruitment for future surveys. However, sending extra reminders has shown lit- tle impact on the sociodemographic distribution among participants, so other methods to increase the participa- tion of underrepresented groups should be explored [59]. Strength and limitations The main strength of this study is the linkage of infor- mation on sociodemographic characteristics from the Tromsø Study to that from the SSB, a national register. Another strength is the use of a large population-based study with reasonably high participation. Our study provides an overview of the representativeness of the Tromsø Study regarding a variety of sociodemographic characteristics. A potential limitation of this study is that we have categorized continuous variables; as there are no perfect cut-off points for variables like income, informa- tion might be lost in categorization. Errors in the collec- tion and processing of the income data are unavoidable, even for administrative data. Income information for employed individuals was based on complete registration from employers and other administrative data. An exten- sive work from SSB has been carried out to minimize errors, and we consider errors to be relatively insignifi- cant. Whereas information from self-employed individu- als in Norway is self-reported, but they are required to provide accurate taxable income, which is carefully con- trolled by the tax authorities. In conclusion, sociodemographic differences in par- ticipation must be considered to avoid biased estimates in research based on population-based studies, espe- cially when the relationship between SES and health is being explored. Particular attention should be paid to the recruitment of groups with lower SES to population- based studies. Abbreviations CL Confidents intervals DPIA Data Protection Impact Assessment IQR Interquartile range NSD The Norwegian Data Protection Authority OR Odds ratio REK North The Regional Committee for Medical and Health Research Ethics North SES Socioeconomic status SSB Statistics Norway Supplementary Information The online version contains supplementary material available at https:// doi. org/ 10. 1186/ s12889‑ 023‑ 15928‑w. Additional file 1: Supplementary Table 1. Odds ratios for participation by sex, Tromsø7 (2015‑2016). Acknowledgements We sincerely thank participants for participating in the Tromsø Study. We thank Dr Sweta Tiwari for meaningful discussions regarding the categorization of area SES. Authors’ contributions All authors contributed to the study conception and design. Material preparation and analysis were performed by CQV. Project administration was performed by CQV and AEE. Choropleth maps were created by PJS with Python 3. The first draft of the manuscript was written by CQV, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript. Funding Open access funding provided by UiT The Arctic University of Norway (incl University Hospital of North Norway). This study received funding from the High North Population Studies, UiT The Arctic University of Norway. Page 9 of 10Vo et al. BMC Public Health (2023) 23:994 Availability of data and materials The data that support the findings of this study are available from the Statistics Norway (SSB), but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Declarations Ethics approval and consent to participate This study is based on secondary use of data from administrative registry. This study was not defined as health research by The Regional Committee for Medical and Health Research Ethics North (REK North) and thus was exempted from the requirement of study preapproval. We have written consent from participants. For non‑participants who have not consented in the project, the legal basis for the processing will be the performance of a task in the public interest. The project has therefore conducted a Data Protection Impact Assess‑ ment (DPIA) and the project was approved on 22.11.2021 by Data Protection Officer at UiT The Arctic University of Norway (ref. 809230). Consent for publication Not applicable. Competing interests The authors declare that they have no competing interests. Author details 1 Department of Community Medicine, Faculty of Health Sciences, UiT The Arctic University of Norway, N‑9037 Tromsø, Norway. 2 Regional Medicines Information and Pharmacovigilance Centre (RELIS), University Hospital of North Norway, Tromsø, Norway. 3 The Norwegian Historical Data Centre, Department of Archaeology, History, Religious Studies and Theology, UiT The Arctic University of Norway, Tromsø, Norway. 4 Health Services Research Unit, Akershus University Hospital, Lørenskog, Norway. Received: 28 September 2022 Accepted: 17 May 2023 References 1. Ezzati‑Rice TM, Curtin LR. Population‑based surveys and their role in public health11Address reprint requests to: Centers for Disease Control and Prevention, National Immunization Program Resource Center, 1600 Clifton Road NE, Mailstop E‑34, Atlanta, Georgia 30333. Fax: (404) 639–8828. Am J Prev Med. 2001;20(4, Supplement 1):15–6. 2. Jousilahti P, Salomaa V, Kuulasmaa K, Niemelä M, Vartiainen E. Total and cause specific mortality among participants and non‑participants of population based health surveys: a comprehensive follow up of 54 372 Finnish men and women. J Epidemiol Community Health. 2005;59(4):310. 3. Lash TL, Rothman KJ, editors. Selection Bias and Generalizability. Philadel‑ phia: Lippincott Williams and Wilkins; 2021. p. 315–31. 4. Hatch SL, Frissa S, Verdecchia M, Stewart R, Fear NT, Reichenberg A, et al. Identifying socio‑demographic and socioeconomic determi‑ nants of health inequalities in a diverse London community: the South East London Community Health (SELCoH) study. BMC Public Health. 2011;11(1):861. 5. Mackenbach JP, Stirbu I, Roskam AJ, Schaap MM, Menvielle G, Leinsalu M, et al. Socioeconomic inequalities in health in 22 European countries. N Engl J Med. 2008;358(23):2468–81. 6. Lorant V, Demarest S, Miermans P‑J, Van Oyen H. Survey error in measur‑ ing socio‑economic risk factors of health status: a comparison of a survey and a census. Int J Epidemiol. 2007;36(6):1292–9. 7. Harald K, Salomaa V, Jousilahti P, Koskinen S, Vartiainen E. Non‑partic‑ ipation and mortality in different socioeconomic groups: the FINRISK population surveys in 1972–92. J Epidemiol Community Health. 2007;61(5):449–54. 8. McElfish PA, Long CR, Selig JP, Rowland B, Purvis RS, James L, et al. Health Research Participation, Opportunity, and Willingness Among Minority and Rural Communities of Arkansas. Clin Transl Sci. 2018;11(5):487–97. 9. Søgaard AJ, Selmer R, Bjertness E, Thelle D. The Oslo Health Study: The impact of self‑selection in a large, population‑based survey. Int J Equity Health. 2004;3(1):3. 10. Tolonen H, Helakorpi S, Talala K, Helasoja V, Martelin T, Prättälä R. 25‑year trends and socio‑demographic differences in response rates: Finnish adult health behaviour survey. Eur J Epidemiol. 2006;21(6):409–15. 11. Christensen AI, Lau CJ, Kristensen PL, Johnsen SB, Wingstrand A, Friis K, Davidsen M, Andreasen AH. The Danish National Health Survey: Study design, response rate and respondent characteristics in 2010, 2013 and 2017. Scandinavian journal of public health. 2022;50(2):180–188. https:// doi. org/ 10. 1177/ 14034 94820 966534. 12. Ahlmark N, Algren MH, Holmberg T, Norredam ML, Nielsen SS, Blom AB, et al. Survey nonresponse among ethnic minorities in a national health survey–a mixed‑method study of participation, barriers, and potentials. Ethn Health. 2015;20(6):611–32. 13. Martikainen P, Laaksonen M, Piha K, Lallukka T. Does survey non‑response bias the association between occupational social class and health? Scand J Public Health. 2007;35(2):212–5. 14. Fry A, Littlejohns TJ, Sudlow C, Doherty N, Adamska L, Sprosen T, et al. Comparison of sociodemographic and health‑related characteristics of UK Biobank participants with those of the general population. Am J Epidemiol. 2017;186(9):1026–34. 15. Livingston P, Lee S, Taylor H. A comparison of participants with non‑ participants in a population‑based epidemiologic study: The Melbourne Visual Impairment Project. [Lisse, the Netherlands]. 1997. p. 73–81. 16. Goldberg M, Chastang JF, Leclerc A, Zins M, Bonenfant S, Bugel I, et al. Socioeconomic, Demographic, Occupational, and Health Factors Associ‑ ated with Participation in a Long‑term Epidemiologic Survey: A Prospec‑ tive Study of the French GAZEL Cohort and Its Target Population. Am J Epidemiol. 2001;154(4):373–84. 17. Wendler D, Kington R, Madans J, Wye GV, Christ‑Schmidt H, Pratt LA, et al. Are racial and ethnic minorities less willing to participate in health research? PLoS Med. 2006;3(2): e19. 18. Eggen AE, Mathiesen EB, Wilsgaard T, Jacobsen BK, Njolstad I. The sixth survey of the Tromso Study (Tromso 6) in 2007–08: collaborative research in the interface between clinical medicine and epidemiology: study objectives, design, data collection procedures, and attendance in a multipurpose population‑based health survey. Scand J Public Health. 2013;41(1):65–80. 19. Jacobsen BK, Eggen AE, Mathiesen EB, Wilsgaard T, Njolstad I. Cohort profile: the Tromso Study. Int J Epidemiol. 2012;41(4):961–7. 20. Hopstock LA, Grimsgaard S, Johansen H, Kanstad K, Wilsgaard T, Eggen AE. The seventh survey of the Tromsø Study (Tromsø7) 2015–2016: study design, data collection, attendance, and prevalence of risk factors and disease in a multipurpose population‑based health survey. Scand J Public Health. 2022;50(7):919–29. 21. Hopstock L, Løvsletten O, Johansen H, Tiwari S, Njølstad I, Løchen M‑L. Folkehelserapport: Den sjuende Tromsøundersøkelsen 2015–16. Septen‑ trio Reports. 2019. 22. Statistics Norway. Basic statistical unit. Available from: https:// www. ssb. no/a/ metad ata/ conce ptvar iable/ vardok/ 135/ en. [Accessed 20 June 2022]. 23. GeoNorge. Statistiske enheter grunnkretser. Available from: https:// kartk atalog. geono rge. no/ metad ata/ stati stiske‑ enhet er‑ grunn krets er/ 51d27 9f8‑ e2be‑ 4f5e‑ 9f72‑ 1a53f 7535e c1. [Accessed 09 May 2022]. 24. OpenStreetMap. Base map. Available from: https:// www. opens treet map. org/# map=5/ 65. 401/ 17. 864. [Accessed 09 May 2022]. 25. Langhammer A, Krokstad S, Romundstad P, Heggland J, Holmen J. The HUNT study: participation is associated with survival and depends on socioeconomic status, diseases and symptoms. BMC Med Res Methodol. 2012;12(1):143. 26. Knudsen AK, Hotopf M, Skogen JC, Overland S, Mykletun A. The health status of nonparticipants in a population‑based health study: the Horda‑ land Health Study. Am J Epidemiol. 2010;172(11):1306–14. 27. Nummela O, Sulander T, Helakorpi S, Haapola I, Uutela A, Heinonen H, et al. Register‑based data indicated nonparticipation bias in a health study among aging people. J Clin Epidemiol. 2011;64(12):1418–25. 28. Tolonen H, Lundqvist A, Jääskeläinen T, Koskinen S, Koponen P. Reasons for non‑participation and ways to enhance participation in health examination surveys—the Health 2011 Survey. Eur J Pub Health. 2017;27(5):909–11. Page 10 of 10Vo et al. BMC Public Health (2023) 23:994 • fast, convenient online submission • thorough peer review by experienced researchers in your field • rapid publication on acceptance • support for research data, including large and complex data types • gold Open Access which fosters wider collaboration and increased citations maximum visibility for your research: over 100M website views per year • At BMC, research is always in progress. Learn more biomedcentral.com/submissions Ready to submit your research ? Choose BMC and benefit from: 29. Van Loon AJM, Tijhuis M, Picavet HSJ, Surtees PG, Ormel J. Survey non‑response in the Netherlands: effects on prevalence estimates and associations. Ann Epidemiol. 2003;13(2):105–10. 30. Ek S. Gender differences in health information behaviour: a Finnish population‑based survey. Health Promot Int. 2013;30(3):736–45. 31. Enzenbach C, Wicklein B, Wirkner K, Loeffler M. Evaluating selection bias in a population‑based cohort study with low baseline participation: the LIFE‑Adult‑Study. BMC Med Res Methodol. 2019;19(1):135. 32. Laaksonen M, Aittomäki A, Lallukka T, Rahkonen O, Saastamoinen P, Sil‑ ventoinen K, et al. Register‑based study among employees showed small nonparticipation bias in health surveys and check‑ups. J Clin Epidemiol. 2008;61(9):900–6. 33 Sala E, Zaccaria D, Guaita A. Survey participation to the first Wave of a lon‑ gitudinal study of older people: the case of the Italian InveCe. Ab study. Qual Quan. 2020;54(1):99–110. 34. Antonsen S. Motivasjon for deltakelse i helseundersøkelser. Nor J Epi‑ demiol [Internet]. 2009;15(1). Available from: https:// www. ntnu. no/ ojs/ index. php/ norep id/ artic le/ view/ 232. [cited 23 May 2023]. 35. Galea S, Tracy M. Participation rates in epidemiologic studies. Ann Epide‑ miol. 2007;17(9):643–53. 36. Kiecolt‑Glaser JK, Newton TL. Marriage and health: his and hers. Psychol Bull. 2001;127(4):472. 37. Bopp M, Braun J, Faeh D, Group ftSNCS. Variation in Mortality Patterns Among the General Population, Study Participants, and Different Types of Nonparticipants: Evidence From 25 Years of Follow‑up. Am J Epidemiol. 2014;180(10):1028–35. 38. Hu YR, Goldman N. Mortality differentials by marital status: an interna‑ tional comparison. Demography. 1990;27(2):233–50. 39. Robards J, Evandrou M, Falkingham J, Vlachantoni A. Marital status, health and mortality. Maturitas. 2012;73(4):295–9. 40. Ben‑Shlomo Y, Smith GD, Shipley M, Marmot MG. Magnitude and causes of mortality differences between married and unmarried men. J Epide‑ miol Community Health. 1993;47(3):200–5. 41. Engebretson J, Mahoney JS, Walker G. Participation in community health screenings: a qualitative evaluation. J Community Health Nurs. 2005;22(2):77–92. 42. Statistics Norway. Norges 100 mest folkerike kommune 2021. Available from: https:// www. ssb. no/ befol kning/ artik ler‑ og‑ publi kasjo ner/ norges‑ 100‑ mest‑ folke rike‑ kommu ner? tabell= 446939. [Accessed 10 Aug 2022]. 43. Gulbrandsen F, Kulasingam AS, Molstad CS, Steinkellner A. Innvandrere og norskfødte med innvandrerforeldres fordeling på kommunenivå. Statistisk sentralbyrå: Statistisk sentralbyrå; 2021. 44. Reinikainen J, Tolonen H, Borodulin K, Härkänen T, Jousilahti P, Karvanen J, et al. Participation rates by educational levels have diverged dur‑ ing 25 years in Finnish health examination surveys. Eur J Pub Health. 2017;28(2):237–43. 45. Raghupathi V, Raghupathi W. The influence of education on health: an empirical assessment of OECD countries for the period 1995–2015. Arch Public Health. 2020;78(1):20. 46. Nadelson L, Jorcyk C, Yang D, Jarratt Smith M, Matson S, Cornell K, et al. I just don’t trust them: the development and validation of an assessment instrument to measure trust in science and scientists. Sch Sci Math. 2014;114(2):76–86. 47. Tromsø Municipality. Fakta om Tromsø. Available from: https:// tromso. kommu ne. no/ fakta‑ om‑ tromso. [Accessed 20 June 2022]. 48. Johansen H. Sluttrapport Tromsø 7: Den sjuende Tromsøundersøkelsen 2015–16. 2019. 49. Olsen F, Abelsen B, Olsen JA. Improving response rate and quality of sur‑ vey data with a scratch lottery ticket incentive. BMC Med Res Methodol. 2012;12(1):52. 50 Bender A, Jørgensen T, Helbech Kleist B, Linneberg A, Pisinger C. Socio‑ economic position and participation in baseline and follow‑up visits: The Inter99 study. Eur J Prev Cardiol. 2012;21:899–905. 51. Chaix B, Billaudeau N, Thomas F, Havard S, Evans D, Kestens Y, et al. Neigh‑ borhood effects on health: correcting bias from neighborhood effects on participation. Epidemiology. 2011;22(1):18–26. 52. Bender AM, Kawachi I, Jørgensen T, Pisinger C. Neighborhood deprivation is strongly associated with participation in a population‑based health check. PLoS ONE. 2015;10(6): e0129819. 53. Diez Roux AV, Jacobs DR, Kiefe CI. Neighborhood characteristics and components of the insulin resistance syndrome in young adults: the coronary artery risk development in young adults (CARDIA) study. Diabe‑ tes Care. 2002;25(11):1976–82. 54. Howard VJ, McClure LA, Kleindorfer DO, Cunningham SA, Thrift AG, Roux AVD, et al. Neighborhood socioeconomic index and stroke incidence in a national cohort of blacks and whites. Neurology. 2016;87(22):2340–7. 55. Bilal U, Hill‑Briggs F, Sanchez‑Perruca L, Del Cura‑Gonzalez I, Franco M. Association of neighbourhood socioeconomic status and diabetes bur‑ den using electronic health records in Madrid (Spain): the HeartHealthy‑ Hoods study. BMJ Open. 2018;8(9): e021143. 56. Drivsholm T, Eplov LF, Davidsen M, Jørgensen T, Ibsen H, Hollnagel H, et al. Representativeness in population‑based studies: a detailed descrip‑ tion of non‑response in a Danish cohort study. Scand J Public Health. 2006;34(6):623–31. 57. Tolonen H, Dobson A, Kulathinal S. Effect on trend estimates of the difference between survey respondents and non‑respondents: results from 27 populations in the WHO MONICA Project. Eur J Epidemiol. 2005;20(11):887–98. 58. Baraldi LG, Steele EM, Canella DS, Monteiro CA. Consumption of ultra‑ processed foods and associated sociodemographic factors in the USA between 2007 and 2012: evidence from a nationally representative cross‑ sectional study. BMJ Open. 2018;8(3): e020574. 59. Selmer R, Sögaard AJ, Bjertness E, Thelle D. The Oslo Health Study: reminding the non‑responders ‑ effects on prevalence estimates. Norsk epidemiologi. 2003;13(1):89. Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in pub‑ lished maps and institutional affiliations. Paper II https://doi.org/10.1177/14034948221088004 © Author(s) 2022 Article reuse guidelines: sagepub.com/journals-permissions DOI: 10.1177/ 403 9 82 10 04 journals.sagepub.com/home/sjp Scandinavian Journal of Public Health, 2023; 51: 1061–1068 Introduction Self-administrated questionnaires are often used in epi- demiological studies to obtain information about a per- son’s education. Inaccurate self-reported data occur when individuals answer questions incorrectly, which can lead to exposure misclassification, and thereby to less reliable study findings [1]. Education is an impor- tant determinant of socioeconomic status, as it confers skills that help individuals utilise health information, and it affects future income and occupational class [2, 3]. Indeed, education has become the principal pathway to higher incomes, stable employment and healthier lifestyle [4]. Furthermore, as self-reported education is often used as an exposure and covariate in health research [5, 6], it is important to assess the valid- ity of that variable. Validation studies on this variable should be done to produce estimates of misclassifica- tion in self-reported data and help determine if study results are biased. Data accuracy can be determined by comparing self-reported data to a gold standard data source and is often calculated by two measures: correct- ness, the proportion of recorded observations in the registry that are correct; and completeness, which measures the proportion of recorded observations that Validity of self-reported educational level in the Tromsø Study ChI Q VO1 , PEr-JOStEIn SAmuElSEn1,2, hIlDE l SOmmErSEth3, tOrbJørn WISløFF1, tOm WIlSgAArD1 & AnnE E EggEn1 1Department of Community Medicine, UiT The Arctic University of Norway, Tromsø, Norway, 2Regional Medicines Information and Pharmacovigilance Centre (RELIS), University Hospital of North Norway, Tromsø, Norway, 3The Norwegian Historical Data Centre, UiT The Arctic University of Norway, Tromsø, Norway Abstract Background: Self-reported data on educational level have been collected for decades in the tromsø Study, but their validity has yet to be established. Aim: to investigate the completeness and correctness of self-reported educational level in the tromsø Study, using data from Statistics norway. In addition, we explored the consequence of using these two data sources on educational trends in cardiometabolic diseases. Methods: We compared self-reported and Statistics norway-recorded educational level (primary, upper secondary, college/university <4 years, and college/university ⩾4 years) among 20,615 participants in the seventh survey of the tromsø Study (tromsø7, 2015–2016). Sensitivity, positive predictive value and weighted kappa were used to measure the validity of self-reported educational level in three age groups (40–52, 53–62, 63–99 years). multivariable logistic regression was used to compare educational trends in cardiometabolic diseases between self-reported and Statistics norway-recorded educational level. Results: Sensitivity of self-reported educational level was highest among those with a college/university education of 4 years or more (⩾97% in all age groups and both sexes). Sensitivity for primary educational level ranged from 67% to 92% (all age groups and both sexes). the lowest positive predictive value was observed among women with a college/university education of 4 years or more (29–46%). Weighted kappa was substantial (0.52–0.59) among men and moderate to substantial (0.41–0.51) among women. Educational trends in the risk of cardiometabolic diseases were less pronounced when self-reported educational level was used. Conclusions: Self-reported educational level in Tromsø7 is adequately complete and correct. Self-reported data may produce weaker associations between educational level and cardiometabolic diseases than registry-based data. Keywords: Self-report, education, validation, survey, completeness, correctness Correspondence: Chi Q Vo, Department of Community medicine, Faculty of health Sciences, uit the Arctic university of norway, n-9037 tromsø, norway. E-mail: chi.q.vo@uit.no Date received 20 November 2021; reviewed 25 February 2022; accepted 28 February 2022 1088004SJP0010.1177/14034948221088004Vo et al.Short title research-article2022 OrIgInAl ArTIcle 1062 Vo et al. are actually recorded in the gold standard data source [7]. Studies of the quality of reported education are not new in the literature. however, research on the validity of self-reported education within epidemiology is still scarce. this study aimed to investigate the complete- ness and correctness of self-reported educational level in the tromsø Study, using data from Statistics norway (SSb). In addition, we explored the consequence of using these two data sources on educational trends in cardiometabolic diseases. Methods The Tromsø Study the tromsø Study is an ongoing population-based health survey, which consists of seven surveys (tromsø1–7) conducted between 1974 and 2016 in the municipality of tromsø, northern norway. the study population consists of complete birth cohorts and random samples of other cohorts [8, 9]. All inhab- itants of the municipality aged 40 years and above were invited to participate in tromsø7 (2015–2016), and the study questionnaire collected information on topics such as health issues, symptoms, diseases, use of medication and healthcare services, employment, and sociodemographic and lifestyle factors. Study population Data on self-reported educational level from tromsø7 was linked to data from SSb, the national statistical institute of norway and the main producer of official statistics, using the unique 11-digit identification number assigned to all individuals living in norway. A total of 21,083 people participated in tromsø7 (attendance 65%), of which 20,615 had records in SSb and were included in the analyses. A total of 468 were excluded from the analysis. Of these 99 persons lack information about education in SSb (19 persons were specified as ‘no education, unspecified, and pre- school education’) and 369 had no education in tromsø7. Self-reported educational level, household income, and other variables In the tromsø7 questionnaire, participants were asked to respond to the question: ‘What is the highest level of education you have completed?’. response options were: primary/partly secondary education (up to 10 years of schooling); upper secondary educa- tion (minimum 3 years); tertiary education, short: college/university less than 4 years; and tertiary edu- cation, long: college/university 4 years or more (see link to questionnaire in Supplemental material). they were also asked to report their total pre-taxable household income for the previous year, using eight categories from 150,000 nOK or less to 1,000,000 nOK or more. the two lowest income groups (⩽150,000 nOK and 150,000–250,000 nOK) were merged in the analysis. Participants reported their current and previous status for the following cardio- metabolic diseases: diabetes, myocardial infarction, angina pectoris, and cerebral stroke, which were cat- egorised as binary variables. Participants reported their self-rated health status as ‘very bad’, ‘bad’, ‘nei- ther good nor bad’, ‘good’ and ‘excellent’, which was regrouped into three categories (‘bad’, ‘neither good nor bad’ and ‘good’). Finally, participants reported whether or not they lived with a spouse. SSB-recorded educational level Educational information in SSb comes from admin- istrative sources, such as educational institutions, and the State Educational loan Fund provides sup- plemental data on education acquired abroad [10]. SSb records the highest completed educational level. the norwegian Standard Classification of Education has nine educational levels alone, including a value for unspecified level [11]. these were regrouped by SSb into: no education or preschool education; pri- mary education; upper secondary education; voca- tional education; university/college education, short; and university/college education, long. We further- more excluded participants in the group with no education, preschool education or unspecified edu- cation from the analysis. We also merged the catego- ries upper secondary education and vocational education leaving four educational levels (primary education, upper secondary education, university/ college education <4 years, and university/college education ⩾4 years) that were comparable to the self-reported educational levels in tromsø7. Statistical analyses We assessed the validity of self-reported educational level in tromsø7 by estimating sensitivity (complete- ness) and positive predictive value (PPV, correctness), using SSb-recorded educational level as the gold standard. Agreement between self-reported and SSb- recorded educational level was measured by percent- age observed agreement and weighted kappa. Kappa values and kappa agreement were interpreted as pro- posed by Viera and garrett [12] (less than chance: <0.00, slight: 0.00–0.20, fair: 0.21–0.40, moderate: 0.41–0.60, substantial: 0.61–0.80, or almost perfect: Validity of self-reported education 1063 0.81–1.00). multinomial logistic regression was used to calculate odds ratios (Ors) of over or underreport- ing educational level. Comparisons between self- reported and SSb-recorded educational level were stratified by age group (40–52, 53–62 and 63–99 years) and sex. these age groups were constructed after taking into account the school reform of 1959, when 7 years of primary education was made manda- tory. those who started primary school in 1959 were 63 years old in tromsø7. the 53–62 age group was constructed to reflect another school reform in 1969. logistic regression models were also used to estimate Ors of self-reported cardiometabolic diseases in tromsø7 according to self-reported and SSb- recorded educational levels. A randomisation test with 10,000 permutations of the data file was used to compare trends, that is, the categorical educational level variable modelled as a linear term, between self- reported and SSb-recorded educational level. the linearity assumption was reasonably met and self- reported and SSb-recorded educational levels were therefore modelled as linear terms. Ethics this study was approved by the norwegian Centre for research Data (nSD Data Protection Services) (reference 809230). All participants in the tromsø Study have given written informed consent for their data to be used in research. this study was not defined as health research by the regional Ethics Committee north and was exempted from the requirement of study preapproval. results Of the 20,615 individuals included in the analysis, 53% were women; the mean age was 57 years (stand- ard deviation (SD): 11.3 years, range: 40–99 years). the proportion of women with college/university education of 4 years or more was higher than that among men (33% vs. 26%, respectively); this was also seen for the primary educational level (24% vs. 22%, respectively). the proportion of women with household income of 1,000,000 nOK or more was lower than that among men (22% vs. 28%, respec- tively) (table I). Sensitivity of self-reported educational level was highest among those with a college/university educa- tion of 4 years or more (⩾97% in all age groups and both sexes), and lowest among those with a college/ university education of less than 4 years (37–58% in all age groups and both sexes) (table II). Among women who self-reported primary educational level, sensitivity ranged from 67% to 92%, compared to 72–91% among men. PPVs for women with a col- lege/university education of 4 years or more were between 29–46% and 59–62% for men. the PPV was 48–67% among women, compared to 52–66% among men with primary education. In all age groups and both sexes, the highest degree of underreporting in tromsø7 was observed among those with SSb- recorded upper secondary educational level, but a self-reported primary educational level, whereas the highest degree of overreporting was observed among those with SSb-recorded college/university educa- tion less than 4 years, but a self-reported college/uni- versity education of 4 years or more (Supplemental table I. Socioeconomic characteristcs of study population in the tromsø Study 2015–16. Women (%) n=10,826 men (%) n=9789 Age group 40–52 years 4372 (41.4) 3865 (39.5) 53–62 years 3067 (28.3) 2682 (27.4) 63–99 years 3387 (31.3) 3242 (33.1) Educational level Primary education 2597 (24.0) 2163 (22.1) upper secondary education 2749 (25.4) 2989 (30.5) College/university <4 years 1913 (17.7) 2082 (21.3) College/university ⩾4 years 3567 (32.9) 2555 (26.1) household incomea,b <250,000 nOK 725 (7.0) 396 (4.1) 251,000–350,000 nOK 892 (8.7) 509 (5.3) 350,000–450,000 nOK 1110 (10.8) 764 (7.9) 450,000–550,000 nOK 1311 (12.7) 976 (10.1) 550,000–750,000 nOK 1749 (17.0) 1780 (18.5) 750,000–1,000,000 nOK 2259 (22.0) 2453 (25.5) ⩾1,000,000 nOK 2244 (21.8) 2744 (28.5) Values are numbers (%). a100,000 nOK ≈ 11,500 uSD. b703 missing value. 1064 Vo et al. table I and table II). For women, kappa agreement varied from moderate to substantial (57–64%), and was substantial in all age groups for men (65–71%). A fair corresponding weighted kappa value was found in all age groups for women (0.41, 0.48 and 0.51, respectively), and for men (0.52, 0.54 and 0.59, respectively). Among those aged 40–52 and 53–62 years, the proportions of self-reported and SSb-recorded pri- mary educational level were similar. however, in those aged 63–99 years, there was a notable differ- ence (Supplemental table III). All age groups showed higher self-reported than SSb-recorded college/uni- versity education of 4 years or more, and this was especially evident in the youngest age group. the difference between self-reported and SSb- recorded educational levels varied by sex (Figure 1), that is, levels of education registered by SSb sub- tracted self-reported level of education in tromsø7. Zero represents individuals who self-reported the same educational level as in the SSb registry. numbers ±1, 2 and 3 indicate levels of underreport- ing or overreporting. Women were more likely to overreport (Or 1.46, 95% confidence interval (CI) 1.36–1.57) and underreport (Or 1.10, 95% CI 0.99–1.21) their educational level compared to men. Women aged 53–62 years overreported more often than those aged 40–52 years (Or 1.13, 95% CI 1.02–1.26), and the odds of underreporting are higher among women aged 53–62 years (Or 1.95, 95% CI 1.58–2.41) and 63–99 years (Or 4.47, 95% CI 3.68–5.43) compared to 40–49 years (table III). higher odds of underreporting was also found among men aged 53–62 years (Or 1.47, 95% CI 1.10–1.94) and 63–99 years (Or 1.29, 95% CI 1.11–1.50) compared to 40–49 years. men aged 53–62 years overreported more than those aged 40–52 years (Or 1.06, 95% CI 0.94–1.20). For participants who lived with a spouse, the Ors for overreporting were 0.56 (95% CI 0.48–0.64) for women, and 0.73 (95% CI 0.63–0.86) for men. the Ors for underreporting were 1.62 (95% CI 1.36–1.92) for women and 1.46 (95% CI 1.21–1.77) for men. underreporting edu- cational level was more common among men with bad (Or 1.46, 95% CI 1.10–1.94) and neither good nor bad health (Or 1.29, 95% CI 1.11–1.50), com- pared with those with good health. Finally, overre- porting of educational level increased, while underreporting decreased, with increasing household income. We found educational trends in the risk of self- reported cardiometabolic diseases when using both self-reported and SSb-recorded educational level (table IV). For women the odds for diabetes increased by 31% per one-level decrease in self- reported educational level (Or 1.31, 95% CI 1.20– 1.42), while the odds increased by 44% per one-level decrease in SSb-recorded educational level (Or 1.44, 95% CI 1.29–1.61). We saw the same trends for myocardial infarction, angina pectoris, and stroke, also for men. however, the educational trend was less pronounced when using the self-reported educa- tional level. Discussion We found that self-reported data on educational level in tromsø7 achieved very high completeness (⩾97% in all age groups and both sexes) for participants with a college/university education of 4 years or more, and table II. Validity of self-reported educational level compared to that recorded in Statistics norway by age and stratified by sex. the tromsø Study 2015–2016. Age Women men Sensitivity (%) PPV (%) Sensitivity (%) PPV (%) 40–52 years Primary education 67.2 66.8 72.3 65.8 upper secondary education 73.6 86.2 69.0 87.3 College/university <4 years 40.2 77.2 51.3 64.3 College/university ⩾4 years 99.1 46.0 99.1 60.3 53–62 years Primary education 70.5 62.8 75.5 61.4 upper secondary education 64.5 82.2 60.1 86.1 College/university <4 years 37.0 66.4 53.9 54.0 College/university ⩾4 years 98.4 37.5 97.2 58.5 63–99 years Primary education 92.0 48.4 90.8 51.5 upper secondary education 43.3 88.1 49.7 88.6 College/university <4 years 37.2 68.3 57.9 56.5 College/university ⩾4 years 96.7 29.1 96.6 62.1 PPV: positive predictive value. Validity of self-reported education 1065 high completeness (67–92% in all age groups and both sexes) for those with a primary educational level. however, low correctness was found for both of these educational levels (29–62% for college/univer- sity education ⩾4 years and 48–67% for primary educational level, respectively). Our findings showed substantial agreement (65–71%) in all age groups for men, and moderate to substantial agreement for women (57–64%). Fair weighted kappa values were found in both women (0.41–0.51) and men (0.52– 0.59). Educational trends in cardiometabolic dis- eases were less pronounced when self-reported educational level was used rather than registry- recorded educational level. the degree of completeness was highest among those with a college/university education of 4 years or more, indicating near-perfect self-reporting. however, completeness among those with primary educational level was slightly lower. low correctness was found in all age groups in our highest and lowest categories of educational level. there are several possible explana- tions for this low correctness. First, individuals might consider that they belong in the highest educational category because they have taken courses or pro- grammes that were not necessarily included in a degree. Indeed, it is common in norway to take work- related continuing education courses, but they do not necessarily culminate in a formal degree. SSb only places individuals in the category of college/university education of 4 years or more if they have completed a master’s degree or a PhD [11]. In addition, tromsø7 and SSb measure the educational level differently: table III. Sex-specific odds ratios of under and overreporting of educational level from tromsø7 and Statistics norway. n (%) Overreporting vs. correctly reported Or (95% CI) underreporting vs. correctly reported Or (95% CI) Women Age group 40–52 years 3977 (41.7) reference group reference group 53–62 years 2767 (29.1) 1.13 (1.02–1.26) 1.95 (1.58–2.41) 63–99 years 2784 (29.2) 0.97 (0.85–1.11) 4.47 (3.68–5.43) living with spouse 6926 (72.7) 0.56 (0.48–0.64) 1.62 (1.36–1.92) Self-rated health bad 575 (6.0) 1.05 (0.85–1.29) 0.94 (0.71–1.26) neither good nor bad 2364 (24.8) 0.87 (0.77–0.98) 1.05 (0.90–1.22) good 6589 (69.2) reference group reference group household income <250,000 nOK 606 (6.4) 0.17 (0.12–0.22) 5.71 (3.92–8.33) 251,000–350,000 nOK 754 (7.9) 0.29 (0.23–0.38) 7.25 (5.07–10.38) 351,000–450,000 nOK 945 (9.9) 0.35 (0.28–0.44) 6.75 (4.79–9.52) 451,000–550,000 nOK 1155 (12.1) 0.65 (0.55–0.78) 4.54 (3.23–6.37) 551,000–750,000 nOK 1635 (17.1) 0.52 (0.45–0.61) 3.40 (2.46–4.69) 751,000–1,000,000 nOK 2217 (23.3) 0.76 (0.67–0.87) 2.40 (1.74–3.33) ⩾1,000,000 nOK 2216 (23.3) reference group reference group men Age group 40–52 years 3702 (39.8) reference group reference group 53–62 years 2576 (27.7) 1.06 (0.94–1.20) 1.47 (1.10–1.94) 63–99 years 3024 (32.5) 0.98 (0.86–1.12) 1.29 (1.11–1.50) living with spouse 7621 (81.9) 0.73 (0.63–0.86) 1.46 (1.21–1.77) Self-rated health bad 429 (4.6) 1.37 (1.07–1.75) 1.46 (1.10–1.94) neither good nor bad 2394 (25.7) 0.93 (0.82–1.05) 1.29 (1.11–1.50) good 6479 (69.6) reference group reference group household income <250,000 nOK 341 (3.7) 0.29 (0.20–0.44) 6.38 (4.47–9.10) 251,000–350,000 nOK 466 (5.0) 0.41 (0.30–0.56) 5.18 (3.70–7.25) 351,000–450,000 nOK 715 (7.7) 0.52 (0.41–0.67) 4.50 (3.32–6.09) 451,000–550,000 nOK 923 (9.9) 0.64 (0.52–0.79) 4.80 (3.62–6.36) 551,000–750,000 nOK 1717 (18.5) 0.73 (0.62–0.85) 3.20 (2.47–4.14) 751,000–1,000,000 nOK 2417 (26.0) 0.85 (0.75–0.97) 2.26 (1.76–2.91) ⩾1,000,000 nOK 2723 (29.3) reference group reference group CI: confidence interval; Or: odds ratio. the tromsø Study 2015–16. 100,000 nOK ≈ 11,500 uSD. mutually adjusted for all listed variables. total missing values for women n=1298. total missing values for men n=486. 1066 Vo et al. SSb asks for the highest completed degree, whereas tromsø7 asked for the duration of education. moreover, it has been hypothesised that question- naire respondents sometimes give answers that are more in line with prevailing social norms than their factual situation [13]. When individuals provide answers they believe to be more socially desirable, rather than revealing their true attitudes, prefer- ences, or beliefs, it is referred to as social desirability bias [14]; it is one of the most common and perva- sive sources of bias that affects the validity of survey research findings and might also explain some of the overreporting of the educational level in our study. Previous studies also found that those who claimed to have a degree did not, in fact, have any degree [15, 16]. It is often harder to get a correct answer to questions about education. Some might think they do not have the education they ‘should have’ due to a feeling of social prestige, and therefore report a higher educa- tional level than they actually have [10]. the age group 53–62 years had a higher tendency to overreport their educational level than the youngest age group, while others have found a higher tendency of overreporting among the youngest age group [10]. Second, it is dif- ficult to measure education appropriately, as most soci- eties have complex educational systems that change over time [17, 18]. In tromsø7, the participants of dif- ferent age groups have received their education within different school systems, as the norwegian educational system has been reformed continuously from 1959, which may make it difficult for these participants to Figure 1. Differences between self-reported and Statistics norway-recorded educational level by sex. the tromsø Study 2015–16. negative numbers indicate underreporting and the positive numbers indicate overreporting. table IV. Age-adjusted odds ratios for the association between cardiometabolic diseases and educational level from tromsø7 and Statistics norway. tromsø7 Or (95% CI)a Statistics norway Or (95% CI)a P value equalityb Women Diabetes mellitus (509 out of 10,510)c 1.31 (1.20–1.42) 1.44 (1.29–1.61) 0.004 myocardial infarction (166 out of 10,459) 1.44 (1.23–1.72) 1.66 (1.35–2.17) 0.098 Angina pectoris (158 out of 10,447) 1.17 (1.01–1.36) 1.47 (1.20–1.81) 0.102 Stroke (206 out of 10,482) 1.21 (1.07–1.39) 1.38 (1.17–1.66) 0.063 men Diabetes mellitus (579 out of 9580)c 1.21 (1.12–1.31) 1.26 (1.14–1.38) 0.034 myocardial infarction (550 out of 9540) 1.19 (1.09–1.29) 1.28 (1.14–1.40) 0.044 Angina pectoris (290 out of 9514) 1.08 (0.98–1.21) 1.13 (0.99–1.29) 0.128 Stroke (314 out of 9561) 1.12 (1.01–1.25) 1.13 (1.00–1.29) 0.109 CI: confidence interval; Or: odds ratio. aEducation as linear term, per level decrease. bP value for equality between Ors based on education from Statistics norway and tromsø7. the tromsø Study 2015–16. cDiabetes mellitus types 1 and 2. Validity of self-reported education 1067 report their educational level correctly; for example, the transition from several different degrees with spe- cific norwegian and latin titles to bachelor and master degrees [19]. SSb has re-classified the educa- tional level of those with what were previously the low- est and middle educational levels due to changes in the norwegian educational system [10, 20]. Self-reporting of educational level could also be subject to recall bias, particularly among the oldest participants [10, 16]. Finally, overreporting of educational level in ques- tionnaires due to misunderstanding has been reported [21, 22]. It has been suggested that this mis- understanding is linked to the question regarding the duration of education (total years of education versus highest obtained degree) [21, 22], and misclassifica- tion can occur when inferring attainment of a degree from years of schooling. Previous studies observed misreporting of educa- tional level in both sexes, although it was higher among women, which was also the case in our study [15, 23]. Our data suggest that participants from the most affluent households are more likely to overre- port their educational level. A previous study found that women who reported having a higher degree also tended to have higher earnings than those who reported their educational level correctly [15]. high- income individuals are more likely than low-income individuals to report their education correctly [23], which is consistent with our findings in the highest household income category. Knowing the extent of misreporting also has obvious implications for the interpretation of other studies that use educational attainment as an exposure or for descriptive purposes. When education is used as a con- founding variable, misclassification may affect the effi- ciency of adjustment for confounding effects, and thus seriously bias the results [1, 24]. Extensive literature over several decades has reported that people of lower socioeconomic status tend to have a higher prevalence of cardiometabolic diseases [5, 6, 25, 26]. Education is often used as a proxy for socioeconomic status [27], and one purpose of collecting information about edu- cation in the tromsø Study was to use this variable as a proxy for socioeconomic status; thus misreporting may lead to misclassification. this distortion in the associa- tion between the exposure and outcome might create a less pronounced educational trend when self-reported educational data are used. researchers should therefore be aware of the potential shortcomings of using self- reported education compared to administrative records. Strengths and limitations the main strength of this study is the individual complete linkage between a health survey and a national register, using the unique national identifi- cation number. the tromsø Study is a population- based study with a relatively large sample and good representativeness of both women and men. Data on educational level from SSb are based on reports from various educational institutions in norway and abroad, and we assessed the criterion validity to be reasonably high. this study also has some limitations as the tromsø Study and SSb measure educational level differently, with tromsø7 recording years of completed education, and SSb measuring completed education. this might result in low correctness and kappa values in our study. Although the dataset from the SSb had some missing values, the proportion was very low (0.5%) and did not impact the results. Changes in the wording of questionnaires or the addition of extra questions might help future partici- pants to provide their educational level more cor- rectly. For instance, asking for the highest level of education, rather than the number of years of educa- tion could improve accuracy. In conclusion, this study found that data on self- reported educational level in tromsø7 is adequately complete and correct for research, with fair weighed kappa values in all age groups and both sexes. A con- siderable proportion of participants, however, did not answer these questions correctly, which can lead to misclassification, and may explain why educational trends in cardiometabolic diseases were less pro- nounced when using self-reported educational level. We consider our findings to be important for epide- miological research, as they contribute to knowledge on the degree of misclassification and validation of self-reported educational level. Acknowledgements the author(s) thank the participants of tromsø7, as their willingness to participate is fundamental to our research. Declaration of conflicting interests the author(s) declared no potential conflicts of interest with respect to the research, authorship, and/ or publication of this article. Funding the author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: this work was supported by the high north Population Studies, uit the Arctic university of norway. OrcID iD Chi Q Vo https://orcid.org/0000-0003-3988-5726 1068 Vo et al. Supplemental material Supplemental material for this article is available online. references [1] Althubaiti A. Information bias in health research: definition, pitfalls, and adjustment methods. J Multidiscip Healthcare 2016;9:211–217. [2] galobardes b, Shaw m, lawlor DA, et al. Indicators of socioeconomic position (part 1). J Epidemiol Community Health 2006;60:7–12. [3] lahelma E, martikainen P, laaksonen m, et al. Pathways between socioeconomic determinants of health. J Epidemiol Community Health 2004;58:327–332. [4] ross CE and Wu C-l. the links between education and health. Am Sociol Rev 1995;60:719–745, https://doi. org/10.2307/2096319. [5] beltrán-Sánchez h and Andrade FC. time trends in adult chronic disease inequalities by education in brazil: 1998– 2013. Int J Equity Health 2016;15:139. [6] Espelt A, borrell C, roskam AJ, et al. Socioeconomic inequalities in diabetes mellitus across Europe at the begin- ning of the 21st century. Diabetologia 2008;51:1971. [7] hogan Wr and Wagner mm. Accuracy of data in computer- based patient records. J Am Med Inform Assoc 1997;4:342–355. [8] Eggen AE, mathiesen Eb, Wilsgaard t, et al. the sixth sur- vey of the tromso Study (tromso 6) in 2007–08: collab- orative research in the interface between clinical medicine and epidemiology: study objectives, design, data collection procedures, and attendance in a multipurpose population- based health survey. Scand J Public Health 2013;41:65–80. [9] Jacobsen bK, Eggen AE, mathiesen Eb, et al. Cohort pro- file: the tromso Study. Int J Epidemiol 2012;41:961–967. [10] Kleven ø and ringdal K. Causes and effects of measurement errors in educational attainment. report no. 2535-7271. Oslo: Statistisk Sentralbyrå, 2020. [11] barrabés n and østli gK. Norwegian Standard Classification of Education 2016. Contract no. Documents 2017/02. Oslo: Statistisk Sentralbyrå, 2016. [12] Viera AJ and garrett Jm. understanding interobserver agreement: the kappa statistic. Fam Med 2005;37:360–363. [13] Sjöström O and holst D. Validity of a questionnaire survey: response patterns in different subgroups and the effect of social desirability. Acta Odontologica Scand 2002;60:136–140. [14] brenner PS and Delamater J. lies, damned lies, and survey self-reports? Identity as a cause of measurement bias. Soc Psychol Q 2016;79:333–354. [15] black D, Sanders S and taylor l. measurement of higher education in the Census and Current Population Survey. J Am Stat Assoc 2003;98:545–554. [16] battistin E, De nadai m and Sianesi b. misreported school- ing, multiple measures and returns to educational qualifica- tions. J Econometrics 2014;181:136–150. [17] Connelly r, gayle V and lambert PS. A review of educa- tional attainment measures for social survey research. Meth- odol Innovat 2016;9:2059799116638001. [18] nepomuceno mr and turra Cm. Assessing the quality of education reporting in brazilian censuses. Demogr Res 2020;42:441–460. [19] Kehm bm, michelsen S and Vabø A. towards the two-cycle degree structure: bologna, reform and path dependency in german and norwegian universities. Higher Educat Policy 2010;23:227–245. [20] nygård g and rustad hAm. New classification of educa- tional attainment Statistics Norway. Statistics norway; 2016. https://www.ssb.no/en/utdanning/artikler-og-publikasjoner/ new-classification-of-educational-attainment (accessed 21 march 2021). [21] Kristensen P, Corbett K, mohn FA, et al. Information bias of social gradients in sickness absence: a comparison of self- report data in the norwegian mother and Child Cohort Study (moba) and data in national registries. BMC Public Health 2018;18:1275. [22] Kominski r and Siegel Pm. measuring education in the Current Population Survey. Monthly Labor Review 1993;116:34+. [23] matthes b, Kruppe t and unger S. Effectiveness of data cor- rection rules in process-produced data. The case of educational attainment. IAb-Discussion Paper, 15/2014, 2014, Institut für Arbeitsmarkt- und berufsforschung (IAb), nürnberg [Institute for Employment research, nuremberg, ger- many]. [24] Szklo m and nieto FJ. Epidemiology: Beyond the Basics, 4th ed. Sudbury: Sudbury: Jones & bartlett learning llC, 2018. pp. 140–151. [25] de mestral C and Stringhini S. Socioeconomic status and cardiovascular disease: an update. Curr Cardiol Rep 2017;19:115. [26] Dalstra J, Kunst A, borrell C, et al. Socioeconomic dif- ferences in the prevalence of common chronic diseases: an overview of eight European countries. Int J Epidemiol 2005;34:316–326. [27] Janković S, Stojisavljević D, Janković J, et al. Association of socioeconomic status measured by education, and cardio- vascular health: a population-based cross-sectional study. BMJ Open 2014;4:e005222. Paper III Appendices 1. Supplementary Table 1 in paper I. Odds ratios for participation by sex, Tromsø7 (2015-2016). 2. Supplementary Table 1 in paper II. Sex-specific cross tabulation of self-reported and Statistics Norway-recorded educational level by age. The Tromsø Study 2015-2016. 3. Supplementary Table 2 in paper II. Sex-specific cross tabulation of self-reported and Statistics Norway-recorded educational level by age. The Tromsø Study 2015-2016. Bolded numbers reflect positive predictive value. 4. Supplementary Table 3 in paper II. Distribution of self-reported and Statistics Norway-recorded educational level by age and sex. The Tromsø Study 2015-2016. Supplementary Table 1. Odds ratios for participation by sex, Tromsø7 (2015-2016). Women Men Women Men Crude OR (95% Cl) Crude OR (95% Cl) Adjusted OR (95% Cl)a Adjusted OR (95% Cl)b Age, years 40-49 Ref. Ref. - - 50-59 1.35 (1.24 – 1.48) 1.49 (1.37 – 1.62) - - 60-69 1.58 (1.44 – 1.74) 1.97 (1.80 – 2.16) - - 70-79 1.14 (1.02 – 1.28) 1.86 (1.66 – 2.07) - - 80-99 0.27 (0.24 – 0.31) 0.76 (0.65 – 0.89) - - Marital status Married Ref. Ref. Ref. Ref. Unmarried 0.64 (0.59 – 0.70) 0.48 (0.45 – 0.52) 0.66 (0.61 – 0.72) 0.52 (0.49 – 0.57) Widowed 0.34 (0.31 – 0.38) 0.47 (0.38 – 0.57) 0.61 (0.54 – 0.70) 0.56 (0.45 – 0.69) Separated/divorced 0.67 (0.62 – 0.74) 0.59 (0.53 – 0.65) 0.65 (0.60 – 0.72) 0.56 (0.51 – 0.61) Country of birth Norway Ref. Ref. Ref. Ref. Western countries 0.88 (0.74 – 1.04) 0.52 (0.45 – 0.61) 0.81 (0.68 – 0.96) 0.54 (0.46 – 0.63) Eastern Europe 0.26 (0.26 – 0.41) 0.09 (0.07 – 0.12) 0.30 (0.24 – 0.38) 0.10 (0.08 – 0.13) Other countries 0.39 (0.39 – 0.57) 0.26 (0.21 – 0.32) 0.44 (0.36 – 0.53) 0.28 (0.23 – 0.35) Region of birth Tromsø Ref. Ref. Ref. Ref. Northern Norway 0.95 (0.88 – 1.03) 1.11 (1.03 – 1.21) 1.02 (0.94 – 1.11) 1.09 (1.00 – 1.18) South Norway 0.87 (0.80 – 0.95) 1.02 (0.94 – 1.11) 1.03 (0.93 – 1.13) 0.98 (0.89 – 1.07) Educational level Primary Ref. Ref. Ref. Ref. Upper secondary 2.01 (1.85 – 2.20) 1.75 (1.61 – 1.91) 1.80 (1.65 – 1.97) 1.77 (1.62 – 1.93) College/university <4 years 2.45 (2.24 – 2.69) 2.10 (1.90 – 2.32) 2.20 (1.99 – 2.42) 2.22 (2.00 – 2.47) College/university ≥4 years 2.08 (1.85 – 2.33) 1.65 (1.48 – 1.83) 1.88 (1.67 – 2.12) 1.74 (1.56 – 1.94) Individual income (NOK) <249,999 0.60 (0.51 – 0.70) 0.57 (0.52 – 0.63) 0.52 (0.44 – 0.61) 0.34 (0.30 – 0.38) 250,000-349,999 0.96 (0.79 – 1.18) 0.49 (0.41 – 0.59) 0.90 (0.73 – 1.10) 0.42 (0.35 – 0.51) 350,000-449,999 1.16 (0.97 – 1.38) 0.64 (0.56 – 0.73) 1.14 (0.95 – 1.36) 0.61 (0.53 – 0.70) 450,000-549,999 1.41 (1.18 – 1.69) 0.96 (0.85 – 1.08) 1.44 (1.21 – 1.71) 0.99 (0.88 – 1.12) 550,000-749,999 1.34 (1.12 – 1.60) 1.13 (1.01 – 1.26) 1.35 (1.13 – 1.62) 1.17 (1.04 – 1.31) ≥750,000 Ref. Ref. Ref. Ref. Total household income (NOK) <249,999 0.19 (0.17 – 0.21) 0.16 (0.14 – 0.19) 0.22 (0.19 – 0.25) 0.15 (0.13 – 0.17) 250,000-349,999 0.46 (0.41 – 0.51) 0.31 (0.27 – 0.35) 0.45 (0.40 – 0.51) 0.28 (0.25 – 0.31) 350,000-449,999 0.71 (0.63 – 0.79) 0.46 (0.41 – 0.52) 0.71 (0.63 – 0.80) 0.43 (0.38 – 0.49) 450,000-549,999 0.63 (0.57 – 0.71) 0.57 (0.51 – 0.64) 0.59 (0.53 – 0.67) 0.49 (0.44 – 0.55) 550,000-749,999 0.85 (0.77 – 0.93) 0.87 (0.80 – 0.95) 0.80 (0.72 – 0.88) 0.79 (0.72 – 0.86) ≥750,000 Ref. Ref. Ref. Ref. Residential ownership status Owner 2.84 (2.58 – 3.14) 3.53 (3.21 – 3.89) 2.66 (2.41 – 2.94) 3.32 (3.02 – 3.66) Renter Ref. Ref. Ref. Ref. Area SES2 Low Ref. Ref. Ref. Ref. High 1.47 (1.35 – 1.59) 1.52 (1.40 – 1.64) 1.24 (1.13 – 1.35) 1.17 (1.08 – 1.28) Individual-level SES Low Ref. Ref. Ref. Ref. Medium 2.28 (2.11 – 2.46) 2.39 (2.20 – 2.59) 2.23 (2.05 – 2.43) 2.62 (2.41 – 2.85) High 2.56 (2.34 – 2.79) 2.85 (2.63 – 3.10) 2.84 (2.58 – 3.14) 3.73 (3.41 – 4.08) aAdjusted for age. bAdditionally adjusted for individual-level SES. OR: odds ratio, CI: confidence interval, NOK: Norwegian kroner, SES: socioeconomic status. Supplementary Table 1. Sex-specific cross tabulation of self-reported and Statistics Norway-recorded educational level by age. The Tromsø Study 2015-2016. Bolded numbers reflect positive predictive value. Statistics Norway Tromsø7 Age group Women Primary education, n (%) Upper secondary education, n (%) College/university <4 years, n (%) College/university ≥4 years, n (%) Agreement (%) Weighted kappa 40-52 years 64.4 0.66 Primary education 258 (66.8) 125 (32.4) 3 (0.8) 0 (0.0) Upper secondary education 100 (9.3) 921 (86.2) 47 (4.4) 1 (0.1) College/university <4 years 20 (2.1) 190 (20.0) 733 (77.2) 7 (0.7) College/university ≥4 years 6 (0.3) 16 (0.8) 1040 (52.9) 905 (46.0) 53-62 years 60.8 0.65 Primary education 403 (62.8) 234 (36.4) 5 (0.8) 0 (0.0) Upper secondary education 127 (14.5) 719 (82.2) 28 (3.2) 1 (0.1) College/university <4 years 36 (6.4) 147 (26.3) 371 (66.4) 5 (0.9) College/university ≥4 years 6 (0.6) 15 (1.5) 598 (60.4) 372 (37.5) 63-99 years 56.7 0.60 Primary education 759 (48.4) 803 (51.2) 7 (0.4) 0 (0.0) Upper secondary education 55 (6.8) 709 (88.1) 40 (5.0) 1 (0.1) College/university <4 years 6 (1.5) 117 (29.0) 276 (68.3) 5 (1.2) College/university ≥4 years 5 (0.8) 7 (1.1) 420 (69.0) 177 (29.1) Percentages calculated to equal 100% in row. Supplementary Table 2. Sex-specific cross tabulation of self-reported and Statistics Norway-recorded educational level by age. The Tromsø Study 2015-2016. Bolded numbers reflect positive predictive value. Statistics Norway Tromsø7 Age group Men Primary education, n (%) Upper secondary education, n (%) College/university <4 years, n (%) College/university ≥4 years, n (%) Agreement (%) Weighted kappa 40-52 years 70.6 0.72 Primary education 337 (65.8) 175 (34.2) 0 (0.0) 0 (0.0) Upper secondary education 101 (8.1) 1083 (87.3) 54 (4.4) 3 (0.2) College/university <4 years 23 (2.7) 274 (32.5) 543 (64.3) 4 (0.5) College/university ≥4 years 5 (0.4) 38 (3.0) 461 (36.4) 764 (60.2) 53-62 years 66.9 0.69 Primary education 364 (61.4) 227 (38.2) 1 (0.2) 1 (0.2) Upper secondary education 89 (10.5) 730 (86.1) 26 (3.1) 3 (0.3) College/university <4 years 28 (4.7) 239 (40.2) 321 (53.9) 7 (1.2) College/university ≥4 years 1 (0.2) 19 (2.9) 248 (38.4) 378 (58.5) 63-99 years 64.9 0.68 Primary education 545 (51.5) 513 (48.5) 0 (0.0) 0 (0.0) Upper secondary education 45 (5.0) 797 (88.6) 56 (6.2) 2 (0.2) College/university <4 years 9 (1.4) 259 (40.3) 363 (56.4) 12 (1.9) College/university ≥4 years 1 (0.2) 34 (5.3) 208 (32.4) 398 (62.1) Percentages calculated to equal 100% in row. Supplementary Table 3. Distribution of self-reported and Statistics Norway-recorded educational level by age and sex. The Tromsø Study 2015-2016. Women (n = 10 826) Men (n = 9789) Age group Tromsø7 n (%) Statistics Norway n (%) Tromsø7 n (%) Statistics Norway n (%) 40-52 years Primary education 386 (8.8) 384 (8.8) 512 (12.3) 466 (12.1) Upper secondary education 1069 (24.5) 1251 (28.6) 1241 (32.1) 1570 (40.6) College/university <4 years 950 (21.7) 1823 (41.7) 844 (21.8) 1058 (27.4) College/university ≥4 years 1967 (45.0) 913 (20.9) 1268 (32.8) 771 (19.9) 53-62 years Primary education 642 (20.9) 572 (18.6) 593 (22.1) 482 (18.0) Upper secondary education 875 (28.5) 1115 (36.4) 848 (31.6) 1215 (45.3) College/university <4 years 559 (18.2) 1002 (32.7) 595 (22.2) 596 (22.2) College/university ≥4 years 991 (32.3) 378 (12.3) 646 (24.1) 389 (14.5) 63-99 years Primary education 1569 (46.3) 825 (24.4) 1058 (32.6) 600 (18.5) Upper secondary education 805 (23.8) 1636 (48.3) 900 (27.8) 1630 (49.4) College/university <4 years 404 (11.9) 743 (21.9) 643 (19.8) 627 (19.3) College/university ≥4 years 609 (18.0) 183 (5.4) 641 (19.8) 412 (12.7)