Validity of self-reported educational level in the Tromsø Study

Background: Self-reported data on educational level have been collected for decades in the Tromsø Study, but their validity has yet to be established. Aim: To investigate the completeness and correctness of self-reported educational level in the Tromsø Study, using data from Statistics Norway. In addition, we explored the consequence of using these two data sources on educational trends in cardiometabolic diseases. Methods: We compared self-reported and Statistics Norway-recorded educational level (primary, upper secondary, college/university <4 years, and college/university ⩾4 years) among 20,615 participants in the seventh survey of the Tromsø Study (Tromsø7, 2015–2016). Sensitivity, positive predictive value and weighted kappa were used to measure the validity of self-reported educational level in three age groups (40–52, 53–62, 63–99 years). Multivariable logistic regression was used to compare educational trends in cardiometabolic diseases between self-reported and Statistics Norway-recorded educational level. Results: Sensitivity of self-reported educational level was highest among those with a college/university education of 4 years or more (⩾97% in all age groups and both sexes). Sensitivity for primary educational level ranged from 67% to 92% (all age groups and both sexes). The lowest positive predictive value was observed among women with a college/university education of 4 years or more (29–46%). Weighted kappa was substantial (0.52–0.59) among men and moderate to substantial (0.41–0.51) among women. Educational trends in the risk of cardiometabolic diseases were less pronounced when self-reported educational level was used. Conclusions: Self-reported educational level in Tromsø7 is adequately complete and correct. Self-reported data may produce weaker associations between educational level and cardiometabolic diseases than registry-based data.


Introduction
Self-administrated questionnaires are often used in epidemiological studies to obtain information about a person's education.Inaccurate self-reported data occur when individuals answer questions incorrectly, which can lead to exposure misclassification, and thereby to less reliable study findings [1].Education is an important determinant of socioeconomic status, as it confers skills that help individuals utilise health information, and it affects future income and occupational class [2,3].Indeed, education has become the principal pathway to higher incomes, stable employment and healthier lifestyle [4].Furthermore, as self-reported education is often used as an exposure and covariate in health research [5,6], it is important to assess the validity of that variable.Validation studies on this variable should be done to produce estimates of misclassification in self-reported data and help determine if study results are biased.Data accuracy can be determined by comparing self-reported data to a gold standard data source and is often calculated by two measures: correctness, the proportion of recorded observations in the registry that are correct; and completeness, which measures the proportion of recorded observations that are actually recorded in the gold standard data source [7].Studies of the quality of reported education are not new in the literature.however, research on the validity of self-reported education within epidemiology is still scarce.this study aimed to investigate the completeness and correctness of self-reported educational level in the tromsø Study, using data from Statistics norway (SSb).In addition, we explored the consequence of using these two data sources on educational trends in cardiometabolic diseases.

Methods
The Tromsø Study the tromsø Study is an ongoing population-based health survey, which consists of seven surveys (tromsø1-7) conducted between 1974 and 2016 in the municipality of tromsø, northern norway.the study population consists of complete birth cohorts and random samples of other cohorts [8,9].All inhabitants of the municipality aged 40 years and above were invited to participate in tromsø7 (2015-2016), and the study questionnaire collected information on topics such as health issues, symptoms, diseases, use of medication and healthcare services, employment, and sociodemographic and lifestyle factors.

Study population
Data on self-reported educational level from tromsø7 was linked to data from SSb, the national statistical institute of norway and the main producer of official statistics, using the unique 11-digit identification number assigned to all individuals living in norway.A total of 21,083 people participated in tromsø7 (attendance 65%), of which 20,615 had records in SSb and were included in the analyses.A total of 468 were excluded from the analysis.Of these 99 persons lack information about education in SSb (19 persons were specified as 'no education, unspecified, and preschool education') and 369 had no education in tromsø7.

Self-reported educational level, household income, and other variables
In the tromsø7 questionnaire, participants were asked to respond to the question: 'What is the highest level of education you have completed?'.response options were: primary/partly secondary education (up to 10 years of schooling); upper secondary education (minimum 3 years); tertiary education, short: college/university less than 4 years; and tertiary education, long: college/university 4 years or more (see link to questionnaire in Supplemental material).they were also asked to report their total pre-taxable household income for the previous year, using eight categories from 150,000 nOK or less to 1,000,000 nOK or more.the two lowest income groups (⩽150,000 nOK and 150,000-250,000 nOK) were merged in the analysis.Participants reported their current and previous status for the following cardiometabolic diseases: diabetes, myocardial infarction, angina pectoris, and cerebral stroke, which were categorised as binary variables.Participants reported their self-rated health status as 'very bad', 'bad', 'neither good nor bad', 'good' and 'excellent', which was regrouped into three categories ('bad', 'neither good nor bad' and 'good').Finally, participants reported whether or not they lived with a spouse.

SSB-recorded educational level
Educational information in SSb comes from administrative sources, such as educational institutions, and the State Educational loan Fund provides supplemental data on education acquired abroad [10].SSb records the highest completed educational level.the norwegian Standard Classification of Education has nine educational levels alone, including a value for unspecified level [11].these were regrouped by SSb into: no education or preschool education; primary education; upper secondary education; vocational education; university/college education, short; and university/college education, long.We furthermore excluded participants in the group with no education, preschool education or unspecified education from the analysis.We also merged the categories upper secondary education and vocational education leaving four educational levels (primary education, upper secondary education, university/ college education <4 years, and university/college education ⩾4 years) that were comparable to the self-reported educational levels in tromsø7.

Statistical analyses
We assessed the validity of self-reported educational level in tromsø7 by estimating sensitivity (completeness) and positive predictive value (PPV, correctness), using SSb-recorded educational level as the gold standard.Agreement between self-reported and SSbrecorded educational level was measured by percentage observed agreement and weighted kappa.Kappa values and kappa agreement were interpreted as proposed by Viera and garrett [12] (less than chance: <0.00, slight: 0.00-0.20,fair: 0.21-0.40,moderate: 0.41-0.60,substantial: 0.61-0.80,or almost perfect: 0.81-1.00).multinomial logistic regression was used to calculate odds ratios (Ors) of over or underreporting educational level.Comparisons between selfreported and SSb-recorded educational level were stratified by age group (40-52, 53-62 and 63-99 years) and sex.these age groups were constructed after taking into account the school reform of 1959, when 7 years of primary education was made mandatory.those who started primary school in 1959 were 63 years old in tromsø7.the 53-62 age group was constructed to reflect another school reform in 1969.logistic regression models were also used to estimate Ors of self-reported cardiometabolic diseases in tromsø7 according to self-reported and SSbrecorded educational levels.A randomisation test with 10,000 permutations of the data file was used to compare trends, that is, the categorical educational level variable modelled as a linear term, between selfreported and SSb-recorded educational level.the linearity assumption was reasonably met and selfreported and SSb-recorded educational levels were therefore modelled as linear terms.

Ethics
this study was approved by the norwegian Centre for research Data (nSD Data Protection Services) (reference 809230).All participants in the tromsø Study have given written informed consent for their data to be used in research.this study was not defined as health research by the regional Ethics Committee north and was exempted from the requirement of study preapproval.

results
Of the 20,615 individuals included in the analysis, 53% were women; the mean age was 57 years (standard deviation (SD): 11.3 years, range: 40-99 years).the proportion of women with college/university education of 4 years or more was higher than that among men (33% vs. 26%, respectively); this was also seen for the primary educational level (24% vs. 22%, respectively).the proportion of women with household income of 1,000,000 nOK or more was lower than that among men (22% vs. 28%, respectively) (table I).
Sensitivity of self-reported educational level was highest among those with a college/university education of 4 years or more (⩾97% in all age groups and both sexes), and lowest among those with a college/ university education of less than 4 years (37-58% in all age groups and both sexes) (table II).Among women who self-reported primary educational level, sensitivity ranged from 67% to 92%, compared to 72-91% among men.PPVs for women with a college/university education of 4 years or more were between 29-46% and 59-62% for men. the PPV was 48-67% among women, compared to 52-66% among men with primary education.In all age groups and both sexes, the highest degree of underreporting in tromsø7 was observed among those with SSbrecorded upper secondary educational level, but a self-reported primary educational level, whereas the highest degree of overreporting was observed among those with SSb-recorded college/university education less than 4 years, but a self-reported college/university education of 4 years or more (Supplemental  II).For women, kappa agreement varied from moderate to substantial (57-64%), and was substantial in all age groups for men (65-71%).
A fair corresponding weighted kappa value was found in all age groups for women (0.41, 0.48 and 0.51, respectively), and for men (0.52, 0.54 and 0.59, respectively).Among those aged 40-52 and 53-62 years, the proportions of self-reported and SSb-recorded primary educational level were similar.however, in those aged 63-99 years, there was a notable difference (Supplemental table III).All age groups showed higher self-reported than SSb-recorded college/university education of 4 years or more, and this was especially evident in the youngest age group.
We found educational trends in the risk of selfreported cardiometabolic diseases when using both self-reported and SSb-recorded educational level (table IV).For women the odds for diabetes increased by 31% per one-level decrease in selfreported educational level (Or 1.31, 95% CI 1.20-1.42),while the odds increased by 44% per one-level decrease in SSb-recorded educational level (Or 1.44, 95% CI 1.29-1.61).We saw the same trends for myocardial infarction, angina pectoris, and stroke, also for men.however, the educational trend was less pronounced when using the self-reported educational level.

Discussion
We found that self-reported data on educational level in tromsø7 achieved very high completeness (⩾97% in all age groups and both sexes) for participants with a college/university education of 4 years or more, and high completeness (67-92% in all age groups and both sexes) for those with a primary educational level.however, low correctness was found for both of these educational levels (29-62% for college/university education ⩾4 years and 48-67% for primary educational level, respectively).Our findings showed substantial agreement (65-71%) in all age groups for men, and moderate to substantial agreement for women (57-64%).Fair weighted kappa values were found in both women (0.41-0.51) and men (0.52-0.59).Educational trends in cardiometabolic diseases were less pronounced when self-reported educational level was used rather than registryrecorded educational level.the degree of completeness was highest among those with a college/university education of 4 years or more, indicating near-perfect self-reporting.however, completeness among those with primary educational level was slightly lower.low correctness was found in all age groups in our highest and lowest categories of educational level.there are several possible explanations for this low correctness.First, individuals might consider that they belong in the highest educational category because they have taken courses or programmes that were not necessarily included in a degree.Indeed, it is common in norway to take workrelated continuing education courses, but they do not necessarily culminate in a formal degree.SSb only places individuals in the category of college/university education of 4 years or more if they have completed a master's degree or a PhD [11].In addition, tromsø7 and SSb measure the educational level differently: total missing values for women n=1298.
total missing values for men n=486.
SSb asks for the highest completed degree, whereas tromsø7 asked for the duration of education.moreover, it has been hypothesised that questionnaire respondents sometimes give answers that are more in line with prevailing social norms than their factual situation [13].When individuals provide answers they believe to be more socially desirable, rather than revealing their true attitudes, preferences, or beliefs, it is referred to as social desirability bias [14]; it is one of the most common and pervasive sources of bias that affects the validity of survey research findings and might also explain some of the overreporting of the educational level in our study.Previous studies also found that those who claimed to have a degree did not, in fact, have any degree [15,16].
It is often harder to get a correct answer to questions about education.Some might think they do not have the education they 'should have' due to a feeling of social prestige, and therefore report a higher educational level than they actually have [10].the age group 53-62 years had a higher tendency to overreport their educational level than the youngest age group, while others have found a higher tendency of overreporting among the youngest age group [10].Second, it is difficult to measure education appropriately, as most societies have complex educational systems that change over time [17,18].In tromsø7, the participants of different age groups have received their education within different school systems, as the norwegian educational system has been reformed continuously from 1959, which may make it difficult for these participants to  report their educational level correctly; for example, the transition from several different degrees with specific norwegian and latin titles to bachelor and master degrees [19].SSb has re-classified the educational level of those with what were previously the lowest and middle educational levels due to changes in the norwegian educational system [10,20].Self-reporting of educational level could also be subject to recall bias, particularly among the oldest participants [10,16].Finally, overreporting of educational level in questionnaires due to misunderstanding has been reported [21,22].It has been suggested that this misunderstanding is linked to the question regarding the duration of education (total years of education versus highest obtained degree) [21,22], and misclassification can occur when inferring attainment of a degree from years of schooling.
Previous studies observed misreporting of educational level in both sexes, although it was higher among women, which was also the case in our study [15,23].Our data suggest that participants from the most affluent households are more likely to overreport their educational level.A previous study found that women who reported having a higher degree also tended to have higher earnings than those who reported their educational level correctly [15].highincome individuals are more likely than low-income individuals to report their education correctly [23], which is consistent with our findings in the highest household income category.
Knowing the extent of misreporting also has obvious implications for the interpretation of other studies that use educational attainment as an exposure or for descriptive purposes.When education is used as a confounding variable, misclassification may affect the efficiency of adjustment for confounding effects, and thus seriously bias the results [1,24].Extensive literature over several decades has reported that people of lower socioeconomic status tend to have a higher prevalence of cardiometabolic diseases [5,6,25,26].Education is often used as a proxy for socioeconomic status [27], and one purpose of collecting information about education in the tromsø Study was to use this variable as a proxy for socioeconomic status; thus misreporting may lead to misclassification.this distortion in the association between the exposure and outcome might create a less pronounced educational trend when self-reported educational data are used.researchers should therefore be aware of the potential shortcomings of using selfreported education compared to administrative records.

Strengths and limitations
the main strength of this study is the individual complete linkage between a health survey and a national register, using the unique national identification number.the tromsø Study is a populationbased study with a relatively large sample and good representativeness of both women and men.Data on educational level from SSb are based on reports from various educational institutions in norway and abroad, and we assessed the criterion validity to be reasonably high.this study also has some limitations as the tromsø Study and SSb measure educational level differently, with tromsø7 recording years of completed education, and SSb measuring completed education.this might result in low correctness and kappa values in our study.Although the dataset from the SSb had some missing values, the proportion was very low (0.5%) and did not impact the results.Changes in the wording of questionnaires or the addition of extra questions might help future participants to provide their educational level more correctly.For instance, asking for the highest level of education, rather than the number of years of education could improve accuracy.
In conclusion, this study found that data on selfreported educational level in tromsø7 is adequately complete and correct for research, with fair weighed kappa values in all age groups and both sexes.A considerable proportion of participants, however, did not answer these questions correctly, which can lead to misclassification, and may explain why educational trends in cardiometabolic diseases were less pronounced when using self-reported educational level.We consider our findings to be important for epidemiological research, as they contribute to knowledge on the degree of misclassification and validation of self-reported educational level.the author(s) thank the participants of tromsø7, as their willingness to participate is fundamental to our research.

Declaration of conflicting interests
the author(s) declared no potential conflicts of interest with respect to the research, authorship, and/ or publication of this article.Funding the author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: this work was supported by the high north Population Studies, uit the Arctic university of norway.

Figure 1 .
Figure 1.Differences between self-reported and Statistics norway-recorded educational level by sex. the tromsø Study 2015-16.negative numbers indicate underreporting and the positive numbers indicate overreporting.

table I .
Socioeconomic characteristcs of study population in the tromsø Study 2015-16.

table II .
Validity of self-reported educational level compared to that recorded in Statistics norway by age and stratified by sex. the tromsø Study 2015-2016.

table III .
Sex-specific odds ratios of under and overreporting of educational level from tromsø7 and Statistics norway.

table IV .
Age-adjusted odds ratios for the association between cardiometabolic diseases and educational level from tromsø7 and Statistics norway.P value for equality between Ors based on education from Statistics norway and tromsø7.the tromsø Study 2015-16.
a Education as linear term, per level decrease.b c Diabetes mellitus types 1 and 2.