Associations of breast cancer related exposures and gene expression profiles in normal breast tissue—The Norwegian Women and Cancer normal breast tissue study

Abstract Background Normal breast tissue is utilized in tissue‐based studies of breast carcinogenesis. While gene expression in breast tumor tissue is well explored, our knowledge of transcriptomic signatures in normal breast tissue is still incomplete. The aim of this study was to investigate variability of gene expression in a large sample of normal breast tissue biopsies, according to breast cancer related exposures (obesity, smoking, alcohol, hormone therapy, and parity). Methods We analyzed gene expression profiles from 311 normal breast tissue biopsies from cancer‐free, post‐menopausal women, using Illumina bead chip arrays. Principal component analysis and K‐means clustering was used for initial analysis of the dataset. The association of exposures and covariates with gene expression was determined using linear models for microarrays. Results Heterogeneity of the breast tissue and cell composition had the strongest influence on gene expression profiles. After adjusting for cell composition, obesity, smoking, and alcohol showed the highest numbers of associated genes and pathways, whereas hormone therapy and parity were associated with negligible gene expression differences. Conclusion Our results provide insight into associations between major exposures and gene expression profiles and provide an informative baseline for improved understanding of exposure‐related molecular events in normal breast tissue of cancer‐free, post‐menopausal women.

factors for breast cancer, other than sex and age, include overweight/ obesity, alcohol consumption, family history of breast cancer, reproductive history, postmenopausal hormone therapy (HT), as well as smoking for pre-menopausal breast cancer. 2,3 With growing incidence rates, increased understanding of the mechanisms of cancer development is needed for preventive measures and early detection.
Despite growing understanding of breast cancer development at the molecular level, our knowledge of transcriptomic signatures of normal breast tissue is still incomplete. Tissue samples of normal breast have been widely used in breast cancer research serving as control tissue. [4][5][6][7] However, these tissue samples often originate from reduction mammoplasty, benign breast lesions, or histologically normal tissue adjacent to breast cancer. 4,5,8 Such tissue samples often show more histological abnormalities when compared to tissue obtained from healthy donors, [9][10][11] and using different sources of control breast tissue in different studies make comparisons between studies difficult. In addition, most studies on gene expression profiles were generated from a small set of samples that were likely not representative of the general population. 8,12 Finally, a better understanding of the natural variability of gene expression in normal breast tissue would represent a significant step forward in our understanding of early disease-related mechanisms.
With this study of a large, random sample of cancer-free, postmenopausal Norwegian women, we investigated the variability of gene expression in normal breast tissue. In particular, we explored gene expression patterns associated with exposure to known risk factors for breast cancer, such as obesity, parity, alcohol consumption, and use of menopausal HT. We also examined if smoking was associated with gene expression. The generated data represent a baseline of gene expression patterns in normal breast tissue from cancer-free, post-menopausal women, and can potentially play an important role in the feasibility, design, and analysis of future tissue-based studies investigating biomarkers of exposure, as well as breast cancer development.

| Study population
A detailed description of the recruitment proses and study population as well as ethical aspects of genetic research in healthy populations are presented and discussed in our previously published article. 13 Briefly, study participants were recruited through the national mammography screening program at the Breast Diagnostic Center at the University Hospital of North Norway (UNN), Tromsø, Norway, during the years 2010-2011. They were not referred due to pathological clinical findings or irregularities on previous mammograms but attended a scheduled routine mammogram. Eligible women were age between 53 and 67 years, were post-menopausal, and were already participating in the nationally representative Norwegian Women and Cancer study (NOWAC). 14 Exclusion criteria for the present study included self-reported previous history of breast cancer, positive mammogram, other current malignant diseases and use of anticoagulation therapy with warfarin, heparin, dipyridamole, or clopidogrel. Eligible women who agreed to participate received written and oral information, signed an informed consent form, and answered a twopage questionnaire regarding menopausal status, weight and height, smoking and alcohol consumption, use of HT and other medication.
The number of included participants was 317. Three years after inclusion, data was linked to the Cancer Registry of Norway, using the unique personal identification number. This resulted in the exclusion of five participants who developed breast cancer within 3 years after the biopsy was taken, and one participant due to prior lymphoma diagnosis with unknown treatment. Thus, the final number of women included for statistical analysis was 311. The North Norway Regional Committee for Medical and Health Research Ethics (REK-Nord case no # 200603551) approved the study.

| Definition of exposures
Information on year of birth, menopausal status, current height and weight and exposures (HT use, smoking and alcohol consumption) was extracted from the two-page questionnaires answered at the time of inclusion.
Body mass index (BMI) was calculated, and obesity was defined according to the definition of the World Health Organization (WHO, BMI > 30). Women were considered post-menopausal if they reported that menstruation had ceased. In case of incomplete information, women were defined as post-menopausal if they were older than 53 years. Women who had consumed alcohol during the week prior to the biopsy, regardless of the type or amount, were defined as alcohol consumers. Similarly, women who had smoked during the week prior to biopsy were defined as smokers. Only women who were current users of systemic HT (tablets or patch) were defined as HT users. Data on parity was retrieved from the NOWAC database, and the variable was dichotomized into parous versus non-parous for analyses of gene expression.
We also collected data on smoking status and alcohol consumption (g/day) from the more comprehensive eight-page questionnaire answered by the participants as part of the prospective data collection in NOWAC. Participants of the biopsy study answered the eight-page questionnaire 0-20 years prior to donating a biopsy (1991-2011).
These data were used for a sensitivity analysis.

| Tissue samples
An experienced radiologist obtained tissue samples after mammography, by ultrasound-guided needle-biopsy (

| Statistical analysis
Data analysis was done using R (r-project.org). Raw files were quantile normalized using the Bioconductor lumi package. 15 Principal component analysis (PCA) and clustering was used for initial analysis of the dataset. The PCA was computed with all genes included. In order to obtain distinct clusters that correlate with the PCA scores to simplify interpretation we clustered the genes with the most variability (inter quantile range [IQR] > 1 log2 unit).
The three gene clusters identified were analyzed for overrepresented gene ontology (GO) terms using the clusterProfiler package. 16 This analysis highlighted cell composition as a potentially important co variate and non-negative matrix factorization (NMF) was used to obtain improved cellular composition estimates. 17 NMF was run with the "nsNMF" method and initialized with non-negative singular value decomposition to get a sparser estimate for the gene profiles of the cell types. The association of exposures and covariates on gene expression was determined using linear models for microarrays (LIMMA), with the scores on principal component one and two included as covariates to correct for bias due to the cellular composition of the biopsies. p Values from the linear models were corrected for multiple testing using the method of Benjamin and Hochberg. 18

| Unsupervised clustering
After normalization of data, the initial analyses identified 607 genes with high level of variance with IQR larger than one log2 unit. These 607 genes were analyzed by K-means clustering and three dominating cluster were identified ( Figure 1). These clusters appear unrelated to either of the exposures. PCA analysis with the exposure variables illustrated are shown in Figure S1.
Genes from the three clusters were analyzed using cluster profiler to identify GO categories that describe the functionality of the clusters ( Figure 2).
Cluster 1 Cluster 2 Cluster 3 High-variance genes in breast tissue from healthy women represent three main clusters unrelated to major breast cancer risk factors. We analyzed 607 high-variance genes (inter quantile range larger than 1 log2 unit) by K-means clustering, and identified three dominating clusters (1-red, 2-green, and 3-blue). Distribution of the exposures are shown in the top pane (in color, legend to the right). A, alcohol; BMI, body mass index; H, HT use; P, parity; S, smoking The three main gene expression clusters identified in breast tissue from healthy women likely reflect biopsy composition. Genes from the three clusters identified using K-means clustering ( Figure 1) were analyzed using clusterProfiler, to identify overrepresented gene ontology categories that describe the functionality of the clusters

| Associations between exposures and gene expression profiles
To analyze associations of selected exposures with gene expression profiles, we used LIMMA. An overview of the results is presented in Table 2. The 20 most significant genes and pathways associated with each exposure are presented in Tables 3 and 4 for obesity, Tables 5   and 6 for smoking, Tables 7 and 8 for alcohol, Tables 9 and 10 for HT,   and Tables 11 and 12 for parity. The list of differentially expressed genes and gene sets are provided in Supporting Information S2 and S3.
When comparing gene expression profiles from breast tissue biopsies from women with BMI of 30 and above to those with BMI below 30, we identified 1577 significantly differentially expressed genes (Top 20 genes in Table 3). The differentially expressed genes included three alcohol dehydrogenases. There were more than 600 differentially expressed gene sets from GO and Kyoto Encyclopedia of Genes and Genomes, the majority of which were up-regulated in women with obesity (Top 20 pathways in Table 4). The upregulated gene sets were dominated by immune-related processes, with both innate and adaptive immunology represented. The list of down-regulated gene sets included processes related to aerobic oxidation, fatty acid metabolism, amino acid metabolism, and protein translation in the mitochondria, which were all present among the top 20 gene sets, when sorted by p value.
Nine genes were statistically associated with alcohol exposure ( Table 7). Eight of these were upregulated (MAMDC4, ISCA2, FAM171A2, BCDIN3D, SMIM20, RIT1, DHRS4-AS1, and UNC50) and one, EPB42, was downregulated. Pathway analysis revealed 80 alcohol-associated gene sets, and the 30 downregulated gene sets were all related to immunological processes (Top 20 pathways in Table 8). The 50 upregulated pathways were related to aerobic oxidation and fatty acid metabolism, and these were all among the top 20 gene sets when sorted by p value.
Two genes (ZCCHC12 and SEL1L2) were associated with HT use, both upregulated, but no pathways were identified (Tables 9 and 10).
Finally, when comparing parous versus non-parous women, we found no associated genes or pathways at our chosen level of statistical significance (p < .05), Tables 11 and 12.
We carried out a sensitivity analysis combining two sources of exposure data for smoking and alcohol: the detailed, eight-page questionnaire answered 0-20 years prior to the biopsy, and the two-page questionnaire answered at the time of the biopsy. Being classified as a current smoker when combining data from the two time points was associated with the same top five genes as having smoked during the last week before the biopsy (data not shown). Being classified as a former smoker when combining data from the two timepoints was not associated with any differentially expressed genes (data not shown).
Assessed in the eight-page questionnaire, the median amount of  During obesity, macrophages, putatively of the M1 type, accumulate in the adipose tissue, serving as a rich source of cytokines. 20,21 In obese breast tissue, inflammatory foci with dead adipocytes circled by macrophages, have been observed. 22,23 In the breast tissue, macrophages are exposed to saturated fatty acids from lipolysis leading to TLR 4 signaling via NFκ-b, culminating in increased expression of proinflammatory genes like COX-2, IL-1β, IL-6, and TNF-α. 20 These obesity-linked pro-inflammatory mediators may have local, proneoplastic effects, but also contribute to diminished overall health in obesity. However, our study cannot distinguish between breast tissue transcriptomic patterns associated with local inflammation, and transcriptomic patterns associated with systemic, circulating inflammatory factors. Biologically, however, this distinction is somewhat artificial, as the local and systemic effects of obesity are closely interrelated. 20 Estrogen receptor signaling, increased by obesity and inflammation, is a key contributor to increased risk of hormone-positive breast cancer. In our data, five prostaglandin-related gene sets were identified. Prostaglandin contributes to increased aromatase expression in breast tissue, which is the rate-limiting enzyme in estrogen biosynthesis. 24,25 Hence, our results support the "obesity-inflammation-aromatase" axis 23   of body fat leading to obesity. 26 In the breast, adipocytes are involved in normal tissue development, but there is also close interaction between stromal adipocytes and tumor cells. 27 Our results are in line with the finding of gene expression related to lipogenesis and fatty acid oxidation being downregulated in subcutaneous fat of both moderately and morbidly obese women, potentially as a mechanism for limiting further development of fat mass. 28,29 Of note, it has been suggested that this profile may be reversed by tumor cells, allowing adipocytes to provide lipids for the growing tumor. 27 We also identified several down-regulated gene sets related to protein translation in the mitochondria. Metabolic imbalance is closely related to mitochondrial function, as they in addition to ATP production are involved in production and elimination of reactive oxygen species (ROS). 30 Obesity causes increased inflammation and oxidative stress through ROS production, which in turn may lead to mitochondrial dysfunction. 30 In adipose tissue and skeletal muscle, numbers of mitochondria and rate of mitochondria biogenesis may decrease during obesity. 30 Although these processes have not been described in breast tissue, our gene expression data from normal breast tissue supports this overall concept.

| Smoking
Our results revealed several genes and pathways significantly associated with smoking (Tables 5 and 6). Eight genes were up-regulated, one gene was down-regulated, and 19 gene sets were associated with smoking. Cytochrome P-450 1A1 and -1B1 (CYP1A1 and CYP1B1) were among the top up-regulated genes. These genes are involved in  39 Further, F2RL3 methylation was suggested as a biomarker of smoking, 40 and over-expression and hypo-methylation were associated with higher risk of lung cancer, 41 and with tumor aggressiveness and poor survival in renal cancer. 42 In the smoking group, stabilin-1 (STAB1) was up-regulated, and secreted protein acidic and rich in cysteine (SPARC) was down-regulated. STAB1 is a scavenger receptor mediating both phagocytosis of

| Alcohol
Exposure to alcohol was associated with 9 differentially expressed genes, and 80 gene sets (Tables 7 and 8). Overall, the magnitude of the differential gene expression is comparable to previous analyses of alcohol and gene expression in breast tumors. 50 Interestingly, the upregulated BCDIN3 domain containing RNA methyltransferase gene (BCDIN3D) has been clearly linked to breast cancer progression, via down-regulation of tumor suppressor miRNAs. 51 In a cohort of 227 breast cancer patients, tumor levels of BCDIN3D was associated with lower disease-free survival. 52 Hence, BCDIN3D could serve as a link between alcohol consumption and breast cancer tumorigenesis and survival.
Among the 80 pathways associated with alcohol consumption, 30 were down-regulated. All of these were related to immunological processes, and the majority describe aspects of the innate immune system. Particularly, mast cell mediated immunity was present, including mast cell activation and degranulation. Mast cells have been linked to alcohol consumption, as they may mediate the damaging effects of alcohol by contributing to chronic inflammation, tissue damage, and remodeling, especially in the gastrointestinal tract. 53 Their role in cancer, 54 including breast cancer, is controversial, with conflicting results on the association with disease subtypes and prognosis. 55,56 In sum, these findings warrant further investigation of the effects of alcohol in the pre-cancerous breast tissue environment.
Fifty pathways were up-regulated in alcohol consumers (Table 8).
Among those, two related processes were represented: aerobic oxidation, including translation of mitochondrial proteins for oxidative phosphorylation, and fatty acid metabolism. As in the liver, alcohol is metabolized in breast tissue into acetaldehyde, a class 1 carcinogen forming DNA and protein adducts, and further into acetic acid and acetyl-CoA, the latter which enters the citric acid cycle. 57,58 This increased acetyl-CoA input may drive energy metabolism and increase the cellular energy state. 59 Furthermore, it has been suggested that acetyl-CoA is not merely a passive metabolite, but rather an important signaling molecule dictating cell function in a variety of settings. 58 Similarly, the various metabolites of the tricarboxylic acid cycle (TCA), increasing in concentrations upon an up-regulation of the cycle, affect intracellular and organismal processes, such as innate immunity, inflammation, and immune effector cells (succinate), and tumor cell growth (fumarate). TCA metabolite release from the mitochondria are one of the main processes by which the mitochondria influence cell function. 58 With the upregulation of aerobic oxidation pathways in our data, perhaps as a response to increased levels of acetyl-CoA from ethanol, it is also evident that the genotoxic acetaldehyde may be present in the breast tissue. Taken together, our data suggests that alcohol consumption may influence gene expression related to breast tissue physiology and metabolism.

| Hormone therapy
A current exposure to HT in our study was associated with two upregulated single genes, but no significant pathways (Tables 9 and 10).
Prolonged, systemic use of all, yet especially combined HT is associated with increased risk for breast cancer. 60 Hall et al. found a distinct gene expression profile in breast cancer tissue associated with HT use, and linked HT use to better recurrence-free survival. 61 Changes in gene expression patterns in normal breast tissue after treatment with HT was observed in one experimental study, 62 and a recent study on DNA methylation showed association between HT use and epigenetic changes in normal breast tissue. 63 We did not observe similar changes in our data. Few participants exposed and lack of information on prior HT use, as well as duration of use, could partially explain these results.

| Parity
In our study, parity was not associated with any significant single genes or pathways (Tables 11 and 12). Epidemiological studies have shown that a first full time pregnancy at an early age, as well as multiple pregnancies, are associated with long-term risk reduction for breast cancer. 64 Several studies found genomic signature of pregnancy in the breast tissue by comparing gene expression profiles of parous and non-parous postmenopausal women. 65, 66 We did not reproduce these findings, perhaps related to a low number of nonparous women in our cohort (55 women, 17% of the study sample).

| Sensitivity analysis
As a sensitivity analysis, we combined two sources of exposure data for smoking and alcohol: the detailed, eight-page questionnaire answered 0-20 years prior to the biopsy, and the two-page questionnaire answered at the time of the biopsy. The combined information on smoking exposure provided no further insight. This was also true for participants being classified as former smokers. Similarly, combining information on alcohol intake during the week before the biopsy with the data on alcohol intake 0-20 years prior to the biopsy provided no further insight. The average alcohol consumption of our participants was low (median: 3.08 g/day), which limits our ability to discern effects of higher alcohol consumption.
For these sensitivity analyses of smoking and alcohol, the detailed exposure information was separated with up to 20 years in time from the more limited two-page questionnaire answered at the time of the biopsy. As the additional data did to add much, we chose to present the most recent exposure information as our main result, even though it was less detailed compared to the eight-page questionnaire. The lack of additional findings ties in with the understanding of gene expression being a highly dynamic and responsive biological process, which is likely to reflect recent exposures rather than exposure history. The results are in line with findings on gene expression in blood related to smoking history and current smoking. 67 In comparison, DNA methylation patterns may to a larger extent reflect previous exposure. 68

| STRENGTH AND LIMITATIONS
The main strength of this study is the analysis of normal breast tissue Several limitations must be considered. We collected whole biopsies, and no histological assessment of the tissue composition was performed. In general, using self-report as the data collection method may introduce information bias, and unmeasured confounding factors may influence our findings. Twenty-one percent of our study population were defined as smokers. This number is comparable to smoking prevalence for adult females in Norway in 2010, at the time of our sample collection. 69 We did not assess smoking beyond current smoking status in our main analysis. Hence, a certain degree of misclassification is expected, for example, in categorizing former smokers as non-smokers, which may drive our results toward the null. However, our sensitivity analysis supports that smoking history does not have effect on gene expression in the breast tissue. Further, alcohol consumption is associated with smoking and is itself a known risk factor for BC. We adjusted for alcohol intake in the smoking analyses. Nonetheless, statistical adjustment using self-reported alcohol consumption may not be adequate to control fully for confounding by alcohol. Our alcohol-related analysis also has a few limitations. There was a high Positive regulation of mast cell activation percentage of alcohol consumers in this study, but the proportion is comparable to the whole NOWAC study. 70 Additionally, we only have data on alcohol consumption during the previous week before biopsy in our main analysis. Potential effects of alcohol dose were addressed in the sensitivity analysis, although, these data were separated by up to several years from the biopsy. Finally, we did not include any analyses stratified by type and amount of alcohol, due to loss of power in such subgroup analyses.
The descriptive, cross-sectional design of this study provides a snapshot in time of gene expression profiles and does not allow any discussion of causality. By its nature, gene expression analysis is hypothesis generating. Testing the identified gene expression associations by using other study designs such as randomized controlled trials, or in an experimental, in vitro setting was beyond the scope of the present study.

| CONCLUSION
To our knowledge, this is the first study describing associations of breast cancer related exposures and gene expression profiles, in normal breast tissue from cancer-free, post-menopausal women. Obesity, smoking, and alcohol had the highest numbers of associated genes and pathways, whereas HT use and parity were associated with negligible gene expression differences in our data. Our results provide both confirmation of some previously reported findings, but also new hypotheses for further exploration. We conclude that our data provide an informative baseline for improved understanding of exposurerelated molecular events in normal breast tissue.

ACKNOWLEDGMENTS
We would like to thank all the women who participated and donated their time and breast tissue biopsy for this study. Thanks to all personnel working at the Breast Diagnostic Center at the University Hospital of North Norway (UNN), Tromsø, who made this collection possible.
We thank Inger Riise Bergheim, Anita Halvei and Eldri Undlien Due at the Department of Cancer Genetics, Oslo University Hospital for isolating the RNA. The microarray service was provided by the Genomics Core Facility (GCF) at the Norwegian University of Science and Technology (NTNU) in Trondheim, Norway. Bente Augdal and Marita Melhus has been responsible for the administration of the data collection and biobank. Finally, we are grateful to Dr Marko Lukic for data management services.

CONFLICT OF INTEREST
The authors have stated explicitly that there are no conflicts of interest in connection with this article.

DATA AVAILABILITY STATEMENT
Due to ethical restrictions on this dataset, which contains potentially sensitive information, the data will be made available upon request.

DISCLAIMER
The gene expression laboratory analyses were provided by the Genomics Core Facility (GCF), Norwegian University of Science and Technology (NTNU). GCF is funded by the Faculty of Medicine and Health Sciences at NTNU and Central Norway Regional Health Authority.

ETHICS STATEMENT
The Regional Committee for Medical and Health research Ethics (case no # 200603551) has approved this study. Participants in this study signed an informed consent form.