Behind the Paper: DNA methylation as a marker of education
Epigenome-wide association studies (EWAS) test, for hundreds of thousands of locations in the genome (usually at cytosines in cytosine–guanine (CpG) dinucleotides), if the proportion of methylated alleles in a biological sample (such as whole blood) differs between cases and controls, or is associated with a continuous trait that varies from e.g. low to high. In 2015, we decided to perform such a study for educational attainment, which is an extremely interesting human trait to study from an epigenetic perspective.
The paper in npj Science of Learning is here.
By definition, educational attainment is the highest completed level of education of an adult person. But it may hold much more information: educational attainment is connected to a range of human conditions, from cognitive functioning and personality to socioeconomic conditions across the lifespan, and physical and mental health. Even in a country as wealthy and highly educated as the Netherlands, where our study population and research team live, there is a rather remarkable difference between higher and lower educated Dutch people in terms of health, for reasons that are not at all clear. According to records from 2010, the difference in average life expectancy at birth is 6 to 7 years between men and women with and without a higher education, and a striking 14 years when considering lifespan without physical limitation. Where does this difference come from?
We decided to perform an EWAS to identify loci where the methylation level in blood correlates with educational attainment. Such loci may point to:
1) Consequences of different life conditions of lower educated versus higher educated people on the epigenome of white blood cells, or even consequences of having parents with a lower versus higher education (epigenetic biomarkers of exposures).
2) Epigenetic mechanisms that contribute to individual differences in educational attainment, for instance by regulating gene expression in neurons and thereby affecting brain development or later cognitive functioning. We can only detect such relationships with cognitive functioning in peripheral tissues such as blood, if the individual differences in these epigenetic mechanisms are reflected in the epigenetic marks of white blood cells (peripheral epigenetic markers of cognitive functioning).
3) Epigenetic mechanisms that contribute to education-related physical health differences or that are a biomarker of health differences without being the causal trigger.
I am a postdoc at the Netherlands Twin Register, where I did my PhD work on biomarkers and epigenetics in twins, and where after my graduation, I continued to work on aggression in children and on epigenetics within BBMRI-NL a large collaboration between biobanks in the Netherlands.
In 2015, the unique opportunity arose to perform an EWAS on a large group of well-characterized individuals from multiple biobanks in the Netherlands, using data collected by the Biobank-based Integrative Omics Studies (BIOS) consortium.
Amsterdam, capital of the Netherlands, where part of the research team is based at the Vrije Universiteit Amsterdam.
In the BIOS consortium, over 4000 blood samples with in-depth phenotypic information and GWAS data from six Dutch biobanks were assessed with genome-wide DNA methylation arrays (Illumina 450k) and RNA-sequencing (>15 M paired end reads). The data processing and QC was performed for all samples together in a harmonized way. This powerful resource has contributed to many papers and a comprehensive online catalogue that includes eQTLs, meQTLs (genetic variants associated with gene expression and DNA methylation, respectively),and DNA methylation heritability, and will continue to grow in the future when more associations of human traits and diseases with the epigenome, transcriptome and metabolome will be added.
In the current project, we analyzed data from DNA methylation arrays and educational attainment in four cohorts: Lifelines-Deep, the Leiden Longevity Study, the Netherlands Twin Register, and the Rotterdam Study. Having access to such a large dataset from within one country has multiple advantages, including a genetically homogeneous population, and a national educational system. Nonetheless, the participants were born between 1925 and 1989. In this period, educational opportunities have drastically changed, especially for women, as illustrated in the picture below.
Birth cohort trends in educational attainment
Educational attainment in the Netherlands. The x-axis shows birth year and the y-axis shows educational attainment on a 7-point scale. The cohorts taking part in the study are indicated by different colors: Lifelines-Deep, the Leiden Longevity Study (LLS), the Netherlands Twin Register (NTR) and the Rotterdam Study (RS). The pink and blue lines show the average level of educational attainment of females and males across birth cohorts. As can be seen, the sex differences in educational attainment are large in earlier birth cohorts, and only really start to disappear in the generations of people born in the late sixties and afterwards.
One of the first challenges we were faced with was how to account in our analyses for these birth cohort differences in educational attainment. I sent an email to Jaap Dronkers, who was a renowned Professor in international comparative research on educational performance and social inequality at Maastricht University, to ask for his advice on possible ways to standardize educational attainment. Prof. Dronkers pointed out the ridit method, which we then used to calculate birth cohort- and sex-specific education scores. These scores better reflect the social value of a certain level of education within society at a given time, and captures individual differences in educational attainment when individuals from different birth cohorts and sexes are simultaneously analyzed. Very sadly, prof. Dronkers passed away in 2016, only two weeks after we had been in touch about the data harmonization.
In each cohort, we performed an EWAS analysis on educational attainment ridit scores, adjusting for typical EWAS covariates such as white blood cell counts, technical covariates, age and sex, and combined results by a meta-analysis (total sample size= 4179). The analysis identified 58 genome-wide significant CpGs.
We were quite surprised to find out that a majority of the CpG sites that were significantly associated with educational attainment were also reported by previous EWASs of smoking behavior. In fact, when the large EWAS meta-analyses of maternal smoking (sample size=6,685; April 2016) and individual smoking (sample size= 15,907; October 2016) came out, we found out that all CpG sites that were significantly associated with educational attainment were also reported in the smoking EWAS, and exactly 50% was found in the EWAS of maternal smoking.
Overlap of CpGs reported in previous EWASs of smoking and maternal smoking, and CpGs associated with educational attainment.
After adjusting for smoking status, 11 CpG sites were still significantly associated with educational attainment, after stringent Bonferroni correction for genome-wide tests, including CpG sites in the AHRR gene.
Although it is known that smoking has relatively large effects on DNA methylation in blood (larger than thus far described effects of other ‘exposures’ such as alcohol use and BMI), it seems reasonable to speculate that other exposures with effects of similar magnitude as smoking on DNA methylation do exist, and that such exposures may correlate with educational attainment to a similar extent as smoking.
We had not at all expected a priori that all top hits for educational attainment would overlap with places in the genome that previously had been identified as differently methylated due to smoking .
Obviously, since smoking has large effects on DNA methylation, we were concerned that adjusting for smoking status might not completely remove effects of smoking on methylation and that effects referred to as residual confounding might be a concern.
I then made use of the wealth of information that had been collected by the Netherlands Twin Register, on individual’s smoking history and quantity, maternal smoking during pregnancy, and serum cotinine levels; a biomarker of (second-hand) smoking. We examined the effects of adjusting for smoking pack-years, a methylation-based smoking predictor, serum cotinine level, and tested the association in never-smokers and within (smoking concordant) twin pairs.
These analyses indicated that our concern was just to a certain extent: when we took other indicators of smoking into account, the association between DNA methylation and educational attainment was reduced.
Methylation is associated with educational attainment beyond effects of own smoking
The analyses also indicated that our top sites were associated with educational attainment beyond the effects of one’s own smoking (some CpGs stronger than others), because education showed a similar trend of association with methylation in never smokers. We also found that several CpGs were associated with maternal smoking, on top of the effect of own smoking, in this adult population.
In never smokers, the association between DNA methylation and educational attainment was on average 60 % smaller. None of the CpGs was genome-wide significant in never-smokers, but the pattern of effect sizes in never smokers was similar to the pattern in the total sample (the correlation between results was 0.83, and 98% of CpGs showed the same direction of effect). Of course, one problem when splitting up datasets is that group sizes become smaller, so even effects of similar size may still no longer be statistically significant.
What could explain the strong association between education and DNA methylation at smoking-associated loci, and the observation that this association remains after accounting for own smoking? We recognize that there are multiple possibilities, including:
1) DNA methylation is a biomarker of lifelong differential (second-hand) exposure to cigarette smoke (or other chemicals) between higher and lower-educated people, but does not play a role in the causal chain of biological mechanisms that influence educational attainment or smoking.
2) Genetic variants have independent effects on educational attainment, smoking and DNA methylation (genetic pleiotropy), which may induce a correlation among these outcomes even if none of the outcomes has a causal effect on the others.
3) DNA methylation in blood marks epigenetic mechanisms in the brain that influence both educational attainment and (becoming exposed to) smoking.
If educational attainment and smoking correlate, correction for smoking will also take out relevant variation related to educational attainment. If we restrict the analysis to never smokers, the samples size is also reduced, which leads to a loss in statistical power, as explained above. We performed a power calculation to estimate the sample size required to detect the effects in never smokers at genome-wide significance, which indicated that larger studies are required to detect these effects at genome-wide significance.
We started a quest to find additional cohorts in the Netherlands with similar data on educational attainment and DNA methylation. Unfortunately, this quest was unsuccessful, but we hope this may change in the near future, especially if we can follow some cohorts of children longitudinally who had their epigenomes measured at a young age.
What about other exposures?
One can imagine that at some point we started to wonder: Are we only picking up smoking signal with a blood-based EWAS of educational attainment?
Fortunately, several EWASs came online in 2016 that allowed us to explore this question. It turned out that the answer to this question was no.
We took the significant CpGs from previously published EWASs of smoking, maternal smoking, maternal plasma folate, airborne fine particulate matter and alcohol consumption and the effect sizes reported in those papers. This allowed us to assess the correlation between effect sizes of these CpGs for educational attainment and their effect size for each of the respective exposures. It turned out that there was a significant correlation for all exposures, except for alcohol intake (although we did find that 5 education-associated CpGs were also significant in the EWAS of alcohol intake). For fine particulate matter (a measure of air pollution), epigenetic signatures correlated positively with epigenetic signatures of educational attainment, which suggests that higher educated people in our study population show greater exposure to ambient fine matter particles. This effect is opposite to what we see for smoking, for which the correlations suggest higher exposure in lower educated people.
In this figure, there are 6 panels that show the effect size from our EWAS of educational attainment (beta from the regression of methylation level on educational attainment, adjusted for smoking) on the x-axis, and the effect size from previously published EWASs on the y-axis: a) CpGs associated with smoking, b) CpGs associated with maternal smoking in cord blood at birth, c) CpGs associated with maternal smoking in older children, d) CpGs associated with maternal plasma folate level in cord blood at birth, e) CpGs associated with airborne fine matter, f) CpGs associated with alcohol consumption.
Is cigarette smoke the only exposure associated with AHRR methylation?
CpGs annotated to the AHRR gene are amongst the most strongly and consistently reported sites associated with smoking exposures. Although a great body of literature exists on the AHR (aryl hydrocarbon receptor) pathway and the diversity of chemical compounds that can act on this pathway by binding to the AHR (of which dioxin is the best-characterized), cigarette smoke currently seems to be the only chemical exposure that has been linked to this pathway by EWA studies. To facilitate the interpretation of EWA studies of human traits and diseases in the future, there clearly is a great need for EWA studies on other chemical exposures!
Are any of these findings relevant to cognition?
We argue that CpGs connected to genes that have been previously linked to cognition-related phenotypes (several of our top 58) are potential candidates for being involved in causal epigenetic mechanisms that contribute to individual differences in cognition, provided that these loci also show education-associated differences in epigenetic regulation in the relevant tissue (most likely, the brain). We could not measure DNA methylation in the brain of our participants and therefore turned to a publicly available dataset from a previous study that assessed DNA methylation in matched samples from blood and brain. At 17% of the 58 sites associated with educational attainment, DNA methylation level in blood samples correlated significantly with methylation level in one or more brain regions. We don’t know yet why these CpGs correlate between blood and brain. Two possibilities are:
1) Genetic effects on DNA methylation correlate across tissues
2) DNA methylation in these tissues respond similarly to environmental exposures
Using data from another publicly available dataset, we found that 31% of CpG sites associated with educational attainment in blood show dynamic methylation in fetal brains. This may indicate that these loci have a role in brain development.
Venn diagram illustrating how many of the 58 top-CpGs for educational attainment show correlated methylation levels between blood and brain, or show dynamic methylation in fetal brains, or are associated with maternal smoking (in cord blood).
How interesting would it be to perform an EWAS of educational attainment and exposures such as (maternal) smoking across multiple tissues including the (fetal) brain? This might be possible with existing postmortem brain datasets on DNA methylation.
Lessons learned and future directions
On a critical note, our study illustrates how difficult the interpretation of EWA studies can be.
On the positive side, our findings point at many exciting questions that ask (no, scream!) for further research.
Which CpGs are related to exposure to other pollutants? Which CpGs might have a causal role in smoking behavior and educational attainment? Which have causal effects on education-related health differences? There are multiple opportunities to obtain answers. We can for instance look for families where mother smoked during one pregnancy, but not during the next; we can –ideally longitudinally- study monozygotic twin pairs where one twin starts smoking and the other does not, and we can follow children whose epigenomes were measured before they had to make educational choices. We provide more suggestions to study these questions in the discussion of the paper!
The results from this EWAS are available here: BBMRI-NL omics atlas.