Are these predictions from DNA of how far a person gets in school strong or weak?” This question from Antonio Regalado, a science reporter from MIT Technology Review, captures how confusing new discoveries in social science genetics can be, even to experts.

In this blog post, we want to help clarify this question: What do social science geneticists mean when we say that DNA can “predict” educational attainment, and that those predictions are “strong” or “weak”? Given the history of atrocities perpetuated under the banner of eugenic ideologies, any scientific effort to connect DNA differences to social inequalities between people is bound to be controversial, to say the least.  Clear understanding of what DNA measures can (and cannot) statistically predict is essential for grounding debates about how DNA measures should be used.

“We want to help clarify this question: What do social science geneticists mean when we say that DNA can ‘predict’ educational attainment, and that those predictions are ‘strong’ or ‘weak’?”

Two new papers have re-animated the public conversation about prediction in social science genetics. This week, Daniel Benjamin and his colleagues from the Social Science Genetics Association Consortium (SSGAC) reported an analysis of the genomes of over 1 million people that uncovered more than a thousand genetic variants associated with educational attainment.

One of the products of this giant study is an algorithm called a polygenic score. This algorithm can be applied to the genomes of people not included in the original study to predict their educational attainment and, as one of us showed in a paper published earlier this month, their career success and wealth accumulation.

The three figures below all illustrate DNA predictions of life course outcomes. Figures A and C, which are from the SSGAC study, show educational attainment in a sample called the Health and Retirement Study. Figure A is a scatterplot of each person’s educational attainment by their polygenic score, whereas Figure C is the percentage of people attaining a college degree by quintile of polygenic score. Figure B gives yet another way of visualizing polygenic associations with life course outcomes – a binned scatterplot. The outcome here is wealth, rather than educational attainment, plotted separately for people from low, medium, and high childhood socioeconomic status.

 

Figure A. Figure provided by the Social Science Genetic Association Consortium to The Atlantic. Data and analyses reported in Lee et al., 2018, Nature Genetics. Data are from the Health and Retirement Study.

Figure A. Figure provided by the Social Science Genetic Association
Consortium to The Atlantic. Data and analyses reported in Lee et al., 2018,
Nature Genetics. Data are from the Health and Retirement Study.

 

Figure B. From Belsky et al., 2018, PNAS. Each plotted point reflects average x and y coordinates for a bin of 50 participants. The red regression lines are plotted from the raw data. The box-and-whisker plots at the bottom of the graphs show the distribution of the education polygenic score for each childhood SES category. Data are from the Health and Retirement Study.

Figure B. From Belsky et al., 2018, PNAS. Each plotted point reflects
average x and y coordinates for a bin of 50 participants. The red regression
lines are plotted from the raw data. The box-and-whisker plots at the
bottom of the graphs show the distribution of the education polygenic
score for each childhood SES category. Data are from the Health and
Retirement Study.

 

Figure C. From Lee et al., 2018, Nature Genetics. Mean prevalence of college completion by polygenic score quintile. Data are mean ± 95% confidence interval.

Figure C. From Lee et al., 2018, Nature Genetics. Mean prevalence of
college completion by polygenic score quintile. Data are mean ± 95%
confidence interval.

 

Relationships with DNA seem to be stronger as you move from Figure A to Figure C, but the major difference between these figures is not the size of the effect. It’s how many people are represented by each data point in the graphs. The data points in Figure A (the small grey dots) represent single individuals. The data points in Figure B (the big blue dots) represent averages from groups of 50 people. And the data points in Figure C represent averages from groups of about 1000 people (the blue bars) or about 1700 people (the yellow bars).

The genetics discovered in the new study of educational attainment are highly predictive of average outcomes in large groups of people but not very predictive of outcomes for any one individual.

This is a basic point about statistics that often gets lost: For an effect of any size, statistical models predict the average value for a group of people with much more certainty than they predict the individual value for any one person. It comes down to signal-to-noise ratio.

“Those unique and serendipitous events that might have steered your life didn’t matter in someone else’s. That’s what we mean by statistical noise. “

When you reflect on the course your life has taken, you can often identify some unique and serendipitous events and circumstances that were influential in bringing you to where you are. But serendipity is just that: Those unique and serendipitous events that might have steered your life didn’t matter in someone else’s. That’s what we mean by statistical noise.  In group averages, that noise gets cancelled out, because it’s different for each person. What’s left is the signal – what’s common between us. All three plots show us the DNA signal predicting educational attainment; the difference between the figures is in the amount of noise. Figure A shows more noise than Figure B, and Figure B shows more than Figure C.

So, how predictive are these DNA differences for life outcomes? It depends on the question.

Researchers are interested in averages. We want to know how patterns of educational differences in the population come about. For that question, these DNA differences are predictive enough to be useful. (Think about Figure C). However, parents and educators might want to make predictions about an individual child – for example, in order to tailor a curriculum in a precision education intervention. For that question, these DNA differences are likely not predictive enough. The DNA will guess wrong more often than we will feel comfortable with. Think about Figure A: pick any given value of the polygenic score, and the dots – individual human lives – are scattered up and down the full range of educational attainment.

“Polygenic scores are useful tools for social science researchers who are interested in average trends, but specific predictions about an individual human life will be wildly uncertain.”

Something else we should consider when understanding DNA predictions of educational attainment is the extent to which DNA is capturing information about people’s social environments. A paper published in Science earlier this year found that a polygenic score calculated for parental DNA that their children did not inherit still predicted their children’s educational attainment. As one of us wrote in a commentary on that study, DNA associations with life course outcomes “could operate through any physical or social environment woven by genetic kin – a tangled web indeed.”

Public debates regarding the value of polygenic scores vacillate between a ready embrace of their possibilities for individual prediction, such as personalized education, and an overly pessimistic dismissal of them as “next to worthless.” The reality is much more nuanced. They are useful tools for social science researchers who are interested in average trends, but specific predictions about an individual human life will be wildly uncertain.  For polygenic prediction, there is safety only in (large) numbers.

4 comments

  1. If they are predictive for large groups, but not individuals, that is suggestive of population stratification. It might be that they are picking up common genetic variants in an insular group (white people who only marry others of a certain status, for example). All they are predicting is whether you are genetically similar to that group in the way that you might be flagged as Italian at 23andMe from your genetic variations. The genes probably have no causal relationship to “educational attainment”, which is kind of an absurd notion on it’s face.

    1. The polygenic score for education predicts differences in the educational outcomes of siblings (and also differences in career success and wealth). So it’s clearly capturing something beyond what geneticists call population stratification. It is a good question how DNA differences come to predict children’s educational outcomes. Obviously there is no direct connection. But there are many pathways through which genetic differences between people could lead to differences in brain structure and function, and in behavior that ultimately contribute to success in school. You can check out my other BOLD blog post for some discussion of this issue. And stay tuned for more from us on this topic.

  2. Saying that you observe very attenuated effect among siblings does not negate population stratification. This is clear let’s say if you devised a scores with an ancestry component, wich is the case for colorism. You can mistakenly validated your scores by looking at siblings and take that as a causal claims about some SNPs. The same goes for tallness and other things like that. Within family variation can hence be affected by social stratification. Should add that differential treatment by say other family members like grand parents and subjects own self identification can affect behaviors. This is the same critic that apply to twin studies, the pairs don’t differ only for half of their genetic material (here the polygenic score related to the targeted trait), but also a all set of environmental treatment.

  3. What is your score has ancestry related residual anthropometric components wich elicite differential treatment both within the family by differential treatment by family members (parents; grand parents) via homophily and other dynamics and outside in congnitive ability enhancing setups like schools. This as been documented for tallness and colorism. Family design do not insulate from social stratification then.

    Another confounding factor for the family design.

Leave a Reply

Your email address will not be published. Required fields are marked *