Razib Khan One-stop-shopping for all of my content

April 10, 2019

The Insight Show Notes — Season 2, Episode 22: Solving the Missing Heritability

Filed under: Genetics,Heritability — Razib Khan @ 2:50 pm

The Insight Show Notes — Season 2, Episode 22: Solving the Missing Heritability

“Narrow-sense” heritability

This week on The Insight (Apple Podcasts, Spotify, Stitcher and Google Podcasts) we discuss the “missing heritability,” and whether it has been “solved,” with Alexander Young.

Heritability is a subtle statistic

First, we have described what heritability is. The difference between “narrow-sense” heritability, which is mostly what we discussed, and “broad-sense” heritability. In short, narrow-sense heritability is linear and additive effects useful for selection in large populations. Broad-sense heritability includes dominance effects, which are more important in comparisons between siblings.

We also discussed the field’s origins in biostatistics, and how it was different from Mendelian genetics. And, how R. A. Fisher theoretically fused the two fields.

Genomics, its history, and how it changed quantitative genetics and heritability were extensively discussed. This is the point historically when the “missing heritability” became a thing, the gap between genomic estimates and classical estimates of heritability.

Much of the second half of the podcast involved reviewing a preprint that just came out, Recovery of trait heritability from whole genome sequence data. We got into the details of rare variants, population structure, and other considerations. In short, Alex is moderately skeptical of the result of the paper.

Finally, if you want to dig deeper, I highly recommend Alex’s blog post: Missing Heritability Revisited.

Perhaps the missing heritability has not been solved….

The Insight Show Notes — Season 2, Episode 22: Solving the Missing Heritability was originally published in Insitome on Medium, where people are continuing the conversation by highlighting and responding to this story.

April 8, 2019

In search of the missing heritability

Filed under: Genetics,Genomics,heredity,Heritability — Razib Khan @ 9:45 pm

We’ve always known that parents resemble their offspring. An intuitive understanding of how traits are passed down in families is probably as old as our species and its ability to reflect on the world around us. The ancient Romans would often observe an association between a characteristic, for example, red hair, and a particular aristocratic family. And today, it is common to notice how a particular child resembles a particular grandparent. An interest in heredity is part of human nature.

But it has only been within the last 100 years that this intuition was transformed into a quantitative and rigorous science.
Resemblance within a dynasty

And believe it or not, this began before we understood the Mendelian genetic basis of inheritance. In the 19th century, Charles Darwin’s cousin Francis Galton developed the concept of correlation to explore the relationship of characteristics between parents and offspring. Some traits, such as the height of fathers and sons, turn out to exhibit a very strong correlation between the generations. Other traits, such as hairstyle, don’t (this is probably a good thing).

Heritable traits are those variable characteristics where parents and offspring resemble each other due to heredity. Those traits where parents and offspring show no correlation due to relatedness across the population are not heritable. Or, more precisely in the language of statistical genetics, their heritability is very low. In cases where there is a strong correlation between parents and offspring, the heritability is very high. Heritability is evaluated in the range of 0 to 1, with moderate heritability being ~0.5.

A heritability of 0.5 means that 50% of the variation in the trait in the population is due to variation in the genes.

This understanding of heritability was decoupled from an analysis of Mendelian inheritance. Though theoretically fused in the early 20th century thanks to the work of R. A. Fisher, the manner in which heritable polygenic characteristics expressed themselves genetically meant that they were beyond the power of Mendelianism to examine. The genetic effects of any particular gene were very small.

A Mendelian trait, such as cystic fibrosis, is passed through a pedigree and expresses a particular genotype. That is, most of the variation in the expression of cystic fibrosis is due to mutations at a particular gene. Mendelian analysis develops around the insight of genes encoding characteristics, but in the early 20th century the methods had the power to only detect associations at traits where a single gene, or perhaps a few, influenced the variation. In contrast, very polygenic characteristics were only understood through statistical analysis.

One of the distinctions of most heritable traits is that their causes are complex, and, they are often quantitative or continuous. Though some people are ‘tall’ or ‘short’, the reality is that we measure people to obtain a single number on a numerical range. Similarly, though we can divide people into ‘extroverts’ and ‘introverts’, we understand that this disposition is really on a spectrum.

This is in contrast to Mendelian traits, which are of the form where there are clear discrete differences between people with different genotypes. You have cystic fibrosis. Or you don’t.


Historically, for quantitative traits, the goal was to assess the total genome-wide heritability. In domains such as agricultural genetics, this was very important. The more heritable the trait, the more artificial selection could change a characteristic of a line of plants or stock of animals.

In humans, the understanding of heritability had different implications. In the middle decades of the 20th century, there were many theories of environmental triggers for illnesses such as schizophrenia, often focusing upon a child’s mother and her treatment of her offspring. The reality is schizophrenia is a highly heritable trait, so the likelihood of manifesting the illness is strongly dependent on one’s genes, rather than details of upbringing. For decades clinicians were looking at the wrong primary causes.

We may not have known the genes responsible, but we knew it was genes.
The pacifiers are not heritable

One of the most common methods used to understand heritability in humans utilized patterns in twins and their siblings. Scientists realized that identical twins share 100% of their genes, while siblings share about 50% of their genes. The heritability of a trait can then be expressed in terms of the difference between the correlation on the trait between identical twins and full siblings.

Identical twins tend to be rather close in height. Siblings are closer in height than you would expect from two random individuals selected from a population. The correlation being ~0.50. But that is far less than between identical twins. That’s because they share only 50% of their genes, and it turns out that height is a very heritable trait (estimates are 0.8 to 0.9). If the correlation between full siblings and identical twins on a trait is the same, it is quite likely genes have little to do with variation of the trait in the population (as opposed to the environment).

And yet these analyses are very sensitive to broader environmental conditions. Within a family in a developed society, it is unlikely one sibling would get more nutritional resources than another. But this is not true in a pre-modern society, or in the developing world. If an older sibling is born in the midst of a famine, it would not be surprising if there was some permanent stunting later on it life. The shorter adult height of this sibling in comparison to their younger brothers and sisters would then be due to the environment.

So another feature of complex heritable traits is that there is an environmental component to the variation of outcome across the population, at least for any trait where the heritability is less than 1. And, that environmental component is going to vary from society to society. Heritability is not a fixed statistic, but a dynamic one, dependent on conditions.

All of these complexities for heritable traits make it very clear why conventional Mendelian genetics did not attempt to tackle them. Pedigrees and experiments with linked physical characteristics were never going to get very far.

This landscape of indirect inference only began to change at the end of the 20th century, with the revolution in molecular biology which transformed genetics from an abstract field of pattern recognition to a concrete one where scientists began to hunt for specific genes in the physical genome.

A good understanding of genome-wide variation in human populations has only been available for the last ten years.

Until modern sequencing and genotyping technology emerged, we did not even know the number of genes in the human genome! Twenty years ago the number was estimated to be ~100,000. A ballpark guess based on intuition and hunches more than anything else. Fifteen years ago, after the first human genome had been published, the number was reduced to 40,000 genes. Today, the best estimate is 19,000 genes.

Within any given human genome there are about 3 billion DNA base pairs. Of these base pairs, only about 1% are functional in terms of coding proteins. In the vast majority of cases, differences between individuals in traits are due to differences in these regions of the functional genome. That’s about 30 million base pairs. So it is within these 30 million base pairs that the search for the biophysical basis of heritability will occur.

Genetic relatedness of full siblings

In the past, estimates of heritability rested upon “good enough” assumptions, such as relatedness between individuals. Today, genomic methods allow researchers to look at the truth beneath assumptions. Statistical methods assumed that the relatedness between full siblings is 50%. But this is the expected relatedness. Geneticists have always known that due to Mendelian segregation the real value is often different between any two siblings. But they had no direct way of assessing this.

That is until genomic techniques became more advanced and cheaper. In 2006 researchers confirmed that twin studies were correct in their estimate of the heritability of height by looking at how differences between full siblings varied in relation to how truly genetically similar they were. Simply due to the rules of chance, some full siblings shared more than >60% of their genome in common, while others shared <40% of their genome.

Full siblings who were genetically more similar turn out to be more similar in height, to the extent that the inferred heritability to explain the pattern was exactly the same as twin studies.

Researchers were also able to do more than just refine their older methods of assessing heritability: they finally had the tools to begin discovering the specific genes which underly the heritability of complex traits. For a century scientists understood and assumed that there were genes, physical entities responsible for the biology underlying their statistical results. But finally, they would be able to zero in on candidates!

The earliest attempts to understand heritability with genomic technologies and modern computational methods were somewhat disappointing.

For example, for height, work in the late 2000s discovered 40 genetic positions that correlated with height in humans, but these positions explained only 5% of the total heritability estimated from earlier studies using classical methods.

Why were these state-of-the-art methods only detecting a small proportion of the statistically inferred heritability? Some possibilities presented themselves:

  • Perhaps statistical geneticists used flawed methods, and the environmental component was greater than they understood
  • Perhaps single base changes, SNPs, were not responsible for much of the variation. Perhaps it was copy number variation, for example.
  • Perhaps the sample sizes were too small to detect the effects of single genes because most of the effects were too small
  • Perhaps the “SNP chips” did not have enough markers to detect the effects
  • Perhaps many of the variants were are very low frequency and were not typed on the SNP chips

For about a decade this issue of the “missing heritability” hung over the new synthesis between genomics and quantitative genetics. But recently a research group has presented results which suggest that they have solved the missing heritability problem for height. Using whole-genome analysis, which was prohibitively expensive ten years ago, but on the margin of the feasible today, the researchers captured almost all of the missing heritability genomically.

With 20,000 individuals and 50 million markers, the authors argue that rare variation accounts for most of what was missed in earlier studies.

Will these results hold up? Possibly. But the bigger take-home message is to reflect on how far we’ve come in our understanding of heritability in the past century. In the beginning, heritability was understood by looking at similarities across families. This sounds simple, but this straightforward design required a great deal of statistical ingenuity. And, the reality is that with the discovery of DNA and the molecular understanding of the gene, researchers could not satisfactorily answer the question until they connected the statistical causes to molecular processes.

Illumina Sequencing Machine

Today whole-genome sequences, which may have cost $100,000 ten years ago, can be had for less than $1,000. This is the total sequence information of human genetic variation. Whereas decades ago researchers didn’t even know the total number of human genes or the full genetic map of our species, today we can count and locate 19,000 genes.

It is no surprise that many of the genes associated with height in humans turn out to be related to bone development. In this way, the statistical wizardry has produced results in keeping with our expectations of standard biology.

In another ten years, it seems likely that the search for the missing heritability will be a footnote in the history of genetics. But, that footnote will have been fruitful in generating a great deal of science, and helping us solve the mysteries of complex traits.

In search of the missing heritability was originally published in Insitome on Medium, where people are continuing the conversation by highlighting and responding to this story.

September 29, 2012

The moral measure of bad teeth

Filed under: Health,Heritability — Razib Khan @ 8:57 pm

Recently I was at the dentist and I was told that because I did not have any caries at this age, I would probably not have to worry about that in the future (in contrast, I do have some issues with gingivitis). I wasn’t surprised that I didn’t have caries, I have no great love of sweet confections. I had chalked up my evasion of this dental ailment to my behavior. To make a long story short my dentist disabused me of the notion that dental pathologies are purely a function of dental hygiene and diet. Rather, he explained that many of these ailments exhibit strong family and ethnic patterns, and are substantially heritable. My mother did suffer from periodontal disease a few years back, and that has made me much more proactive of my own dental health.

As someone who is quite conscious of the power of genetics, I was quite taken aback by this blind spot. I realized that not only did I attribute my own rather fortunate dental health (so far) to my personal behaviors, but, I had long suspected those with dental issues of less than optimal habits. Obviously environment (e.g., high sugar diet) does matter. ...

September 4, 2012

Me & my 0.55 brother against my 0.45 brother

Filed under: Genetics,Heritability — Razib Khan @ 10:41 pm

One of the more fascinating things about getting much of your child’s pedigree genotyped is that one can ascertain true relatedness to various relatives, rather than just expected relatedness. For example, 28% of her genome is identical by descent from my father, while 22% is from my mother. She is 26% identical by descent with one uncle, and 24% with another. More practically, the understanding of patterns realized and concrete genetic relatedness within families allows us another avenue into teasing apart heritability. Though this method has been around for more than half-a-decade, I find it curious that when I post on it some commenters immediately make objections to twin studies. Why? Because they assume that the analysis had to be a twin study because they don’t know of the genomic methodology!

But on a broader evolutionary scale, does this matter? Two of my siblings have a relatedness of 41%. In other words, as you can see in the histogram there is a wide variation in relatedness. Might this perhaps impact social relations? One can imagine genetically more similar siblings aligning against those who are dissimilar. Or not. I am skeptical ...

January 29, 2012

Most people don’t understand “heritability”

Filed under: Heritability,Quantitative Genetics — Razib Khan @ 1:08 pm

According to the reader survey 88 percent said they understood what heritability was. But only 34 percent understood the concept of additive genetic variance. For the purposes of this weblog it highlights that most people don’t understand heritability, but rather heritability. The former is the technical definition of heritability which I use on this weblog, the latter is heritability in the colloquial sense of a synonym for inheritance, biological and cultural. Almost everyone who understands the technical definition of heritability will know what heritability in the ‘narrow sense’ is, often just informally termed heritability itself. It is the proportion of phenotype variability that can be attributed to additive genetic variation. Those who understand additive genetic variance and heritability in the survey were 32 percent of readers. If you understand heritability in the technical manner you have to understand additive genetic variance. This sets the floor for the number who truly understand the concept in the way I use on this weblog (I suspect some people who were exceedingly modest who basically understand the concept for ‘government purposes’ put themselves in the ‘maybe’ category’). After nearly 10 years of blogging (the first year or so of which I myself wasn’t totally clear on the issue!) that’s actually a pretty impressive proportion. You take what you can get.

July 11, 2011

Blank slate when you want it that way

Filed under: Behavior Genetics,Genetics,Heritability,homosexuality,Psychology — Razib Khan @ 10:44 am

Tim Pawlenty debates Lady Gaga’s ‘Born This Way’ idea:

Gregory pressed, asking “Is being gay a choice?”

Pawlenty ultimately said, “I defer to the scientists in that regard.”

Again, Gregory pressed: “So you, you think it’s not a choice. … That you are, as Lady Gaga says, you’re born that way.”

Said Pawlenty: “There’s no scientific conclusion that it’s genetic. We don’t know that. So we don’t know to what extent, you know, it’s behavioral, and that’s something that’s been debated by scientists for a long time. But as I understand the science, there’s no current conclusion that it’s genetic.”

This is one issue where the American Left has a tendency to be on the side of the hereditarians. In contrast, the American Right emphasizes the plasticity of human behavior, and its amenability to cultural pressures and individual will and contingency. Transpose the structure of the arguments to male-female sex differences, and many of the basic elements would be preserved, but those espousing them would invert politically.

One issue which needs to be clarified is the distinction between something which is explainable by genetics, and something which is not explainable by genetics but may still have a biological basis. It does seem that

June 26, 2011

Heritability and genomics of facial characteristics

Filed under: Face genetics,Genetics,Genomics,Heritability,Human Genetics,Morphology — Razib Khan @ 11:49 pm

On several occasions I’ve gotten into discussions with geneticists about the possibility of reconstructing someone’s facial structure by genes alone. Combined with advances in pigmentation prediction by genetics, this could put the sketch artist out of business! But all that begs the question: how heritable are facial features anyhow? Impressionistically we know that feature are broadly heritable. This isn’t a tenuous supposition, you see the resemblance over and over across families. All that being said, what are the specific quantitative heritability estimates? How do they relate to other traits we’re interested in? This review from the early 1990s seems to have what I’m looking for, The Role of Genetics in Craniofacial Morphology and Growth. Below is a table which shows averaged heritabilities for a range of facial quantitative traits from a large number of studies:

h2 is the narrow-sense heritability. Also, in care you are curious cephalometry seems to be utilizing imaging of some sort. Anthropmetry refers to the more conventional measuring techniques (get out the calipers!). These results suggest that facial features are typically more heritable than behavioral traits (usually < 0.50), but less heritable than ...

June 20, 2011

Breaking the “Central Dogma”

Epigenetics is making it “big time,” Slate has a review up of the new book Epigenetics: The Ultimate Mystery of Inheritance. In case you don’t know epigenetics in terms of “what it means/why it matters” holds out the promise to break out of the genes → trait conveyor belt. Instead positing genes → trait → experience → genes, and so forth. Or perhaps more accurately  genes → trait × experience → genes. Epigenetics has obviously long been overlooked as a biological phenomenon. But, I think the same could be said for the ubiquity of asexual reproduction and unicellularity! Life science exhibits anthropocentrism. That’s why there’s human genetics, and biological anthropology. My own concern is that epigenetics will give some a license to posit that the old models have been overthrown, when in fact in many cases they have been modified on the margin. Especially at the level of organisms which we’re concerned about; human-scaled eukaryotes. Humans most of all.

The last paragraph in the review highlights the hope, promise, and perils of epigenetics in regards to social relevance:

It’s almost enough to make one nostalgic for the simplicity of old-style genetic determinism, which at least offered the sense that the ...

June 17, 2011

Does heritability of political orientation matter?

Filed under: Behavior Genetics,Heritability,Politics — Razib Khan @ 1:40 am

At The Intersection Chris Mooney points to new research which reiterates that 1) political ideology exhibits some heritability, 2) and, there are associations between political ideology and specific genes. I’ll set #2 aside for now, because this is a classic “more research needed” area at this point. But as I mentioned in the comments the heritability of political ideology is well known and robust. From what I can gather most people assume it’s mediated through personality traits. In the comments Chris asks:

That sounds sensible. What i find amazing is that if the heritability of politics is so robust–and I agree, it would happen via personality–why is this so widely ignored?

There are I think several issues at work. First, many people are not comfortable within imagining that beliefs which they attribute to their conscious rational choice are not only subject to social inculcation, but that may also have an element of genetic disposition. Second, most people have a poor grasp of what heritability implies. Take a look at some of Chris’ commenters. The response is generally in the “not even wrong” class. Finally, what’s the actionable component to this? In other words, what are people going to do ...

March 27, 2011

It’s about heritability….

Filed under: Behavior Genetics,Behavior Genomics,Genetics,Genomics,Heritability — Razib Khan @ 1:25 pm

I’m going to promote a comment:

…would knowing the root biological cause for differences which are already apparent to us change anything?

It’s obvious to you that there’s a contradiction here, but to the average educated person this makes total sense.

The proximal reason seems to be that in thinking about “genetic” and “environmental” factors, the average educated person still fundamentally views “genetics” as equivalent to genetic determinism and “environmental” factors as equivalent to social norms or parenting tactics. In this black-and-white view of human development, quantitative distinctions and complex causal models have no place. Genetic causes are irremediable and ever-lasting, whereas environmental causes are a generation-away from disappearing with the right appropriations to social programs. That’s why an environmental cause for phenotypic differences doesn’t “count” but a genetic one is game changing.

It seems as if the nature-nurture world view painted in the 1970s by the anti-heredity crowd has remained largely intact with only minor modifications in the mind of the average educated person. Since the 1970s, they now know to respond to questions about nature-vs-nurture by saying “both”, but their understanding goes no deeper than that. As best as I can ...

February 3, 2011

Why siblings differ differently

The Pith: In this post I examine how looking at genomic data can clarify exactly how closely related siblings really are, instead of just assuming that they’re about 50% similar. I contrast this randomness among siblings to the hard & fast deterministic nature of of parent-child inheritance. Additionally, I detail how the idealized spare concepts of genetics from 100 years ago are modified by what we now know about how genes are physically organized, and, reorganized. Finally, I explain how this clarification allows us to potentially understand with greater precision the nature of inheritance of complex traits which vary within families, and across the whole population.

Humans are diploid organisms. We have two copies of each gene, inherited from each parent (the exception here is for males, who have only one X chromosome inherited from the mother, and lack many compensatory genes on the Y chromosome inherited from the father). Our own parents have two copies of each gene, one inherited from each of their parents. Therefore, one can model a grandchild from two pairs of grandparents as a mosaic of the genes of the four ancestral grandparents. But, the relationship between ...

January 12, 2011

When genes matter for intelligence

Filed under: Behavior Genetics,Genetics,Heritability,I.Q.,Psychology — Razib Khan @ 3:01 am

Image credit: Aleksandra Pospiech

One of the interesting and robust nuggets from behavior genetics is that heritability of psychological traits increases as one ages. Imagine for example you have a cohort of individuals you follow over their lives. At the age of 1 the heritability of I.Q. may be ~20%. This means that ~20% of the variation in the population of I.Q. explained by variation in the genes of the population. More concretely, you would only expect a weak parent-offspring correlation in I.Q. in this sample. At the age of 10 the heritability of I.Q. in the same sample may be ~40%, and in mature adulthood it may rise to ~80% (those are real numbers which I’ve borrowed from Robert Plomin). Many people find this result rather counterintuitive. How can a trait like intelligence become “more genetic”?

Remember that I’m talking about heritability here, not an ineffable “more” or “less” quantum of “genetic” aspect of a trait. In other words: does variation in genes due to different parental backgrounds matter for a trait? Second, the nature of psychological traits is somewhat slippery and plastic. As I’ve noted before the correlation between a score on a 10-world vocabulary test and general intelligence is pretty good. You can expect people with high scores on the vocabulary test to have higher I.Q.’s than those who have low scores. But if you take an individual and lock them in a room without human contact for their first 15 years, they are unlikely to exhibit any such correspondence. You don’t have to be a rocket scientist to understand why. Quantitative behavior genetic traits are complex and are subject to a host of background conditions, and express themselves in an environmental context.

So why can you explain more of the variance of a psychological trait like I.Q. at age 40 than at age 5 with genes? It has to do with environment. Specifically, intelligence isn’t something you’re born with, it’s something that you develop over time, through a complex confluence between biology and environment. The developmental process exhibits a level of contingency as well. Decision A redounds to the choice between B and C, which redounds between a further set of choices. Small initial differences in disposition and talent can compound over time through positive feedback loops. Practice may make perfect, but perfection may be a goal to which you aspire only if you have initial talent or inclination.

In other words, your genetic disposition can shape the environment you select, which can then serve to express your genetic potential in a specific manner. Children have less power in selection of their environment than adults. Over time the model is that environmental variables which differentiate children diminish in importance as they select contexts and situations which express their own preference sets as adults. This dynamic can be illustrated with a rather strange example. Consider two siblings who are pressured to be academic by their parents. One has a natural disposition toward scholarly activities, while the other does not. Their realized performance difference in youth may be small. People can respond to incentives! But at 18 the two siblings become adults, and begin to make their own decisions. At 25 one sibling may be a university drop out, and the other a graduate student. The modest differences in adolescence may start amplifying due to the positive feedback loops which consist of a set of choices which exhibit dependencies. Of course siblings would tend to be more similar than two random individuals off the street. But even within families there is genetic variance and so innate differences of disposition (the average difference in I.Q. between siblings is about the same as the average difference in I.Q. between two random people off the street, one standard deviation, or 15 points).

ResearchBlogging.orgModeling behavior genetic phenomena in a rough & ready fashion is then a matter of keeping dynamic networks of parameters in your head. Traits aren’t constructed about of static blocks; they’re the outcomes of a set of parameters at a given moment, as well as a developmental arc shaped by a previous set of parameters (some of them the same, some of them new). Thinking like this gives you a method by which to analyze phenomena, it does not tell you in a clear and general manner how a whole range of phenomena emerged down to the last detail.

The analysis doesn’t just apply to populations over time. You can also look to different groups which are contemporary. In 2003 a paper was published, Socioeconomic Status Modifies Heritability of IQ in Young Children. The major findings are illustrated by this figure (I’ve added some clarifying labels):

On the x-axis you see socioeconomic status (SES). This variable is a compound of traits which reflect’s one’s position in the social status hierarchy. Income and wealth are clearly important, but a salesman for a fertilizer company could presumably be more economically well off than a physics professor. So other variables such as education also matter. It is clear then that as SES increases genetic variation explains much more of the variation in I.Q., while environment explains less and less. The shared environment is rather straightforward: your family. The non-shared environment is more vague, and to some extent is just the remainder from the model which predicts I.Q. In The Nurture Assumption Judith Rich Harris posited that non-shared environment was mostly peer group effects. Interestingly, by adulthood non-shared environment tends to be a more important variable than shared environment for most psychological traits.

Any guess for why genetic variance is more efficacious in prediction of I.Q. among the high status than the low status? Here’s a clue: heritability of height is much higher in developed nations than in developing nations. In other words, environment explains more of the variance in height in developing nations, while it explains almost none of the height in developed nations. There’s only so much you can eat, and there are diminishing returns on nutritional inputs. In developed nations most of the environmental variance has been removed due to adequate nutrition. When you remove the environmental variance, the genetic variance remains. Heritability is roughly the ratio of the additive genetic variance over the total variance, so its value gets larger.

The analogy to I.Q. should be relatively easy. Don’t tell Amy Chua, but there are probably diminishing marginal returns on “nurturing” environments for a child when it comes to their intellectual development. You have only a maximum of 24 hours in the day you can study and drill, and a personal library of 10,000 is probably not very different from 1,000, if all the books fall within the purview of your interest. Even in well off suburban communities there are differences of wealth and income, but on the margin vast increases in wealth and income do not allow one’s child to develop their mental faculties proportionality greater. What there remains in well off suburban communities are differences of genetic disposition and aptitude. Bill Gates’ children are probably good candidates for the Ivy League. Not because he is worth billions of dollars in relation to a professional whose net assets barely break a million. Gates got into Harvard, and reputedly did well before dropping out to pursue his business. His wife is also an overachiever.

This is I believe a fascinating topic, and needs to be explored in more detail. Some members of the same group now have a study out which shows that differences in socioeconomic status matter differently for infants at 10 months and tots are 2 years. Emergence of a Gene × Socioeconomic Status Interaction on Infant Mental Ability Between 10 Months and 2 Years:

Recent research in behavioral genetics has found evidence for a Gene × Environment interaction on cognitive ability: Individual differences in cognitive ability among children raised in socioeconomically advantaged homes are primarily due to genes, whereas environmental factors are more influential for children from disadvantaged homes. We investigated the developmental origins of this interaction in a sample of 750 pairs of twins measured on the Bayley Short Form test of infant mental ability, once at age 10 months and again at age 2 years. A Gene × Environment interaction was evident on the longitudinal change in mental ability over the study period. At age 10 months, genes accounted for negligible variation in mental ability across all levels of socioeconomic status (SES). However, genetic influences emerged over the course of development, with larger genetic influences emerging for infants raised in higher-SES homes. At age 2 years, genes accounted for nearly 50% of the variation in mental ability of children raised in high-SES homes, but genes continued to account for negligible variation in mental ability of children raised in low-SES homes.

They used a standard SEM model. I’m not going to go over that in detail, but suffice it to say that they related a set of variables to the outputs which they wanted to predict, performance on I.Q. tests for very young children. If you are curious, the demographic sample was rather diverse, and controlling for race did not impact their outcomes. So let’s outline what’s going on here.

First, predicted:

- Performance at 10 months
- Performance at 2 years

Second, putative predictors:

- Genes (A). Specifically, additive genetic variance
- Shared environment (C)
- Non-shared environment (E)

I’ve reedited some of the main results. On the Y axis you see the % of variance explainable by A, C, and E. The variance components are broken down into two levels: SES, and age. 2 SD means 2 standard deviations. In a normal distribution that’s the ~2% tail at the ends.

What you see are two trends with age and SES:

- For infants at the age of 10 months parents matter. Genes do not. SES is not a major issue.

- For tots at the age of 2 years, SES matters quite a bit. You see a recapitulation with the earlier data, where higher SES parents seem to be providing environments which probably exhibit diminishing marginal returns (environmental variance doesn’t have much of an effect on the margin), so that genetic variance is much more important by default. The trend is clear as you move in a step-wise fashio up the class ladder. Though I have to say, the top ~2% in SES is an elite group already, so I wonder what sort of environmental variance could be found there.

The figure to the left shows the same outcome out of their model, only now the curves illustartes the variation of the effects as you modify SES in a continuous fashion. These are estimates generated out of their model, so that probably explains the > 100% values you see on the margins. The key is to focus on the broad qualitative trends. Even at 2 years of age genes start to trump shared environment ~1 standard deviation above the norm (though not aggregate “environment”). If the earlier data is correct, the heritability will continue to increase over time for higher SES individuals, as their affluent backgrounds will give them perfect freedom to take them where their dispositions lead them.

Why does all this matter? There are practical outcomes to this sort of research. I’ll quite from the paper:

These findings build on a growing body of literature that highlights the importance of early life experiences for cognitive development…Current evidence suggests that, although children maintain a great deal of neurobiological and behavioral plasticity well past infancy…the predictive validity of infant mental ability for later cognitive ability is moderate…We agree with Bornstein and Sigman…who have strongly argued against the perspective “that infancy might play little or no role in determining the eventual cognitive performance of the child and, therefore, that individuals could sustain neglect in infancy if remediation were later made available”…Heckman…has recently taken an economic perspective on this topic. He argued that prophylactic interventions for disadvantaged younger children produce much higher rates of return on what he termed “human skill formation” than later remedial interventions for older children and adults do. On the basis of this perspective, Heckman concluded that “at current levels of funding, we overinvest in most schooling and post-schooling programs and underinvest in preschool programs for disadvantaged persons”….

My understanding is that the long-term effectiveness of even Head Start is non-existent, so I don’t know what proposals could be made based on this. Preschool for 1-2 years? I find it broadly plausible that high SES parents do provide more enriching environments, but I don’t see the detailed understanding necessary for genuinely effective prescriptions. Rather, we’re doing conventional trial & error when it comes to policy.

Additionally, the authors also admit that the high and low SES populations may have been stratified for genes. That’s just a way of saying that it isn’t as if genetic variance for things like intelligence are necessarily equally distributed across the social classes. If a genuine meritocracy exists what one should rapidly see is a crystallization of hereditary class castes, as individuals marry and associate assortatively on a meritocratic basis. Remember, assortative mating should increase heritability estimates (Quantitative Genetics says so!). This is part of the irony of some peoples’ conception of how genes relate to outcomes. Equality of opportunity will almost certainly lead to a cleaner separation of outcomes by genetic variation. In a chaotic world defined by random acts many people will find themselves in positions at variance with their aptitudes or dispositions. Once you remove the environmental randomness, then from each according to their capabilities should be the outcome.

For future investigation: the hypothesis that Goldman Sachs partners are precursors to Guild Navigators!

Citation: Tucker-Drob EM, Rhemtulla M, Harden KP, Turkheimer E, & Fask D (2010). Emergence of a Gene x Socioeconomic Status Interaction on Infant Mental Ability Between 10 Months and 2 Years. Psychological science : a journal of the American Psychological Society / APS PMID: 21169524

December 22, 2010

Heritability and genes as causes

Filed under: Genetics,Genomics,Heritability,Missing Heritability — Razib Khan @ 11:25 am

Since the beginning of this weblog (I’ve been writing for eight years) heritability has been a major confusion. Even long time readers misunderstand what I’m trying to get at when I talk about heritability. That’s why posts such as Mr. Luke Jostins‘ are so helpful. I had seen references to a piece online, The Causes of Common Diseases are Not Genetic Concludes a New Analysis, but I hadn’t given it much thought. Until Ms. Mary Carmichael’s post DNA, Denial, and the Rise of “Environmental Determinism”. She begins:

Michael Pollan, the well-known writer on food and agriculture, is a smart guy. His arguments tend to be nuanced and grounded in common sense. I like his basic maxim on nutrition – “Eat food. Not too much. Mostly plants” – so much that I recently promoted it in a Newsweek cover story. He’s the last person I’d suspect of reactionary thinking, which is why I wish I didn’t have to say this: Michael Pollan has made a deeply unfortunate mistake.

A few days ago, speaking to his 43,000 followers on Twitter, Pollan linked to an essay written by an environmental advocacy group that spends much of its time fighting the depradations of Big Agriculture. Curiously, the essay wasn’t about ecological destruction or even about agriculture. It was about human genetics. It argued that since genetics currently can’t explain everything about inheritance, genes must not influence the development of disease, and thus the causes of illness must be overwhelmingly environmental (meaning “uninherited” as opposed to “caused by pollution,” though the latter category of factors is part of the former one). This was a little like arguing that your engine doesn’t power your car because sometimes it breaks down in a way that confuses your mechanic — and concluding that gasoline alone is sufficient to make a car with no engine run. But Pollan took the argument at face value. He said it showed “how the gene-disease paradigm appears to be collapsing.” He was troubled that its contentions apparently had gone unnoticed: “Why aren’t we hearing about this?!”

Of course I had seen Dr. Daniel MacArthur’s post Bioscience Resource Project critique of modern genomics: a missed opportunity in my RSS, but when I started reading the rebuttal I immediately thought “Dr. Dan’s interlocutors sound kind of dumb,” and I stopped reading. After reading the post I don’t think they’re dumb, I think they’re being lawyerly. Much of the piece is a rhetorical tour de force in leveraging the prejudices and biases of the intended readership. This is the Intelligent Design version of Left-wing “Blank Slate” Creationism.* They smoothly manipulate real findings in a deceptive shell game intended to convince the public, and shape public policy. Their success is evident in Pollan’s response. “X paradigm appears to be collapsing.” “Why aren’t we hearing about this?” Does this sound familiar? Like Dr. MacArthur I think some of the criticisms within the piece are valid. Despite not being hostile to the maxim “better living through chemistry,” I do think that there has been an excessive trend toward pharmaceutical or surgical “cures” in relation to diseases of lifestyle (anti-depressants, gastric bypass, etc.). But we go down a very dangerous path when we make recourse to shoddy means toward ostensibly admirable ends. This sort of discourse is not sustainable! (just used a buzzword intended to appeal right there!)

I honestly can’t be bothered to say much more when so many others already have. This is a boat I missed. But if some of what I say above isn’t clear, I recommend you read the original essay. Then read Dr. MacArthur and Ms. Carmichael. If you’re hungry for more, Ms. Carmichael has a helpful list of links.

* Left Creationism had its most negative manifestation as Lysenkoism, but it suffuses the outlook of many who fear the emergence of a new Nazi abomination. Leon Kamin in the 1970s even claimed that IQ was not heritable at all! Though he backed off such an extreme position, it shows how confident he was that could claim such a thing.

September 29, 2010

Every variant with an author!

I recall projections in the early 2000s that 25% of the American population would be employed as systems administrators circa 2020 if rates of employment growth at that time were extrapolated. Obviously the projections weren’t taken too seriously, and the pieces were generally making fun of the idea that IT would reduce labor inputs and increase productivity. I thought back to those earlier articles when I saw a new letter in Nature in my RSS feed this morning, Hundreds of variants clustered in genomic loci and biological pathways affect human height:

Most common human traits and diseases have a polygenic pattern of inheritance: DNA sequence variants at many genetic loci influence the phenotype. Genome-wide association (GWA) studies have identified more than 600 variants associated with human traits1, but these typically explain small fractions of phenotypic variation, raising questions about the use of further studies. Here, using 183,727 individuals, we show that hundreds of genetic variants, in at least 180 loci, influence adult height, a highly heritable and classic polygenic trait2, 3. The large number of loci reveals patterns with important implications for genetic studies of common human diseases and traits. First, the 180 loci are not random, but instead are enriched for genes that are connected in biological pathways (P = 0.016) and that underlie skeletal growth defects (P < 0.001). Second, the likely causal gene is often located near the most strongly associated variant: in 13 of 21 loci containing a known skeletal growth gene, that gene was closest to the associated variant. Third, at least 19 loci have multiple independently associated variants, suggesting that allelic heterogeneity is a frequent feature of polygenic traits, that comprehensive explorations of already-discovered loci should discover additional variants and that an appreciable fraction of associated loci may have been identified. Fourth, associated variants are enriched for likely functional effects on genes, being over-represented among variants that alter amino-acid structure of proteins and expression levels of nearby genes. Our data explain approximately 10% of the phenotypic variation in height, and we estimate that unidentified common variants of similar effect sizes would increase this figure to approximately 16% of phenotypic variation (approximately 20% of heritable variation). Although additional approaches are needed to dissect the genetic architecture of polygenic human traits fully, our findings indicate that GWA studies can identify large numbers of loci that implicate biologically relevant genes and pathways.

The supplements run to nearly 100 pages, and the author list is enormous. But at least the supplements are free to all, so you should check them out. There are a few sections of the paper proper that are worth passing on though if you can’t get beyond the paywall.

fig1bIn this study they pooled together several studies into a meta-analysis. One thing not mentioned in the abstract: they checked their GWAS SNPs against a family based study. This was important because in the latter population stratification isn’t an issue. Family members naturally overlap a great deal in their genetic background. Also, if I read it correctly they’re focusing on populations of European origin, so this might not capture larger effect alleles which impact between population variance in height but don’t vary within a given population (note that if you explored pigmentation genetics just through Europeans you would miss the most important variable on the world wide scale, SLC24A5, because it’s fixed in Europeans). In any case, as you can see what they did was extrapolate out the number of loci which their methods could capture to explain variation with the predictor being the sample size. At 500,000 individuals they’re at ~700 loci, and around 20% of the heritable variation. My initial thought is that I’m not seeing diminishing returns here, but since I haven’t read the supplements I’ll let that pass since I don’t know the guts of this anyhow. They do assert that they are likely underestimating the power of these methods because there may be be smaller effect common variants which can top off the fraction.

But even they admit that they can go only so far. Here are some sections from the conclusion that lays it out pretty clearly:

By increasing our sample size to more than 100,000 individuals, we identified common variants that account for approximately 10% of phenotypic variation. Although larger than predicted by some models26, this figure suggests that GWA studies, as currently implemented, will not explain most of the estimated 80% contribution of genetic factors to variation in height. This conclusion supports the idea that biological insights, rather than predictive power, will be the main outcome of this initial wave of GWA studies, and that new approaches, which could include sequencing studies or GWA studies targeting variants of lower frequency, will be needed to account for more of the ‘missing’ heritability. Our finding that many loci exhibit allelic heterogeneity suggests that many as yet unidentified causal variants, including common variants, will map to the loci already identified in GWA studies, and that the fraction of causal loci that have been identified could be substantially greater than the fraction of causal variants that have been identified.

In our study, many associated variants are tightly correlated with common nsSNPs, which would not be expected if these associated common variants were proxies for collections of rare causal variants, as has been proposed27. Although a substantial contribution to heritability by less common and/or quite rare variants may be more plausible, our data are not inconsistent with the recent suggestion28 that many common variants of very small effect mostly explain the regulation of height.

In summary, our findings indicate that additional approaches, including those aimed at less common variants, will likely be needed to dissect more completely the genetic component of complex human traits. Our results also strongly demonstrate that GWA studies can identify many loci that together implicate biologically relevant pathways and mechanisms. We envisage that thorough exploration of the genes at associated loci through additional genetic, functional and computational studies will lead to novel insights into human height and other polygenic traits and diseases.

The second to last paragraph takes a shot at David Goldstein’s idea of synthetic associations.

We’re still where we were a a few years back though, old fashioned Galtonian quantitative genetics, a branch of statistics, is the best bet to predict the heights of your offspring. As with intelligence, “height genes”, are not improvements upon common sense. But if you’re going into the 10-20% range of variation explained it’s certainly not trivial, and the biological details are going to be of interest.

Powered by WordPress