Razib Khan One-stop-shopping for all of my content

April 8, 2019

In search of the missing heritability

Filed under: Genetics,Genomics,heredity,Heritability — Razib Khan @ 9:45 pm

We’ve always known that parents resemble their offspring. An intuitive understanding of how traits are passed down in families is probably as old as our species and its ability to reflect on the world around us. The ancient Romans would often observe an association between a characteristic, for example, red hair, and a particular aristocratic family. And today, it is common to notice how a particular child resembles a particular grandparent. An interest in heredity is part of human nature.

But it has only been within the last 100 years that this intuition was transformed into a quantitative and rigorous science.
Resemblance within a dynasty

And believe it or not, this began before we understood the Mendelian genetic basis of inheritance. In the 19th century, Charles Darwin’s cousin Francis Galton developed the concept of correlation to explore the relationship of characteristics between parents and offspring. Some traits, such as the height of fathers and sons, turn out to exhibit a very strong correlation between the generations. Other traits, such as hairstyle, don’t (this is probably a good thing).

Heritable traits are those variable characteristics where parents and offspring resemble each other due to heredity. Those traits where parents and offspring show no correlation due to relatedness across the population are not heritable. Or, more precisely in the language of statistical genetics, their heritability is very low. In cases where there is a strong correlation between parents and offspring, the heritability is very high. Heritability is evaluated in the range of 0 to 1, with moderate heritability being ~0.5.

A heritability of 0.5 means that 50% of the variation in the trait in the population is due to variation in the genes.

This understanding of heritability was decoupled from an analysis of Mendelian inheritance. Though theoretically fused in the early 20th century thanks to the work of R. A. Fisher, the manner in which heritable polygenic characteristics expressed themselves genetically meant that they were beyond the power of Mendelianism to examine. The genetic effects of any particular gene were very small.

A Mendelian trait, such as cystic fibrosis, is passed through a pedigree and expresses a particular genotype. That is, most of the variation in the expression of cystic fibrosis is due to mutations at a particular gene. Mendelian analysis develops around the insight of genes encoding characteristics, but in the early 20th century the methods had the power to only detect associations at traits where a single gene, or perhaps a few, influenced the variation. In contrast, very polygenic characteristics were only understood through statistical analysis.

One of the distinctions of most heritable traits is that their causes are complex, and, they are often quantitative or continuous. Though some people are ‘tall’ or ‘short’, the reality is that we measure people to obtain a single number on a numerical range. Similarly, though we can divide people into ‘extroverts’ and ‘introverts’, we understand that this disposition is really on a spectrum.

This is in contrast to Mendelian traits, which are of the form where there are clear discrete differences between people with different genotypes. You have cystic fibrosis. Or you don’t.


Historically, for quantitative traits, the goal was to assess the total genome-wide heritability. In domains such as agricultural genetics, this was very important. The more heritable the trait, the more artificial selection could change a characteristic of a line of plants or stock of animals.

In humans, the understanding of heritability had different implications. In the middle decades of the 20th century, there were many theories of environmental triggers for illnesses such as schizophrenia, often focusing upon a child’s mother and her treatment of her offspring. The reality is schizophrenia is a highly heritable trait, so the likelihood of manifesting the illness is strongly dependent on one’s genes, rather than details of upbringing. For decades clinicians were looking at the wrong primary causes.

We may not have known the genes responsible, but we knew it was genes.
The pacifiers are not heritable

One of the most common methods used to understand heritability in humans utilized patterns in twins and their siblings. Scientists realized that identical twins share 100% of their genes, while siblings share about 50% of their genes. The heritability of a trait can then be expressed in terms of the difference between the correlation on the trait between identical twins and full siblings.

Identical twins tend to be rather close in height. Siblings are closer in height than you would expect from two random individuals selected from a population. The correlation being ~0.50. But that is far less than between identical twins. That’s because they share only 50% of their genes, and it turns out that height is a very heritable trait (estimates are 0.8 to 0.9). If the correlation between full siblings and identical twins on a trait is the same, it is quite likely genes have little to do with variation of the trait in the population (as opposed to the environment).

And yet these analyses are very sensitive to broader environmental conditions. Within a family in a developed society, it is unlikely one sibling would get more nutritional resources than another. But this is not true in a pre-modern society, or in the developing world. If an older sibling is born in the midst of a famine, it would not be surprising if there was some permanent stunting later on it life. The shorter adult height of this sibling in comparison to their younger brothers and sisters would then be due to the environment.

So another feature of complex heritable traits is that there is an environmental component to the variation of outcome across the population, at least for any trait where the heritability is less than 1. And, that environmental component is going to vary from society to society. Heritability is not a fixed statistic, but a dynamic one, dependent on conditions.

All of these complexities for heritable traits make it very clear why conventional Mendelian genetics did not attempt to tackle them. Pedigrees and experiments with linked physical characteristics were never going to get very far.

This landscape of indirect inference only began to change at the end of the 20th century, with the revolution in molecular biology which transformed genetics from an abstract field of pattern recognition to a concrete one where scientists began to hunt for specific genes in the physical genome.

A good understanding of genome-wide variation in human populations has only been available for the last ten years.

Until modern sequencing and genotyping technology emerged, we did not even know the number of genes in the human genome! Twenty years ago the number was estimated to be ~100,000. A ballpark guess based on intuition and hunches more than anything else. Fifteen years ago, after the first human genome had been published, the number was reduced to 40,000 genes. Today, the best estimate is 19,000 genes.

Within any given human genome there are about 3 billion DNA base pairs. Of these base pairs, only about 1% are functional in terms of coding proteins. In the vast majority of cases, differences between individuals in traits are due to differences in these regions of the functional genome. That’s about 30 million base pairs. So it is within these 30 million base pairs that the search for the biophysical basis of heritability will occur.

Genetic relatedness of full siblings

In the past, estimates of heritability rested upon “good enough” assumptions, such as relatedness between individuals. Today, genomic methods allow researchers to look at the truth beneath assumptions. Statistical methods assumed that the relatedness between full siblings is 50%. But this is the expected relatedness. Geneticists have always known that due to Mendelian segregation the real value is often different between any two siblings. But they had no direct way of assessing this.

That is until genomic techniques became more advanced and cheaper. In 2006 researchers confirmed that twin studies were correct in their estimate of the heritability of height by looking at how differences between full siblings varied in relation to how truly genetically similar they were. Simply due to the rules of chance, some full siblings shared more than >60% of their genome in common, while others shared <40% of their genome.

Full siblings who were genetically more similar turn out to be more similar in height, to the extent that the inferred heritability to explain the pattern was exactly the same as twin studies.

Researchers were also able to do more than just refine their older methods of assessing heritability: they finally had the tools to begin discovering the specific genes which underly the heritability of complex traits. For a century scientists understood and assumed that there were genes, physical entities responsible for the biology underlying their statistical results. But finally, they would be able to zero in on candidates!

The earliest attempts to understand heritability with genomic technologies and modern computational methods were somewhat disappointing.

For example, for height, work in the late 2000s discovered 40 genetic positions that correlated with height in humans, but these positions explained only 5% of the total heritability estimated from earlier studies using classical methods.

Why were these state-of-the-art methods only detecting a small proportion of the statistically inferred heritability? Some possibilities presented themselves:

  • Perhaps statistical geneticists used flawed methods, and the environmental component was greater than they understood
  • Perhaps single base changes, SNPs, were not responsible for much of the variation. Perhaps it was copy number variation, for example.
  • Perhaps the sample sizes were too small to detect the effects of single genes because most of the effects were too small
  • Perhaps the “SNP chips” did not have enough markers to detect the effects
  • Perhaps many of the variants were are very low frequency and were not typed on the SNP chips

For about a decade this issue of the “missing heritability” hung over the new synthesis between genomics and quantitative genetics. But recently a research group has presented results which suggest that they have solved the missing heritability problem for height. Using whole-genome analysis, which was prohibitively expensive ten years ago, but on the margin of the feasible today, the researchers captured almost all of the missing heritability genomically.

With 20,000 individuals and 50 million markers, the authors argue that rare variation accounts for most of what was missed in earlier studies.

Will these results hold up? Possibly. But the bigger take-home message is to reflect on how far we’ve come in our understanding of heritability in the past century. In the beginning, heritability was understood by looking at similarities across families. This sounds simple, but this straightforward design required a great deal of statistical ingenuity. And, the reality is that with the discovery of DNA and the molecular understanding of the gene, researchers could not satisfactorily answer the question until they connected the statistical causes to molecular processes.

Illumina Sequencing Machine

Today whole-genome sequences, which may have cost $100,000 ten years ago, can be had for less than $1,000. This is the total sequence information of human genetic variation. Whereas decades ago researchers didn’t even know the total number of human genes or the full genetic map of our species, today we can count and locate 19,000 genes.

It is no surprise that many of the genes associated with height in humans turn out to be related to bone development. In this way, the statistical wizardry has produced results in keeping with our expectations of standard biology.

In another ten years, it seems likely that the search for the missing heritability will be a footnote in the history of genetics. But, that footnote will have been fruitful in generating a great deal of science, and helping us solve the mysteries of complex traits.

In search of the missing heritability was originally published in Insitome on Medium, where people are continuing the conversation by highlighting and responding to this story.

Powered by WordPress