Razib Khan One-stop-shopping for all of my content

April 23, 2017

Why the rate of evolution may only depend on mutation

Filed under: Evolutionary Genetics,Genetics,Population genetics — Razib Khan @ 10:07 pm

Sometimes people think evolution is about dinosaurs.

It is true that natural history plays an important role in inspiring and directing our understanding of evolutionary process. Charles Darwin was a natural historian, and evolutionary biologists often have strong affinities with the natural world and its history. Though many people exhibit a fascination with the flora and fauna around us during childhood, often the greatest biologists retain this wonderment well into adulthood (if you read W. D. Hamilton’s collections of papers, Narrow Roads of Gene Land, which have autobiographical sketches, this is very evidently true of him).

But another aspect of evolutionary biology, which began in the early 20th century, is the emergence of formal mathematical systems of analysis. So you have fields such as phylogenetics, which have gone from intuitive and aesthetic trees of life, to inferences made using the most new-fangled Bayesian techniques. And, as told in The Origins of Theoretical Population Genetics, in the 1920s and 1930s a few mathematically oriented biologists constructed much of the formal scaffold upon which the Neo-Darwinian Synthesis was constructed.

The product of evolution

At the highest level of analysis evolutionary process can be described beautifully. Evolution is beautiful, in that its end product generates the diversity of life around us. But a formal mathematical framework is often needed to clearly and precisely model evolution, and so allow us to make predictions. R. A. Fisher’s aim when he wrote The Genetical Theory Natural Selection was to create for evolutionary biology something equivalent to the laws of thermodynamics. I don’t really think he succeeded in that, though there are plenty of debates around something like Fisher’s fundamental theorem of natural selection.

But the revolution of thought that Fisher, Sewall Wright, and J. B. S. Haldane unleashed has had real yields. As geneticists they helped us reconceptualize evolutionary process as more than simply heritable morphological change, but an analysis of the units of heritability themselves, genetic variation. That is, evolution can be imagined as the study of the forces which shape changes in allele frequencies over time. This reduces a big domain down to a much simpler one.

Genetic variation is concrete currency with which one can track evolutionary process. Initially this was done via inferred correlations between marker traits and particular genes in breeding experiments. Ergo, the origins of the “the fly room”.

But with the discovery of DNA as the physical substrate of genetic inheritance in the 1950s the scene was set for the revolution in molecular biology, which also touched evolutionary studies with the explosion of more powerful assays. Lewontin & Hubby’s 1966 paper triggered a order of magnitude increase in our understanding of molecular evolution through both theory and results.

The theoretical side occurred in the form of the development of the neutral theory of molecular evolution, which also gave birth to the nearly neutral theory. Both of these theories hold that most of the variation with and between species on polymorphisms are due to random processes. In particular, genetic drift. As a null hypothesis neutrality was very dominant for the past generation, though in recent years some researchers are suggesting that selection has been undervalued as a parameter for various reasons.

Setting the live scientific debate, which continue to this day, one of the predictions of neutral theory is that the rate of evolution will depend only on the rate of mutation. More precisely, the rate of substitution of new mutations (where the allele goes from a single copy to fixation of ~100%) is proportional to the rate of mutation of new alleles. Population size doesn’t matter.

The algebra behind this is straightforward.

First, remember that the frequency of the a new mutation within a population is \frac{1}{2N}, where N is the population size (the 2 is because we’re assuming diploid organisms with two gene copies). This is also the probability of fixation of a new mutation in a neutral scenario; it’s probability is just proportional to its initial frequency (it’s a random walk process between 0 and 1.0 proportions). The rate of mutations is defined by \mu, the number of expected mutations at a given site per generation (this is a pretty small value, for humans it’s on the order of 10^{-8}). Again, there are 2N individuals, so you have 2N\mu to count the number of new mutations.

The probability of fixation of a new mutations multiplied by the number of new mutations is:

    \[ \( \frac{1}{2N} \) \times 2N\mu = \mu \]

So there you have it. The rate of fixation of these new mutations is just a function of the rate of mutation.

Simple formalisms like this have a lot more gnarly math that extend them and from which they derive. But they’re often pretty useful to gain a general intuition of evolutionary processes. If you are genuinely curious, I would recommend Elements of Evolutionary Genetics. It’s not quite a core dump, but it is a way you can borrow the brains of two of the best evolutionary geneticists of their generation.

Also, you will be able to answer the questions on my survey better the next time!

The logic of human destiny was inevitable 1 million years ago

Filed under: Evolution,Genetics,Genomics,Human Evolution,Human Genetics — Razib Khan @ 1:11 pm

Robert Wright’s best book, Nonzero: The Logic of Human Destiny, was published near 20 years ago. At the time I was moderately skeptical of his thesis. It was too teleological for my tastes. And, it does pander to a bias in human psychology whereby we look to find meaning in the universe.

But this is 2017, and I have somewhat different views.

In the year 2000 I broadly accepted the thesis outlined a few years later in The Dawn of Human Culture. That our species, our humanity, evolved and emerged in rapid sequence, likely due to biological changes of a radical kind, ~50,000 years ago. This is the thesis of the “great leap forward” of behavioral modernity.

Today I have come closer to models proposed by Michael Tomasello in The Cultural Origins of Human Cognition and Terrence Deacon in The Symbolic Species: The Co-evolution of Language and the Brain. Rather than a punctuated event, an instance in geological time, humanity as we understand it was a gradual process, driven by general dynamics and evolutionary feedback loops.

The conceit at the heart of Robert J. Sawyer’s often overly preachy Neanderthal Parallax series, that if our own lineage went extinct but theirs did not they would have created a technological civilization, is I think in the main correct. It may not be entirely coincidental that the hyper-drive cultural flexibility of African modern humans evolved in African modern humans first. There may have been sufficient biological differences to enable this to be likely. But I believe that if African modern humans were removed from the picture Neanderthals would have “caught up” and been positioned to begin the trajectory we find ourselves in during the current Holocene inter-glacial.

Luke Jostins’ figure showing across board encephalization

The data indicate that all human lineages were subject to increased encephalization. That process trailed off ~200,000 years ago, but it illustrates the general evolutionary pressures, ratchets, or evolutionary “logic”, that applied to all of them. Overall there were some general trends in the hominin lineage that began to characterized us about a million years ago. We pushed into new territory. Our rate of cultural change seems to gradually increased across our whole range.

One of the major holy grails I see now and then in human evolutionary genetics is to find “the gene that made us human.” The scramble is definitely on now that more and more whole genome sequences from ancient hominins are coming online. But I don’t think there will be such gene ever found. There isn’t “a gene,” but a broad set of genes which were gradually selected upon in the process of making us human.

In the lingo, it wasn’t just a hard sweep from a de novo mutation. It was as much, or even more, soft sweeps from standing variation.

April 20, 2017

Aryan marauders from the steppe came to India, yes they did!

Filed under: Genetics,Genomics,History,India — Razib Khan @ 10:21 pm

Its seems every post on Indian genetics elicits dissents from loquacious commenters who are woolly on the details of the science, but convinced in their opinions (yes, they operate through uncertainty and obfuscation in their rhetoric, but you know where the axe is lodged). This post is an attempt to answer some questions so I don’t have to address this in the near future, as ancient DNA papers will finally start to come out soon, I hope (at least earlier than Winds of Winter).

In 2001’s The Eurasian Heartland: A continental perspective on Y-chromosome diversity Wells et al. wrote:

The current distribution of the M17 haplotype is likely to represent traces of an ancient population migration originating in southern Russia/Ukraine, where M17 is found at high frequency (>50%). It is possible that the domestication of the horse in this region around 3,000 B.C. may have driven the migration (27). The distribution and age of M17 in Europe (17) and Central/Southern Asia is consistent with the inferred movements of these people, who left a clear pattern of archaeological remains known as the Kurgan culture, and are thought to have spoken an early Indo-European language (27, 28, 29). The decrease in frequency eastward across Siberia to the Altai-Sayan mountains (represented by the Tuvinian population) and Mongolia, and southward into India, overlaps exactly with the inferred migrations of the Indo-Iranians during the period 3,000 to 1,000 B.C. (27). It is worth noting that the Indo-European-speaking Sourashtrans, a population from Tamil Nadu in southern India, have a much higher frequency of M17 than their Dravidian-speaking neighbors, the Yadhavas and Kallars (39% vs. 13% and 4%, respectively), adding to the evidence that M17 is a diagnostic Indo-Iranian marker. The exceptionally high frequencies of this marker in the Kyrgyz, Tajik/Khojant, and Ishkashim populations are likely to be due to drift, as these populations are less diverse, and are characterized by relatively small numbers of individuals living in isolated mountain valleys.

In a 2002 interview with the India site Rediff, the first author was more explicit:

Some people say Aryans are the original inhabitants of India. What is your view on this theory?

The Aryans came from outside India. We actually have genetic evidence for that. Very clear genetic evidence from a marker that arose on the southern steppes of Russia and the Ukraine around 5,000 to 10,000 years ago. And it subsequently spread to the east and south through Central Asia reaching India. It is on the higher frequency in the Indo-European speakers, the people who claim they are descendants of the Aryans, the Hindi speakers, the Bengalis, the other groups. Then it is at a lower frequency in the Dravidians. But there is clear evidence that there was a heavy migration from the steppes down towards India.

But some people claim that the Aryans were the original inhabitants of India. What do you have to say about this?

I don’t agree with them. The Aryans came later, after the Dravidians.

Over the past few years I’ve gotten to know the above first author Spencer Wells as a personal friend, and I think he would be OK with me relaying that to some extent he was under strong pressure to downplay these conclusions. Not only were, and are, these views not popular in India, but the idea of mass migration was in bad odor in much of the academy during this period. Additionally, there was later work which was less clear, and perhaps supported an Indian origin for R1a1a. Spencer himself told me that it was not impossible for R1a to have originated in India, but a branch eventually back-migrated to southern Asia.

But even researchers from the group at Stanford where he had done his postdoc did not support this model by the middle 2000s, Polarity and Temporality of High-Resolution Y-Chromosome Distributions in India Identify Both Indigenous and Exogenous Expansions and Reveal Minor Genetic Influence of Central Asian Pastoralists. In 2009 a paper out of an Indian group was even stronger in its conclusion for a South Asian origin of R1a1a, The Indian origin of paternal haplogroup R1a1* substantiates the autochthonous origin of Brahmins and the caste system.

By 2009 one might have admitted that perhaps Spencer was wrong. I was certainly open to that possibility. There was very persuasive evidence that the mtDNA lineages of South Asia had little to do with Europe or the Middle East.

Yet a closer look at the above papers reveals two major systematic problems.

First, ancient DNA has made it clear that there has been major population turnover during the Holocene, but this was not the null hypothesis in the 2000s. Looking at extant distributions of lineages can give one a distorted view of the past. Frankly, the 2009 Indian paper was egregious in this way because they included Turkic groups in their Central Asian data set. Even in 2009 there was a whole lot of evidence that Central Asian Turkic groups were likely very different from Indo-European Turanian populations which would have been the putative ancestors of Indo-Aryans. Honestly the authors either consciously loaded the die to reduce the evidence for gene flow from Central Asia, or they were ignorant (the nature of the samples is much clearer in the supplements than the  primary text for what it’s worth).

Second, Y chromosomal marker sets in the 2000s were constrained to fast mutating microsatellite regions or less than 100 variant SNPs on the Y. Because it is so repetitive the Y chromosome is hard to sequence, and it really took the technologies of the last ten years to get it done. Both the above papers estimate the coalescence of extant R1a1a lineages to be 10-15,000 years before the present. In particular, they suggest that European and South Asian lineages date back to this period, pushing back any possible connection between the groups, and making it possible that European R1a1a descended from a South Asian founder group which was expanding after the retreat of the ice sheets. The conclusions were not unreasonable based on the methods they had.  But now we have better methods.*

Whole genome sequencing of the Y, as well as ancient DNA, seems to falsify the above dates. Though microsatellites are good for very coarse grain phyolgenetic inferences, one has to be very careful about them when looking at more fine grain population relationships (they are still useful in forensics to cheaply differentiate between individuals, since they accumulate variation very quickly). They mutate fast, and their clock may be erratic.

Additionally, diversity estimates were based on a subset of SNP that were clearly not robust. R1a1a is not diverse anywhere, though basal lineages seem to be present in ancient DNA on the Pontic steppe in some cases.

To show how lacking in diversity R1a1a is, here are the results of a 2016 paper which performed whole genome sequencing on the Y. Instead of relying on the order of 10 to 100 SNPs, this paper discover over 65,000 Y variants worldwide. Notice how little difference there is between different South Asian groups below, indicative of a massive population expansion relatively recently in time which didn’t even have time to exhibit regional population variation. They note that “The most striking are expansions within R1a-Z93 [the South Asian clade], ~4.0–4.5 kya. This time predates by a few centuries the collapse of the Indus Valley Civilization, associated by some with the historical migration of Indo-European speakers from the western steppes into the Indian sub-continent.

(BEB = Bengali, GIH = Gujarati, PJL = Punjabi, STU = Sri Lanka Tamil, ITU = Indian Telugu)

The spatial distribution of Z93 lineages of R1a is as you can see to the left. There are branches in South Asia, Central Asia, and in the Altai region. Ancient DNA from the Bronze Age Mongolia has found Z93. Modern Mongolians clearly have a small, but appreciable, fraction of West Eurasian ancestry. Some also carry R1a1a. Z93 has also been found in North-Central Asian steppe samples that date to ~4,500 years before the present.

Today with ancient DNA we’re discovering individuals who lived around the time of the massive  expansion alluded to above. What are these individuals like? They are a mix of European, Central Eurasian, Near Eastern, and Siberian. Many of them share quite a bit of ancestry with South Asian populations, in particular those from the northwest of subcontinent, as well as upper castes more generally.

A new paper using ancient DNA from Scythians (Iranian speakers) also shows that they carried Z93. Some of them had East Asian admixture. These were the ones from the eastern steppe. So not entirely surprising. In the supplements of the paper they have an admixture plot with many populations. At K = 15 in supplementary figure 14 you see many ancient Central Eurasian populations run against modern groups. At this K there is a South Asian modal cluster which is found in South Asians as well as nearby Iranian groups from Afghanistan.

It is not light green or dark blue. You see see that this salmon color is modal in tribal South Indian populations, or non-Brahmin South Indians. It drops in frequency as you move north and west, and as you move up the caste ladder. Observe that is present even among the relatively isolated Kalash people of Chitral.

Outside of South Asia-Afghanistan, this salmon component is found among Thai and Cambodians. From talking to various researchers, and recent published findings, it seems clear that this signature is not spurious, but is indicative of some migration from South Asia to Southeast Asia in the historical period, as one might infer based on cultural affinities. It is also found at lower frequencies among the Uyghur of Xinjiang. This is not entirely surprising either. This region of the Tarim basin was connected to Kashmir across the Pamirs. The 4th century Buddhist monk from the Tarim basin city of Kucha, who was instrumental in the translation of texts into Chinese, Kumārajīva, may have had a Kashmiri father.

Even before Islam much of Northwest India and Central Asia were under the rule of the same polity, and after Islam there is extensive record of the enslavement of many Indians in the cities of the eastern Islamic world, as well as the travel of some Indian merchants and intellectuals into these regions.

And yet this South Asia cluster is not present in the ancient steppe samples carrying R1a1a-Z93. None of them to my knowledge. Many ancient samples share ancestry with South Asians. For example it seems that many ancient West Asian samples from Iran share common history as evident in genetic drift patterns with many South Asians. And, there is good evidence that a subset of South Asians, skewed toward northwest and upper caste groups, share drift with steppe Yamna samples. But South Asians are often clearly composites of these exogenous populations and an indigenous component with affinities with Andaman Islanders, and more distantly Southeast Asians and other eastern non-Africans.

How can you reconcile this with migration out of South Asia? The path is found in publications such as Genetic Evidence for Recent Population Mixture in India. Here you have a paper which models mixing between Ancestral North Indians (ANI) and Ancestral South Indians (ASI). The ANI would be the source population for the ancestry shared with West Eurasians. And, they would lack ASI ancestry because the mixing had not occurred. The admixture dates the paper are between two and four thousand years before the present.

There is a problem though. These methods detect the last admixture events. Therefore, they are a lower bound on major mixing events, not a record of when there was no mixing. Secondarily, but not less importantly, recent work indicates that because of the pulse admixture simplification these methods likely underestimate the time period of admixture.

Another issue for me is the idea that ANI and ASI could be so separate within India. If ANI is the source of gene flow into other parts of Eurasia from South Asia, then I believe that ASI is intrusive to the subcontinent. I don’t think that ASI being intrusive is so implausible. Southeast Asia has undergone massive genetic changes over the Holocene, and it may be that there was much more ASI ancestry in placers like Burma before the arrival of Austro-Asiatic rice farmers. The presence of Austro-Asiatic languages in northeast India and central India shows a precedent of migration from Southeast Asia into the subcontinent.

In sum, the balance of evidence suggests male mediated migration into South Asia from Central Asia on the order of ~4-5,000 years ago. There are lots of details to be worked out, and this is not an assured model in terms of data, but it is the most likely. In the near future ancient DNA will clear up confusions. Writing very long but confused comments just won’t change this state of affairs. New data will.

Addendum: Indian populations have finally been relatively well sampled, thanks to Mait Mepsalu’s group in Estonia, David Reich’s lab and, the Indian collaborators of both, and the 1000 Genomes (HGDP gave us Pakistanis). Additionally, Zack Ajmal’s Harappa website did some work filling in some holes in the early 2010s.

* A Facebook argument broke out about one of my posts where one interlocutor asserted that he leaned on papers from the late 2000s, not all the new stuff. That’s obviously because the new stuff did not support his preferred position, while the old stuff did. I would prefer that faster-than-light travel were possible, so I’ll just stick to physics before 1910?

April 19, 2017

Mouse fidelity comes down to the genes

Filed under: Genetics,Genomics,Human Genetics — Razib Khan @ 10:02 pm

While birds tend to be at least nominally monogamous, this is not the case with mammals. This strikes some people as strange because humans seem to be monogamous, at least socially, and often we take ourselves to be typically mammalian. But of course we’re not. Like many primates we’re visual creatures, rather than relying in smell and hearing. Obviously we’re also bipedal, which is not typical for mammals. And, our sociality scales up to massive agglomerations of individuals.

How monogamous we are is up for debate. Desmond Morris, who is well known to many from his roles in television documentaries, has been a major promoter of the idea that humans are monogamous, with a focus on pair-bonds. In contrast, other researchers have highlighted our polygamous tendencies. In The Mating Mind Geoffrey Miller argues for polygamy, and suggests that pair-bonds in a pre-modern environment were often temporary, rather than lifetime (Miller is now writing a book on polyamory).

The fact that in many societies high status males seem to engage in polygamy, despite monogamy being more common, is one phenomenon which confounds attempts to quickly generalize about the disposition of our species. What is preferred may not always be what is practiced, and the external social adherence to norms may be quite violated in private.

Adducing behavior is simpler in many other organisms, because their range of behavior is more delimited. When it comes to studying mating patterns in mammals voles have long been of interest as a model. There are vole species which are monogamous, and others which are not. Comparing the diverged lineages could presumably give insight as to the evolutionary genetic pathways relevant to the differences.

But North American deer mice, Peromyscus, may turn to be an even better bet: there are two lineages which exhibit different mating patterns which are phylogenetically close enough to the point where they can interbreed. That is crucial, because it allows one to generate crosses and see how the characteristics distribute themselves across subsequent generations. Basically, it allows for genetic analysis.

And that’s what a new paper in Nature does, The genetic basis of parental care evolution in monogamous mice. In figure 3 you can see the distribution of behaviors in parental generations, F1 hybrids, and the F2, which is a cross of F1 individuals. The widespread distribution of F2 individuals is likely indicative of a polygenic architecture of the traits. Additionally, they found that some traits are correlated with each other in the F2 generation (probably due to pleiotropy, the same gene having multiple effects), while others were independent.

With the F2 generation they ran a genetic analysis which looked for associations between traits and regions of the genome. They found 12 quantitative trait loci (QTLs), basically zones of the genome associated with variation on one or more of the six traits. From this analysis they immediately realized there was sexual dimorphism in terms of the genetic architecture; the same locus might have a different effect in the opposite sex. This is evolutionarily interesting.

Because the QTLs are rather large in terms of physical genomic units the authors looked to see which were plausible candidates in terms of function. One of their hits was vasopressin, which should be familiar to many from vole work, as well as some human studies. Though the QTL work as well as their pup-switching experiment (which I did not describe) is persuasive, the fact that a gene you’d expect shows up as a candidate really makes it an open and shut case.

The extent of the variation explained by any given QTL seems modest. In the extended figures you can see it’s mostly in the 1 to 5 percent range. In Carl Zimmer’s excellent write up he ends:

But Dr. Bendesky cautioned that the vasopressin gene would probably turn out to be just one of many that influence oldfield mice. Though it is strongly linked to parental behavior, the vasopressin gene accounts for 6.7 percent of the variation in nest building among males, and only 2.9 percent among females.

The genetic landscape of human parenting will turn out to be even more rugged, Dr. Bendesky predicted.

“You cannot do a 23andMe test and find out if your partner is going to be a good father,” he said.

Sort of. The genetic architecture above is polygenic…but not incredibly diffuse. The proportion of variation explained by the largest effect allele is more than for height, and far more than for education. If human research follows up on this, I wouldn’t be surprised if you could develop a polygenic risk score.

But I don’t have a good intuition on how much variation in humans there really is for these sorts of traits that are heritable. I assume some. But I don’t know how much. And how much of the variance in behavior might be explained by human QTLs? Humans don’t lick or build nests, or retrieve pups. Also, as one knows from Genetics and Analysis of Quantitative Traits sexually dimorphic traits take a long time to evolve. These are two deer mice species. Within humans there may not have been enough time for this sort of heritable complexity of behavior to evolve.

There are a lot of philosophical issues here about translating to a human context.

Nevertheless, this research shows that ingenious animal models can powerfully elucidate the biological basis of behavior.

Citation: The genetic basis of parental care evolution in monogamous mice. Nature (2017) doi:10.1038/nature22074

April 18, 2017

Women hate going to India

Filed under: Anthroplogy,Genetics,Human Genetics,India,Parsi — Razib Khan @ 9:11 pm

For some reason women do not seem to migrate much into South Asia. In the late 2000s I, along with others, noticed a strange discrepancy in the Y and mtDNA lineages which trace one’s direct male and female lines: in South Asia the male lineages were likely to cluster with populations to the north an west, while the females lines did not. South Asia’s females lines in fact had a closer relationship to the mtDNA lineages of Southeast and East Asia, albeit distantly.

One solution which presented itself was to contend there was no paradox at all. That the Y chromosomal lineages found in South Asia were basal to those to the west and north. In particular, there were some papers suggesting that perhaps R1a1a originated in South Asia at the end of the last Pleistocene. Whole genome sequencing of Y chromosomes does not bear this out though. R1a1a went through rapid expansion recently, and ancient DNA has found it in Russia first. But in 2009 David Reich came out with Reconstructing Indian population history, which offered up somewhat of a possible solution.

What Reich and his coworkers found that South Asia seems to be characterized by the mixture of two very different types of populations. One set, ANI (Ancestral North Indian), are basically another western or northwestern Eurasian group. ASI (Ancestral South Indian), are indigenous, and exhibit distant affinities to the Andaman Islanders. The India-specific mtDNA then were from ASI, while the Y chromosomes with affinities to people to the north and west were from ANI. In other words, the ANI mixture into South Asia was probably through a mass migration of males.

But it’s not just Y and mtDNA in this case only. A minority of South Asians speak Austro-Asiatic languages. The most interesting of these populations are the Munda, who tend to occupy uplands in east-central India. Older books on India history often suggest that the Munda are the earliest aboriginals of the subcontinent, but that has to confront the fact that most Austro-Asiatic language are spoken in Southeast Asia. There was no true consensus where they were present first.

Genetics seems to have solved this question. The evidence is building up that Austro-Asiatic languages arrived with rice farmers from Southeast Asia. Though most of the ancestry of the Munda is of ANI-ASI mix, a small fraction is clearly East Asian. And interestingly, though they carry no East Asian mtDNA, they do carry East Asian Y. Again, gene flow mediated by males.

The same is true of India’s Bene Israel Jewish community.

A new preprint on biorxiv confirms that the Parsis are another instance of the same dynamic: The genetic legacy of Zoroastrianism in Iran and India: Insights into population structure, gene flow and selection:

Zoroastrianism is one of the oldest extant religions in the world, originating in Persia (present-day Iran) during the second millennium BCE. Historical records indicate that migrants from Persia brought Zoroastrianism to India, but there is debate over the timing of these migrations. Here we present novel genome-wide autosomal, Y-chromosome and mitochondrial data from Iranian and Indian Zoroastrians and neighbouring modern-day Indian and Iranian populations to conduct the first genome-wide genetic analysis in these groups. Using powerful haplotype-based techniques, we show that Zoroastrians in Iran and India show increased genetic homogeneity relative to other sampled groups in their respective countries, consistent with their current practices of endogamy. Despite this, we show that Indian Zoroastrians (Parsis) intermixed with local groups sometime after their arrival in India, dating this mixture to 690-1390 CE and providing strong evidence that the migrating group was largely comprised of Zoroastrian males. By exploiting the rich information in DNA from ancient human remains, we also highlight admixture in the ancestors of Iranian Zoroastrians dated to 570 BCE-746 CE, older than admixture seen in any other sampled Iranian group, consistent with a long-standing isolation of Zoroastrians from outside groups. Finally, we report genomic regions showing signatures of positive selection in present-day Zoroastrians that might correlate to the prevalence of particular diseases amongst these communities.

The paper uses lots of fancy ChromoPainter methodologies which look at the distributions of haplotypes across populations. But some of the primary results are obvious using much simpler methods.

1) About 2/3 of the ancestry of Indian Parsis derives from an Iranian population
2) About 1/3 of the ancestry of Indian Parsis derives from an Indian popuation
3) Almost all the Y chromosomes of Indian Parsis can be accounted for by Iranian ancestry
4) Almost all the mtDNA haplogroups of Indian Parsis can be accounted for by Indian ancestry
5) Iranian Zoroastrians are mostly endogamous
6) Genetic isolation has resulted in drift and selection on Zoroastrians

The fact that the ancestry proportion is clearly more than 50% Iranian for Parsis indicates that there was more than one generation of males who migrated. They did not contribute mtDNA, but they did contribute genome-wide to Iranian ancestry. There are wide intervals on the dating of this admixture event, but they are consonant oral history that was later written down by the Parsis.

So there you have it. Another example of a population formed from admixture because women hate going to India.

Citation: The genetic legacy of Zoroastrianism in Iran and India: Insights into population structure, gene flow and selection.
Saioa Lopez, Mark G Thomas, Lucy van Dorp, Naser Ansari-Pour, Sarah Stewart, Abigail L Jones, Erik Jelinek, Lounes Chikhi, Tudor Parfitt, Neil Bradman, Michael E Weale, Garrett Hellenthal
bioRxiv 128272; doi: https://doi.org/10.1101/128272

April 15, 2017

Genetic variation in human populations and individuals

Filed under: Genetics,Genomics,Human Genetics,Polymorphisms,SNPs — Razib Khan @ 9:25 pm

I’m old enough to remember when we didn’t have a good sense of how many genes humans had. I vaguely recall numbers around 100,000 at first, which in hindsight seems rather like a round and large number. A guess. Then it went to 40,000 in the early 2000s and then further until it converged to some number just below 20,000.

But perhaps more fascinating is that we have a much better catalog of the variation across the whole human genome now. Often friends ask me questions of the form: “so DTC genomic company X has about 800,000 SNPs, is that enough to do much?” To answer such a question you need some basic numbers in your head, as well as what you want to “do.”

First, the human genome has about 3 billion base pairs (3 Gb). That’s a lot. But most of the genome famously doesn’t code for proteins. The exome, the proportion of the genome where bases directly translate into a protein accounts for 1% of the whole genome. That’s 30 million bases (30 Mb). But this small region of the genome is very important, as the vast majority of major disease mutations are found in the exome.

When it comes to a standard 800K SNP chip, which samples 800,000 positions across the 3 Gb genome, it is likely that the designers enriched the marker set for functional positions relevant to diseases. Not all marker positions are created equal. Though even outside of those functional positions there are often nearby SNPs that can “tag” them, so you can infer one from the state of the other.

But are 800,000 positions enough to make good ancestry inference? (to give one example) Yes. 800,000 is actually a substantial proportion of the polymorphism in any given genome. There have been some papers which improved on the numbers in 2015’s A global reference for human genetic variation, but it’s still a good comprehensive review to get an order-of-magnitude sense. The table below gives you a sense of individual variation:

Median autosomal variant sites per genome

When it comes to single nucleotide polymorphisms (SNPs), what SNP chips are getting at, an 800K array should get a substantial proportion of your genome-wide variation. More than enough for ancestry inference or forensics. The singleton column shows mutations specific to the individual.  When focusing on new mutations specific to an individual that might cause disease, singleton large deletions and nonsynonymous SNPs is really where I’d look.

But what about whole populations? The plot to the left shows the count of variants as a function of alternative allele frequency. When we say “SNP”, you really mean variants which exhibit polymorphism at a particular cut-off frequency for the minor allele (often 1%). It is clear that as the minor allele frequency increases in relation to the human reference genome the number of variants decreases.

From the paper:

The majority of variants in the data set are rare: ~64 million autosomal variants have a frequency <0.5%, ~12 million have a frequency between 0.5% and 5%, and only ~8 million have a frequency >5% (Extended Data Fig. 3a). Nevertheless, the majority of variants observed in a single genome are common: just 40,000 to 200,000 of the variants in a typical genome (1–4%) have a frequency <0.5% (Fig. 1c and Extended Data Fig. 3b). As such, we estimate that improved rare variant discovery by deep sequencing our entire sample would at least double the total number of variants in our sample but increase the number of variants in a typical genome by only ~20,000 to 60,000.

An 800K SNP chip will be biased toward the 8 million or so variants with a frequency of 5%. This number gives you a sense of the limited scope of variation in the human genome. 0.27% of the genome captures a lot of the polymorphism.

Citation: 1000 Genomes Project Consortium. “A global reference for human genetic variation.” Nature 526.7571 (2015): 68-74.

April 14, 2017

Why overdominance probably isn’t responsible for much polymorphism

Filed under: Genetics,Population genetics — Razib Khan @ 10:54 pm

Hybrid vigor is a concept that many people have heard of, because it is very useful in agricultural genetics, and makes some intuitive sense. Unfortunately it often gets deployed in a variety of contexts, and its applicability is often overestimated. For example, many people seem to think (from personal communication) that it may somehow be responsible for the genetic variation around us.

This is just not so. As you may know each human carries tens of millions of genetic variants within their genome. Populations have various levels of polymorphism at particular positions in the genome. How’d they get there? In the early days of population genetics there were two broad schools, the “balance” and “classical.” The former made the case for the importance of balancing selection in maintaining variation. The latter suggested that the variation we see around us is simply a transient between fixation of a favored mutation from a low a frequency or extinction of a disfavored variant (perhaps environmental conditions changed and a high frequency variant is now disfavored). Arguably the rise of neutral theory and empirical results from molecular evolution supported the classical model more than the balance framework (at least this was Richard Lewontin’s argument, and I follow his logic here).

But even in relation to alleles which are maintained at polymorphism through balancing selection, overdominance isn’t going to be the major player.

Sickle cell disease is a classic consequence of overdominance; the heterozygote is more fit than the wild type or the recessive disease which is caused by homozygotes of the mutation. Obviously polymorphism is maintained despite the decreased fitness of the mutant homozygote because the heterozygote is so much more fit than the wild type. The final proportion of the alleles segregating in the population will be conditional on the fitness drag of the homozygote in the mutant type, because as per HWE it will be present in the population ~q2.

The problem is that this is clearly not going to scale across loci. That is, even if the fitness drag is more minimal than is the case with the sickle cell locus, one can imagine a cummulative situation. The segregation load is just going to be too high. Overdominance is probably a transient strategy which fades away as populations evolve more efficient ways to adapt that doesn’t have such a fitness load.

So how does balancing selection still lead to variation without heteroygote advantage? W. D. Hamilton argued that much of it was due to negative frequency dependent selection. Co-evolution with pathogens is the best case of this. As strategies get common pathogens adapt, so rare strategies encoded by rare alleles gain in fitness. As these alleles increase in frequency their fitness decreases due to pathogen resistance. Their frequency declines, and eventually the pathogens lose the ability to resist it, and its frequency increases again.

What if you call for a revolution and no one revolts?

Filed under: EES,Evolution,Genetics,Neo-Darwinian Synthesis — Razib Khan @ 3:30 pm

When I was in 8th grade my earth science teacher explained he did not believe in Darwinism. He seemed a reasonable fellow so my first reaction was shock. My best friend at the time, who sat next to me, laughed, “Yeah, some people believe we’re descended from monkeys! Crazy, huh?” I didn’t really know what to say. But what followed was even more confusing to me: my teacher explained that he accepted punctuated equilibrium, not Darwinism. He did not elaborate much beyond this, though I tried to get at what he believed after class in the few minutes I had.

Later on I realized that he had drunk deeply at the well of Stephen Jay Gould, paleontologist and polymath. I will quote Richard Lewontin, Gould’s longtime collaborator and friend:

Now I should warn you about my prejudices. Steve and I taught evolution together for years and in a sense we struggled in class constantly because Steve, in my view, was preoccupied with the desire to be considered a very original and great evolutionary theorist. So he would exaggerate and even caricature certain features, which are true but not the way you want to present them. For example, punctuated equilibrium, one of his favorites. He would go to the blackboard and show a trait rising gradually and then becoming completely flat for a while with no change at all, and then rising quickly and then completely flat, etc. which is a kind of caricature of the fact that there is variability in the evolution of traits, sometimes faster and sometimes slower, but which he made into punctuated equilibrium literally. Then I would have to get up in class and say “Don’t take this caricature too seriously. It really looks like this…” and I would make some more gradual variable rates. Steve and I had that kind of struggle constantly. He would fasten on a particular interesting aspect of the evolutionary process and then make it into a kind of rigid, almost vacuous rule, because—now I have to say that this is my view—I have no demonstration of it—that Steve was really preoccupied by becoming a famous evolutionist.

Gould succeed, after a fashion. His reputation within evolutionary biology is mixed, at best. Just look at what someone who thinks he made genuine original contributions to science admits above. But in the mind of the public Stephen Jay Gould was an oracle of sorts.

A revolution is sexy. A revolution sells. Having read both of them, I would say that Richard Dawkins is the better stylist when compared to Gould. Additionally, though some might disagree with this Dawkins is closer to the mainline of the modern evolutionary biological tradition than Gould. But in the United States Gould far overshadowed Dawkins…until the latter began to make a name for himself as an anti-religion polemicist in the 2000s. Revolution. Controversy. They’re salient. The press eats it up, and the public trusts the press.

And some things never change. Every few years there is an impending “revolution” in evolutionary biology or genetics. But the revolution is mostly in the minds of a few journalists, and a public that reads a little too much into a puff piece here and there. The sort of well educated public woolly on what the “central dogma” is, but clear that it has been overthrown.

Sometimes this gets out of control. Suzan Mazur’s The Altenberg 16: An Exposé of the Evolution Industry is probably the weirdest instance of this genre of “the sky is falling in evolutionary theory!” But of late some scholars have been coming out with more sober critiques, arguing that the Neo-Darwinian Synthesis needs to be extended or modified significantly. Kevin Laland’s Darwin’s Unfinished Symphony: How Culture Made the Human Mind is the latest instance of this, but you can also read Evolution in Four Dimensions: Genetic, Epigenetic, Behavioral, and Symbolic Variation in the History of Life. You can also read David Dobbs’ sympathetic treatment from a few years back around this issue.

I can communicate to you what seems to be the majority view among the evolutionary biologist I know: there isn’t a need for a revolution in conceptual thought, just a working out of details and reallocation of resources. Many who are sympathetic to Kevin Laland’s argument still believe that it’s about emphases and semantics. There’s no reason to put out a clarion call that evolution needs to be rethought in its conceptual foundations.

Honestly I don’t know if there’s been much that is revolutionary since he original period of the synthesis. Perhaps the rise of molecular evolution and neutrality as a null hypothesis? But even I’m not sure about that.

Erik I. Svensson has put up a preprint which speaks for many people, On reciprocal causation in the evolutionary process. Read the whole thing, it’s thorough, and accessible to a lay audience. The main thing that is a bit surprising is the good work put in for The Dialectical Biologist, which I have heard is an interesting book:

Recent calls for a revision the standard evolutionary theory (ST) are based on arguments about the reciprocal causation of evolutionary phenomena. Reciprocal causation means that cause-effect relationships are obscured, as a cause could later become an effect and vice versa. Such dynamic cause-effect relationships raises questions about the distinction between proximate and ultimate causes, as originally formulated by Ernst Mayr. They have also motivated some biologists and philosophers to argue for an Extended Evolutionary Synthesis (EES). Such an EES will supposedly replace the Modern Synthesis (MS), with its claimed focus on unidirectional causation. I critically examine this conjecture by the proponents of the EES, and conclude, on the contrary, that reciprocal causation has long been recognized as important in ST and in the MS tradition. Numerous empirical examples of reciprocal causation in the form of positive and negative feedbacks now exists from both natural and laboratory systems. Reciprocal causation has been explicitly incorporated in mathematical models of coevolutionary arms races, frequency-dependent selection and sexual selection. Such feedbacks were already recognized by Richard Levins and Richard Lewontin, long before the call for an EES and the associated concept of niche construction. Reciprocal causation and feedbacks is therefore one of the few contributions of dialectical thinking and Marxist philosophy in evolutionary theory, and should be recognized as such. While reciprocal causation have helped us to understand many evolutionary processes, I caution against its extension to heredity and directed development if such an extension involves futile attempts to restore Lamarckian or soft inheritance.

April 13, 2017

The reality of cultural hitchhiking

Filed under: Anthroplogy,Cultural hitchhiking,Genetics,History — Razib Khan @ 2:55 pm

The figure to the left is from a paper, The mountains of giants: an anthropometric survey of male youths in Bosnia and Herzegovina, which attempts to explain why the people from the uplands of the western Balkans are so tall. Anyone who has watched high level basketball, or perused old physical anthropology textbooks, knows that average heights in the Dinaric Alps are quite high in comparison to the rest of Europe, matched only in the region around Scandinavia. The Dutch of late have been the world champions in height, and explanations such as recent selection and their high consumption of dairy products have been given. In this paper the authors point out that the people who live in the Dinaric uplands are not a population which consumes a inordinately high protein diet, at least in relation to their neighbors.

Rather, they suggest that the height of the people who reside in the Dinarics is due to a genetic factor. There is now good genomic evidence that selection accounts for at least some of the difference in height between Northern and Southern Europeans. That is, seems that there have been divergent pressures in these two locales, their genetic differences due to historical demography aside.

The exception to this north-south gradient is obviously in the Dinarics. Another way in which the Dinarics are exception is that it has the highest frequency of Y chromosomal haplgroup I. The other mode of haplogroup I is in Scandinavia. I1 is common among people who live in Sweden, while I2 among the peoples of the western Balkans. I has an interesting history because the vast majority of Mesolithic hunter-gatherer males in Europe belong to this haplogroup. It is very rare outside of Europe. This is in contrast to the other major European haplogroups, which are found outside of Europe at appreciable frequencies.

It is likely that I is indicative of a lineage which roots in Europe which go back to the late Pleistocene period after Last Glacial Maximum ~20,000 years ago. As the world warmed ~10,000 years ago small populations of hunter-gatherers rapidly expanded from their refuges and either most of the males were I, or in the drift process on the edge of the wave of advance I became very common. It is plausible that in terms of alleles which account for variation in height these hunter-gatherers were enriched for those conferring larger size. Cold weather populations tend to be larger. Additionally, they probably consumed a relatively diversified but high protein diet, allowing for greater median size than among farmers at the Malthusian carrying capacity.

But, there has been a lot of selection over the past 10,000 years, and I am skeptical that this correlation between I and height in Europe is anything but a coincidence. Rather, the phylogeny which I exhibits brings me to another issue which I think is not often highlighted: I1 in particular may have “hitchhiked” with the exogenous lineages such as R1b and R1a in early Indo-European society.

That is, in the patrilineal descent groups expanding across the landscape and monopolizing access to resources and mates, the non-invasive I somehow integrated themselves into the broader cultural complex, and partook in the plenty. Like R1b and R1a it exhibits a rake-like topology which suggests rapid recent expansion.

This would not be exceptional. The modern Russian state’s origins are in the polities created by Keivan Rus, who were famously Scandinavian. Rurik was by origin a Sweden, and his dynasty eventually came to encompass most of the eastern Slavic peoples, and rule over the Russian people and state until the 17th century. Because there were so any descendants of this dynasty it was possible to adduce its Y chromosomal haplogroup, N1c1. The kicker is that this is clearly a Finnic lineage, with the most recent evidence being that it is a remnant of a recent migration out of Siberia to the west. The implication here is that the direct male lineage of Rurik were assimilated into the Scandinavian culture and power structure, and were possibly chieftains of Finnic tribes somewhere along the Baltic littoral.

Another example is the House of Wessex. Alfred the Great is arguably the first true king of England. Here are the names of some of the earlier monarchs of the House of Wessex, Ceawlin, Cynric, and Cynegils. Even someone without a background in historical linguistics may be curious about whether these are Anglo-Saxons, and there is a line of thinking that perhaps the forebears of Alfred were British warlords, who “went Saxon,” in a fashion analogous to Gallo-Roman aristocrats who assimilated to Frankish-Germanic norms and forms in the 6th and 7th centuries in the Merovingian domains.

Overall what you see in the genetic data are many things, but rarely a straightforward story. Just as genes can impact culture (e.g., lactase persistence), so culture impacts the distribution of genes. Just as human polities are coalitions, so genetic lineages themselves in their distribution and evolutionary history exhibit fingerprints of these past socio-political events and ideas.

April 12, 2017

Fisherianism in the genomic era

Filed under: Evolutionary Genetics,Genetics — Razib Khan @ 1:07 am

There are many things about R. A. Fisher that one could say. Professionally he was one of the founders of evolutionary genetics and statistics, and arguably the second greatest evolutionary biologist after Charles Darwin. With his work in the first few decades of the 20th century he reconciled the quantitative evolutionary framework of the school of biometry with mechanistic genetics, and formalized evolutionary theory in The Genetical Theory of Natural Selection.

He was also an asshole. This is clear in the major biography of him, R.A. Fisher: The Life of a Scientist. It was written by his daughter.  But The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century also seems to indicate he was a dick. And W. D. Hamilton’s Narrow Roads of Gene Land portrays Fisher has rather cold and distant, despite the fact that Hamilton idolized him.

Notwithstanding his unpleasant personality, R. A. Fisher seems to have been a veritable mentat in his early years. Much of his thinking crystallized in the first few decades of the 20th century, when genetics was a new science and mathematical methods were being brought to bear on a host of topics. It would be decades until DNA was understood to be the substrate of heredity. Instead of deriving from molecular first principles which were simply not known in that day, Fisher and his colleagues constructed a theoretical formal edifice which drew upon patterns of inheritance that were evident in lineages of organisms that they could observe around them (Fisher had a mouse colony which he utilized now and then to vent his anger by crushing mice with his bare hands). Upon that observational scaffold they placed a sturdy superstructure of mathematical formality. That edifice has been surprisingly robust down to the present day.

One of Fisher’s frameworks which still gives insight is the geometric model of the distribution of fitness of mutations. If an organism is near its optimum of fitness, than large jumps in any direction will reduce its fitness. In contrast, small jumps have some probability of getting closer to the optimum of fitness. In plainer language, mutations of large effect are bad, and mutations of small effect are not as bad.

A new paper in PNAS loops back to this framework, Determining the factors driving selective effects of new nonsynonymous mutations:

Our study addresses two fundamental questions regarding the effect of random mutations on fitness: First, do fitness effects differ between species when controlling for demographic effects? Second, what are the responsible biological factors? We show that amino acid-changing mutations in humans are, on average, more deleterious than mutations in Drosophila. We demonstrate that the only theoretical model that is fully consistent with our results is Fisher’s geometrical model. This result indicates that species complexity, as well as distance of the population to the fitness optimum, modulated by long-term population size, are the key drivers of the fitness effects of new amino acid mutations. Other factors, like protein stability and mutational robustness, do not play a dominant role.

In the title of the paper itself is something that would have been alien to Fisher’s understanding when he formulated his geometric model: the term “nonsynonymous” to refer to mutations which change the amino acid corresponding to the triplet codon. The paper is understandably larded with terminology from the post-DNA and post-genomic era, and yet comes to the conclusion that a nearly blind statistical geneticist from about a century ago correctly adduced the nature of mutation’s affects on fitness in organisms.

The authors focused on two primary species which different histories, but well characterized in the evolutionary genomic literature: humans and Drosophila. The models they tested are as follows:


Basically they checked the empirical distribution of the site frequency spectra (SFS) of the nonsynonymous variants against expected outcomes based on particular details of demographics, which were inferred from synonymous variation. Drosophila have effective population sizes orders of magnitude larger than humans, so if that is not taken into account, then the results will be off. There are also a bunch of simulations in the paper to check for robustness of their results, and they also caveat the conclusion with admissions that other models besides the Fisherian one may play some role in their focal species, and more in other taxa. A lot of this strikes me as accruing through the review process, and I don’t have the time to replicate all the details to confirm their results, though I hope some of the reviewers did so (again, I suspect that the reviewers were demanding some of these checks, so they definitely should have in my opinion).

In the Fisherian model more complex organisms are more fine-tuned due topleiotropy and other such dynamics. So new mutations are more likely to deviate away from the optimum. This is the major finding that they confirmed. What does “complex” mean? The Drosophila genome is less than 10% of the human genome’s size, but the migratory locust has twice as large a genome as humans, while wheat has a sequence more than five times as large. But organism to organism, it does seem that Drosophila has less complexity than humans. And they checked with other organisms besides their two focal ones…though the genomes there are not as complete presumably.

As I indicated above, the authors believe they’ve checked for factors such as background selection, which may confound selection coefficients on specific mutations. The paper is interesting as much for the fact that it illustrates how powerful analytic techniques developed in a pre-DNA era were. Some of the models above are mechanistic, and require a certain understanding of the nature of molecular processes. And yet they don’t seem as predictive as a more abstract framework!

Citation: Christian D. Huber, Bernard Y. Kim, Clare D. Marsden, and Kirk E. Lohmueller, Determining the factors driving selective effects of new nonsynonymous mutations PNAS 2017 ; published ahead of print April 11, 2017, doi:10.1073/pnas.1619508114

April 10, 2017

Sexual selection decreasing difference

Filed under: Evolution,Genetics,Sexual Selection — Razib Khan @ 10:32 pm

Sexual selection is often considered a driver of diversification of a lineage. I was introduced to the concept in Jared Diamond’s The Third Chimpanzee, where he suggested that racial differences in appearance might be due to sexual preference, following a suggestion originally made by Charles Darwin. Though sexual selection emerges now and then as a deus ex machina in discussion sections of papers, in general it hasn’t panned out addressing this topic.

But a new paper using shorebirds offers results which oppose this sort of inference, in that sexual selection may be a homogenizing force. Basically the authors used the fact that shorebird lineages have related monogamous and polygamous species. They looked at species richness and genetic diversity using STRUCTURE and microsatellites.

Polygamy slows down population divergence in shorebirds:

Examining microsatellite data from 79 populations in 10 plover species (Genus: Charadrius) we found that polygamous species display significantly less genetic structure and weaker isolation-by-distance effects than monogamous species. Consistent with this result, a comparative analysis including 136 shorebird species showed significantly fewer subspecies for polygamous than for monogamous species. By contrast, migratory behavior neither predicted genetic differentiation nor subspecies richness. Taken together, our results suggest that dispersal associated with polygamy may facilitate gene flow and limit population divergence. Therefore, intense sexual selection, as occurs in polygamous species, may act as a brake rather than an engine of speciation in shorebirds.

A reminder that lots of theorizing may lead you nowhere fast, but a quick empirical check can be very humbling. I’m not sure as to the generality of this result, and ultimately it probably has to do with reproductive variance. But it is a starting point.

Addendum: Overall Geoffrey Miller’s The Mating Mind is probably wrong in most of the details, though perhaps on the most general level there may be something there (I’m wondering particularly in regards to mutational load). But it’s a decent introduction to sexual selection theory in  human context, and has a lot of interesting ideas. And Miller is actually a good writer as far as scientists go.

The human extended phenotype

Filed under: Evolution,Genetics,Neural Crest,Self-domestication — Razib Khan @ 8:17 pm

I think there is something to the hypothesis that we as a species are self-domesticated, but a new preprint really doesn’t change my probability up or down, Comparative Genomic Evidence for Self-Domestication in Homo sapiens. Notwithstanding my own participation in some comparative genomic work, a lot of the conclusions from this field are as clear and obvious to me as the above figure, not very.

To be fair at least the authors of the preprint have a hypothesis they’re testing, the “domestication syndrome” as cause by the neural crest gene modification. Two major issues I’d bring up: it’s comparative genomic because of a paucity of samples, and, tidy explanations often don’t pan out.

Genomic analysis of ancient genomes is very preliminary. Phylogenomic work, which establishes relationships between lineages, can accept a noisy and poor marker set with only a few representative samples. But when looking at population genomics one should at least have either really good data on a small number of individuals, or, more preferable, good-enough-data on lots of individuals. The ancient genomic data set for hominins is not rich enough that I’m confident about any but the most obvious and clear differences between our closest relations and ourselves. The reality of gene flow across populations also adds a confounding element, because it might not be implausible that “modern” alleles actually derive from another ancient lineage, and our modern forebears exhibited the ancestral state.

Second, the neural crest hypothesis and a general model of domestication is rather attractive. I myself find it intriguing, and am curious from a professional scientific perspective. But, attractive hypotheses often do not pan out, and gain early attention because scientists are human, and exhibit some bias and hope. A case in point, mirror neurons has stalled as a silver bullet to explain all sorts of unique aspects of human cognition. Neural crest models are part of the long quest to establish the genes which make us unique and human, even though I’m not even sure this is a wrong question.

The preprint did remind me of an excellent book I read over 10 years ago, The Cultural Origins of Human Cognition. I am much more well disposed toward the thesis now than I was then, in large part because I now longer hold to a “big bang” theory of the origin of modern humanity due to a behavioral revolution triggered by a rapid suite of genetic changes. Rather, I suspect a cultural model where there is reciprocal feedback with genetic changes in a sort of ratchet has a lot more utility, in part because the gap between “archaic” H. sapiens and our own ancestors was I believe much smaller in many ways in relation to behavior than we’ve assumed until lately. Finally, the genetic evidence of lots of lateral gene flow across these distinct branches is indicative of more complexity in the origin of humanity than we had previously understood.

There is also the whole idea of “self-domestication.” I think perhaps it needs to be more explicitly formulated in an ecological sense. Rather than self-domestication, what occurred is that a host of species began to inhabit an evolving “extended phenotype” which humans were a motive engine within. But we need to be cautious about overemphasizing our agency. Once human societies became agricultural beyond a certain point it is not not possible to revert back to hunter-gathering lifestyles without migration or mass die off. In some ways we are as much pawns in the forces unleashed by our original choices and actions as the domestic animals and plants and parasites which have come along for the ride.

Citation: Comparative Genomic Evidence for Self-Domestication in Homo sapiens, Constantina Theofanopoulou, Simone Gastaldon, Thomas O’Rourke, Bridget D Samuels, Angela Messner, Pedro Tiago Martins, Francesco Delogu, Saleh Alamri, Cedric Boeckx, doi: https://doi.org/10.1101/125799

April 8, 2017

Just because it’s not hereditary does not mean you can affect it

Filed under: Behavior Genetics,Culture,Genetics — Razib Khan @ 10:11 pm

A comment below from John:

Love to see a post about which human traits worth caring about are notable for having little or no hereditary component. It is all good and well to know what we cannot change, but it makes more sense to focus personally and as a parent on those things that aren’t genetically preordained.

This is a common sentiment I’ve seen. If you haven’t read The Nurture Assumption, please do so. I’d say a substantial reason I think that The Blank Slate is a good book is that Steven Pinker promoted Judith Rich Harris’ work.

With that out of the way: the implication in the comment above is that hereditary traits are the ones you have least control over, so you should focus on the non-hereditary traits. To some extent there is truth in this. Micronutrients are important. You don’t want to turn you children into cretins.

But a major problem with the idea that we can impact environmental impacts on characteristics is that on many traits we don’t know what those environmental impacts are. You can take a behavior genetic model and come to the following conclusion: within the population 50% of the variation is due to genes, 40% of the variation is due to non-shared environment, and 10% of the variation is due to shared environment (parents). We don’t really usually know what the non-shared environment means. It might be just developmental noise. It might be epistatic genetic effects. Or, in relation to behavior, it might be peer group, as Judith Rich Harris asserts.

We just don’t know. What that means is that the hereditary components are what you have legitimate effective control over through mate choice. And shared environment. These two combined are not nothing. And of course there is the impact of nation or community on the environmental in which propensities are expressed.

Addendum: The non-shared environmental variance was once explained to me as a “noise” factor. Just to give you a sense of how well we understand it.

Why only one migrant per generation keeps divergence at bay

The best thing about population genetics is that because it’s a way of thinking and modeling the world it can be quite versatile. If Thinking Like An Economist is a way to analyze the world rationally, thinking like a population geneticist allows you to have the big picture on the past, present, and future, of life.

I have some personal knowledge of this as a transformative experience. My own background was in biochemistry before I became interested in population genetics as an outgrowth of my lifelong fascination with evolutionary biology. It’s not exactly useless knowing all the steps of the Krebs cycle, but it lacks in generality. In his autobiography I recall Isaac Asimov stating that one of the main benefits of his background as a biochemist was that he could rattle off the names on medicine bottles with fluency. Unless you are an active researcher in biochemistry your specialized research is quite abstruse. Population genetics tends to be more applicable to general phenomena.

In a post below I made a comment about how one migrant per generation or so is sufficient to prevent divergence between two populations. This is an old heuristic which goes back to Sewall Wright, and is encapsulated in the formalism to the left. Basically the divergence, as measured by Fst, is proportional to the inverse of 4 time the proportion of migrants times the total population + 1. The mN is equivalent to the number of migrants per generation (proportion times the total population). As the mN become very large, the Fst converges to zero.

The intuition is pretty simple. Image you have two populations which separate at a specific time. For example, sea level rise, so now you have a mainland and island population. Since before sea level rise the two populations were one random mating population their initial allele frequencies are the same at t = 0. But once they are separated random drift should begin to subject them to divergence, so that more and more of their genes exhibit differences in allele frequencies (ergo, Fst, the between population proportion of genetic variation, increases from 0).

Now add to this the parameter of migration. Why is one migrant per generation sufficient to keep divergence low? The two extreme scenarios are like so:

  1. Large populations change allele frequency very slowly due to drift, so only a small proportion of migration is needed to prevent them from diverging
  2. Small populations change allele frequency very fast due to drift, so a larger proportion of migration is needed to prevent them from drifting

Within a large population one migrant is a small proportion, but drift is occurring very slowly. Within a small population drift is occurring fast, but one migrant is a relatively large proportion of a small population.

Obviously this is a stylized fact with many details which need elaborating. Some conservation geneticists believe that the focus on one migrant is wrongheaded, and the number should be set closer to 10 migrants.

But it still gets at a major intuition: gene flow is extremely powerful and effective at reducing differences between groups. This is why most geneticists are skeptical of sympatric speciation. Though the focus above is on drift, the same intuition applies to selective divergence. Gene flow between populations work at cross-purposes with selection which drives two groups toward different equilibrium frequencies.

This is why it was surprising when results showed that Mesolithic hunter-gatherers and farmers in Europe were extremely genetically distinct in close proximity for on the order of 1,000 years. That being said, strong genetic differentiation persists between Pygmy peoples and their agriculturalist neighbors, despite a long history of living nearby each other (Pygmies do not have their own indigenous languages, but speak the tongue of their farmer neighbors). In the context of animals physical separation is often necessary for divergence, but for humans cultural differences can enforce surprisingly strong taboos. Culture is as strong a phenomenon as mountains or rivers….

April 7, 2017

Why humans have so many pulse admixtures

Filed under: Admixture,Evolution,Genetics,Genomics — Razib Khan @ 5:38 pm

The Blank Slate is one of my favorite books (though I’d say The Language Instinct is unjustly overshadowed by it). There is obviously a substantial biological basis in human behavior which is mediated by genetics. When The Blank Slate came out in the early 2000s one could envisage a situation in 2017 when empirically informed realism dominated the intellectual landscape. But that was not to be. In many ways, for example in sex differences, we’ve gone backward, while there is still undue overemphasis in our society on the environmental impact parents have on children (as opposed to society more broadly).

But genes do not determine everything, obviously. Several years after reading The Blank Slate I read Not by Genes Alone: How Culture Transformed Human Evolution. In this work Peter Richerson and Robert Boyd outline their decades long project of modeling cultural variation and evolution formally in a manner reminiscent of biological evolution. Richerson and Boyd’s program does not start from a “blank slate” assumption. Rather, it is focused on broad macro-social dynamics where cultural variation “swamps” out biological variation.

Recall that in classic population genetic theory a major problem with group level selection is that gene flow between adjacent groups quickly removes between group variation. One migrant between two groups per generation is enough for them not to diverge genetically. For group selection to occur the selective effect has to be very strong or the between group difference has to be very high. Rather than talking about genetics though, where the debate is still live, and the majority consensus is still that biological group selection is not that common (depending on how you define it), let’s talk about human culture.

Here the group level differences are extreme and the boundaries can be sharp. Historically it seems likely that most groups which were adjacent to each other looked rather similar because of gene flow and similar selective pressures. Even though in medieval Spain there was a generality, probably true, that Muslims were swarthier than Christians*, there was a palpable danger in battle of identifying friend from foe because the two groups overlapped too much in appearance.

This brings up how one might delineate differences culturally. In battle opposing armies wear distinct uniforms and colors so that the distinction can be made. But obviously one change uniform surreptitiously (perhaps taking the garb from the enemy dead). This is why physical adornment such as tattoos are useful, as they are “hard to fake.” Perhaps the most clear illustration of this dynamic is the Biblical story for the origin of the term shibboleth. Even slight differences in accent are clear to all, and, often difficult to mimic once in adulthood.

Biological evolution mediated through genes is relatively slow and constrained compared to cultural evolution. Whole regions of central and northern Europe shifted from adherence to Roman Catholicism to forms of Protestantism on the order of 10 years. Of course religion is an aspect of culture where change can happen very rapidly, but even language shifts can occur in only a few generations (e.g., the decline of regional German and Italian dialects in the face of standard forms of the language).

Cultural evolution as a formally modeled neofunctionalism is credibly outlined in works such as Peter Turchin’s Ultrasociety: How 10,000 Years of War Made Humans the Greatest Cooperators on Earth. That’s not what I want to focus on here. Rather, I contend that the reality of massive pulse admixtures evident in the human genome over the past 10,000 years, at minimum, is a function of the fact that human cultural evolutionary processes result in winner-take-all genetic consequences.

A concrete example of what I’m talking about would compare the peoples of the Italian peninsula and the Iberian peninsula around 1500. The two populations are not that different genetically, and up to that point shared many cultural traits (and continue to do so). But, a combination of geography and history resulted in Iberian demographic expansion in the several hundred years after 1500, whereby today there are probably many more descendants of Iberians than Italians. This is not a function of any deep genetic difference between the two groups. There aren’t deep genetic differences in fact. Rather, the social and demographic forces which propelled Iberia to imperial status redounded upon the demographic production of Iberians in the future. In addition, the New World underwent a massive pulse admixture between Iberians, and native Amerindians, as well as Africans, usually brought over as slaves, due the cultural and political history of the period.

The pulse admixture question is rather interesting academically. To some extent current methods are biased toward detection of pulse admixtures, and even fit continuous gene flow as pulse admixtures. A quick rapid exchange of gene flow and then recombination breaking apart associations of markers which are ancestrally informative haplotypes is something you can test for. But I think we can agree that the gene flow triggered by the Columbian Exchange was a pulse admixture, and there’s too much concurrent evidence from uniparental lineage turnover in the ancient DNA to dismiss the non-historically corroborated signatures of pulses as simply artifacts.

Nevertheless continuous gene flow does occur. That is, normal exchange of individuals between neighboring demes as a slow simmer over time. But the idea that we are a clinal ring species or something like that isn’t right in my opinion. Part of the story are strong geographical barriers. But another major part is that cultural revolutions and advantages introduce huge short-term demographic advantages to particular groups, and the shake out of inter-group competition can be dramatic.

Therefore, I make a prediction: the more cultural evolutionary dynamics a species is subject to, the more pulse admixture you’ll be able to detect. For example, pulse admixture should be more important in social insects than their solitary relatives.

* Not only was some of the ancestry of Muslims North African, Muslim rule was longest in the southern and southeastern regions, where people were not as fair as in the north.

Direct-to-consumer genomics, it’s back on!

Filed under: 23andMe,DTC,Genetics,Personal genomics — Razib Khan @ 8:11 am

The past three and a half years, and arguably longer, there has been something of a dark night passing over direct to consumer (DTC) personal genomics. The regulatory issues have been unclear to unfavorable. If you have read this blog you know 23andMe‘s saga with the Food and Drug Administration.

It looks like 2017 DTC is finally turning a regulatory corner, with some clarity and freedom to operate, FDA Opens Genetic Floodgates with 23andMe Decision:

Today, the U.S. Food and Drug Administration told gene-testing company 23andMe that it will be allowed to directly tell consumers whether their DNA puts them at higher risk for 10 different diseases, including late-onset Alzheimer’s disease and Parkinson’s.

The decision to allow these direct-to-consumer tests is a big vindication for 23andMe, which in 2013 was forced to cease marketing such results after the FDA said they could be inaccurate and risky to consumers, and that they required regulatory approval.

I still agree with my assessment in 2013, this won’t mean anything in the long run. DTC is here to stay, and if the decentralization of medical testing and services don’t happen in the USA, they’ll happen elsewhere, and at some point medical tourism will get cheap enough that any restrictions in this nation won’t be of relevance. But, this particular decision alters the timeline in the grand scheme of things, and matters a great deal for specific players.

It’s on!

April 4, 2017

Sex bias in migration from the steppe (revisited)

Filed under: Anthroplogy,Genetics,Genomics,History — Razib Khan @ 11:21 pm

Last fall I blogged a preprint which eventually came out as a paper in PNAS, Ancient X chromosomes reveal contrasting sex bias in Neolithic and Bronze Age Eurasian migrations. The upshot is that the authors found that there was far less steppe ancestry on the X chromosomes of Bronze Age Central Europeans than across the whole genome. The natural inference here is that you had migrations of males into territory where they had to find local wives.

But the story does not end there. Iosif Lazaridis and David Reich have put out a short not on biorxiv, Failure to Replicate a Genetic Signal for Sex Bias in the Steppe Migration into Central Europe. It’s short, so I suggest you read the note yourself, but the major issue seems to be that on X chromosomes ADMIXTURE in supervised mode seems to behave really strangely. Lazaridis and Reich find that there seems to be a downward bias of steppe ancestry. Ergo, the finding was an artifact.

Goldberg et al. almost immediately responded, Reply To Lazaridis And Reich: Robust Model-Based Inference Of Male-Biased Admixture During Bronze Age Migration From The Pontic-Caspian Steppe. Their response seems to be that yes, ADMIXTURE does behave strangely, but the overall finding is still robust.

With these uncertainties I do wonder if it’s hard at this point to evaluate the alternative models. But, we do have archaeology and mtDNA. What do those say? On that basis, from what little I know, I am inclined to suspect a strong male bias of migration.

Citation: Reply To Lazaridis And Reich: Robust Model-Based Inference Of Male-Biased Admixture During Bronze Age Migration From The Pontic-Caspian Steppe, Amy Goldberg, Torsten Gunther, Noah A Rosenberg, Mattias Jakobsson
bioRxiv 122218; doi: https://doi.org/10.1101/122218

Citation: Failure to Replicate a Genetic Signal for Sex Bias in the Steppe Migration into Central Europe, Iosif Lazaridis, David Reich, bioRxiv 114124; doi: https://doi.org/10.1101/114124

How Tibetans can function at high altitudes

Filed under: Altitude Adaptation,Evolution,Genetics,Genomics,Human Evolution,Tibetans — Razib Khan @ 11:10 am

About seven years ago I wrote two posts about how Tibetans manage to function at very high altitudes. And it’s not just physiological functioning, that is, fitness straightforwardly understood. High altitudes can cause a sharp reduction in reproductive fitness because women can not carry pregnancies to term. In other words, high altitude is a very strong selection pressure. You adapt, or you die off.

For me there have been two things of note since those original papers came out. First, one of those loci seem to have been introgressed from a Denisovan genetic background. I want to be careful here, because the initial admixture event may not have been into the Tibetans proper, but earlier hunter-gatherers who descend from Out of Africa groups, who were assimilated into the Tibetans as they expanded 5-10,000 years ago. Second, it turns out that dogs have been targeted for selection on EPAS1 as well (the “Denisovan” introgression) for altitude adaptation as well.

This shows that in mammals at least there’s a few genes which show up again and again. The fact that EPAS1 and EGLN1 were hits on relatively small sample sizes also reinforces their powerful effect. When the EPAS1 results initially came out they were highlighted as the strongest and fastest instance of natural selection in human evolutionary history. One can quibble about the details about whether this was literally true, but that it was a powerful selective event no one could deny.

A new paper in PNAS, Genetic signatures of high-altitude adaptation in Tibetans, revisits the earlier results with a much larger sample size (the research group is in China) comparing Han Chinese and Tibetans. They confirm the earlier results, but, they also find other loci which seem likely targets of selection in Tibetans. Below is the list:

SNP A1 A2 Frequency of A1 P value FST Nearest gene
Tibetan EAS (Han)
rs1801133 A G 0.238 0.333 6.30E-09 0.021 MTHFR
rs71673426 C T 0.102 0.013 1.50E-08 0.1 RAP1A
rs78720557 A T 0.498 0.201 4.70E-08 0.191 NEK7
rs78561501 A G 0.599 0.135 6.10E-15 0.414 EGLN1
rs116611511 G A 0.447 0.003 3.60E-19 0.57 EPAS1
rs2584462 G A 0.211 0.549 3.90E-09 0.203 ADH7
rs4498258 T A 0.586 0.287 1.70E-08 0.171 FGF10
rs9275281 G A 0.095 0.365 1.10E-10 0.162 HLA-DQB1
rs139129572 GA G 0.316 0.449 5.80E-09 0.036 HCAR2
P value indicates the P value from the MLMA-LOCO analysis. FST is the FST value between Tibetans and EASs. Nearest gene indicates the nearest annotated gene to the top differentiated SNP at each locus except EGLN1, which is known to be associated with high-altitude adaptation; rs139129572 is an insertion SNP with two alleles: GA and G. A1, allele 1; A2, allele 2.

Many of these genes are familiar. Observe the allele frequency differences between the Tibetans and other East Asians (mostly Han). The sample sizes are on the order of thousands, and the SNP-chip had nearly 300,000 markers. What they found was that the between population Fst of Han to Tibetan was ~0.01. So only 1% of the SNP variance in their data was partitioned between the two groups. These alleles are huge outliers.

The authors used some sophisticated statistical methods to correct for exigencies of population structure, drift, admixture, etc., to converge upon these hits, but even through inspection the deviation on these alleles is clear. And as they note in the paper it isn’t clear all of these genes are selected simply for hypoxia adaptation. MTFHR, which is quite often a signal of selection, may have something to due to folate production (higher altitudes have more UV). ADH7 is part of a set of genes which always seem to be under selection, and HLA is never a surprise.

Rather than get caught up in the details it is important to note here that expansion into novel habitats results in lots of changes in populations, so that two groups can diverge quite fast on functional characteristics.  The PCA makes it clear that Tibetans and Hans have very little West Eurasian admixture, and the Fst based analysis puts their divergence on the order of 5,000 years before the present. The authors admit honestly that this is probably a lower bound value, but I also think it is quite likely that Tibetans, and probably Han too, are compound populations, and a simple bifurcation model from a common ancestral population is probably shaving away too many realistic edges. In plainer language, there has been gene flow between Han and Tibetans probably <5,000 years ago, and Tibetans themselves probably assimilated more deeply diverged populations in the highlands as they expanded as agriculturalists. An estimate of a single divergence fits a complex history to too simple of a model quite often.

The take home: understanding population history is probably important to get a better sense of the dynamics of adaptation.

Citation: Jian Yang, Zi-Bing Jin, Jie Chen, Xiu-Feng Huang, Xiao-Man Li, Yuan-Bo Liang, Jian-Yang Mao, Xin Chen, Zhili Zheng, Andrew Bakshi, Dong-Dong Zheng, Mei-Qin Zheng, Naomi R. Wray, Peter M. Visscher, Fan Lu, and Jia Qu, Genetic signatures of high-altitude adaptation in Tibetans, PNAS 2017 ; published ahead of print April 3, 2017, doi:10.1073/pnas.1617042114

March 29, 2017

List of top 10 evolutionary biologists in history

Filed under: Evolution,Evolutionary Biologists,Genetics — Razib Khan @ 4:00 am

What is your list of the top 10 evolutionary biologists in history? I’m asking because this came up in a discussion with a friend. Obviously the composition of the list will have to do with disciplinary bias and geography and history (there are Russian population geneticists from the 20th century who should be more famous who aren’t).

Here are my top 10 (with two minutes thought given):

1. Charles Darwin
2. R. A. Fisher
3. Sewall Wright
4. J. B. S. Haldane
5. W. D. Hamilton
6. G. G. Simpson
7. John Maynard Smith
8. August Weismann
9. Motoo Kimura
10. Theodosius Dobzhansky

What’s your list? (in the comments)

March 28, 2017

How Indians are a lot like Latin Americans

Filed under: Anthroplogy,Genetics,India — Razib Khan @ 5:45 am

Pretty much any person of Indian subcontinental origin in the United States of a certain who isn’t very dark skinned has probably had the experience of being spoken to in Spanish at some point. When I was younger growing up in Oregon I had the experience multiple times of Spanish speakers, probably Mexican, pleading with me to interpret for them because there was no one else who seemed likely. It isn’t a genius insight to conclude I was most likely South Asian…but it wasn’t out of the question I was Mexican. This applies even more to lighter skinned South Asians. In the Central Valley of California, where there are many Sikhs from Punjabi and Mexicans, this confusion occurred a lot for some Indian kids.

Of course biogeographically there isn’t that much connection between South Asia and the New World. But it isn’t crazy that Christopher Columbus labelled the peoples of the New World “Indian.” After all, they were a brown-skinned people whose features were not African, East Asian, or West Eurasian. And, it turns out genetically there is a coincidence that connects the New World and South Asia: the mixed peoples of Latin America with Amerindian and European ancestry recapitulate an admixture which resembles what occurred in South Asia thousands of years ago. It looks as if about half the ancestry of South Asians is West Eurasian and half something more like eastern Eurasians.

On principles component analysis that means that South Asian and Mexican and Peruvian samples often overlap. This is somewhat curious because the non-West Eurasian ancestors of South Asians and Amerindians diverged in ancestry on the order of 25 to 45 thousand years before the present. And the Iberian ancestry of the mixed people of the New World is almost as far from the character of South Asian West Eurasian ancestry as you can get (in the parlance of this blog, lots of EEF, less CHG, not too much ANE).

A new paper, A genetic chronology for the Indian Subcontinent points to heavily sex-biased dispersals, highlights another similarity: massive bias in biogeographic ancestry by sex. More precisely, the rank order of West Eurasian ancestry in South Asia is skewed like so: Y chromosome > whole-genome > mtDNA (as is evident in the above figure).

I actually began writing about this in the late 2000s, when the fact that South Asian mtDNA was very different from West Eurasian mtDNA, and South Asian Y chromosome was mostly West Eurasian, was obvious. Then work using genome-wide data sets began to point to massive intra-Eurasian admixture between very diverged lineages. The paper is not revolutionary, but worth reading for its thoroughness and how it brings together all the lines of evidence.

Finally, no ancient DNA. That’s probably for the future, but I don’t expect any surprises.

Citation: A genetic chronology for the Indian Subcontinent points to heavily sex-biased dispersals.

Older Posts »

Powered by WordPress