Razib Khan One-stop-shopping for all of my content

September 22, 2017

The non-European ancestry of Afrikaners

Filed under: Afrikaner,Genetics — Razib Khan @ 12:37 am


A few years ago I got some South African genotypes. Some of the individuals were clearly African. A few mapped perfectly upon Northern Europeans. But many of the samples consistently were European but shifted toward non-European populations.

Based on history of the assimilation of slaves into the European population of Cape Colony in the 18th century, my assumption is that these individuals are Afrikaners.

Recently I realized that Brenna Henn had released some more Khoisan samples, so I decided to look at this question of admixture again. The two Khoisan populations are the Nama and the Khomani. I removed those with lots of Bantu and European admixture and combined them together into one population.

Running unsupervised Admixture shows how distinct the South African whites are.

The average Utah white in this sample (this population is a mix of British, German, and Scandinavian in ancestry) is 99% European modal cluster, and 1% South Asian. The average for the white South Africans in this data set is 94% European modal cluster. The residual is 1% East Asian (Dai modal), 1% Khosian, 1% non-Khoisan African, and 2% South Asian.

I ran Treemix a bunch of times, and every single plot came out like this when I ran it for three migrations:

 

The gene flow from the Utah whites to the Gujuratis is simply an artifact of the fact that the Gujurati sample is mixed caste, and some of the Brahmin or Lohannas have more “Ancestral North Indian.” The gene flow from the Europeans to the Khoisan is probably real, or, might be due to pastoralist admixture via East Africans. The last migration arrow goes from the African populations to the South African whites, with a shift toward the Khoisan.

I also ran a three population test where A is the outgroup, and B and C are a clade. A significantly negative f3-statistic indicates admixture in population A. The negative values are listed below:

A B C f3 f3-error Z-score
Gujrati Dai UtahWhite -0.00121718 0.000140141 -8.68539
South_Africa EsanNigeria UtahWhite -0.00127718 0.000147982 -8.63059
South_Africa Khoisan_SA UtahWhite -0.0012928 0.000151416 -8.53802
Gujrati South_Africa Dai -0.000778791 0.000155656 -5.00329
South_Africa Dai UtahWhite -0.000541974 0.000133262 -4.06699
South_Africa UtahWhite Gujrati -0.000103581 8.46193e-05 -1.22408

This aligns well with the Admixture results. Afrikaners have both African ancestries, and, Asian ancestry.

In James Michener’s The Covenant one of the plot lines alludes to mixed ancestry in one of the Afrikaner families. The results above suggest that mixed ancestry is very common, and perhaps ubiquitous, in this population. True, there are some Afrikaners such as Hendrik Verwoerd who migrated to South Africa from the Netherlands in the past century or so, but these are uncommon to my knowledge.

September 14, 2017

After agriculture, before bronze

 

The above plot shows genetic distance/variation between highland and lowland populations in Papa New Guinea (PNG). It is from a paper in Science that I have been anticipating for a few months (I talked to the first author at SMBE), A Neolithic expansion, but strong genetic structure, in the independent history of New Guinea.

What does “strong genetic structure” mean? Basically Fst is showing the proportion of genetic variation which is partitioned between groups. Intuitively it is easy to understand, in that if ~1% of the genetic variation is partitioned between groups in one case, and ~10% in another, then it is reasonable to suppose that the genetic distance between groups in the second case is larger than in the first case. On a continental scale Fst between populations is often on the order of ~0.10. That is the value for example when you pool the variation amongst Northern Europeans and Chinese, and assess how much of it can be apportioned in a manner which differentiates populations (so it’s about ~10% of the variation).

This is why ancient DNA results which reported that Mesolithic hunter-gatherers and Neolithic farmers in Central Europe who coexisted in rough proximity for thousands of years exhibited differences on the order of ~0.10 elicited surprise. These are values we are now expecting from continental-scale comparisons. Perhaps an appropriate analogy might be the coexistence of Pygmy groups and Bantu agriculturalists? Though there is some gene flow, the two populations exist in symbiosis and exhibit local ecological segregation.

In PNG continental scale Fst values are also seen among indigenous people. The differences between the peoples who live in the highlands and lowlands of PNG are equivalent to those between huge regions of Eurasia. This is not entirely surprising because there has been non-trivial gene flow into lowland populations from Austronesian groups, such as the Lapita culture. Many lowland groups even speak Austronesian languages today.

Using standard ADMIXTURE analysis the paper shows that many lowland groups have significant East Asian ancestry (red), while none of the highland groups do (some individuals with East Asian admixture seem to be due to very recent gene flow). But even within the highlands the genetic differences are striking. The  Fst values between Finns and Southern European groups such as Spaniards are very high in a European context (due to Finnish Siberian ancestry as well as drift through a bottleneck), but most comparisons within the highland groups in PNG still exceeds this.

The paper also argues that genetic differences between Papuans and the natives of Australia pre-date the rising sea levels at the beginning of the Holocene, when Sahul divided between its various constituents. This is not entirely surprising considering that the ecology of the highlands during the Pleistocene would have been considerably different from Australia to the south, resulting in sharp differences in the hunter-gatherer lifestyles. Additionally, there does not seem to have been a genetic cline. Papuans are symmetrically related to all Australian groups they had samples from.

Using coalescence-based genomic methods they inferred that separation between highlands and some lowland groups occurred ~10-20,000 years ago. That is, after the Last Glacial Maximum. For the highlands, the differences seem to date to within the last 10,000 years. The Holocene. Additionally, they see population increases in the highlands, correlating with the shift to agriculture (cultivation of taro).

None of the above is entirely surprising, though I would take the date inferences with a grain of salt. The key is to observe that large genetic differences, as well as cultural differences, accrued in the highlands of PNG during the Holocene. In the paper they have a social and cultural explanation for what’s going on:

  Fst values in PNG fall between those of hunter-gatherers and present-day populations of west Eurasia, suggesting that a transition to cultivation alone does not necessarily lead to genetic homogenization.

A key difference might be that PNG had no Bronze Age, which in west Eurasia was driven by an expansion of herders and led to massive population replacement, admixture, and cultural and linguistic change (7, 8), or Iron Age such as that linked to the expansion of Bantu-speaking
farmers in Africa (24). Such cultural events have resulted in rapid Y-chromosome lineage expansions due to increased male reproductive variance (25), but we consistently find no evidence for this in PNG (fig. S13). Thus, in PNG, wemay be seeing the genetic, linguistic, and cultural diversity that sedentary human societies can achieve in the absence of massive technology-driven expansions.

Peter Turchin in books like Ultrasociety has aruged that one of the theses in Steven Pinker’s The Better Angels of Our Nature is incorrect: that violence has not decreased monotonically, but peaked in less complex agricultural societies. PNG is clearly a case of this, as endemic warfare was a feature of highland societies when they encountered Europeans. Lawrence Keeley’s War Before Civilization: The Myth of the Peaceful Savage gives so much attention to highland PNG because it is a contemporary illustration of a Neolithic society which until recently had not developed state-level institutions.

What papers like these are showing is that cultural and anthropological dynamics strongly shape the nature of genetic variation among humans. Simple models which assume as a null hypothesis that gene flow occurs through diffusion processes across a landscape where only geographic obstacles are relevant simply do not capture enough of the dynamic. Human cultures strongly shape the nature of interactions, and therefore the genetic variation we see around us.

September 10, 2017

Quantitative genomics, adaptation, and cognitive phenotypes

The human brain utilizes about ~20% of the calories you take in per day. It’s a large and metabolically expensive organ. Because of this fact there are lots of evolutionary models which focus on the brain. In Catching Fire: How Cooking Made Us Human Richard Wrangham suggests that our need for calories to feed our brain is one reason we started to use fire to pre-digest our food. In The Mating Mind Geoffrey Miller seems to suggest that all the things our big complex brain does allows for a signaling of mutational load. And in Grooming, Gossip, and the Evolution of Language Robin Dunbar suggests that it’s social complexity which is driving our encephalization.

These are all theories. Interesting hypotheses and models. But how do we test them? A new preprint on bioRxiv is useful because it shows how cutting-edge methods from evolutionary genomics can be used to explore questions relating to cognitive neuroscience and pyschopathology, Polygenic selection underlies evolution of human brain structure and behavioral traits:

…Leveraging publicly available data of unprecedented sample size, we studied twenty-five traits (i.e., ten neuropsychiatric disorders, three personality traits, total intracranial volume, seven subcortical brain structure volume traits, and four complex traits without neuropsychiatric associations) for evidence of several different signatures of selection over a range of evolutionary time scales. Consistent with the largely polygenic architecture of neuropsychiatric traits, we found no enrichment of trait-associated single-nucleotide polymorphisms (SNPs) in regions of the genome that underwent classical selective sweeps (i.e., events which would have driven selected alleles to near fixation). However, we discovered that SNPs associated with some, but not all, behaviors and brain structure volumes are enriched in genomic regions under selection since divergence from Neanderthals ~600,000 years ago, and show further evidence for signatures of ancient and recent polygenic adaptation. Individual subcortical brain structure volumes demonstrate genome-wide evidence in support of a mosaic theory of brain evolution while total intracranial volume and height appear to share evolutionary constraints consistent with concerted evolution…our results suggest that alleles associated with neuropsychiatric, behavioral, and brain volume phenotypes have experienced both ancient and recent polygenic adaptation in human evolution, acting through neurodevelopmental and immune-mediated pathways.

The preprint takes a kitchen-sink approach, throwing a lot of methods of selection at the phenotype of interest. Also, there is always the issue of cryptical population structure generating false positive associations, but they try to address it in the preprint. I am somewhat confused by this passage though:

Paleobiological evidence indicates that the size of the human skull has expanded massively over the last 200,000 years, likely mirroring increases in brain size.

From what I know human cranial sizes leveled off in growth ~200,000 years ago, peaked ~30,000 years ago, and have declined ever since then. That being said, they find signatures of selection around genes associated with ‘intracranial volume.’

There are loads of results using different methods in the paper, but I was curious note that schizophrenia had hits for ancient and recent adaptation. A friend who is a psychologist pointed out to me that when you look within families “unaffected” siblings of schizophrenics often exhibit deviation from the norm in various ways too; so even if they are not impacted by the disease, they are somewhere along a spectrum of ‘wild type’ to schizophrenic. In any case in this paper they found recent selection for alleles ‘protective’ of schizophrenia.

There are lots of theories one could spin out of that singular result. But I’ll just leave you with the fact that when you have a quantitative trait with lots of heritable variation it seems unlikely it’s been subject to a long period of unidirecitional selection. Various forms of balancing selection seem to be at work here, and we’re only in the early stages of understanding what’s going on. Genuine comprehension will require:

– attention to population genetic theory
– large genomic data sets from a wide array of populations
– novel methods developed by population genomicists
– and funcitonal insights which neuroscientists can bring to the table

September 7, 2017

South Asian gene flow into Burmese and Malays?

Filed under: Burma,Genetics,Malaysia,Southeast Asia — Razib Khan @ 10:22 pm


I happen to have a data set merged from the 1000 Genomes and Estonian Biocentre which has Malays, Burmans, and other assorted Southeast Asians, East Asians, and South Asians. In light of recent posts I thought I would throw out something in relation to this data set (you can download the data here). Above you can see the populations in the data. You see Bangladeshis consistently are shifted toward Southeast Asians in comparison to other South Asians. But both Burmans and Malays exhibit some shift toward South Asians.

I ran ADMIXTURE at K = 4. Click the image for the larger file which shows the populations, but I will tell you what’s going on.

The yellow to green represent a north-south axis in East Asia. The Han sample is mostly yellow, but there is a green component in varying degrees. This almost certainly represents heterogeneity in the Han sample of north to south Chinese. The green component is nearly ~100% in some individuals from indigenous tribes in Borneo, and balanced with the yellow among peninsular Malays. It is more at a higher frequency in Cambodia than in Vietnam or Burma, indicating the older roots of Khmers and their relative insulation from later migrations of Sino-Tibetan and Tai peoples.

The red South Asian component is found in many Southeast Asians, but curious in the Burmans and Malays there is a lot of variation within the population. That indicates admixture over time that has not homogenized throughout the population.

I ran Treemix with 5 migration edges and French rooted (1000 SNP blocks out of 225,000 SNPs) and they all looked like this. Commentary I will leave to readers….

August 29, 2017

Genetics books for the masses!

Filed under: Genetics — Razib Khan @ 10:49 pm

Since I’ve become professionally immersed in genetics I haven’t read many books on the topics. I read papers. And I do genetics. But back in the day I did enjoy a good book. The standard recommendation would be to read Matt Ridley’s Genome. It’s a bit dated now (it was published around when the Human Genome Project being completed), but I’d still recommend it.

But when in the mid-2000s I dabbled a little bit in the world of worm (C. elegans) genetics I read Andrew Brown’s In the Beginning Was the Worm: Finding the Secrets of Life in a Tiny Hermaphrodite. It’s pretty far from my current concerns and fixations, with more of a focus on developmental processes, but it is pretty cool to read about the race to “map” every cell in C. elegans.

The second book I’d recommend readers of this blog is the late Will Provine’s The Origins of Theoretical Population Genetics. Modern population genomics is a massive edifice built atop the foundations of the early 20th century fusion of Mendelism and the biometrical heirs of Darwin. Provine outlines how primitive genetics eventually seeded the birth of the Neo-Darwinian Synthesis.

Why do percentage estimates of “ancestry” vary so much?

Filed under: Genetics,Human Genetics — Razib Khan @ 10:36 pm

When looking at the results in Ancestry DNA, 23andMe, and Family Tree DNA my “East Asian” percentage is:

– 19%
– 13%
– 6%

What’s going on here? In science we often make a distinction between precision and accuracy. Precision is how much your results vary when you re-run an experiment or measurement. Basically, can you reproduce your result? Accuracy refers to how close your measurement is to the true value. A measurement can be quite precise, but consistently off. Similarly, a measurement may be imprecise, but it bounces around the true value…so it is reasonably accurate if you get enough measurements just cancel out the errors (which are random).

The values above are precise. That is, if you got re-tested on a different chip, the results aren’t going to be much different. The tests are using as input variation on 100,000 to 1 million markers, so a small proportion will give different calls than in the earlier test. But that’s not going to change the end result in most instances, even though these methods often have a stochastic element.

But what about accuracy? I am not sure that old chestnuts about accuracy apply in this case, because the percentages that these services provide are summaries and distillations of the underlying variation. The model of precision and accuracy that I learned would be more applicable to the DNA SNP array which returns calls on the variants; that is, how close are the calls of the variant to the true value (last I checked these are arrays are around 99.5% accurate in terms of matching the true state).

What you see when these services pop out a percentage for a given ancestry is the outcome of a series of conscious choices that designers of these tests made keeping in mind what they wanted to get out of these tests. At a high level here’s what’s going on:

  1. You have a model of human population history and dynamics with various parameters
  2. You have data that that varies that you put into that model
  3. You have results which come back with values which are the best fit of that data to the model you specificed

Basically you are asking the computational framework a question, and it is returning its best answer to the question posed. To ask whether the answer is accurate or not is almost not even wrong. The frameworks vary because they are constructed by humans with difference preferences and goals.

Almost, but not totally wrong. You can for example simulate populations whose histories you know, and then test the models on the data you generated. Since you already know the “truth” about the simulated data’s population structure and history, you can see how well your framework can infer what you already know from the patterns of variation in the generated data.

Going back to my results, why do my East Asian percentages vary so much? The short answer is that one of the major variables in the model alluded to above is the nature of the reference population set and the labels you give them.

Looking at Bengalis, the ethnic group I’m from, it is clear that in comparison to other South Asian populations they are East Asian shifted. That is, it seems clear I do have some East Asian ancestry. But how much?

The “simple” answer is to model my ancestry is a mix of two populations, an Indian one and an East Asian one, and then see what the values are for my ancestry across the two components. But here is where semantics becomes important: what is Indian and East Asian? Remember, these are just labels we give to groups of people who share genetic affinities. The labels aren’t “real”, the reality is in the raw read of the sequence. But humans are not capable of really getting anything from millions of raw SNPs assigned to individuals. We have to summarize and re-digest the data.

The simplest explanation for what’s going on here is that the different companies have different populations put into the boxes which are “Indian/South Asian” and “East Asian.” If you are using fundamentally different measuring sticks, then there are going to be problems with doing apples to apples comparisons.

My personal experience is that 23andMe tends to give very high percentages of South Asian ancestry for all South Asians. Because “South Asian” is a very diverse category when tests come back that someone is 95-99% South Asian…it’s not really telling you much. In contrast, some of the other services may be using a small subset of South Asians, who they define as “more typical”, and so giving lower percentages to people from Pakistan and Bengal, who have admixture from neighboring regions to the west and east respectively.*

Something similar can occur with East Asian ancestry. If the “donor” ancestral groups are South Asian and East Asian for me, then the proportions of each is going to vary by how close the donor groups selected by the company is to the true ancestral group. If, for example, Family Tree DNA chose a more Northeastern Asian population than Ancestry DNA, then my East Asian population would vary between the two services because I know my East Asian ancestry is more Southeast Asian.

The moral of the story is that the values you obtain are conditional on the choices you make, and those choices emerge from the process of reducing and distilling the raw genetic variation into a manner which is human interpretable. If the companies decided to use the same model, the would come out with the same results.

* I helped develop an earlier version of MyOrigins, and so can attest to this firsthand.

August 28, 2017

When journalists get out of their depth on genetic genealogy

Filed under: DTC personal genomics,Genealogy,Genetics,Personal genomics — Razib Khan @ 7:39 pm

For some reason The New York Times tasked Gina Kolata to cover genetic genealogy and its societal ramifications, With a Simple DNA Test, Family Histories Are Rewritten. The problem here is that to my knowledge Kolata doesn’t cover this as part of her beat, and so isn’t well equipped to write an accurate and in depth piece on the topic in relation to the science.

This is a general problem in journalism. I notice it most often when it comes to genetics (a topic I know a lot about for professional reasons) and the Middle East and Islam (topics I know a lot about because I’m interested in them). It’s unfortunate, but it has also made me a lot more skeptical of journalists whose track record I’m unfamiliar with.* To give a contrasting example, Christine Kenneally is a journalist without a background in genetics who nevertheless is immersed in genetic genealogy, so that she could have written this sort of piece without objection from the likes of me (she did write a book on the topic, The Invisible History of the Human Race: How DNA and History Shape Our Identities and Our Futures, which I had a small role in fact-checking).

What are the problems with the Kolata piece? I think the biggest issue is that she didn’t go in to test any particular proposition, and leaned on the wrong person for the science. She quotes Joe Pickrell, who knows this stuff like the back of his hand. But a more space is given to Jonathan Marks, an anthropologist who is quite opinionated and voluble, and so probably a “good source” for any journalist.

Marks seems well respected in anthropology from what I can tell, but he’s also the person who put up a picture of L. L. Cavalli-Sforza juxtaposed with a photo of Josef Mengele in the late 1990s during a presentation at Stanford. Perhaps this is why anthropologists respect him, I don’t know, but I do not like him because of his nasty tactics (I wouldn’t be surprised if Marks had power he would make sure people like me were put in political prison camps, his rhetoric is often so unhinged).

Marks’ quotes wouldn’t be much of an issue if Kolata could figure out when he’s making sense, and when he’s just bullshitting. But she can’t. For example:

…“tells me I’m 95 percent Ashkenazi Jewish and 5 percent Korean, is that really different from 100 percent Ashkenazi Jewish and zero percent Korean?”

The precise numbers offered by some testing services raise eyebrows among genetics researchers. “It’s all privatized science, and the algorithms are not generally available for peer review,” Dr. Marks said.

The part about precise numbers is an issue, though a lot less of an issue with high density SNP-chips (the real issue is sensitivity to reference population and other such parameters). But if a modern test says you are 95 percent Ashkenazi Jewish and 5 percent Korean it really is different from 100% Ashkenazi. Someone who comes up as 5% Korean against an Ashkenazi Jewish background is most definitely of some East Asian heritage. In the early 2000s with ancestrally informative markers and microsatellite based tests you’d get somewhat weird results like this, but with the methods used by the major DTC companies (and in academia) today these sorts of proportions are just not reported as false positives. Marks may not know because this isn’t his area, but Pickrell would have. Kolata probably did not think to double-check with him, but that’s because she isn’t able to smell out tendentious assertions. She has no feel for the science, and is flying blind.

Second, Marks notes that the science is privatized, and it isn’t totally open. But it’s just false that the algorithms are not generally available for peer review. All the details of the pipeline are not downloadable on GitHub, but the core ancestry estimation methods are well known. Eric Durand, who wrote the originally 23andMe ancestry composition methodology presented on it at ASHG 2013. I know because I was there during his session.

You can find a white paper for 23andMe’s method and Ancestry‘s. Not everything is as transparent as open science would dictate (though there are scientific papers and publications which also mask or hide elements which make reproducibility difficult), but most geneticists with domain experience can figure out what’s going on and it if it is legitimate. It is. The people who work at the major DTC companies often come out of academia, and are known to academic scientists. This isn’t blackbox voodoo science like “soccer genomics.”

Then Marks says this really weird thing:

“That’s why their ads always specify that this is for recreational purposes only: lawyer-speak for, ‘These results have no scientific standing.’”

Actually, it’s lawyer-speak for “do not sue us, as we aren’t providing you actionable information.” Perhaps I’m ignorant, but lawyers don’t get to define “scientific standing”.

The problem, which is real, is that the public is sometimes not entirely clear on what the science is saying. This is a problem of communication from the companies to the public. I’ve even been in scientific sessions where geneticists who don’t work in population genomics have weak intuition on what the results mean!

Earlier Kolata states:

Scientists simply do not have good data on the genetic characteristics of particular countries in, say, East Africa or East Asia. Even in more developed regions, distinguishing between Polish and, for instance, Russian heritage is inexact at best.

This is not totally true. We have good data now on China and Japan. Korea also has some data. Using haplotype-based methods you can do a lot of interesting things, including distinguish someone who is Polish from Russian. But these methods are computationally expensive and require lots of information on the reference samples (Living DNA does this for British people). The point is that the science is there. Reading this sort of article is just going to confuse people.

On the other hand a lot of Kolata’s piece is more human interest. The standard stuff about finding long lost relatives, or discovering your father isn’t your father. These are fine and not objectionable factually, though they’ve been done extensively before and elsewhere. I actually enjoyed the material in the second half of the piece, which had only a tenuous connection to scientific detail. I just wish these sorts of articles represented the science correctly.

Addendum: Just so you know, three journalists who regularly cover topics I can make strong judgments on, and are always pretty accurate: Carl Zimmer, Antonio Regalado, and Ewen Callaway.

* I don’t follow Kolata very closely, but to be frank I’ve heard from scientist friends long ago that she parachutes into topics, and gets a lot of things wrong. Though I can only speak on this particular piece.

August 10, 2017

But evolution converges!

Filed under: Evolution,Genetics — Razib Khan @ 10:43 pm

Stephen Jay Gould became famous in part for his book Wonderful Life: The Burgess Shale and the Nature of History. By examining the strange creatures in the Burgess Shale formation Gould makes the case that evolution is a highly contingent process, and that if you reran the experiment of life what we’d see might be very different from what we have now.

But the scientist whose study of the formation that inspired Gould’s interpretation, Simon Conway Morris, had very different views. Though it can sometimes be churlish, his rebuttal can be found in The Crucible of Creation: The Burgess Shale and the Rise of Animals. Simony Conway Morris does not believe that contingency is nearly as powerful a force as Gould would have you believe. And his viewpoints are influential. Richard Dawkins leaned on him to make the case for convergence in evolution in The Ancestor’s Tale.

This crossed my mind when reading Carl Zimmer’s new column, When Dinosaurs Ruled the Earth,
Mammals Took to the Skies
:

Today, placental mammals like flying squirrels and marsupials like sugar gliders travel through the air from tree to tree. But Volaticotherium belonged to a different lineage and independently evolved the ability to glide.

They were not the only mammals to do so, it turns out. Dr. Luo and his colleagues have now discovered at least two other species of gliding mammals from China, which they described in the journal Nature.

Dr. Meng said that the growing number of fossil gliders showed that many different kinds of mammals followed the same evolutionary path. “They did their own experiments,” he said.

This ultimately comes down to physics. There are only so many ways you can make an organize that flies or glides. Mammals come to the table with a general body plan, and that can be modified only so many different ways.

This is not a foolproof point of datum in favor of convergence as opposed to contingency. Frankly these are often vague verbal arguments which are hard to refute or confirm. And even molecular evolutionary analyses come to different conclusions. It may be that we are asking the wrong question. But, it does suggest that evolution may work in a much narrower range of parameters as time progresses because of the winnowing power of selection.

August 8, 2017

Jon Snow + Daenerys Targaryen far creepier genetically than you know

Filed under: A Game of Thrones,Game of Thrones,Genetics — Razib Khan @ 10:51 pm
Screenshot 2016-06-14 22.09.51
Credit: poly-m (deviantART)

If you have been following Game of Thrones you have been noticing that there is a brewing romance between Jon Snow, King in the North, and Daenerys Targaryen, the aspiring claimant to her father’s Iron Throne.

Of course there is a twist to all of this: unbenknownst to either, Jon Snow’s biological father is Daenerys’ dead brother, Rhaegar. This means that Daenery’s is Jon Snow’s aunt.

Long-time followers of the world of Game of Thrones are aware that incest between near relations is neither unknown nor shocking. But there is a non-trivial detail which it is important to note. Jon and Daenerys are far more closely related than typical aunts and nephews.

The reason is simple, Daenerys and her brother were the products of two generations of sibling incest. Incest results in inbreeding, and inbreeding as you know results in loss of genetic diversity. By Daenerys’s generation the coefficient of relationship between herself and her brothers was much higher than normal.

To be concrete, the coefficient of relationship of full-siblings is 0.50. That of half-siblings 0.25. Identical twins? Obviously 1.0. Another way to think about this is how much of the genome do any two pairs of individuals share in terms of long tracts of inheritance from recent ancestors. On the whole siblings share about half of their genomes in such a fashion. After two generations of inbreeding Daenerys and Rhaegar have a coefficient of relationship of 0.727 (using Wright’s method). They’re not identical twins, obviously, but their genetic relationship is far closer than full-siblings!

Don’t let the mother of dragons ride you Jon!

Dividing  this in half gives 0.36 as the coefficient of relationship between Jon and Daenerys, as opposed to 0.50 for full-siblings and 0.25 for a conventional aunt-nephew. Jon and Daenerys have almost the same genetic relationship as 3/4 siblings; two individuals who share a common parent, like half-siblings, but whose unshared parents are first order relatives (full-siblings or parent-child).

Not Jaime & Cersei creepy, but still creepy.

Addendum: Though Daenerys is quite inbred, Jon is not at all. One generation of outbreeding can eliminate all inbreeding.

August 2, 2017

When the ancestors were cyclops

Filed under: Genetics,Greek History,Greeks,Minoans,Mycenaeans — Razib Khan @ 4:44 pm


The Greeks are important because Western civilization began with Greece. And therefore modern civilization. I don’t think the Greeks were “Western” truly; my own preference is to state that the West as we understand it is really just Latin Christendom, which emerged in the late first millennium A.D. in any coherent fashion. Yet without Classical Greece and its accomplishments the West wouldn’t make any sense.

But here I have to stipulate Classical, because Greeks existed before the Classical period. That is, a people who spoke a language that was recognizably Greek and worshipped gods recognizable to the Greeks of the Classical period. But these Greeks were not proto-Western in any way. These were the Mycenaeans, a Bronze Age civilization which flourished in the Aegean in the centuries before the cataclysms outlined in 1177 B.C.

The issue with the Mycenaean civilization is that its final expiration in the 11th century ushered in a centuries long Dark Age. During this period the population of Greece seems to have declined, and society reverted to a more simple structure. By the time the Greeks emerged from this Dark Age much had changed. For example, they no longer used Linear B writing. Presumably this technique was passed down along lineages of scribes, whose services were no longer needed, because the grand warlords of the Bronze Age were no longer there to patronize them and make use of their skills. In its stead the Greeks modified the alphabet of the Phoenicians.

To be succinct the Greeks had to learn civilization all over again. The barbarian interlude had broken continuous cultural memory between the Mycenaeans and the Greeks of the developing polises of the Classical period. The fortifications of the Mycenaeans were assumed by their Classical descendants to be the work of a lost race which had the aid of monstrous cyclops.

Of course not all memories were forgotten. Epic poems such as The Iliad retained the memory of the past through the centuries. The list of kings who sailed to Troy actually reflected the distribution of power in Bronze Age Greece, while boar’s tusk helmets mentioned by Homer were typical of the period. To be sure, much of the detail in Homer seems more reflective of a simpler society of petty warlords, so the nuggets of memory are encased in later lore accrued over the centuries.

When antiquarians and archaeologists began to take a look at the Bronze Age Aegean the assumption by many was that the Mycenaeans were not Greek, but extensions of the earlier Minoan civilization. The whole intellectual history here is outlined in Michael Wood’s 1980s documentary In Search of the Trojan War. But suffice it to say that many were shocked when Michael Ventris deciphered Linear B, and found that it was clearly Greek!

The surprise here was partly due to the fact that though Mycenaean cultural remains indicated a different civilization from that of the Minoans, its motifs clearly inherited from the earlier group. Mycenaeans seemed in many ways to be Minoans in chariots. And the presumption has long been that the Minoans themselves were not an Indo-European group. In fact, the island of Crete had developed early on and become part of the orbit of civilized states from the northern Levant down to Egypt, including Cyprus. Therefore some scholars hypothesized an Egyptian connection.

In any case, the Mycenaeans were Greek. And Homer then most certainly must have transmitted traditions which went back to the Bronze Age.

At this point we can now speak to demographics with some data, as Nature has come out with a paper using ancient DNA from Mycenaeans, Minoans, as well as Bronze Age Anatolians, Genetic origins of the Minoans and Mycenaeans:

The origins of the Bronze Age Minoan and Mycenaean cultures have puzzled archaeologists for more than a century. We have assembled genome-wide data from 19 ancient individuals, including Minoans from Crete, Mycenaeans from mainland Greece, and their eastern neighbours from southwestern Anatolia. Here we show that Minoans and Mycenaeans were genetically similar, having at least three-quarters of their ancestry from the first Neolithic farmers of western Anatolia and the Aegean12, and most of the remainder from ancient populations related to those of the Caucasus3 and Iran45. However, the Mycenaeans differed from Minoans in deriving additional ancestry from an ultimate source related to the hunter–gatherers of eastern Europe and Siberia678, introduced via a proximal source related to the inhabitants of either the Eurasian steppe1,69 or Armenia49. Modern Greeks resemble the Mycenaeans, but with some additional dilution of the Early Neolithic ancestry. Our results support the idea of continuity but not isolation in the history of populations of the Aegean, before and after the time of its earliest civilizations.

About 85% of the ancestry of the Minoan samples could be modeled as being derived from Anatolian farmers, the ancestors of the “Early European Farmers” (EEF) that introduced agriculture to most of the continent, and whose heritage is most clear in modern populations among Sardinians. For the three Mycenaean samples the value is closer to 80% (though perhaps high 70s is more accurate).

Now the question though is what’s the balance? For the Minoans the residual is a component which seems to derive from “Eastern Farmer” populations. Additionally the authors note that the Y chromosomes in four out of five individuals in their Mycenaean-Minoan-Anatolians are haplogroup J associated with these eastern groups, rather than the ubiquitous G2 of the earlier farmer populations. The authors suggest that in the 4th millennium B.C. there was a demographic event where this ancestral component swept west, and served as the common Mycenaean-Minoan (and Anatolian) substrate.

But the Mycenaean samples (one of which was elite, two of which were not) also have a third component: affinities with steppe populations. One model which presents itself is that there was a pulse out of the Balkans, and this was part of the dynamic described in Massive migration from the steppe was a source for Indo-European languages in Europe. But another model, which they could not reject, is that the steppe affinity came from the east, perhaps from a proto-Armenian population. Additionally, they did not find much steppe ancestry in the Anatolian samples at all.

My own preference is for a migration through the Balkans. It seems relatively straightforward. As for why the Anatolian samples did not have the steppe ancestry, the authors provide the reasonable supposition that Indo-European in Anatolia branched off first, and the demographic signal was diluted over successor generations. Perhaps. But another aspect of Anatolia is that it seems the Hittites, the Nesa, where never a numerous population in comparison to the Hatti amongst whom they lived. Perhaps a good model for their rise and takeover may be that of the post-Roman West and the Franks in Gaul.

Then the question becomes how does a less numerous people impose their language on a more numerous one? This happens. See the Hungarians for an example. In fact the paper which covered the other end of the Mediterranean, The population genomics of archaeological transition in west Iberia: Investigation of ancient substructure using imputation and haplotype-based methods, suggests that language shift can occur in unpredictable ways. On the one hand Basques seem to have mostly Indo-European Y chromosomes, but their whole genome ancestry indicates less exogenous input than their neighbors. Speaking of which, we know by the Classical period large regions of western Spain were dominated by Celtic speaking peoples, but  the genetic imprint of the Indo-Europeans is still very modest in the Iberian peninsula.

I think what we’re seeing here is the difference between Indo-European agro-pastoralists arriving to a landscape of relatively simple societies with more primal institutions, and those who migrated into regions where local population densities are higher and social complexity is also greater. This higher social complexity means that external elites can takeover a system, as opposed to an almost animal competition for resources as seems to have occurred in Northern Europe.

Finally, at the end of the supplements there is an analysis of the physical features of the Minoans and Mycenaeans. There’s not much that’s surprising. The Minoans and Mycenaeans were a dark haired and dark eyed folk. Why should this surprise us at all? We actually have self representations of them! That’s what they look like. If anything they were darker than modern Greeks (small sample size means power to draw conclusions is not high). Why?

Two reasons that come to mind: natural selection, and the fact that modern Greeks seem to be shifted to continental Europeans to their north, likely due to migration. My number one contender here are the Scalveni Slavic tribes which pushed into much of Greece in the second half of the 6th century A.D. (though a minority of Greek samples I’ve seen don’t exhibit much skew toward Slavs at all).

In the future with more samples and more genomes we’ll know more. But I think this work emphasizes that when it comes to Europe most of the demographic patterns we see around us date to the Bronze Age or earlier.

July 28, 2017

The Indo-Aryan question nearing resolution

Filed under: Genetics,science — Razib Khan @ 5:50 pm


India Today published my review of the current state of the genetics and genomics of the Indian subcontinent, and what it can tell us about the ethnogenesis of South Asians generally. In the piece I tried to be very circumspect and stick to what we know with a high, if not perfect, degree of certainty. Here I will add some comments where I reduce the threshold of certainty somewhat. That, I’m going to include here my beliefs where I think I’m right, but in some details wouldn’t be surprised if I was wrong.

First, the title is Aryan wars: Controversy over new study claiming they came from the west 4,000 years ago. Writers don’t get to choose titles, and this is not one I would have chosen. But I am not in a position to care or know what draws clicks. Let’s note that this “controversy” is restricted mostly to India. Outside of India it’s not controversial, but a matter of the science, because people don’t have any political or social investment in the topic. It reminds me of debates about genetics and intelligence in the West, where emotions get overwrought and lies fly wildly with abandon.*

Second, there is a reference in the figures to an “Out of India” (OIT)  model. That is, the Aryans migrated out of India, and implicitly the Indo-European languages derive from South Asia. I don’t think this theory has any support at all. That is, I think it is rather clear that proto-Indo-European probably emerged neither in Europe proper, nor in South Asia, but in the Inner Eurasian spaces between. But for an Indian audience ignoring OIT would seem a peculiar lacunae, so there was a reference added to the figure on that account (I pushed back against this, but do not make ultimate decisions on figures).

But I do think it was plausible up until 2009’s Reconstructing Indian History to suggest that most modern South Asian ancestry dates to the Pleistocene. In this framework the Indo-Europeanization of the subcontinent was primarily a cultural one, where small groups of Central Asians imposed their language on the native population. What the genome-wide work has shown is that South Asians are the product of a large-scale mixing process between a population very distant from West Eurasians (“Ancestral South Indians”, ASI) and a population which was indistinguishable from other West Eurasians (“Ancestral North Indians”, ANI).

Since ANI is indistinguishable from West Eurasians I hold it is clearly a West Eurasian population in provenance. Those who reject this position from a scientific perspective believe that there could have been some sort continuous zone of “ANI-like” habitation from northwestern South Asia up into northern Inner Eurasia (and perhaps toward West Asia as well) dating from the late Pleistocene. I do not that believe this is plausible, and I will tell you that prominent researchers who I have brought up this idea to are somewhat incredulous.**

Third, there are major unresolved issues genetically in relation to the dates and the total number of mixing populations. I am quite confident saying around half of the total South Asian genomic ancestry today derives from populations who were living outside of South Asia on the Holocene-Pleistocene boundary 11,700 years ago. Much of that ancestry probably flourished between the Caucasus and Zagros mountains. The remainder somewhere in the vast swath of territory between the Baltic and Siberia (perhaps further south, toward the Pamirs?).

But I am not confident of the relative balances of contribution to the ANI. It does seem that the northern component, which is derived in part from the southern component, is much more prominent in upper castes and northwestern populations. In contrast the southern component is found throughout the subcontinent.

In Genomic insights into the origin of farming in the Near East there is analysis of South Asia in the supplements. The author concludes that ANI can not be modeled as a single population (Zack Ajmal and I were saying this in 2010). The top hits for the sources of ANI tend to be the genomic sample from the Zagros, in western Iran (before subsequent admixture with Levantine farmers), and a population similar to the Yamna culture the steppe. The issue seems to be that later steppe populations which harbor a fair amount of “Early European Farmer” ancestry (e.g., LBK in Central Europe) due likely to back migration aren’t good model fits.

Below are two plots, one showing a scatter of South Asian groups with their Iran_N (a sample from ~10,000 years ago) vs. Yamna (from ~5,000 years ago), and another with the ratios.

   

DO NOT TAKE THE PROPORTIONS LITERALLY.  My intuition that these models are overestimating the proportion of steppe ancestry, but my confidence in my intuition is low.

There are two groups enriched for Iran_N ancestry.

  1. Lower caste groups, especially from South India.
  2. Populations in southern Pakistan.

The reasons I differ. If you have done genetic analysis of the Pakistani populations it seems quite obvious that unlike other groups in South Asia Pakistani groups facing the Arabian sea across from Oman have genuine Near Eastern ancestry. This affinity declines as you go north in Pakistan rather rapidly. Notice though one South Indian group: Jews from Cochin. This population clearly has recent Near Eastern ancestry.

The Kharia are an Austro-Asiatic Munda group. For whatever reason Austro-Asiatic groups seem to consistently have very little steppe ancestry. The Mala are Dalits from South India. The further up you go on the modal Iran_N-Yamna cline you see the populations are either upper caste, or, they are from the far northwest of the subcontinent.

The conclusion I derive from this is that first there was an early migration of West Eurasian populations consisting of Iranian farmers. This group mixed with the ASI element. The Indo-Aryans, who probably correlates with the Yamna-like component, arrived later as an overlay (and nearly half of their ancestry was derived from Iranian farmers). Then many South Asian populations have modifications on this base model of compound ANI + ASI; Munda and Bengali have later East Asian ancestry, while populations on the Arabian sea have Near Eastern ancestry.

Fourth, the story in India Today leans heavily on Y chromosomes R1a1a. It is true we are Lords of the Steppe and destined to drive out enemies before us. But, it is not the primary story. And yet Y chromosomal phylogenies are easy for the public to understand. But they only make sense in light of the above framework. R1a1a is found in South Indian tribal populations. It seems likely that Indo-Aryan paternal lineages were highly invasive across the subcontinent, just as they were in Europe. In many cases they likely extended far beyond domains where Indo-European acculturation occurred.

I’m probably wrong on some of the details. But I suspect the final story will not be so different from this.

Finally, I will mention the cultural element here. There is a fair amount of the discussion of the form “so you are saying the ancestors of Indians are Europeans?” or “does this mean Hinduism is not Indian?”

The piece was about genetics and demography, not my opinions about culture. So I will say this:

  1. The “West” as an entity is no older that Classical Greece. 500 BC. My own personal position, strongly held, is that the West should indicate cultures and societies which descend from the European societies which adhered to the Western Church around ~1000 AD (some nations, like Lithuania, became absorbed into this cultural complex hundreds of years later). So Russia is not the West. And Merovingian Francia is not the West.
  2. Indian civilization of what we term the Hindu variety coalesced in the period between between 500 BC and 500 AD, from before to the Mauryas up to the Guptas. Obviously the period before 1000 BC was important in setting the ground-work, but I do not believe it was Indian as we’d understand it in anything but the geographical sense, nor was it Hindu in any way we’d recognize it today (similarly, Shang dynasty China was not China as we’d understand, which came into being after 500 BC).

These positions mean that I think nationalist passions are in the “not even wrong” category. Indian Hindu civilization is indigenous by definition, since it was synthesized in situ on the edge of historical perception and attestation (for the record, I think Adi Shankara was critical in the completion of a crystalized self-conception of Hindu religio-philosophical thought, but its origins predate him). Similarly, Indian civilization was not seeded by white Europeans because white Europeans were only coming into being in Europe when the Indus Valley civilization was collapsing.

That is all (for now).

Addendum: The first tranche of ancient DNA should be out in a few months. Also, there is another paper on Indian genetics in the work from the usual suspects. There won’t be anything totally surprising (or so I’ve been told).

* By lies, I mean the contention that intelligence is an “invalid” instrument in relation to predictiveness, or, if it is valid, it is not genetically heritable. People routinely lie about these facts in discussion or spread lies because there are socially preferred positions which they conform to. Similarly, many questions about Indian history seem to hinge on widely promoted lies.

** This model needs to also confront the massive mixing of the last 4,000 years. If it is true then it is ASI which is mostly likely intrusive, because it is not creditable that these two populations were in nearby proximity for tens of thousands of years without exchanging genes.

July 26, 2017

The future will be genetically engineered

Filed under: Genetics,Genomics — Razib Khan @ 4:04 pm


If the film Rise of the Planet of the Apes had come out a few years later I believe there would have been mention of CRISPR. Sometimes science leads to technology, and other times technology aids in science. On occasion the two are one in the same.

The plot I made above shows that in the first five years of the second decade of the 20th century CRISPR went from being an obscure aspect of bacterial genetics to ubiquitous. Friends who had been utilizing “advanced” genetic engineering methods such as TALENS and zinc fingers switched overnight to a CRISPR/Cas9 framework.

As I’ve said before the 2010s are the decade when “reading” the genome becomes normal. We really don’t know what the CRISPR/Cas9 technology is capable of. It’s early years yet. With that, First Human Embryos Edited in U.S.. Technically they’re single celled zygotes. The science itself is not astounding. Rather, it is that the human rubicon has been passed in the United States. As indicated in the article there has been some jealousy about what the Chinese have been able to do because of a different cultural and regulatory framework.

There are those calling for a moratorium on this work (on humans). I’m not in favor or opposed. Rather, my question is simple: if CRISPR/Cas9 makes genetic engineering cheap, easy, and effective, how exactly are we going to enforce a world-wide moratorium? A Butlerian Jihad?

Note: I know that people are freaking about humans + genetic engineering. But most geneticists I know are more excited about the prospects of non-human work, since human clinical trials are going to be way in the future. Over 20 years since Dolly it’s notable to me that no human has been cloned from adult somatic cells yet.

June 24, 2017

Indian genetic history: before the storm

Filed under: Genetics,History,India — Razib Khan @ 2:52 pm

Over at Brown Pundits I’ve mentioned the continuing simmer of controversy over a recent piece, How genetics is settling the Aryan migration debate. This has prompted responses in the Indian media from a Hindu nationalist perspective. One of these notes that the author of the piece above cites me, and then goes on to observe I was fired from The New York Times a few years ago due to accusations of racism (also, there is the implication that I’m just a blogger and we should trust researchers with credibility like Gyaneshwer Chaubey; well, perhaps he should know that Gyaneshwer Chaubey considers me “unbiased” according to an email exchange which I had with him last week [we all have biases, so I think he’s wrong in a literal sense]).

I was a little surprised that a right-wing magazine would lend legitimacy to the slanders of social justice warriors, but this is the world we live in. Those who believe that everything written about me in the media, I invite you to submit your name and background to me. I have contacts in the media and can get things written if I so choose. Watch me write something which is mostly fact, but can easily misinterpreted by those who Google you, and watch how much you value the objective “truth-telling” power of the press.

There’s a reason so many of us detest vast swaths of the media, though to be fair we the public give people who don’t make much money a great deal of power to engage in propaganda. Should we be surprised they sensationalize and misrepresent with no guilt or shame? I have seen most of those who snipe at me in the comments disappear once I tell them that I know what their real identity is. Most humans are cowards. I have put some evidence into the public record to suggest that I’m not.

Perhaps more strange for me is that the above piece was passed around favorably by Sanjeev Sanyal, who I was on friendly terms with (we had dinner & drinks in Brooklyn a few years back). I asked him about the slander in the piece and he unfollowed me on Twitter (a friend of Hindu nationalist bent asked Sanjeev on Facebook about the articles’ attack on me, but the comment was deleted). It shows how strongly people feel about these issues.

I’m in a weird position because I’m brown and have a deep interest in Indian history. But that interest in Indian history isn’t because I’m brown, I’m pretty interested in all the major zones of the Old World Oikoumene. Aside from some jocular R1a1a chauvinism I don’t have much investment personally (I just told said Hindu nationalist friend who turns out to be R2 to clean my latrine; joking of course, though I’m sure he resents that I’m descended on the direct paternal line from the All-Father & Lord of the Steppes and he is not!).

In the aughts I accepted the model outlined in 2006’s The Genetic Heritage of the Earliest Settlers Persists Both in Indian Tribal and Caste Populations. But to be frank it always struck me as a little confusing because the tentative autosomal data we had suggested that many South Asians were closer to West Eurasians than deep divergences dating to the Last Glacial Maximum would suggest. Since I’ve written something like 5 million words in 15 years, I actually can check if I’m remembering correctly. So here’s a post from 2008 where I express reservations of the idea of long term deep heritage of Indians separate from other West Eurasians. The reason I was so impressed by 2009’s Reconstructing Indian Population History is that it resolved the paradox of South Asian genetic relatedness.

To recap, Reich et al. proposed that modern Indians (South Asians) could be modeled as a two way mixture between two distinct populations with separate evolutionary genetic histories, Ancestral North Indians and Ancestral South Indians (ANI and ASI). How distinct? ANI were basically another West Eurasian population, while ASI was likely nested in the clade with Eastern Non-Africans. Additionally, there was a NW-to-SE and caste admixture cline. In other words, the higher you were on the caste ladder the more ANI you had, and the further your ancestors were from the north and west, and more ANI you had. The difference between Y and mtDNA, male and female, could be explained by sex-biased migration.

But there were still aspects of the paper which I had reservations about. After all, it was a model.

  • Models are imperfect fits onto reality. The idea of mass migration seemed ridiculous to me at the time, because even by the time of the Classical Greeks it was noted that Indian was reputedly the most populous land in the world (to their knowledge). But ancient DNA has convinced me of the reality of mass migrations.
  • I wasn’t sure about the nature of the closest modern populations to the ANI. The researchers themselves (in particular, Nick Patterson) told me that the relatedness of ANI to Europeans was very close (on the order of intra-European differences). But modern Indians do not look to be descended from a population that is half Northern European physically. Again, ancient DNA has shown that there was lots of population turnover, and it turns out that Europeans and ANI were likely both compounds and mixed daughter populations of common ancestors (also, typical European physical appearance seems to emerged in situ over the past 5,000 years).
  • The two way admixture modeled seemed too simple. I had run some data and it struck me that North Indian populations like Jats had something different than South Indian groups like Pulayars. In 2013 Priya Moorjani’s paper pretty much confirmed that it was more than a two way admixture along the ANI-ASI cline.

This March BMC Evolution Biology published Silva et al’s A genetic chronology for the Indian Subcontinent points to heavily sex-biased dispersals. It has made a huge splash in India, arguably triggering the write up in The Hindu. But for me it was a bit ho-hum. If you read my 2008 post it is pretty clear that I suspected the most general of the findings in this paper at least 10 years back. It is nice to get confirmation of what you suspect, but I’m more interested to be surprised by something novel.

Nevertheless A genetic chronology for the Indian Subcontinent points to heavily sex-biased dispersals has come in for lots of repeated attack in the right-wing Indian press. This is unfair, because it is a rather good paper. I suspect that it wasn’t published in a higher ranked journal because most scientists don’t consider the history of India to be that important, and they didn’t really apply new methods, as opposed to bringing a bunch of data and methods together (in contrast, the 2009 Reich et al. paper was one of the first publications which showed how to utilize “ghost populations” in explicit phylogenetic models with relevance to human demographic history).

As it happens I will be writing up my thoughts in detail in an article for a major Indian publication (similar circulation numbers as The Hindu). This has been in talks for over six months, but I’ve been busy. But a month or so ago I thought it was time that I put something into print for the Indian audience, because I felt there was some misrepresentation going on (i.e., the Aryan invasion theory has been been refuted by genetics, but this is what many Indians assert).

For any years people have told me there are certain topics that shouldn’t be talked about. I have offended people greatly. There are many things people do not want to know. I have come to the conclusion this is not an entirely indefensible viewpoint (though if you accept this viewpoint, I think acceptance of authoritarianism is inevitable, so I hope people will toe the line when the new order arrives; knowing their personalities I think they will conform fine). But my nature is such that I continue to have nothing but contempt for the duplicitous and craven manner in which people go about these sorts of private conversations. I assume that as someone with the name “Razib Khan” I will be attacked vociferously by Hindu nationalists, who will no doubt make recourse to the Left-wing hit pieces against me to undermine my credibility. The fact that these groups are fellow travelers should tell us something, though I will leave that as an exercise for the reader.

I will write my piece that reflects the science as I believe it is, without much consideration of the attacks. That is rather easy for me to do in part because I live in the United States, where denigrating the deeply held views and self-esteem of Hindu nationalists is not sensitive or politically protected (unlike say, Muslims). And Hindu nationalists are less likely to kill me by orders of magnitude than Muslim radicals, and they have far less purchase in this nation then the latter (though you may be interested to know that very conservative Muslims follow me on Twitter; they’re actually more open-minded than many SJWs to be entirely honest).

Let me go over some general points that I see coming up over and over on the relationship between Indian (pre)history and genetics in the critiques .

One of the major critiques has to do with the nature of R1a-Z93 and its subclades. Basically this Y chromosomal haplogroup, the greatest that has ever been known, exhibits a strong signature of very rapid expansion over the past 4,000 years or so. It is divided from Z282. While Z93 is found in South Asia, Central Asia, and Siberia, Z282 is European, with its dominant subclade the one associated with Eastern Europeans. Both of these clades of R1a have gone through massive expansion. In the Altai region R1a is 40% of the heritage of peoples who are now predominantly East Eurasian today. But they are Z93. Additionally, ancient DNA from the Pontic Steppe dated ~4,000 years ago from Srubna remains is Z93, as are Scythian remains from the Iron Age.

Much of the argument comes down to dating, and citing papers that give deep coalescence numbers between difference branches of R1a1a. Hindu nationalists and their fellow travelers point to recent papers which give dates >10,000 years ago, and so place the origin of Z93 plausibly in the Pleistocene. The problem is that Y chromosomal coalescence dating is something of a mug’s game. Often they use microsatellite data whose mutational rates are highly uncertain. In contrast, using SNP data, which has a slower mutation rate but requires a lot more data, you get TRMCA (common ancestry) between Z93 and Z282 around ~5,800 years ago. But coalescence estimates often have wide confidence intervals of thousands of years. And even with these intervals, the assumptions you make (e.g., mutation rate) strongly influence your midpoint estimate.

The Y chromosomal data is powerful, but its interpretation is still buttressed upon other assumptions. The really big picture framework is the nature of ancient genome-wide variation across Eurasia. Lazaridis et al. 2016 condition us to a prior where much of Eurasia was subject to massive population-wide genetic changes since the Holocene. Therefore, I am much less surprised if there was massive genetic change in India relatively recently. The methods in Priya Moorjani’s paper and in other publications make it obvious that mixture was extensive in South Asia between very distinct groups until about ~2,000 years ago. In fact, Moorjani et al. using patterns of variation across the genome to come at a number of two to four thousand years ago as the period of massive admixture.

Though we don’t have relevant ancient DNA from India proper to answer any questions yet, we do have ancient DNA from across much of Europe, Central Asia, and the Near East. What they show is that Indian populations share ancestry from both Neolithic Iranians and peoples of the Pontic steppe, who flourished ~5 to ~10,000 years ago. To some extent the latter population is a daughter population of the former…which makes things complicated. Conversely, no West Eurasian population seems to harbor ancient signals of ASI ancestry.

One scientist who holds to the position that most South Asian ancestry dates to the Pleistocene argued to me that we don’t know if ancient Indian samples from the northwest won’t share even more ancestry than the Iranian Neolithic and Pontic steppe samples. In other words, ANI was part of some genetic continuum that extended to the west and north. This is possible, but I do not find it plausible.

The reasons are threefold. First, it doesn’t seem that continuous isolation-by-distance works across huge and rugged regions of Central Eurasia. Rather, there are demographic revolutions, and then relative stasis as the new social-cultural environment crystallizes. This inference I’m making from ancient DNA and extrapolating. This may be wrong, but I would bet I’m not off base here.

Second, it strikes me as implausible that there was literally apartheid between ASI and ANI populations for the whole Holocene right up until ~4,000 years before the present. That is, if Northwest India was involved in reciprocal gene flow with the rest of Eurasia over thousands of years I expect there should have been some distinctive South Asian ASI-like ancestry in the ancient DNA we have. We do not see it.

Third, one of the populations with strong affinities to some Indian populations are those of the Pontic steppe. But we know that this group itself is a compound of admixture that arose 5,000-6,000 years ago. Because of the complexity of the likely population model of ANI this is not definitive, but it seems strange to imagine that ANI could have predated one of the populations with which it was in genetic continuum as part of a quasi-panmictic deme.

Finally, many of the critiques involve evaluation of the scientific literature in this field. Unfortunately this is hard to do from the outside. Citing papers from the aughts, for example, is not wrong, but evolutionary human population genomics is such a fast moving field that even papers published a few years ago are often out of date.

Many are citing a 2012 paper by a respected group which argues for the dominant model of the aughts (marginal population movement into South Asia). One of their arguments, that Central Asian migrant should have East Asian ancestry, is a red herring since it is well known that this dates to the last ~2,000 years or so (we know more now with ancient DNA). But the second point that is more persuasive in the paper is that when they look at local ancestry of ANI vs. ASI in modern Indians, the ANI haplotypes are more diverse than West Eurasians, indicating that they are  not descendants but rather antecedents (usually the direction of ancestry is from more divers to less due to subsampling).

There are two points that I have make here. First, local ancestry analysis is difficult, so I would not be surprised if they integrated ASI regions into ANI and so elevated the diversity in that way (though they think they’ve taken care of it in the paper). Second, if the ANI are a compound of several West Eurasian groups then we expect them to be more diverse than their parents. In other words, the paper is refuting a model which is almost certainly incorrect, but the alternative hypothesis is not necessarily the one they are supporting within the paper.

But there are many things we do not know still. Many free variables which we haven’t nailed down. Here are some major points:

  • Y chromosomal lineages have a correlation with ethno-linguistic groups, but the correlation is imperfect. R1b and R1a seems correlated with Indo-European groups, but both these are found in high proportions in groups which are putatively most “pre-Indo-European” in origin (e.g., Basques, Sardinians, and South Indian tribals and non-Brahmin Dravidian speaking groups). Also, haplogroups like I1 in Europe expands with Indo-Europeans locally, suggesting there was lots of heterogeneity in Indo-Europeans as they expanded. In other words, Indo-European expansion in relation to powerful paternal lineages did not always correlate with ethno-linguistic change.
  • There are probably at minimum two Holocene intrusions from the northwest into South Asia, but this is a floor. The models that are constructed always lack power to detect more complexity. E.g., it is not impossible that there were several migrations of Indo-Europeans into South Asia which we can not distinguish genetically over a period of a few thousand years.
  • If one looks over all of South Asia it may be that ASI ancestry in totality is >50% of the total genome ancestry. I haven’t have a good guess of the numbers. If this is correct, perhaps most South Asian ancestors 10,000 years ago were living in South Asia (though the fertility rate are such in Pakistan that ANI ancestry is increasing right now in relative rates).
  • But, this presupposes that ASI were present in South Asia in totality 10,000 years ago, rather than being migrants themselves. If ancient DNA confirms that ANI were long present in Northwest India, I hold then it is entirely likely that ASI was intrusive to South Asia! The BMC Evolutionary Biology Paper does a lot of interpretation of deep structure in haplogroup M in South Asia. I’m moderately skeptical of this. Europe may not be a good model for South Asia, but there we see lots of Pleistocene turnover.

So where does this leave us? Ancient DNA will answer a lot of questions. Pretty much all scientists I’ve talked to agree on this. My predictions, some of which I’ve made before:

  1. The first period of admixture is old, and dates to the founding of Mehrgarh as an agricultural settlement. The dominant ANI component dates to this period and mixture event, all across South Asia. The presence in South India is due to expansion of these farming populations.
  2. A second admixture event occurred with the arrival of steppe people. Those who argue for the Aryan invasion model posit 1500 BCE as the date. But these people probably were expanding in some form before this date.
  3. We still don’t know who the antecedents for the Indo-Aryans were. Probably they were a compound of different steppe groups, and also other populations which were mixed in (by analogy, in Europe it is obvious now that there was some mixture with the local European farmers and hunter-gatherers as Europeans expanded their frontier westward; the same probably applies for Indo-Aryans are the BMAC).

June 19, 2017

Indian genetics, the never-ending argument

Filed under: Genetics,India,Indian Genetics,Indo-Europeans,science — Razib Khan @ 10:44 pm

I am at this point somewhat fatigued by Indian population genetics. The real results are going to be ancient DNA, and I’m waiting on that. But people keep asking me about an article in Swarajya, Genetics Might Be Settling The Aryan Migration Debate, But Not How Left-Liberals Believe.

First, the article attacks me as being racist. This is not true. The reality is that the people who attack me on the Left would probably attack magazines like Swarajya as highly “problematic” and “Islamophobic.” They would label Hindu nationalism as a Nazi derivative ideology. People should be careful the sort of allies they make, if you dance with snakes they will bite you in the end. Much of the media lies about me, and the Left constantly attacks me. I’m OK with that because I do believe that the day will come with all the ledgers will be balanced. The Far Left is an enemy of civilization of all stripes. I welcome being labeled an enemy of barbarians. My small readership, which is of diverse ideologies and professions, is aware of who I am and what I am, and that is sufficient. Either truth or power will be the ultimate arbiter of justice.

With that out of the way, there this one thing about the piece that I think is important to highlight:

To my surprise, it turned out that that Joseph had contacted Chaubey and sought his opinion for his article. Chaubey further told me he was shocked by the drift of the article that appeared eventually, and was extremely disappointed at the spin Joseph had placed on his work, and that his opinions seemed to have been selectively omitted by Joseph – a fact he let Joseph know immediately after the article was published, but to no avail.

Indeed, this itself would suggest there are very eminent geneticists who do not regard it as settled that the R1a may have entered the subcontinent from outside. Chaubey himself is one such, and is not very pleased that Joseph has not accurately presented the divergent views of scholars on the question, choosing, instead to present it as done and dusted.

I do wish Tony Joseph had quoted Gyaneshwer Chaubey’s response, and I’d like to know his opinions. Science benefits from skepticism. Unfortunately though the equivocation of science is not optimal for journalism, so oftentimes things are presented in a more stark and clear manner than perhaps is warranted. I’ve been in this position myself, when journalists are just looking for a quote that aligns with their own views. It’s frustrating.

There are many aspects of the Swarajya piece I could point out as somewhat weak. For example:

The genetic data at present resolution shows that the R1a branch present in India is a cousin clade of branches present in Europe, Central Asia, Middle East and the Caucasus; it had a common ancestry with these regions which is more than 6000 years old, but to argue that the Indian R1a branch has resulted from a migration from Central Asia, it should be derived from the Central Asian branch, which is not the case, as Chaubey pointed out.

The Srubna culture, the Scythians, and the people of the Altai today, all bear the “Indian” branch of R1a. First, these substantially post-date 6000 years ago. I think that that is likely due to the fact that South Asian R1a1a-Z93 and that of the Sbruna descend from a common ancestor. But in any case, the nature of the phylogeny of Z93 indicates rapid expansion and very little phylogenetic distance between the branches. Something happened 4-5,000 years ago. One could imagine simultaneous expansions in India and Central Asia/Eastern Europe. Or, one could imagine an expansion from a common ancestor around that time. The latter seems more parsimonious.

Additionally, while South Asians share ancestry with people in West Asia and Eastern Europe, these groups do not have distinctive South Asian (Ancestral South Indian) ancestry. This should weight out probabilities as to the direction of migration.

Second, I read some of the papers linked to in the article, such as Shared and Unique Components of Human Population Structure and Genome-Wide Signals of Positive Selection in South Asia and Y-chromosomal sequences of diverse Indian populations and the ancestry of the Andamanese. The first paper has good data, but I’ve always been confused by the interpretations. For example:

A few studies on mtDNA and Y-chromosome variation have interpreted their results in favor of the hypothesis,70–72 whereas others have found no genetic evidence to support it.3,6,73,74 However, any nonmarginal migration from Central Asia to South Asia should have also introduced readily apparent signals of East Asian ancestry into India (see Figure 2B). Because this ancestry component is absent from the region, we have to conclude that if such a dispersal event nevertheless took place, it occurred before the East Asian ancestry component reached Central Asia. The demographic history of Central Asia is, however, complex, and although it has been shown that demic diffusion coupled with influx of Turkic speakers during historical times has shaped the genetic makeup of Uzbeks75 (see also the double share of k7 yellow component in Uzbeks as compared to Turkmens and Tajiks in Figure 2B), it is not clear what was the extent of East Asian ancestry in Central Asian populations prior to these events.

Actually the historical and ancient DNA evidence both point to the fact that East Asian ancestry arrived in the last two thousand years. The spread of the first Gokturk Empire, and then the documented shift in the centuries around 1000 A.D. from Iranian to Turkic in what was Turan, signals the shift toward an East Asian genetic influx. Alexander the Great and other Greeks ventured into Central Asia. The people were described as Iranian looking (when Europeans encountered Turkic people like Khazars they did note their distinctive physical appearance).

We have ancient DNA from the Altai, and those individuals initially seemed overwhelmingly West Eurasian. Now that we have Scythian ancient DNA we see that they mixed with East Asians only on the far east of their range.

The second paper is very confused (or confusing):

The time divergence between Indian and European Y-chromosomes, based on the closest neighbour analysis, shows two different distinctive divergence times for J2 and R1a, suggesting that the European ancestry in India is much older (>10 kya) than what would be expected from a recent migration of Indo-European populations into India (~4 to 5 kya). Also the proportions suggest the effect might be less strong than generally assumed for the Indo-European migration. Interestingly, the ANI ancestry was recently suggested to be a mix of ancestries from early farmers of western Iran and people of the Bronze Age Eurasian steppe (Lazaridis et al. 2016). Our results agree with this suggestion. In addition, we also show that the divergence time of this ancestry is different, suggesting a different time to enter India.

Lazaridis et al. accept a mass migration from the steppe. In fact, the migration is to such a magnitude that I’m even skeptical. Also, there couldn’t have been a European migration to South Asia during the Pleistocene because Europeans as we understand them genetically did not exist then!!!

I assume that many of the dates of coalescence are sensitive to parameter conditions. Additionally, they admit limitations to their sampling.

Ultimately the final story will be more complex than we can imagine. R1a is too widespread to be explained by a simple Indo-Aryan migration in my opinion. But we can’t get to these genuine conundrums if we keep having to rebut ideologically motivated salvos.

Related: Ancient herders from the Pontic-Caspian steppe crashed into India: no ifs or buts. I wish David would be a touch more equivocal. But I have to admit, if the model fits, at some point you have to quit.

June 17, 2017

Indian media is finally reporting on the Aryan migration into South Asia

Filed under: Genetics,science — Razib Khan @ 2:49 pm

For various ideological reasons in India there has been a strong resistance to the idea that Aryans came from outside of South Asia. When David Reich’s Reconstructing Indian Population History was published 2009 the Indian media had a weird response. For example, Aryan-Dravidian divide a myth: Study.

Though Reich’s paper was equivocal, it was clear to me that it was likely going to be the launching point for a resurrection of the Aryan migration theory. Now Tony Joseph in The Hindu has published a pretty good survey of the literature, How genetics is settling the Aryan migration debate. Nothing new for readers of this weblog, but he some good quotes:

The avalanche of new data has been so overwhelming that many scientists who were either sceptical or neutral about significant Bronze Age migrations into India have changed their opinions. Dr. Underhill himself is one of them. In a 2010 paper, for example, he had written that there was evidence “against substantial patrilineal gene flow from East Europe to Asia, including to India” in the last five or six millennia. Today, Dr. Underhill says there is no comparison between the kind of data available in 2010 and now. “Then, it was like looking into a darkened room from the outside through a keyhole with a little torch in hand; you could see some corners but not all, and not the whole picture. With whole genome sequencing, we can now see nearly the entire room, in clearer light.”

In relation to online debates I have had Indian interlocutors tell me flat out that they believe in the papers published between 2005 and 2010. It is nice to get the scientists who actually published this work now admit that new results overturn the older theories.

Note: I am going to refer to this as a migration, because “invasion” seems to connote too much specificity as to how it happened. But I have a difficult time imagining that it was a peaceful process.

June 14, 2017

The fad for dietary adaptations is not going away

Filed under: Diet,FADS,Genetics,Human Genetics — Razib Khan @ 7:21 pm


Food is a big deal for humans. Without it we die. Unlike some animals (here’s looking at you pandas) we’re omnivorous. We eat fruit, nuts, greens, meat, fish, and even fungus. Some of us even eat things which give off signals of being dangerous or unpalatable, whether it be hot sauce or lutefisk.

This ability to eat a wide variety of items is a human talent. Those who have put their cats on vegetarian diets know this. After a million or so years of being hunters and gatherers with a presumably varied diet for thousands and thousands of years most humans at any given time ate some form of grain based gruel. Though I am sympathetic to the argument that in terms of quality of life this was a detriment to median human well being, agriculture allowed our species to extract orders of magnitude more calories from a unit of land, though there were exceptions, such as in marine environments (more on this later).

Ergo, some scholars, most prominently Peter Bellwood, have argued that farming did not spread through cultural diffusion. Rather, farmers simply reproduced at much higher rates because of the efficiency of their lifestyle in comparison to that of hunter-gatherers. The latest research, using ancient DNA, broadly confirms this hypothesis. More precisely, it seems that cultural revolutions in the Holocene have shaped most of the genetic variation we see around us.

But genetic variation is not just a matter of genealogy. That is, the pattern of relationships, ancestor to descendent, and the extent of admixtures across lineages. Selection is also another parameter in evolutionary genetics. This can even have genome-wide impacts. It seems quite possible that current levels of Neanderthal ancestry are lower than might otherwise have been the case due to selection against functional variants derived from Neanderthals, which are less fitness against a modern human genetic background.

The importance of selection has long been known and explored. Sickle-cell anemia only exists because of balancing selection. Ancient DNA has revealed that many of the salient traits we associate with a given population, e.g., lactose tolerance or blue eyes, have undergone massive changes in population wide frequency over the last 10,000 years. Some of this is due to population replacement or admixture. But some of it is due to selection after the demographic events. To give a concrete example, the frequency of variants associated with blue eyes in modern Europeans dropped rapidly with the expansion of farmers from the Near East ~10,000 years ago, but has gradually increased over time until it is the modal allele in much of Northern Europe. Lactase persistence in contrast is not an ancient characteristic which has had its ups and downs, but something new that evolved due to the cultural shock of the adoption of dairy consumption by humans as adults. The region around lactase is one of the strongest signals of natural selection in the European genome, and ancient DNA confirms that the ubiquity of the lactase persistent allele is a very recent phenomenon.

But obviously lactase is not going to be the only target of selection in the human genome. Not only can humans eat many different things, but we change our portfolio of proportions rather quickly. In a Farewell to Alms the economic historian Gregory Clark observed that English peasants ate very differently before and after the Black Death. As any ecologist knows populations are resource constrained when they are near the carrying capacity, and England during the High Medieval period there was massive population growth due to gains in productivity (e.g., the moldboard plough) as well as intensification of farming and utilization of all the marginal land.

After the Black Death (which came in waves repeatedly) there was a massive population decline across much of Europe. Because institutions and practices were optimized toward maintaining a much higher population, European peasants lived a much better lifestyle after the population crash because the pie was being cut into far fewer pieces. In other words, centuries of life on the margins just scraping by did not mean that English peasants couldn’t live large when the times allowed for it. We were somewhat pre-adapted.

Our ability to eat a variety of items, and the constant varying of the proportions and kind of elements which go into our diet, mean that sciences like nutrition are very difficult. And, it also means that attempts to construct simple stories of adaptation and functional patterns from regions of the genome implicated in diet often fail. But with better analytic technologies (whole genome sequencing, large sample sizes) and some elbow grease some scientists are starting to get a better understanding.

A group of researchers at Cornell has been taking a closer look at the FADS genes over the past few years (as well as others at CTEG). These are three nearby genes, FADS1FADS2, and FADS3 (they probably underwent duplication). These genes are involved in the metabolization of fatty acids, and dietary regime turns out to have a major impact on variation around these loci.

The most recent paper out of the Cornell group, Dietary adaptation of FADS genes in Europe varied across time and geography:

Fatty acid desaturase (FADS) genes encode rate-limiting enzymes for the biosynthesis of omega-6 and omega-3 long-chain polyunsaturated fatty acids (LCPUFAs). This biosynthesis is essential for individuals subsisting on LCPUFA-poor diets (for example, plant-based). Positive selection on FADS genes has been reported in multiple populations, but its cause and pattern in Europeans remain unknown. Here we demonstrate, using ancient and modern DNA, that positive selection acted on the same FADS variants both before and after the advent of farming in Europe, but on opposite (that is, alternative) alleles. Recent selection in farmers also varied geographically, with the strongest signal in southern Europe. These varying selection patterns concur with anthropological evidence of varying diets, and with the association of farming-adaptive alleles with higher FADS1 expression and thus enhanced LCPUFA biosynthesis. Genome-wide association studies reveal that farming-adaptive alleles not only increase LCPUFAs, but also affect other lipid levels and protect against several inflammatory diseases.

The paper itself can be difficult to follow because they’re juggling many things in the air. First, they’re not just looking at variants (e.g., SNPs, indels, etc.), but also the haplotypes that the variants are embedded in. That is, the sequence of markers which define an association of variants which indicate descent from common genealogical ancestors. Because recombination can break apart associations one has to engage with care in historical reconstruction of the arc of selection due to a causal variant embedded in different haplotypes.

But the great thing about this paper is that in the case of Europe they can access ancient DNA. So they perform inferences utilizing whole genomes from many extant human populations, but also inspect change in allele frequency trajectories over time because of the density of the temporal transect. The figure to the left shows variants in both an empirical and modeling framework, and how they change in frequency over time.

In short, variants associated with higher LCPUFA synthesis actually decreased over time in Pleistocene Europe. This is similar to the dynamic you see in the Greenland Inuit. With the arrival of farmers the dynamic changes. Some of this is due to admixture/replacement, but some of it can not be accounted for admixture and replacement. In other words, there was selection for the variants which synthesize more LCPUFA.

This is not just limited to Europe. The authors refer to other publications which show that the frequency of alleles associated with LCPUFA production are high in places like South Asia, notable for a culture of preference for plant-based diets, as well as enforced by the reality that animal protein was in very short supply. In Europe they can look at ancient DNA because we have it, but the lesson here is probably general: alternative allelic variants are being whipsawed in frequency by protean shifts in human cultural modes of production.

In War Before Civilization Lawrence Keeley observed that after the arrival of agriculture in Northern Europe in a broad zone to the northwest of the continent, facing the Atlantic and North Sea, farming halted rather abruptly for centuries. Keeley then recounts evidence of organized conflict in between two populations across a “no man’s land.”

But why didn’t the farmers just roll over the old populations as they had elsewhere? Probably because they couldn’t. It is well known that marine regions can often support very high densities of humans engaged in a gathering lifestyle. Though not farmers, these peoples are often also not nomadic, and occupy areas as high density. The tribes of the Pacific Northwest, dependent upon salmon fisheries, are classic examples. Even today much of the Northern European maritime fringe relies on the sea. High density means they had enough numbers to resist the human wave of advance of farmers. At least for a time.

Just as cultural forms wane and wax, so do some of the underlying genetic variants. If you dig into the guts of this paper you see much of the variation dates to the out of Africa period. There were no great sweeps which expunged all variation (at least in general). Rather, just as our omnivorous tastes are protean and changeable, so the genetic variation changes over time and space in a difficult to reduce manner. The flux of lifestyle change is probably usually faster than biological evolution can respond, so variation reducing optimization can never complete its work.

The modern age of the study of natural selection in the human genome began around when A Map of Recent Positive Selection In the Human Genome was published. And it continues with methods like SDS, which indicate that selection operates to this day. Not a great surprise, but solidifying our intuitions. In the supplements to the above paper the authors indicate that the focal alleles that they are interrogating exhibit coefficients of selection around ~0.5% or so. This is rather appreciable. The fact that fixation has not occurred indicates in part that selection has reversed or halted, as they noted. But another aspect is that there are correlated responses; the FADS genes are implicated in many things, as the authors note in relation to inflammatory diseases. But I’m not sure that the selection effects of these are really large in any case. I bet there are more important things going on that we haven’t discovered or understood.

Obviously genome-wide analyses are going to continue for the foreseeable future. Ten years ago my late friend Mike McKweon predicted that at some point genomics was going to have be complemented by detailed follow up through bench-work. I’m not sure if we’re there yet, but there are only so many populations you can sequence, and only to a particular coverage to obtain any more information. Some selection sweeps will be simple stories with simple insights. But I suspect many more like FADS will be more complex, with the threads of the broader explanatory tapestry assembled publications by publication over time.

Citation: Ye, K., Gao, F., Wang, D., Bar-Yosef, O. & Keinan, A. Dietary adaptation of FADS genes in Europe varied across time and geography. Nat. Ecol. Evol. 1, 0167 (2017).

June 6, 2017

Origin of modern humanity pushed back 260,000 years BP (?)

Filed under: Ancient DNA,Genetics,Khosian,South Africa — Razib Khan @ 12:45 am


The above figure is from a preprint, Ancient genomes from southern Africa pushes modern human divergence beyond 260,000 years ago. The title and abstract are pretty clear:

Southern Africa is consistently placed as one of the potential regions for the evolution of Homo sapiens. To examine the region’s human prehistory prior to the arrival of migrants from East and West Africa or Eurasia in the last 1,700 years, we generated and analyzed genome sequence data from seven ancient individuals from KwaZulu-Natal, South Africa. Three Stone Age hunter-gatherers date to ~2,000 years ago, and we show that they were related to current-day southern San groups such as the Karretjie People. Four Iron Age farmers (300-500 years old) have genetic signatures similar to present day Bantu-speakers. The genome sequence (13x coverage) of a juvenile boy from Ballito Bay, who lived ~2,000 years ago, demonstrates that southern African Stone Age hunter-gatherers were not impacted by recent admixture; however, we estimate that all modern-day Khoekhoe and San groups have been influenced by 9-22% genetic admixture from East African/Eurasian pastoralist groups arriving >1,000 years ago, including the Ju|’hoansi San, previously thought to have very low levels of admixture. Using traditional and new approaches, we estimate the population divergence time between the Ballito Bay boy and other groups to beyond 260,000 years ago. These estimates dramatically increases the deepest divergence amongst modern humans, coincide with the onset of the Middle Stone Age in sub-Saharan Africa, and coincide with anatomical developments of archaic humans into modern humans as represented in the local fossil record. Cumulatively, cross-disciplinary records increasingly point to southern Africa as a potential (not necessarily exclusive) ‘hot spot’ for the evolution of our species.

These results in the outlines were actually presented at a conference. I saw it on Twitter and don’t remember which conference anymore. But this is not entirely surprising.

First, much respect to Mattias Jakobsson’s group for breaking through the Reich-Willerslev duopoly. Hopefully this presages some democratization of the ancient DNA field as expenses are going down.

Second, notice how in most cases ancient DNA shows that modern reference populations turn out to be admixed. This was the problem with much of Eurasia, and why using modern genetic variation to make inferences about the past totally failed.

I am entirely convinced that the genome from Ballito Bay dating to ~2,000 years does not carry the Eurasian inflected East African admixture. The Mota genome implies that Eurasian admixture did not come to eastern Africa much before 4,500 years ago. There needs to be a much deeper big picture analysis of the archaeology of Africa and the genetic information we have to get a sense of what happened back then…but, it seems likely that the Bantu migration has over-written much of the earlier genetic variation.

The fact that ancient genomes always show that our current populations are admixed makes me wonder if the Ballito Bay sample itself is admixed from more ancient populations. That is, if we found a genome from 20,000 years ago, would it be very different from the Ballito Bay samples? The relatively thick time transect from Europe indicates that turnover happens every 10,000 years or so. Australian Aborigines seem to have been resident in their current locations for ~50,000 years, but this seems the exception, not the rule. Do we really think that the ancestors of the Bushmen were living in southern Africa for five times as long as Australian Aborigines?

Another curious aspect of this paper is that it suggests the effective population size of Bushmen is smaller than we might have thought, and they’re somewhat less diverse than we’d thought. That’s because East African (with Eurasian ancestry) gene flow increased heterozygosity, as well as inferred effective population sizes. I’ve mentioned this effect on statistics before. Unless you have a true model of population history (or close to it) your assumptions might distort the numbers you get.

There is another aspect to this preprint mentioned glancingly in the text, and a bit more in the supplements: they seem to only be able to model Yoruba well if you assume that they themselves are a mix of “Basal Humans” (BH) and other African population which gave rise to East Africans and “Out of Africa” populations. Note that the BH seem to diverge from other human populations before the ancestors of Southern Africans like the Ballito Bay sample. That is, BH could push the diversification of the ancestors of modern humans considerably before 260,000 years before the present.

The possibility of deep structure in the Yoruba is pretty notable because they’ve been the gold standard in many human population genetic data sets as a reference population. But this is not result of deep structure is not entirely surprising. For years researchers have been hinting at confusing results in relation to the possibility of Eurasian back-migration. Perhaps the deep structure was confounding inferences?

The authors themselves are quite cautious about their dating of the divergence. It’s sensitive to many assumptions, and in particular the mutation rate being known and constant over time. But I think it’s hard to deny that this is pushing back the emergence of modern humans beyond what we know today. The earliest anatomically modern humans are found in Ethiopia 195,000 years ago from what I know. As I said, I’m convinced that the ancient genome has shown that modern “pristine” populations have some serious admixture. But I’m not as convinced about any specific point estimate, because that’s sensitive to a lot of assumptions which might not hold.

Finally, first a quick shout out to the blogger Dienekes. As early as ten years ago he anticipated the basic outlines of these sorts of results in the generality, if not the details. We really have come a long way from popular science declaring that all humans descend from a small group of East Africans who lived 50,000 to 100,000 years ago. The real picture was much more complex.

Also, I have to admit I considered titling this blogspot “Wolpoff’s revenge.” As in Milford Wolpoff. The reason being that we’re getting quite close to territory familiar to the much maligned multi-regionalist model of modern human origins.

Note: These findings should make us less surprised perhaps by a “modern” human migration before the primary one out of Africa.

June 2, 2017

The nadir of genetics in the Soviet Union

Filed under: Genetics,History — Razib Khan @ 8:05 pm

A fascinating excerpt in Slate from How to Tame a Fox (and Build a Dog), :

This skepticism of genetics all started when, in the mid-1920s, the Communist Party leadership elevated a number of uneducated men from the proletariat into positions of authority in the scientific community, as part of a program to glorify the average citizen after centuries of monarchy had perpetuated wide class divisions between the wealthy and the workers and peasants. Lysenko fit the bill perfectly, having been raised by peasant farmer parents in the Ukraine. He hadn’t learned to read until he was 13, and he had no university degree, having studied at what amounted to a gardening school, which awarded him a correspondence degree. The only training he had in crop-breeding was a brief course in cultivating sugar beets. In 1925, he landed a middle-level job at the Gandzha Plant Breeding Laboratory in Azerbaijan, where he worked on sowing peas. Lysenko convinced a Pravda reporter who was writing a puff piece about the wonders of peasant scientists that the yield from his pea crop was far above average and that his technique could help feed his starving country. In the glowing article the reporter claimed, “the barefoot professor Lysenko has followers … and the luminaries of agronomy visit … and gratefully shake his hand.” The article was pure fiction. But it propelled Lysenko to national attention, including that of Josef Stalin.

Sometimes it is easy to believe that the period in the Soviet Union under Stalin or in China under Mao or in Germany under Hitler, to name a few, were aberrations. But I think that’s the wrong way to look at it. The story of how Lysenko became influential hooks into so many historical tropes and psychological instincts of our species that we should be wary of it.

There have been great scholars without requisite qualifications. Ramanujan and Faraday come to mind. But great scholars are exceptional people. They are not average.

May 30, 2017

Ancient Egyptians: black or white?

Filed under: Egypt,Genetics,Historical Genetics,History — Razib Khan @ 9:20 pm

One of the most fascinating things about ancient Egypt is its continuity, and our granular and detailed knowledge of that continuity. We can thank in part the dry climate, as well as the Egyptian penchant for putting their hieroglyphs on walls and monuments (as well as graffiti!). And we can also thank the fact that both the ancient Greeks and Hebrews, Athens and Jerusalem so to speak, were deeply connected to and perceived themselves to be indebted to Egyptian civilization. Even before the translation of the Rosetta Stone and the deciphering of ancient Egyptian writing the Hebrews’ interactions with Egyptians, in particular in Exodus, mean that their memory would echo down through the millennia (the newly Christianized Irish interpolated Egyptian ancestry into their own genealogy).

The Greek relationship with Egypt was less fraught and at greater remove than the Hebrews. But the Classical period philosophers correctly perceived that Egyptian civilization was ancient, and preceded their own. Aegean-Egyptian connections were actually more longstanding than the Classical scholars knew, in Brotherhood of Kings: How International Relations Shaped the Ancient Near East, the correspondence in state archives which have been retrieved are rather clear that Minoan civilization was part of the orbit of Egypt early on. Though Egyptians never conquered the Aegean polities, mercantile and diplomatic connections were extremely old and persistent. The late Bronze Age eruption of barbarian Sea Peoples who attacked the whole civilized Near East may have been facilitated in part by the broad familiarity engendered by widespread trade networks.

The most recent book devoted to ancient Egypt I have read was Toby Wilkinson’s The Rise and Fall of Ancient Egypt. Synthesizing extensive written material with archaeology, perhaps the most impactful argument in Wikinson’s narrative was the persistence of the temple based institutions from the Old Kingdom down to the Ptolemaic era. Religious institutions carried on even with the shocks of Nubian and Libyan conquest in the post-New Kingdom period, down to Late Antiquity. The temple at Philae in southern Egypt was an active center of the traditional religion, and therefore the culture which dates to the Old Kingdom in continuous form, down to the 6th century A.D. (when it was closed by Justinian in his kulturkampf against ancient heterodoxies).

For various ideological reasons though many people are very curious about the racial characteristics of the ancient Egyptians. There are two basic extreme positions, Afrocentrists and Eurocentrists. Though I have not done a deep dive of the literature of either group, I’ve read a few books from either camp over my lifetime. In fact I believe the last time I read the “primary literature” of Afrocentrist and Eurocentrism was when I was an early teen, and it was rather strange because both groups seem to be recapitulating racial disagreements and viewpoints relevant to the American context, and projecting them back to the ancient world.

In college I stumbled upon Mary Lefkowitz’s Not Out Of Africa, a book length argument against the more sophisticated Afrocentrist views articulated in the wake of Martin Bernal’s Black Athena: The Afroasiatic Roots of Classical Civilization. Lefkowitz was a classicist, so many of her objections were exceedingly scholarly. The reality is that the best refutation of an Afrocentrist view of of ancient Egypt, which reduces to the idea that ancient Egyptians would be recognizably black African today, are the Fayum portraits. It is notable to me how similar these portraits are to modern Copts. In fact the actor Rami Malek, of Coptic background, looks strikingly like someone who stepped out of the Fayum portraits.

I have read no book length refutation of the Eurocentrist, usually Nordicist, perspective. Mostly because this is a view associated with white supremacism, and that ideology is generally attacked on normative, not positive, grounds. But the visible evidence of the Fayum portraits is a strong refutation of the Nordic model. Of course, there is the reality that we now know that the Nordic phenotype, and the genetic components which congealed into that typical of Northern Europe today, was only coming into existence when the Old Kingdom of Egypt was already a mature civilization.

Of course both Afrocentrists and Eurocentrists will reject the evidence of the Fayum portraits became they came from the Roman era, and they would argue that the demographic nature of Egyptians changed quite a bit between that period and the end of the New Kingdom. And they are not incorrect that the period between the arrival of the Romans and the fall of the New Kingdom was characterized by a great deal of change. There were Libyan dynasties, Nubian dynasties, and periods of rule by Assyrians, Persians, and Macedonians. Large colonies of Greeks, Macedonians, and Hebrews-becoming-Jews were also resident in Egypt. Especially, but not limited to, the urban areas.

But now we have ancient DNA! Ancient Egyptian mummy genomes suggest an increase of Sub-Saharan African ancestry in post-Roman periods:

Egypt, located on the isthmus of Africa, is an ideal region to study historical population dynamics due to its geographic location and documented interactions with ancient civilizations in Africa, Asia and Europe. Particularly, in the first millennium BCE Egypt endured foreign domination leading to growing numbers of foreigners living within its borders possibly contributing genetically to the local population. Here we present 90 mitochondrial genomes as well as genome-wide data sets from three individuals obtained from Egyptian mummies. The samples recovered from Middle Egypt span around 1,300 years of ancient Egyptian history from the New Kingdom to the Roman Period. Our analyses reveal that ancient Egyptians shared more ancestry with Near Easterners than present-day Egyptians, who received additional sub-Saharan admixture in more recent times. This analysis establishes ancient Egyptian mummies as a genetic source to study ancient human history and offers the perspective of deciphering Egypt’s past at a genome-wide level.

Because modern people care about the Afrocentrist question, the extent of Sub-Saharan African ancestry is highlighted in this paper. I do not think this is actually the most interesting aspect. But I’ll get to that. Since this post will be read by a fair number of people I’ll talk about the relationship of ancient and modern Egyptians to (Northern) Europeans and Sub-Saharan Africans.

The figure to the left is looking at 90 ancient Egyptian mitochondrial genomes (and some modern ones in the two rightmost columns). Since mtDNA is copious it was relatively easy to extract and analyze.  Haplogroup L, the red to orange shades in the bar plots, are associated without dispute with Sub-Saharan Africa. Haplogroup U6, M1 and a few others may be “back to Africa” variants of different periods (they are generally found in Afro-Asiatic groups).

What you can see is that somewhat more than half of Ethiopia’s mtDNA lineages are L, in keeping with the whole genome estimate of Sub-Saharan African ancestry in most Cushitic populations. In Egypt there is a difference over time; haplogroup L goes from low frequencies to much higher frequencies in modern periods. The ~20% fraction in the modern samples is in line with the population wide admixture one sees in modern Egyptians of Sub-Saharan admixture.

I actually recomputed the haplogroups to a finer granularity from the supplements for readers who know this stuff well. Here they are:

 

Haplogroup Count
H 2
H13c1 2
H5 2
H6b 2
HV 3
HV1a’b’c 4
HV1a2a 3
HV1b2 2
HV21 2
I 5
J1d 2
J2a1a1 2
J2a2b 2
J2a2c 4
J2a2e 3
K 16T 2
K1a 2
K1a4 2
L3 2
M1a1 4
M1a1e 2
M1a1i 2
M1a2a 2
N 2
N1’5 2
N1a1a2 2
R 3
R0 2
R0a 2
R0a1 2
R0a1a 3
R0a2 3
R0a2f 2
R2’JT 2
T 3
T1a 3
T1a2 2
T1a5 4
T1a7 7
T1a8a 2
T2 3
T2c1 2
T2c1c 2
T2e 2
U 2
U1a1 2
U1a1a3 2
U3b 3
U5a 2
U6a 2
U6a2 2
U6a3 2
U7 4
U8b1a1 3
U8b1b1 2
W3a1 2
W6 2
W8 2
X 2
X1 2
X1c 2

A quick inspection of mtDNA haplogroup frequencies shows that ancient Egyptians are not typical of modern Europeans. Not that much H, and lots of T, J and K. What that does remind me of are Early European Farmers. These people, who brought agriculture to Europe from Anatolia contributed a large fraction of the ancestry of modern Southern Europeans, and a lesser component to Northern Europeans.

But ultimately what’s great about this paper is that they have ancient autosomal DNA. That is, genome-wide results.

They got three samples of reasonably high quality. More precisely: “Two samples from the Pre-Ptolemaic Periods (New Kingdom to Late Period) had 5.3 and 0.5% nuclear contamination and yielded 132,084 and 508,360 SNPs, respectively, and one sample from the Ptolemaic Period had 7.3% contamination and yielded 201,967 SNPs.”

You can see the three samples on this bar plot. What is interesting is that they’re all pretty similar.

What you can see here is that to a great extent ancient Egyptians were descended from a population closely related to Natufians, or Natufians themselves. This easily explains the mtDNA affinity to Neolithic farmers: Natufians and Anatolian Neolithic populations were sister populations. The f3 statistic which looks at shared drift shows an affinity of ancient Egyptians with ancient farmer populations with Near Eastern provenance, but also with modern Sardinians. This is a common pattern, as ancient groups do not have later migration waves, with the Sardinians the modern population closest to this.

You see in the bar plot that northern Levantine populations are placed between Anatolian Neolithics and Natufians, as one might expect based on their geographical position and gene flow between these two regions. Additionally, the cyan color is associated with eastern farmers from the Zagros. I’ve already talked about gene flow from this area to the Levant recently. If you compare the Bronze Age Sidon samples I think you’ll see broad affinities with these Late Period Egyptians.

The PCA gives us results consonant with the model-based clustering. If you plot the genetic variation of ancient Egyptians they’re closest to Neolithic eastern Mediterranean populations. No great surprise.

Not the modern Egyptians. Why? It’s pretty clearly because modern Egyptians are shifted toward Sub-Saharan Africans. But there is also another component: modern Egyptians have more of the cyan eastern farmer component. What could this be?

An immediate thought comes to mind. We focus a great deal on Sub-Saharan African slavery. One reason is that it is visible. Black Africans are physically distinct from most Middle Eastern populations. But Egypt was long the center of another slave trade: “white slaves” from the Caucasus. Circassians. For hundreds of years Mamluks were recruited from the Caucasus as military slaves. They eventually became the ruling class of Egypt, until their decimation in the 19th century under Muhammad Ali (who himself was an Albanian Ottoman who never learned to speak Arabic well).

As noted in the paper earlier work looking at patterns in ancestry tracts and LD decay had made it obvious that much of the admixture of Sub-Saharan ancestry in Egypt, as in much of the Middle East, is relatively recent. In particular, it dates to the Islamic period, when trade and conquest took on new dimensions in Africa and north into Central Asia. One way ethnic minorities like Assyrians and Lebanese Christians differ from their Muslim neighbors is that they have much lower fractions of Sub-Saharan African ancestry, and no East Asian component. The latter might surprise, but remember that Central Asian Turkic slaves have been prominent in Muslim armies since at least the 9th century.

But some of the Sub-Saharan ancestry in Egyptians is old. The ancient Egyptian samples have it. To have none of it would seem strange, considering the history of contact between Nubia and Egypt, dating back to the Old Kingdom. Second, there is evidence of low levels of Sub-Saharan African gene flow into Southern Europeans. How did that happen? The highest fractions are in Spain, and can there be attributed to the Moorish period. But that explanation does not hold in much of Italy, where there are a few percent of haplogroup L. This probably is due to south-to-north gene flow across the Mediterranean during the Classical period. Some of the peoples on the south shore of the Mediterranean almost certainly already had some Sub-Saharan African admixture.

Not getting into the details of it, there are ways to explicitly model gene flow into a target population from donors defined by a phylogeny. In this case the authors tested various models of gene flow from Sub-Saharan Africans and Eurasians (non-Africans) to generate allele frequency patterns we see in modern Egyptians and ancient Egyptians.

What they consistently found is that modern Egyptians are about twice as much Sub-Saharan African as ancient Egyptians. The proportions for modern Egyptians ranged from ~10 to ~20 percent Sub-Saharan African against a Eurasian background, with a bias toward the higher values (depending on which populations you put into the phylogeny for non-Africans), and ~0 to ~10 percent for the ancient Egyptians, again with a bias toward the higher values. The pattern is consistent in these tests.

An issue here is that we’re going off three samples. That being said, the authors observe that despite differences in contamination/quality and time period they’re very concordant with each other. If I had to bet I think Old Kingdom samples would have somewhat less Sub-Saharan and eastern farmer ancestry. But the basic pattern persisted down to the Roman period, and was only shifted by admixture due to slavery.

And not to belabor the point, but a paper from a few years ago which had some Copt samples looks familiar in its broad outlines. You see that the Copts have very little Sub-Saharan African ancestry, though it does seem to be evident (the marker set is in the hundreds of thousands of SNPs). Additionally, they are quite distinct from the Qatari Arab sample.

Unfortunately the data for this paper just published is not on the European Nucleotide Archive. I really want to dig a little deeper into it.

What are the takeaways here? Egypt has been the sink for a lot of migration and gene flow over the past several thousand years, and probably earlier. Not surprising considering that it was relatively wealthy in the aggregate. The Natufian population that the Late Period Egyptians resemble the most did not have Sub-Saharan African ancestry according to earlier research. These Late Period Egyptians do have some. This is reasonable in light of the long interaction with Nubia which is historically attested. Similarly, there was clearly gene flow from Southwest Asia. This is again historically attested, especially in the Nile Delta (though foreign garrisons of mercenaries are recorded in Upper Egypt as well).

The Roman period probably did introduce some gene flow from Southeast Europe and Southwest Asia. But these populations are not that distinct from Egyptians.

Similarly, the Islamic period also brought in different peoples from Arabia and the Caucasus. But the most salient dynamic during the Islamic period was a massive trans-Saharan slave trade (though the Caucasus impact may have been comparable, and I think these results support the proposition that it was).

It seems entirely likely that the Copts are descended from a mix of Roman era Egyptians. Not only do they resemble the people in the Fayum portraits, but the circumstantial genetic data is that they have fewer “exotic” components which increased in frequency during the Islamic era. This would be exactly parallel to ethno-religious minorities in the Levant and Iraq.

One curious element to me is the suggestion gene flow before ~5,000 BCE between Sub-Saharan Africa and the lower Nile valley was low. If it hadn’t been low, it seems unlikely that the fraction of Sub-Saharan ancestry (or shift in that direction in relation to other Eurasians) in Copts would be so small.

So what explains the lack of earlier gene flow? I think the answer is going to be the fact that the human demographic landscape is characterized by lots of local population extinctions. As ancient DNA sampling coverage gets better and better meta-population dynamics are coming into focus, and we see gene flow, and die offs, in several areas. It is fashionable to say that human population variation is characterized by clines. But much of this clinal aspect is an outcome of the period after massive admixture over the last ~10,000 years.

And yet it may not be that the period before the Holocene was not clinal. Rather, it may be that large depopulations of areas of human occupation fragmented clinal ranges, and resulted in new range expansions from “core” zones.

About ~8,000 years ago there was a major desertification period in the Sahara desert. Many trans-Saharan populations may have gone extinct during this time due to rapid climate change. Eventually repopulation may have occurred from outside of the Sahara, so that post-Natufian Levantines and Sub-Saharan Africans from what today call the Sahel pushed up and down the Nile drainage basic respectively, meeting in the zone of Nubia on the boundary of history and prehistory.

Unlike many other areas of the world we have a long attested record of Egyptian history. As we get more mummy samples it seems likely that we’re get a crisper, clearer, picture. And the time transects will not be narrative blind; we already know the general arc of Egyptian history. If, for example, we see a new ancestral component around ~1500 B.C., in Egypt it’s not mysterious what this might be: the Hyksos.

This is just the prologue to a fascinating book that will be written over the next decade.

Related: Blog post analyzing one Copt’s results suggests that Sub-Saharan admixture is more like Dinka than Yoruba (in contrast, Muslim Egyptians have a mix of both, the latter probably coming during the Islamic slave trade, while the former is probably ancient admixture).

Citation: Schuenemann, V. J. et al. Ancient Egyptian mummy genomes suggest an increase of Sub-Saharan African ancestry in post-Roman periods. Nat. Commun. 8, 15694 doi: 10.1038/ncomms15694 (2017).

May 25, 2017

At an inflection point of archaeology and genetics

Filed under: Genetics,History — Razib Khan @ 1:54 pm

People always ask me what to read in relation to the field of historical population genetics. In the 2000s there were a series of books which focused on the mtDNA and Y results from modern phylogeographic analysis. Journey of Man, Seven Daughters of Eve, The Real Eve, and Mapping Human History. But there hasn’t been much equivalent in the 2010s.

Why? I think part of the issue is that the rate of change has been so fast that scholars and journalists haven’t been able to keep up. And, the change is happening right now, so it would likely mean that any book written over a year would be moderately out of date by publication.

I noticed today that Jean Manco has an updated and revised version of her book, Ancestral Journeys: The Peopling of Europe from the First Venturers to the Vikings. This was needed, because the original book was written before some major recent findings, though after some preliminary ones. As Manco has observed herself it was feasible to replace speculations with facts.

Since it seems likely that George R. R. Martin’s next book will be published before David Reich’s, I think that’s all you got. Any suggestions would be welcome.

As for the flip side for history that might be useful to understanding the genetics results, J. M. Roberts The History of the World is the best cliff notes I can think of. It’s obviously a high level survey, but frankly that would improve the interpretation I see in some papers. The fact that much of the history has no contemporary relevance is pretty unimportant, since you want to focus on the older stuff, which is where ancient DNA really shows its metal.

At some point ancient DNA will start to exhibit diminishing returns. Then the long hard slog of interpretation and synthesis will have to begin in earnest.

Older Posts »

Powered by WordPress