Razib Khan One-stop-shopping for all of my content

March 31, 2018

The maturation of the South Asian genetic landscape

Filed under: Dravidian,Human Population Genetics,India,Indo-Aryan — Razib Khan @ 9:39 pm


The above is a stylized map from the preprint, The Genomic Formation of South and Central Asia. In broad strokes, it says some things that are very expected, and some things that are not so expected.

The abstract is long, but I’ll reproduce it in full:

The genetic formation of Central and South Asian populations has been unclear because of an absence of ancient DNA. To address this gap, we generated genome-wide data from 362 ancient individuals, including the first from eastern Iran, Turan (Uzbekistan, Turkmenistan, and Tajikistan), Bronze Age Kazakhstan, and South Asia. Our data reveal a complex set of genetic sources that ultimately combined to form the ancestry of South Asians today. We document a southward spread of genetic ancestry from the Eurasian Steppe, correlating with the archaeologically known expansion of pastoralist sites from the Steppe to Turan in the Middle Bronze Age (2300-1500 BCE). These Steppe communities mixed genetically with peoples of the Bactria Margiana Archaeological Complex (BMAC) whom they encountered in Turan (primarily descendants of earlier agriculturalists of Iran), but there is no evidence that the main BMAC population contributed genetically to later South Asians. Instead, Steppe communities integrated farther south throughout the 2nd millennium BCE, and we show that they mixed with a more southern population that we document at multiple sites as outlier individuals exhibiting a distinctive mixture of ancestry related to Iranian agriculturalists and South Asian hunter-gathers. We call this group Indus Periphery because they were found at sites in cultural contact with the Indus Valley Civilization (IVC) and along its northern fringe, and also because they were genetically similar to post-IVC groups in the Swat Valley of Pakistan. By co-analyzing ancient DNA and genomic data from diverse present-day South Asians, we show that Indus Periphery-related people are the single most important source of ancestry in South Asia — consistent with the idea that the Indus Periphery individuals are providing us with the first direct look at the ancestry of peoples of the IVC — and we develop a model for the formation of present-day South Asians in terms of the temporally and geographically proximate sources of Indus Periphery-related, Steppe, and local South Asian hunter-gatherer-related ancestry. Our results show how ancestry from the Steppe genetically linked Europe and South Asia in the Bronze Age, and identifies the populations that almost certainly were responsible for spreading Indo-European languages across much of Eurasia.

First Turk Empire

Though the abstract is focused on South Asia, the preprint actually has quite a bit about Inner Asia, because of the provenance of the samples. We often view the typical person in the past as a peasant in an agricultural society, and therefore relatively immobile over their lifetime. The story we like to tell ourselves is that non-elites in premodern societies, on the whole, had narrow horizons, delimited by their home village, or the neighboring network of villages.

But results from this work and others show that mobile populations where individuals spanned vast areas of Eurasia across their lifetimes, were not that uncommon for pastoralists. We know this historically, as empires such as that of the Turks and Mongols were defined by a ruling elite whose writ extended from eastern to western Eurasia. The Sintashta samples, which exhibit genetic heterogeneity, with some individuals very different from the norm in their settlement, is exactly what you’d expect from a social and political culture which was united in some fashion over huge distances.

As the sample sizes for ancient DNA have increased it seems rather clear that demographic dynamics that we see in later historical expansions of Inner Asian polities extends back to the Bronze Age. With expanding populations across the ecologically friendly landscape, the ancient proto-Indo-Europeans seem to have mixed with the local substrate wherever they went, just as Turks did later. As they moved west, they mixed with late Neolithic Europeans, as they went east, they mixed with Siberian populations, and as they conquered south they mixed with descendants of West Asian farmers.

One of the primary aspects that I think one needs to keep in mind is that one can’t just imagine that this was defined by simple diffusion dynamics. Historically the boundary between pastoralists and peasants could be fluid, but when political resistance collapses pastoralists have been able to use their military prowess to swarm across the lands of agriculturalists. In other words, centuries of gradual inter-demic gene flow might be interrupted by a rapid “pulse” admixture. There’s no reason that pre-literate polities couldn’t exist. The Inca were one such example, the homogeneity of the Uruk civilization in the 4th millennium BC is strongly suggestive of an imperial hegemony or paramountcy.

Another dynamic is that pastoralists are highly mobile, and so may leapfrog over territory which is unsuitable. Or, they may move so rapidly that there isn’t much mixing with populations in between point A and point B.

This is apparently the case with the Bactria–Margiana Archaeological Complex. These people were mostly descended from people related to the eastern farmers of West Asia, those in modern day Iran. Some of their ancestry had affinities with Anatolian farmers, and there is some evidence even of Siberian admixture in this region. But there are three important takehomes of this preprint in relation to this area 1) the BMAC did not contribute much genetically to South Asia at all, 2) steppe ancestry, related to that of the Yamna culture of the Pontic region, only shows up in BMAC ~2000 3)  there is actually evidence of South Asian (Indus valley?) migration into the BMAC.

The fact that Yamna-like ancestry shows up in the BMAC region so late is a strong reason to suspect that Indo-Iranian peoples did not move to Iran and India until after 2000 BC. In earlier comments on this issue, I was rather vague about timing, because the Corded-Ware people show up in Europe before 2500 BC, and I was going along with the parsimonious idea that this was part of one single cultural and social revolution.

I was wrong. Going back to the Turkic analogy, there were multiple waves of migration and folk wandering by Turkic pastoralists. By different Turkic groups. One of the major ones occurred due to the rise of the Mongols, and the Mongols were not even Turks. The same seems to be true of Inner Eurasian Indo-European groups.

Moving on to South Asia, there are two primary constructs which come out of this preprint. “Indus Periphery” and “Ancient Ancestral South Indians.” I’ll call the former InPe and the latter is termed AASI. To some extent these complement and replace the earlier terms “Ancestral North Indian” and “Ancestral South Indian” (ANI and ASI). The AASI are the ancient hunter-gatherers of the Indian subcontinent. The authors suggested that divergence of this group from other eastern Eurasians occurred very early, that the division between the ancestors of the Papuans, Onge, and AASI was even polytomic (that basically separated very quickly without discernible structure).

The InPe samples are from eastern Iran and the BMAC. They’re unique in having AASI ancestry, at variable fractions (indicating contemporaneous admixture). They also resemble samples from Swat Valley which date to 1200 BC and later, with one major difference: the Swat Valley samples have steppe ancestry.

There are no samples from the Indus Valley proper, so the authors suggest that the InPe are reasonable proxies. Additionally, they assert that ASI can best be modeled as a mixture between InPe and AASI. In other words, there were two admixture events. Their Pulliyar samples are actually pretty good proxies for the resultant ASI, while the Kalash of Pakistan are good proxies for the ANI, who are presumably now modeled as a mixture of steppe populations with the InPe.

This resolves the enigmatic result that Priya Moorjani reported to me last year: less than 4,000 years ago “pure” ANI and ASI people existed. She was presumably going off admixture timing estimates. These results suggest that in some form ANI and ASI still exist, and the first admixture occurred with the creation of InPe.

Using a new method the authors contend that InPe emerged 4700-3000 BC. If this is true then the Indus Valley Civilization (IVC) was a compound of AASI and Iranian agriculturalists (sampled from the eastern end of the cline of admixture with Anatolians, that is, they had none of that ancestry). They also post the first arrival of agriculture to Mehrgarh by 2,000 years at the least. I suspect that it will turn out there were earlier admixtures, which are not being detected. For various ecological reasons the West Asian cultural complex was portable only to the northwest fringe of South Asia, and there it persisted for ~4,000 years. This served as a natural eastern limit for cultures which were migrating out of the West Asian zone, and a point where AASI hunter-gatherers constantly mixed into the local population.

As the IVC sites begin to get sampled in the future I predict that instead of a homogeneous transect of admixture over time and space we’ll see a lot of heterogeneity.

In the Swat samples, the authors see two correlated trends, an increase in steppe ancestry, and an increase in AASI ancestry. No doubt this dates to the “great admixture” which occurred between 2000 BC, and some time before 1000 AD (the Bengali admixture with East Asians dates to between 0 and 1000 AD, as does that of Brahmins who left the North Indian plain and mixed with local populations elsewhere).

Finally, the authors detect a skew toward steppe ancestry among some populations, in particular, Brahmins. The skew is in relation to Iranian farmer ancestry, the two being the primary constituents of ANI ancestry. In Who We Are and How We Got Here David Reich says some of the ANI admixture is much more recent than the rest, judging by tract length. And also going by the BMAC and Swat samples it seems that the time period for when Indo-Aryans arrived in South Asia has to be in the interval between 2000 BC and 1200 BC.

There’s another aspect of the preprint which allows for dating. The arrival of Austro-Asiatic people in South Asia probably has to postdate the expansion of the same group in Vietnam about 4,000 years ago (though not necessarily obviously). But the Munda Austro-Asiatic people of northeast India exhibit curious genetic patterns. They clearly have East Asian ancestry related to other Austro-Asiatic populations in Southeast Asia, but they have a lot less “West Eurasian” in their ANI/ASI mix. The authors resolve this by suggesting that the Munda arrived in South Asia when there was still heterogeneity among the ASI, and unadmixed AASI.

After 2000 BC the IVC went into decline. Various groups of Indo-Aryans were expanding and admixing. From the other end of the subcontinent arrived rice cultivators from Southeast Asia. At some point, they ran into an ASI population that had some Iranian admixture, but not as much as typical. All of this probably occurred in the period between 2000 BC and 1000 BC. I know that some researchers have argued that the Gangetic plain was inhabited by Munda speaking peoples before it was inhabited by Indo-Aryans. The main issue I’ve had with this is that modern Munda peoples are very genetically distinctive, and there’s no evidence of East Asian ancestry in most populations of the Gangetic plain (the main exceptions are those which have experienced Tibetan influence/contact).

So here is my interpretation of the genetic and historical evidence:

1) IVC emerges out of a matrix that was a synthesis of West Asian farmers and indigenous hunter-gatherers. I would not be surprised if later genetic work recapitulates the findings in Europe of an initial period of separation, and then a “resurgence” of indigenous ancestry as the barriers between the two groups break.

2) The period between 2000 BC and 1000 BC is the beginning of the transformation of the South Asian genetic and ethnolinguistic landscape, with the intrusion of two different groups from different directions, Indo-Aryans to the west and Austro-Asiatics from the east. Austro-Asiatic rice culture was superior to western wheat culture because rice is more delicious than wheat, but the Indo-Aryans ultimately established cultural supremacy across South Asia by the Iron Age.

3) The situation in South India is more complicated and confused. The admixture of groups like Pulliyar from InPe and AASI into the classic ASI configuration seems to be more recent than 2000 BC (their low bound dates go as late as 400 BC). The admixture may have occurred in various places, not just in South India. The evidence from this paper suggests that the Andronovo/Sintashta cultural zone was characterized by some genetic heterogeneity due to variation in admixture with neighboring peoples, and the same could be said for the IVC then. I would not be surprised if northern IVC locations had more AASI than southern IVC, as the latter were more insulated from the east due to the Thar desert (the results are consistent with earlier work that suggest modern populations in the lower Indus basis have less Indo-Aryan and more Iranian, with less AASI).

4) We need to be careful about assuming that everything here is a linear combination of distinct and separable atomic units of cultural integrity and wholeness. What I mean is that though Brahmins and some other North Indian groups are enriched for steppe ancestry, it is not only their purview. Rather, it may be that these upper caste groups simply mixed less with the other populations with Iranian and AASI ancestry. The statistics in this paper do not detect enrichment of steppe ancestry in South Indian Brahmins. I believe this is simply an artifact of the reality that South Indian Brahmins mixed with Iranian-enriched elites, like Reddys, when they emigrated to the south.

Though the model outlined in the preprint is much more complicated than a simple ANI/ASI mix, it still simplifies the demographic histories of many populations. For example, own survey of the data suggests that Brahmins who left the Indo-Gangetic plain mixed with local elites wherever they went (Bengali Brahmins have East Asian ancestry, just as South Indian Brahmins have more Iranian-like ancestry).

5) Language is important but is not determinative. R1a1a-Z93 arrived in South Asia relatively late with groups from the steppe. Its frequency is highest in the northwest, and among upper castes. That is, it is correlated in a coarse manner to steppe ancestry. But R1a1a-Z93 is pervasive throughout South Asia irrespective of caste and region. Even in Dravidian speaking southern populations, some groups have quite a bit of R1a1a-Z93.

The analogy that presents itself here is Southern Europe, where some groups with high frequencies of R1b, such as the Basques and Sardinians, are clearly descended in the main from pre-steppe populations. What this suggests is that a broad social-culture prestige network mediated by males extended itself into regions where its cultural hegemony was not assured. Additionally, the autosomal genetic impact was modest, even if privileges given to particular male lineages allowed them to sweep other groups out of the gene pool.

Tamil history precipitates out only a little later than that of North Indian Indo-Aryan civilization. I suspect that this is not a coincidence, that South Asia after the collapse of the IVC and the arrival of the Indo-Aryans and Mundas, could be thought of as a brought mixing cauldron genetically and culturally. In many regions, Dravidian languages persisted in the face of the expansive Indo-Aryan, but there was a cultural influence, likely reciprocal. This is why once Indian civilization reemerged its coherent unity set against peoples to the west and east was not strange despite the linguistic gap between the north and the south.

The only exception here might be the Munda. As I have said, R1a1a-Z93 is pervasive. But it is nearly unfound among the Munda, who tend to carry relatively exotic Southeast Asian Y lineages such as O. I believe that the Munda were in some way losers in a cultural conflict, but they maintained themselves in the hills above the Gangetic plain.

Finally, two reflections, one navel-gazing, one big picture. Genome bloggers in the years around 2010 actually anticipated many of these results. There’s some hindsight bias here because you remember the times you are right and not the times you were wrong. We were right that there was more than one ANI pulse. Additionally, we were looking at the ratio between “Eastern European” and “West Asian” ancestry years ago and noticing the skewed patterns, with North Indian Brahmins biased toward the former and South Indian elite non-Brahmins skewed toward the latter. Chaubey 2010 suggested to us that something was different about the Munda not only in their East Asian ancestry but in their ANI/ASI ancestry. They just didn’t seem to have any Indo-European ancestry (steppe), and a lot of ASI. Over the past few years I’ve been suggesting that Dravidian languages were not primal to South India, but the product of a recent expansion (though part of this is due to scientific publications).

The truth was out there. It just took ancient DNA and the analytic chops of the Reich group and their collaborators to prune the tree of possibilities so that we could zero in on a few precise and likely models.

In the general, I wonder about the role of clines, diffusions, and pulses. The models that the foremost practitioners of the science of ancient DNA utilize tend to assume pulse admixtures, rather than isolation-by-distance gene flow. This isn’t always a crazy assumption. But there was a discussion in the paper of a west-east admixture cline between Anatolian farmers and Iranian farmers. Is this cline due to admixture, or was it always there? A paper from a few years ago implied that early farmers were highly structured, structure that broke down later.

Also, the polytomy at the base of the eastern Eurasian human family tree, where all the major lineages diverge rapidly from each other, makes me wonder about gene flow vs. admixture. It seems possible that the polytomy may mask a phylogenetic tree topology which had gradually bifurcating nodes, if periodically a single daughter population replaced all its sister lineages in a local geographic zone. Much of history in human meta-populations may be characterized by isolation-by-distance and gene flow, erased by the extinction of most lineages and expansion of a favored lineage.

January 18, 2018

The Dravidianization of India

Filed under: Dravidian,India Genetics,India genomics,Indo-Aryan — Razib Khan @ 9:36 pm

On this week’s The Insight Spencer Wells and I talk about the Indo-Aryan arrival to South Asia. This was recorded very early last summer, and I’m rather unguarded (it’s well before I had the piece published in India Today).

I think 2018 will finally be the year that a lot of South Asia will be “solved.” There has been some foot-dragging on papers and results, but that can only go so long.

All that being said I suppose I should make some suppositions I have arrived at on this topic more explicit, as in a discussion with an Indian friend he admitted had no idea about some of my views, though he reads this weblog when I expressed them. That’s because they are speculative and my confidence in them is weak, though you can infer my opinions if you look very closely.

The figure to the left is from Genomic insights into the origin of farming in the ancient Near East, a paper published about a year and a half ago. You see various South Asian populations being modeled as a mixture of four different source populations. The Onge are an Andaman Islander population (and the closest we can get to the aboriginal peoples of South Asia). Iran_N represents Neolithic Iranians, the canonical “eastern farmer” population. Steppe_EMBA represent Yamnaya pastoralists, who are themselves modeled as a mixture of Eastern European Hunter-Gatherers (EHG) and southern population which has affinities with the Iran_N cluster. EHG in their turn seems to exhibit ancestry from Western European Hunter-Gatherers (WHG), whose heritage dates to the late Pleistocene, and Ancient North Eurasians (ANE), who flourished in Siberia, and contributed ancestry to populations to the west and east (including the ancestors of Native Americans).

When I first saw this specific figure I was incredulous. I had long thought that “Ancient North Indians” (ANI) were a compound of two elements, one related to the farmers of West Asia (Iran_N), and the other steppe Indo-European (Steppe_EMBA/Yamnaya). But the fraction of Yamnaya/Indo-European/Indo-Aryan ancestry seemed far too high.

A few years later I am not less certain about my skepticism. The fractions here in the details are debatable. Within the text of the paper, the author admits that the true ancestral populations are probably not represented by the model. But they are close. In most cases, the “Han” ancestry is probably indicative of the fact that the non-ANI component of South Asian ancestry is most closely related to the Onge, but is significantly different nonetheless.

The ratio of Iran_N and Steppe_EMBA is the key. Here is a selection from the paper:

Group Iran_N Steppe_EMBA Ratio
Jew_Cochin 0.53 0.23 2.27
Brahui 0.60 0.30 1.98
Kharia 0.13 0.07 1.97
Balochi 0.57 0.32 1.75
Mala 0.23 0.18 1.25
Vishwabrahmin 0.25 0.20 1.21
GujaratiD 0.29 0.28 1.04
Sindhi 0.38 0.38 1.00
Bengali 0.22 0.25 0.91
Pathan 0.36 0.45 0.81
Punjabi 0.24 0.33 0.72
GujaratiB 0.27 0.38 0.72
Lodhi 0.21 0.29 0.72
Burusho 0.27 0.43 0.64
GujaratiC 0.23 0.37 0.61
Kalash 0.29 0.50 0.58
GujaratiA 0.26 0.46 0.57
Brahmin_Tiwari 0.23 0.44 0.51

Any way you slice it, a group like the Tiwari Brahmins of Northern India have more Onge-like ancestry than most of the groups in Pakistan. But also observe that the ratio toward Steppe_EMBA is more skewed in them than among even Pathans or Kalash.  The Lodhi, a non-upper caste population from Uttar Pradesh in north-central South Asia are more skewed toward Steppe_EMBA than Pathans.

It is important for me to reiterate that the key is to focus on ratios and not exact percentages. Though the Steppe_EMBA fraction did strike me as high, glimmers of these sorts of results were evident in model-based clustering approaches as early as 2010. The population in the list above most skewed toward Iran_N are Cochin Jews. This group has known Middle Eastern ancestry. But next on the list are Brahui, a Dravidian speaking group in Pakistan. There is a north-south cline within Pakistan, with northern populations (Burusho) being skewed toward Steppe_EMBA and southern ones (Sindhi) being skewed toward Iran_N. Additionally, Iranian groups such as Pathans and Baloch likely have had some continuous gene flow with Middle Eastern groups, probably inflating their Iran_N.

Trends I see in the data:

  1. There is a north-south cline within Pakistan with Steppe_EMBA vs. Iran_N
  2. There is a north-south cline within South Asia with Steppe_EMBA vs. Iran_N
  3. There is caste stratification within regions between Steppe_EMBA vs. Iran_N
  4. Though not clear in this table, there are strong suggestions that Indo-European speaking groups tend to be enriched in Steppe_EMBA, all things equal (e.g., the Bengalis in the 1000 Genomes look a lot like the middle-caste Telugus in the 1000 Genomes when you remove the East Asian ancestry…except for a noticeable small fraction of a component which I think points to Indo-European ancestry)

What does this mean in terms of a model of the settlement of South Asian over the past 4,000 years? One conclusion I have come to is that Dravidian speaking groups are not the aboriginal peoples of the subcontinent. Rather, their settlement across much of South Asia is very recent. Almost as recent as Indo-Aryan habitation. In First Farmers the archaeologist Peter Bellwood proposed this model, whereby Indo-Aryans and Dravidians both expanded across South Asia concurrently. Though I think elements of Bellwood’s model that are incorrect, it’s far more correct in my opinion than I believed when I first encountered it.

Why do I believe this?

  1. The Neolithic begins in South India in 3000 BC.
  2. Sri Lanka is Indo-European speaking
  3. The Dravidian languages of South India don’t seem particularly diverged from each other
  4. There is ancestry/caste stratification in South India even excluding Brahmins (e.g., Reddys and Naidus in Andhra Pradesh look somewhat different from Dalits and tribals)
  5. Some scholars claim that there isn’t a Dravidian substrate in the Gangetic plain
  6. R1a1a-Z93, almost certainly associated with Indo-Aryans, is found in South Indian tribal populations
  7. Using LD-based methods researchers are rather sure that the last admixture events between ANI and ASI (“Ancestral South Indians”) populations occurred around ~4,000 years ago

Here is my revised model as succinctly as I can outline it. The northwest fringes of South Asia, today Pakistan, and later to be the home of the Indus Valley Civilization (IVC), was populated by a mix of indigenous populations, a form of ASI, when West Asian agriculturalists arrived ~9,000 years ago from what is today Iran. These were the Iran_N or “eastern farmer” groups. The West Asian agricultural toolkit was serviceable in northwestern South Asia for reasons of climate and ecology, but could not expand further east and south for thousands of years.

There is where the first admixture occurred that led to a population was mixed between ANI and ASI. These people lacked Steppe_EMBA. They were pre-Indo-European. They were almost certainly not all Dravidian speaking. The Burusho people of northern Pakistan, for example, speak a language isolate (in India proper you have Nihali and Kusunda)

By ~3000 BC this proto-South Asian (in a modern sense) population began to expand, while the IVC matured and waxed. Eventually, the IVC waned, fragmented, and disappeared.

Around ~2000 BC, or perhaps somewhat later, Indo-Aryans arrive in South Asia. The situation at this stage in not one of a primordial and static Dravidian India, on which Indo-Aryans place themselves on top. Rather, it’s a dynamic one as the collapse of the IVC has opened up a disordered power vacuum, and a reconfiguration of cultural and sociopolitical alliances.

In the paper above the author alludes to the pervasiveness of both Iran_N and Steppe_EMBA ancestry in South Asia, including in South India. “Indo-European” Y chromosomal lineages are also found among many South Indian groups, albeit at attenuated proportions region-wide. In Peter Turchin’s formulation, I believe that “Indo-Aryan” and “Dravidian” identities became meta-ethnic coalitions in the post-IVC world. Genetically the two groups are different, on average. But some Dravidian populations assimilated and integrated Indo-Aryan tribes and bands, while Indo-Aryans as newcomers assimilated many Dravidian populations.

The reason that the ratio of Iran_N to Steppe_EMBA does not decline monotonically as one goes from west to east along North Indian plain is that Indo-Aryans were not expanding into a Dravidian India.  Dravidian India was expanding only somewhat ahead of Indo-Aryan India, and in some places not all at all. In the northwest fringe of South Asia there had long been a settled population of peasants with West Asian ancestry with Iran_N affinities. In contrast to the east the landscape was populated by nomadic tribal populations with ASI affinities. North Indian Brahmins may have more Steppe_EMBA than some populations in Pakistan and more ASI because they descend from Indo-Aryan groups who absorbed indigenous ASI populations as they expanded across the landscape.

Dravidian groups as they expanded also assimilated indigenous populations. This explains some groups with very high fractions of ASI. Their ASI ancestry is a compound, of an old admixture in Northwest India, and also later assimilation in South India. The presence of R1a1a-Z93 in these populations reflects the integration of some originally Indo-Aryan groups into the expanding Dravidian wavefront.

Where does this leave us?

  1. The Indo-Aryan vs. Dravidian dichotomy is not one of newcomers vs. aboriginals. It is of two different sociocultural configurations which came into their current shape in the waning days of the IVC. That is, it is less than 4,000 years old
  2. The two populations were clearly interacting closely around the time of the collapse and disintegration of the IVC and post-IVC societies. There has been gene flow between the two
  3. ~4000 years ago ANI and ASI populations existed in their “pure” form, but that is because ASI aboriginals still existed to the south and east of the IVC, while Indo-Aryans were a new intrusive presence in the Indian subcontinent

Powered by WordPress