Razib Khan One-stop-shopping for all of my content

January 18, 2018

The Dravidianization of India

Filed under: Dravidian,India Genetics,India genomics,Indo-Aryan — Razib Khan @ 9:36 pm

On this week’s The Insight Spencer Wells and I talk about the Indo-Aryan arrival to South Asia. This was recorded very early last summer, and I’m rather unguarded (it’s well before I had the piece published in India Today).

I think 2018 will finally be the year that a lot of South Asia will be “solved.” There has been some foot-dragging on papers and results, but that can only go so long.

All that being said I suppose I should make some suppositions I have arrived at on this topic more explicit, as in a discussion with an Indian friend he admitted had no idea about some of my views, though he reads this weblog when I expressed them. That’s because they are speculative and my confidence in them is weak, though you can infer my opinions if you look very closely.

The figure to the left is from Genomic insights into the origin of farming in the ancient Near East, a paper published about a year and a half ago. You see various South Asian populations being modeled as a mixture of four different source populations. The Onge are an Andaman Islander population (and the closest we can get to the aboriginal peoples of South Asia). Iran_N represents Neolithic Iranians, the canonical “eastern farmer” population. Steppe_EMBA represent Yamnaya pastoralists, who are themselves modeled as a mixture of Eastern European Hunter-Gatherers (EHG) and southern population which has affinities with the Iran_N cluster. EHG in their turn seems to exhibit ancestry from Western European Hunter-Gatherers (WHG), whose heritage dates to the late Pleistocene, and Ancient North Eurasians (ANE), who flourished in Siberia, and contributed ancestry to populations to the west and east (including the ancestors of Native Americans).

When I first saw this specific figure I was incredulous. I had long thought that “Ancient North Indians” (ANI) were a compound of two elements, one related to the farmers of West Asia (Iran_N), and the other steppe Indo-European (Steppe_EMBA/Yamnaya). But the fraction of Yamnaya/Indo-European/Indo-Aryan ancestry seemed far too high.

A few years later I am not less certain about my skepticism. The fractions here in the details are debatable. Within the text of the paper, the author admits that the true ancestral populations are probably not represented by the model. But they are close. In most cases, the “Han” ancestry is probably indicative of the fact that the non-ANI component of South Asian ancestry is most closely related to the Onge, but is significantly different nonetheless.

The ratio of Iran_N and Steppe_EMBA is the key. Here is a selection from the paper:

Group Iran_N Steppe_EMBA Ratio
Jew_Cochin 0.53 0.23 2.27
Brahui 0.60 0.30 1.98
Kharia 0.13 0.07 1.97
Balochi 0.57 0.32 1.75
Mala 0.23 0.18 1.25
Vishwabrahmin 0.25 0.20 1.21
GujaratiD 0.29 0.28 1.04
Sindhi 0.38 0.38 1.00
Bengali 0.22 0.25 0.91
Pathan 0.36 0.45 0.81
Punjabi 0.24 0.33 0.72
GujaratiB 0.27 0.38 0.72
Lodhi 0.21 0.29 0.72
Burusho 0.27 0.43 0.64
GujaratiC 0.23 0.37 0.61
Kalash 0.29 0.50 0.58
GujaratiA 0.26 0.46 0.57
Brahmin_Tiwari 0.23 0.44 0.51

Any way you slice it, a group like the Tiwari Brahmins of Northern India have more Onge-like ancestry than most of the groups in Pakistan. But also observe that the ratio toward Steppe_EMBA is more skewed in them than among even Pathans or Kalash.  The Lodhi, a non-upper caste population from Uttar Pradesh in north-central South Asia are more skewed toward Steppe_EMBA than Pathans.

It is important for me to reiterate that the key is to focus on ratios and not exact percentages. Though the Steppe_EMBA fraction did strike me as high, glimmers of these sorts of results were evident in model-based clustering approaches as early as 2010. The population in the list above most skewed toward Iran_N are Cochin Jews. This group has known Middle Eastern ancestry. But next on the list are Brahui, a Dravidian speaking group in Pakistan. There is a north-south cline within Pakistan, with northern populations (Burusho) being skewed toward Steppe_EMBA and southern ones (Sindhi) being skewed toward Iran_N. Additionally, Iranian groups such as Pathans and Baloch likely have had some continuous gene flow with Middle Eastern groups, probably inflating their Iran_N.

Trends I see in the data:

  1. There is a north-south cline within Pakistan with Steppe_EMBA vs. Iran_N
  2. There is a north-south cline within South Asia with Steppe_EMBA vs. Iran_N
  3. There is caste stratification within regions between Steppe_EMBA vs. Iran_N
  4. Though not clear in this table, there are strong suggestions that Indo-European speaking groups tend to be enriched in Steppe_EMBA, all things equal (e.g., the Bengalis in the 1000 Genomes look a lot like the middle-caste Telugus in the 1000 Genomes when you remove the East Asian ancestry…except for a noticeable small fraction of a component which I think points to Indo-European ancestry)

What does this mean in terms of a model of the settlement of South Asian over the past 4,000 years? One conclusion I have come to is that Dravidian speaking groups are not the aboriginal peoples of the subcontinent. Rather, their settlement across much of South Asia is very recent. Almost as recent as Indo-Aryan habitation. In First Farmers the archaeologist Peter Bellwood proposed this model, whereby Indo-Aryans and Dravidians both expanded across South Asia concurrently. Though I think elements of Bellwood’s model that are incorrect, it’s far more correct in my opinion than I believed when I first encountered it.

Why do I believe this?

  1. The Neolithic begins in South India in 3000 BC.
  2. Sri Lanka is Indo-European speaking
  3. The Dravidian languages of South India don’t seem particularly diverged from each other
  4. There is ancestry/caste stratification in South India even excluding Brahmins (e.g., Reddys and Naidus in Andhra Pradesh look somewhat different from Dalits and tribals)
  5. Some scholars claim that there isn’t a Dravidian substrate in the Gangetic plain
  6. R1a1a-Z93, almost certainly associated with Indo-Aryans, is found in South Indian tribal populations
  7. Using LD-based methods researchers are rather sure that the last admixture events between ANI and ASI (“Ancestral South Indians”) populations occurred around ~4,000 years ago

Here is my revised model as succinctly as I can outline it. The northwest fringes of South Asia, today Pakistan, and later to be the home of the Indus Valley Civilization (IVC), was populated by a mix of indigenous populations, a form of ASI, when West Asian agriculturalists arrived ~9,000 years ago from what is today Iran. These were the Iran_N or “eastern farmer” groups. The West Asian agricultural toolkit was serviceable in northwestern South Asia for reasons of climate and ecology, but could not expand further east and south for thousands of years.

There is where the first admixture occurred that led to a population was mixed between ANI and ASI. These people lacked Steppe_EMBA. They were pre-Indo-European. They were almost certainly not all Dravidian speaking. The Burusho people of northern Pakistan, for example, speak a language isolate (in India proper you have Nihali and Kusunda)

By ~3000 BC this proto-South Asian (in a modern sense) population began to expand, while the IVC matured and waxed. Eventually, the IVC waned, fragmented, and disappeared.

Around ~2000 BC, or perhaps somewhat later, Indo-Aryans arrive in South Asia. The situation at this stage in not one of a primordial and static Dravidian India, on which Indo-Aryans place themselves on top. Rather, it’s a dynamic one as the collapse of the IVC has opened up a disordered power vacuum, and a reconfiguration of cultural and sociopolitical alliances.

In the paper above the author alludes to the pervasiveness of both Iran_N and Steppe_EMBA ancestry in South Asia, including in South India. “Indo-European” Y chromosomal lineages are also found among many South Indian groups, albeit at attenuated proportions region-wide. In Peter Turchin’s formulation, I believe that “Indo-Aryan” and “Dravidian” identities became meta-ethnic coalitions in the post-IVC world. Genetically the two groups are different, on average. But some Dravidian populations assimilated and integrated Indo-Aryan tribes and bands, while Indo-Aryans as newcomers assimilated many Dravidian populations.

The reason that the ratio of Iran_N to Steppe_EMBA does not decline monotonically as one goes from west to east along North Indian plain is that Indo-Aryans were not expanding into a Dravidian India.  Dravidian India was expanding only somewhat ahead of Indo-Aryan India, and in some places not all at all. In the northwest fringe of South Asia there had long been a settled population of peasants with West Asian ancestry with Iran_N affinities. In contrast to the east the landscape was populated by nomadic tribal populations with ASI affinities. North Indian Brahmins may have more Steppe_EMBA than some populations in Pakistan and more ASI because they descend from Indo-Aryan groups who absorbed indigenous ASI populations as they expanded across the landscape.

Dravidian groups as they expanded also assimilated indigenous populations. This explains some groups with very high fractions of ASI. Their ASI ancestry is a compound, of an old admixture in Northwest India, and also later assimilation in South India. The presence of R1a1a-Z93 in these populations reflects the integration of some originally Indo-Aryan groups into the expanding Dravidian wavefront.

Where does this leave us?

  1. The Indo-Aryan vs. Dravidian dichotomy is not one of newcomers vs. aboriginals. It is of two different sociocultural configurations which came into their current shape in the waning days of the IVC. That is, it is less than 4,000 years old
  2. The two populations were clearly interacting closely around the time of the collapse and disintegration of the IVC and post-IVC societies. There has been gene flow between the two
  3. ~4000 years ago ANI and ASI populations existed in their “pure” form, but that is because ASI aboriginals still existed to the south and east of the IVC, while Indo-Aryans were a new intrusive presence in the Indian subcontinent

April 21, 2011

Visualization of genetic distances, part n

Zack Ajmal has been taking his Reference 3 data set for a stroll over at the Harappa Ancestry Project. Or, more accurately, he’s been driving his computer to crunch up ADMIXTURE results ascending up a later of K’s. Because it is the Harappa Ancestry Project Zack’s populations are overloaded a touch on South Asians. He managed to get a hold of the data set from Reconstructing Indian History. If you will recall this paper showed that the South Asian component which falls out of ancestry structure inference algorithms may actually be a stabilized hybrid of two ancient populations, “Ancestral North Indian” (ANI) and “Ancestral South Indian” (ASI). ANI are a population which can be compared pretty easily to other West Eurasians. There are no “pure” groups of ASI, but the indigenous peoples of the Andaman Islands are the closest, having diverged from the mainland ASI populations tens of thousands of years ago.

At K = 11, that is, 11 inferred ancestral populations, Zack seems to have now stumbled onto the patterns which one would expect from this hybrid model of South Asians. Let me quote him:

Now let’s take all the reference populations with an Onge component between 10% ...

March 28, 2011

Genetics as the myth buster: Indian edition

Filed under: Genetics,Genomics,India Genetics,India genomics,Vishwakarma — Razib Khan @ 12:17 pm

Whenever Zack Ajmal posts a new update to the Harappa Ancestry Project he appends some data to his ethnic database. This sends me to Wikipedia, because how many people are supposed to know what a “Muslim Rawther” means? Well, if you are a Muslim Rawther, and perhaps from Southern India, you would. But South Asian ethno-linguistic categories and hierarchies are notoriously Byzantine, and I have difficulty making sense of them. This isn’t too surprising in my case, as my family’s background is relatively mixed in the very recent past (e.g., Hindus and Muslims, and people of various caste backgrounds), so we’re not the sort who can go at length about our pure ancestry and all that stuff. Unfortunately, Wikipedia isn’t always useful, because the people editing the entries on particular South Asian ethnic groups are often people from those ethnic groups, so you get a lot of extraneous information, and a particular slant on how awesome and high achieving the group (also, sometimes there’s funny stuff about how notoriously good looking that particular caste!). On occasion there are other sources which are informative. For example, Zack has several individuals from the Tamil Nadar caste. I know ...

Powered by WordPress