School girls in Hunza, Pakistan
A few days ago I observed that pseudonymous blogger Dienekes Pontikos seemed intent on throwing as much data and interpretation into the public domain via his Dodecad Ancestry Project as possible. What are the long term implications of this? I know that Dienekes has been cited in the academic literature, but it seems more plausible that this sort of project will simply distort the nature of academic investigation. Distort has negative connotations, but it need not be deleterious at all. Academic institutions have legal constraints on what data they can use and how they can use it (see why Genomes Unzipped started). Not so with Dienekes’ project. He began soliciting for data ~2 months ago, and Dodecad has already yielded a rich set of results (granted, it would not be possible without academically funded public domain software, such as ADMIXTURE). Even if researchers don’t cite his results (and no doubt some will), he’s reshaping the broader framework. In other words, he’s implicitly updating everyone’s priors. Sometimes it isn’t even a matter of new information, as much as putting a spotlight on information which was already there. Below is a slice of a bar plot from Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation. It uses STRUCTURE with K = 7. To the right of the STRUCTURE slice are two plots of individual data on French and French Basque from the same HGDP data set using ADMIXTURE at K = 10 from Dodecad.
Repeated runs and higher K’s make it clear that the French Basque lack a “West Asian” aspect which other French, and Iberians as well, have. Some of this is clear in the paper I referenced above as well…the key is you have to look at the supplements at K = 6. Because the Basque are the only native non-Indo-European speakers in Western Europe, their origin and relationship to nearby populations has always been of interest (they also have the highest Rh- frequency of world populations). Granted, the French Basque are very similar genetically to the French as a whole. But, it is obviously highly informative that they lack an ancestral component in totality which seems to exist at low but consistent levels across Western European populations. The only other European population at K = 15 who lack the West Asian component in totality are Finns (the Lithuanians come very close).
This is all preamble to a discussion of a post Dienekes put up today, A solution to the problem of Indo-Aryan origins. Remember that Dienekes has been “playing” with ADMIXTURE for only a few months. To claim to have found a ‘solution’ to a problem as intellectually and politically intractable and explosive as this is rather bold. The crux of the matter is that at a certain confluences of K’s and population sets Dienekes has discovered a distinctive signature of ancestry which seems to be modal on the north slope of the Caucasus, and spans India and Europe. He terms this “Dagestani,” due to the fact that among a population sample from this province in Russia this ancestral component is overwhelmingly dominant. The patterns of Dagestani admixture in Europe and India are curious and suggestive.
1 – In Europe the frequencies are low, but irregularly distributed (excepting around the North Caucasus). Scandinavians and British have appreciable fractions, Finns and Southern Europeans do not. Here’s Dienekes:
Interpreting this pattern is not easy, but it does seem that this component seems to have a V-like distribution, achieving its maximum in Caucasus and its environs, then undergoing a diminution, and achieving a secondary (lower) frequency mode in NW Europe.
The surprising appearance of the homonymous Dagestan component in India suggests a widespread presence of a common ancestry element. The West Asian element, by comparison seems to have a more normal /\-like distribution around its center in Anatolia-Caucasus-Iran region. It does reach the Atlantic coast, but is lacking in Scandinavia and Finland, and also in India itself.
2 – South Indian Brahmins have appreciable fractions, but non-Brahmins in the same region do not. In contrast, those who come from Indo-Aryan speaking backgrounds do seem to have Dagestani ancestral components, irrespective of other aspects of ancestry. For example Pakistanis don’t have that much more Dagestani than South Indian Brahmins or Gujaratis. Also compare the relatively narrow window of Dagestani ancestry variance among Dodecad South Asians (I’m DOD075). DOD088 is from what I recall a Reddy from Andhara Pradesh, a non-Brahmin but non-low caste. It is interesting that they have a high proportion of “Pakistan,” but no Dagestani. I have ~10% Dagestani, but no Pakistani.
Below is K = 10 for a selection of populations. Dienekes has now included in two non-Indo-European speaking Pakistani populations: the Brahui (Dravidian) and Burusho (linguistic isolate in the mountains of Pakistan):
Some general patterns are evident. The light blue is indicative of generic “Indian” ancestry. It is not found in appreciable proportions outside of subcontinental populations (or those of recent subcontinental origin). The same with the red, and light orange. For your reference the dark orange is a “Northern European” component, modal in Lithuania. The light and dark Green are both East Asian components. The dark blue is a “West Asian” component modal in Georgia, and prominent across Europe with declining as a function of distance from the eastern shore of the Black Sea (this is surely the West Asian which distinguishes the French from the French Basque). I believe that the light purple dominant in the Brahui and the light red dominant in the Burusho probably form as a compound the aforementioned Pakistani component. The dark purple is the Dagestani.
First, a word on the Brahui. These are a group of tribes who reside in northern Balochistan in Pakistan. A small number are even to be found in Afghanistan. Historically they have had close relations with the Baloch, an Iranian speaking cluster of tribes who totally envelop the Brahui. The Brahui do speak a Dravidian language, of a family dominant in South India and found in isolated regions of Central and Eastern India. There are two broad models for the existence of a Dravidian language in Pakistan. The first is that the Brahui are remnants of more widely spoken Dravidian languages which date back to the Indus Valley civilization. The second is that the Brahui arrived during the medieval period from another region of South Asia where Dravidian languages were more common. Assuming either model, it has long been presumed that their involution by the Baloch has had a strong impact on the Brahui genetically; the two groups are very close. This is evident in Dienekes’ results as well. But observe that the Baloch are the group which seems more cosmopolitan in ancestry than the Brahui. If the Brahui were Dravidians from deep in India it seems that they would have a greater residual component of India-specific ancestry (light blue and orange). This is not so. In fact the Baloch have more of the Indian ancestral component than the Brahui. The Brahui component is found across Pakistan, and into India, albeit at lower proportions. Naturally, the Baloch have the second highest fraction. I believe these results should shift us toward the position that the Brahui are indigenous in relation to the Baloch, and that the Baloch ethnic identity emerged through the shift of a Brahui substrate, as evidenced by the greater cosmpolitanism of the Baloch. Additionally, Dienekes observes that the Brahui have a lower proportion of the Dagestani component than most other Pakistani groups, and several Indo-Aryan groups in India proper.
The Burusho are event more interesting than the Brahui. Unlike the Brahui the Burusho are very isolated in the mountainous fastness of Baltistan in northern Pakistan. Additionally, their language, Burushashki, is a linguistic isolate. Others of the class are Basque and Sumerian. In general it is assumed that linguistic isolates were once part of broader families of languages which have gone extinct. Burushashki probably persists in large part because of the geography which its speakers inhabit. Mountainous areas often preserve ethnic and linguistic diversity because the terrain allows for the persistence of local variety. I believe it is plausible that the Burusho have been far more isolated than the Brahui. This seems to show up in the ADMIXTURE plot, the Burusho have a greater proportion of their modal ancestral component than the Brahui. Additionally, the Burusho have even an smaller component of Dagestani than the Brahui.
Below is a chart Dienekes constructed ordered by proportion of Dagestani for his South Asian populations. Next to it I’ve placed a chart from a PCA which has some of the same population samples. Compare & contrast:
The PCA is looking at between population variation in totality. So naturally the Dagestani component isn’t going to be predictive of that. Rather, it speaks to the possibility which Dienekes is mooting: that the Dagestani component spread in the India subcontinent with the Indo-Aryans specifically, overlying the local resident substrate. In South India this meant that Brahmins brought this, mixing with the indigenous Dravidian population. In Pakistan the Indo-Aryan, and Iranians, were overlain on a substrate which were the ancestors of the Burusho and Brahui. The dominant signal of genetic relationship has to do with the substrate, not the Indo-Aryans. So that’s what’s going to show up on the PCA. In other PCA plots the model where South Indian Brahmins are a linear combination of a Pakistani-like population and a Dravidian population becomes clearer. But when you look at ancestry using something like ADMIXTURE you have the potential to tease apart different components, and so uncover relationships which may have been obscured when looking at aggregate variation.
Dienekes’ model seems to post three steps in rapid succession ~4,000 years ago. A background variable which must be mentioned is that one must account for the Mitanni, a dominant Syrian power circa 1500 BC where a non-Indo-European language was the lingua franca, and yet a definite Indo-Aryan element existed within the elite. Indo-Aryan specifically because the Indo-European element within the Mitanni was not Iranian, but specifically Indo-Aryan. An easy explanation for this is that the Indo-Aryan component of the Indo-Iranian branch of the Indo-European languages crystallized outside South Asia, and independently reached Syria and India. In Syria it went extinct, while in India it obviously did not. By Dienekes’ model the Mitanni would be rather closer to the urheimat of the Indo-Aryans.
An aspect of his model which I do not understand is why it has to be Indo-Aryan, instead of Indo-Iranian. The South Asian population which the Dagestani component is modal, the Pathans, are Iranian, not Indo-Aryan. Additionally, this model seems to not speak in detail to the existence of the Dagestani element among Europeans. Here is a sorting of European populations (with Iranians included) by the Dagestani component:
There is here a strange pattern of rapid drop off from the Caucasus, and a bounce back very far away, on the margins of Germanic Northwestern Europe. This to me indicates some sort of leapfrog dynamic. A well known illustration of this would be the Ugric languages. The existence of Hungarian on what was Roman Pannonia is a function of the mobility and power of Magyar horseman, and their cultural domination over the Romance and Slavic speaking peasantry (their genetic impact seems to have been slight). No one believes that Germanic languages are closely related to Indo-Aryan (rather, if there is structure in Indo-European beyond Indo-Iranian, Celtic, etc., it would place the Indo-Iranian languages with Slavic). So what’s going on? I think perhaps the Dagestani component is part a reflection of the common Indo-European origin in that region. For whatever reason that signal is diminished in much of the rest of Europe. Perhaps Southern Europe was much more densely populated when the Indo-Europeans arrived. Additionally, it seems highly likely that in places like Sardinia, much of Spain, and Cyprus, Indo-European speech came through cultural diffusion (elite emulation) and not population movement. Or perhaps we’re seeing the vague shadows of population admixtures on the Pontic steppe, where distinct Germanic and Indo-Iranian confederations admixed with a common North Caucasian substrate.
Going back to India, let’s revisit the model of a two-way admixture between “Ancestral North Indians,” who were genetically similar to Europeans and West Asians, and “Ancestral South Indians,” who were closer to, but not very close to, East Eurasians. The ANI & ASI. The ASI were probably one of the ancient populations along the fringe of southern Eurasia, all of whom have been submerged by demographic movements from other parts of Eurasia over the past 10,000 years, excepting a few groups such as the Andaman Islanders and some Southeast Asian tribes. The model was admittedly a simplification. But taking that model as a given, and accepting that the Dagestani element is in indeed Indo-Aryan, we can infer that the ANI were not Indo-European. It is notable that the South Indian Brahmins have elevated fractions of both the Brahui and Burusho modal components. This is probably indicative of admixture of the Indo-Aryan element in the Indus Valley, prior to their expansion to other parts of India. I assume one of the languages spoken was Dravidian, though if ancient Mesopotamia was linguistically polyglot at the dawn of history I would not be surprised if the much more geographically Indus Valley civilization was as well.
The irony is that today when someone refers to a “Dravidian” physical type, they’re not talking about someone who looks like a Pakistani. They’re talking about someone who looks South Indian, where most Dravidian languages are spoken. But combining the inference from Dienekes’ model and the previous two-way admixture model, you reach the conclusion that lighter skin and more West Asian features among South Asians may be more due to Dravidian-speaking ancestors in the Indus Valley, not Indo-Aryans! It goes to show the wisdom of differentiating linguistic classes from biological ones when discussing historical population genetics. Unfortunately wisdom most of us interested in these topics do not show, alas.
As I like to say, interesting times….
Note: If you leave a comment, please don’t be smarter-than-thou in your tone. I have stopped publishing those sorts of comments because the reality is that most of them have not been that smart or informed. At least by my estimation. If you actually are smarter than the average-bear, and impress me with your erudition and analysis clarity, I’ll probably let your comment through no matter your attitude. But I wouldn’t bet on it if I were you, so show some class and humility. Most of us are muddling through.
Image Credit: Georges Biard, iStockPhoto