Razib Khan One-stop-shopping for all of my content

December 21, 2010

Not misunderstanding the past requires suspicion

In my post on African farmers someone responded:

It was famously reported last winter that Bushmen seem to differ genetically amongst themselves more than Europeans and Asians do. These two latter groups have been separate for at least 40,000 years.

At least? Razib, you are way off on the separation time of Europeans and East Asians. I think it’s much closer to 30,000 years at most. There is growing evidence that ancestral Europeans and ancestral East Asians were one and the same people until 22,500 years ago.

Present-day Europeans and East Asians descend largely from a small nomadic population that once roamed Eurasia’s northern tier—a belt of steppe-tundra that stretched from southwestern France to Beringia during the last ice age. This population then split in two around the time of the glacial maximum (Rogers, 1986; Crawford et al, 1997). Chronologically, this barrier to east-west gene flow matches the dating by Laval et al. (2010) of the split between ancestral Europeans and ancestral East Asians.

The italics are my words, emphasized by the commenter. The bolding is the commenter’s as well (I had to fix some HTML in that comment, but I think I corrected in line with the commenter’s intent). I read (and blogged) the paper cited, so I’m well aware of the low bound value implying more recent common ancestry of East and West Eurasians posited here. I’m moderately skeptical. Part of the issue is that these sorts of computational models are tricky, and most of us aren’t versed in the various moving parts which go into constructing the model. Consider the following from the cited paper:

We tested different evolutionary models…that allow different levels of introgression of archaic hominids to modern human populations. We assumed an early diffusion of archaic hominids (Homo erectus) out of Africa ~1.25 and ~2.25 million years ago…various ancestral migration rate intensities (m0, ancestral migration rate is the proportion of migrants before the Out-of-Africa exodus) and an African exodus of modern humans between ~40,000–100,000 years ago…By tuning the replacement rate δ, we then simulated scenarios that consider different levels of replacement of archaic hominids by modern humans (i.e. different levels of introgression of archaic material into the modern gene pool), including the most extreme cases of complete (δ = 1) and no replacement (δ = 0) as well as several scenarios with varying intermediate levels of replacement…The summary statistics were calculated by merging all population samples (except for global FST) in order to minimize the effects of recent demographic events related to the continental populations. We thus considered in all models a constant size for the three modern human populations. The model with residual ancestral migration rate (m0~10−10) and full replacement (δ = 1) clearly better fitted our data than any other model…highest ψ1, the ψ1 of this model is significantly higher after correction for multiple testing when compared with the other ψ1 values, P<0.01). However, we could not discern between a complete (δ = 1) and an almost-complete (δ≥0.99) replacement of archaic hominids (difference between ψ1 is not significant for this pairwise comparison), indicating that a small contribution of archaic humans to our present-day genome cannot be completely ruled out….

This part of the paper produced predictions which are likely false. The highest probability scenario derived from their model is “full replacement,” which was close to falsified in less than a month after the publication of the paper (though until more data comes in we should still obviously include in some wiggle room for error in the results which indicated Neandertal admixture). But even the second highest probability model, 0-1% archaic admixture, is also probably false. Neandertal admixture is estimated between 1-4%, with the probability naturally that it will be within the interval, not on the margin (I am to understand that future research will clarify the fraction, and it is nearly the midpoint, not along the edges of the distribution). Obviously that does not necessarily falsify the contention that Europeans and East Asians share ancestors ~20,000 years BP, but, it should make us cautious of putting too much weight on models which are likely sensitive to a range of inputs when it comes to putative demographic models. I assume that a very similar model with some parameters fine-tuned could align appropriately with the more recent findings which confirm admixture via ancient DNA. But what’s the utility of such models post facto?

eurasianoriginA bigger issue when it comes to positing relatively recent common ancestry of Eurasians has to do with the fact that in many analyses (though not all) it is clear that East Asians are closer to Oceanians than Europeans. When it comes to Oceanians, Papuans, Melanesians, and Australian Aboriginals, the archeology is nicely clear and distinct. It looks as if Sahul was settled ~50,000 years ago. The assumption currently is that the modern populations of the highlands of Papua and the Australian Aborigines are descendants of these populations. This means that Oceanians have been isolated from other non-Africans for ~50,000 years, excluding recent admixture (e.g., Austronesian influence, European admixture among Aborigines). By intuition then East Eurasians and West Eurasians should form a clade with Oceanians as an outgroup, and yet this is not what we always see (though we do see it sometimes). The above is a visualization of Fst distances from the recent paper on Australian Aborigines. It is likely that you are seeing more than simply time to last common ancestor between the groups. Some of these populations have been isolated, and likely gone through population bottlenecks (both Amerindians and Oceanians).

abofig2To the right is a PCA from the same paper. It’s the standard one you see of HGDP populations, along with the addition of Aborigines, who clearly have a great deal of admixture with Europeans. Rather, focus on the Papuans. Remember that you’re capturing only the two largest components of variation in the HGDP set. But the PCA 3 and 4 show that Eurasians and Africans cluster against Oceanians and Amerindians, who are distributed orthogonally with respect to each other. I personally wouldn’t read too much into the Papuan position closer to Europeans than East Asians, though I don’t think it is admixture with Europeans in this case. Let’s look at this in another way.

Below is a standard STRUCTURE bar plot from the same paper, K = 2 to K = 8. The populations on the far right on Oceanian, Aborigines, Melanesians and Papuans. On the far left Africans, then East Asians, then Europeans (I removed South and Central Asians):


Again, you have to interpret these results with care. East Asians, Amerindians, and Oceanians tend to separate off from Eur-Africans first. Later you see the obvious European admixture among Aborigines, as well Austronesian admixture among the Melanesians. But still observe that the genetic variation among Oceanians and Eurasians tends to split off from Europeans first as you ascend up the K’s.

Now let’s look at another paper, The Genetic Structure and History of Africans and African Americans. Two figures.



As they say, curiouser and curiouser. The second figure might actually be consistent with a branching off of Oceanians far earlier than Europeans + East Asians. But look again at the Amerindian branch. All the research currently suggests they’re derived from Siberian groups. Their genetic distance is probably a function of population bottlenecks and/or lack of genetic exchange with populations on the World Island. One issue is that this second paper has as inputs a wider range of African populations. Here’s a slice of the STRUCTURE results from this paper:


Finally, a tree from Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation:


I kind of saturated this post with charts, but the reason is that these sorts of charts always run through my head whenever I’m trying to evaluate a historical or archaeological hypothesis. Obviously this doesn’t prevent flubs. I so internalized the model whereby non-Africans are a branch of Africans that I neglected to consider the variant genetic distances of non-Africans from Africans, even excluding those with obvious recent admixture from Sub-Saharan Africa such as the Mozabites. Additionally, you always have to account for local genetic peculiarities due to isolation which might deceive as to the time of the last common ancestor between populations because of different evolutionary parameters on some branches of a lineage. And obviously the Neandertal admixture story combined with other confusing results from ancient DNA should make us exceedingly cautious about how rock solid the priors are which we use to frame our probability calculations for any given assertion.

So back to Eurasians and Oceanians. The key for me is that I do believe that archaeologists are correct in pegging the population of Sahul, New Guinea and Australia, to 50,000 years B.P. I also have modest confidence that the populations of the highlands of New Guinea and the non-Eurasian component of Aborigine ancestry derive back to this period of first settlement. Looking at the set of trees above I am simply not confident that East Asians and Europeans separated as half as long ago as they did from Oceanians. One part of the solution though could be admixture. I modestly accept the proposition that South Asians are a hybrid population, and their position between West Eurasians and East Eurasians, but closer to the former, has to do with admixture between and ancient substrate and a West Eurasian population which was intrusive to the subcontinent within the last 10,000 years. A similar model may apply to East Eurasia, where these populations are a compound of an ancient group which was distantly related to those of Sahul, and an intrusive group from the fringes of Siberia. The relative closeness of East Asians to Oceanians (in particular, Papuans, who do not seem to have Austronesian admixture) can then be explained by this common ancient ancestry during the Paleolithic. The current pattern of East Asian variation south of the Amur may be due mostly to a demographic expansion of farmers from a locus of agricultural innovation somewhere in northern China.

Of course there are plenty of models one could construct verbally, and find some support for the literature. My own preference, though only weakly at that, would be a separation of West Eurasians and all other non-Africans ~40,000 years ago. Then a separation of Oceanians from all other non-West Eurasian non-Africans. Then the separation of Amerindians from the East Eurasians. And then finally an expansion of a subset of West Eurasians and East Eurasians during the Holocene which replaced by and large the Southeast Eurasian population which spanned India to the fringes of Sahul. This does not negate the possibility (likelihood in fact) of gene flow around the northern fringe of Eurasia.

But at the end of the day these are the days for humility and caution. Naturally in the course of writing I will forget this on occasion, but the response is not to refute and assert boldly ex cathedra one’s own position.

September 17, 2010

Of Iran, Turan, and Turks

uzbekmanThere’s a new paper out in The European Journal of Human Genetics which is of great interest because it surveys the genetic and linguistic affinities of two dozen ethno-linguistic groups from the three Central Asian nations of Uzbekistan, Kyrgyzstan, and Tajikistan. This is what the Greeks referred to as Transoxiana, and the Persians as Turan. Originally inhabited by peoples with close cultural affinities with those of Persia, indeed, likely the root of the peoples of Persia, by the historical period Turan developed a distinctive identity as a frontier or march. It was in Turan where the Turk met the Iranian (a class which included non-Persian groups, such as the Sogdians), from the pre-Islamic Sassanians down to the present day. It is a region of the world which has a very ancient urban culture, cities such as Merv, as well as peoples that were only recently nomads, forcibly made sedentary by the Soviet regime.

To add another twist to the picture many of the ethno-linguistic groups which we are familiar with today and which serve as the cores of the new Central Asian nations only came into being within the last few centuries, with a particular “push” from Russian Imperial and Soviet ethnologists who were tasked with fleshing out national identities with which the center could negotiate. A “Tajik” is after all simply part of the Persian-speaking residual population of Central Asia, spreading down into Afghanistan. The carving out of an independent Tajikistan out of the Central Asian landscape is as much a creation of the modern age as the state of Israel. The “Uzbek” identity was once simply that of the ruling caste of Transoxiana who came to power after the decline of the Timurids. Today it is an appellation which brackets the settled Turkic speaking peoples of Uzbekistan and beyond.

ResearchBlogging.orgInto this near Gordian knot of history and ideology walk the naive and well-meaning geneticists. There is no great objection one can make to the genetics within the paper, but the historical framework and some of the assertions are peculiar and tendentious indeed. It’s a problem which starts within the abstract. In the heartland of Eurasia: the multilocus genetic landscape of Central Asian populations:

Located in the Eurasian heartland, Central Asia has played a major role in both the early spread of modern humans out of Africa and the more recent settlements of differentiated populations across Eurasia. A detailed knowledge of the peopling in this vast region would therefore greatly improve our understanding of range expansions, colonizations and recurrent migrations, including the impact of the historical expansion of eastern nomadic groups that occurred in Central Asia. However, despite its presumable importance, little is known about the level and the distribution of genetic variation in this region. We genotyped 26 Indo-Iranian- and Turkic-speaking populations, belonging to six different ethnic groups, at 27 autosomal microsatellite loci. The analysis of genetic variation reveals that Central Asian diversity is mainly shaped by linguistic affiliation, with Turkic-speaking populations forming a cluster more closely related to East-Asian populations and Indo-Iranian speakers forming a cluster closer to Western Eurasians. The scattered position of Uzbeks across Turkic- and Indo-Iranian-speaking populations may reflect their origins from the union of different tribes. We propose that the complex genetic landscape of Central Asian populations results from the movements of eastern, Turkic-speaking groups during historical times, into a long-lasting group of settled populations, which may be represented nowadays by Tajiks and Turkmen. Contrary to what is generally thought, our results suggest that the recurrent expansions of eastern nomadic groups did not result in the complete replacement of local populations, but rather into partial admixture.

In my initial comment on this paper in a link round-up I wondered what the authors were thinking making such a comment: anyone who knows Central Asians would see on their faces that the Turks did not completely replace the local populations. The image above is of an Uzbek man, who does not exhibit any visible “Mongolian” features. This is not the norm, but is not unheard of. Even populations which are presumed to have less Iranian admixture, such as the Kazakhs, exhibit a range of physical types. It would be one thing if this reference was an isolated peculiarity, but there are other comments within the paper which indicate to me that the research group’s familiarity with the non-genetic literature is cursory at best. They refer to Huns as having “brought the East-Asian anthropological phenotype to Central Asia.” There is no clear definite foundation for this assertion. Unfortunately historians do not have a clear idea what the ethno-linguistic character of the Huns was. By the time Roman observers encountered them the Hunnic horde seems to have been predominantly German, with a Iranian (Alan) secondary component, the Huns themselves being a small elite (Attila’s name itself may be Gothic). In light of subsequent eruptions into Europe of Turkic and Ugric nomads it is easy to slot the Huns into this exotic category, but the primary literature makes it clear that you can’t ascertain their ethnic character from the contemporary sources (the “White Huns” of Central and South Asia had no real connection to the Huns of Europe).

Near the end of the paper they say something really peculiar: “The Westernized view of westward invasions usually emphasizes the extreme violence and cruelty of the hordes led by Attila the Hun (AD 406–453), or that from the Mongolian empire led by Genghis Khan. However, our results somehow challenge this view and rather suggest that these more recent expansions did not lead to the massacre and complete replacement of the locally settled populations….” It is true that European observers of the Mongol expansion did not have a sanguine attitude. But the idea that Mongols were genocidal exterminationists really comes to us via the Islamic historians, for whom the Mongol conquests were totally shocking and a literal world-turned-upside-down moment. The Mongol conquests did seem to result in a decline in population between Mesopotamia and Transoxiana. Whole cities in Central Asia were depopulated. There is an assumption that the Mongol conquests marks the turning point where Central Asia passed from being a predominantly Iranian world with a Turkic military elite (which was to be the nature of Iran proper until the 20th century) to a Turkic world with a large Persian minority. Though the military conquests of the Mongols were important punctuating events, I do not believe that scholars today would assume that they produced an ethnic shift in toto. On the contrary, the null hypothesis is generally against migrationism.

With those preliminaries out of the way, what’s going on with the genetics? Below are the less interesting tables & figures. The first is important because it has the abbreviations which they use. Basically anything that starts with a “T” are Indo-Iranian Tajiks, and everything else is Turkic, except LUzn LUza, who are Indo-Iranian Uzbek nationals, but I presume would be ethnic Tajiks in Uzbekistan (this stuff is really confusing in regards to labels, because as I said the national categories are to some extent ad hoc impositions on more ancient identities which don’t always follow the European language = nation formula). The second image is a figure which shows the sampling of locations, as well as pie charts with ancestral quanta. The third image is a table which shows that Indo-Iranians are genetically more varied than Turks. While the fourth is a STRUCTURE plot which I reedited to zoom in on peoples of interest for this study, as well removing some of the lower K’s. Remember that each K is a putative ancestral population. As Dienekes notes since they used only 27 microsatellite markers across their 26 populations, the plot may inflate minor ancestral contributions.

Of more interest is the correspondence analysis, which is conceptually similar to principal component analysis. The variate inputs are allele counts. I’ve obviously reedited the figure a bit, and added some labels (yeah, I ended up thinking that rotating after I’d added some labels was best, sorry). Note the clear color-coding of Turkic vs. Iranian Central Asian groups.


There’s a clear separation linguistically between Iranian speaking and Turkic speaking groups in Central Asia. Some of the Turkic groups are close to Iranian groups, closer than to other Turkic groups, but still the two broad sets have a coherent identity. Undergirding the linguistic variation is classical geographic variation. The eastern Turkic groups seem the least impacted by the Iranian substrate which was dominant before the arrival of Turks, while the Turcoman group sampled from western Uzbekistan seems to have been the most genetically “Iranized.” In a world wide context the central position of Central Asians is not surprising. Interestingly the Iranian groups of Central Asia seem to overlap rather well with the Indo-Iranian groups from the HGDP data set. In contrast, the Turkic groups are distributed along a linear axis from East Asians to the Iranian cluster. This is the same pattern evident among African Americans as individuals. It’s a two-way admixture, with different dosage degrees by population as a function of history and geography (I presume you’d see the same pattern if it was broken down on individuals with a SNP-chip).

admixMoving to the explicit admixture estimates, the labels leave something to be desired. The shaded area is for Turkic speakers. The very last group, TJY, indicates the Yagnobis of Dushanbe. I happen to know offhand that the Yagnobis are reputed to be descendants of the Sogdians, having preserved their language and Zoroastrian religion relatively late in history before switching to Tajik and Islam. Like many ethno-linguistic relics these people preserved their independent identity after the Arab conquest, which saw the decline of Sogdian influence on the Silk Road, by taking refuge in isolated regions. It is no surprise then that this group shows the least East Asian admixture of all the Iranian samples, as they were isolated from many of the social and historical processes which were operative in Transoxiana after the conquest by the Arabs, and the later pushing in of the zone of Turkic hegemony after the fall of the Samanids.

These admixture estimates definitely put the spotlight on the role of Central Asia as a nexus of sorts. In the archaeology and history it is clear that Central Asia has been affected by peoples of European, South Asian, Middle Eastern, and East Asian origin. Central Asia itself has been the mother of empires, famously the seat of Timur, but also the original base of what later became the Abbasid dynasty. At one point the Caliphate was split between western and eastern factions and there was a possibility that the capital would be relocated from Baghdad to the Central Asian city of Merv! I do not believe that the Arabs had a strong genetic impact, nor was there a large South Asian migration in recent periods into Central Asia. So the admixture estimates adduced for these groups may be due to the natural cline in allele frequencies which are found in different peripheral Eurasian populations. Frequencies which are naturally intermediate in Central Asia. The main caveat is that it is probable that local conditions will vary a great deal. In contrast we have strong reason to suspect that the East Asian component arrived relatively recently with the Turks, and we see that its aspect is most evident among the groups which were nomadic within living memory, the Kazakhs and Kyrgyz. These two ethnicities, which are really compounds of several tribes or “hordes,” were only marginally integrated into sedentary Islamic society where the Tajik element would be prominent (shamanism among many of these tribes only disappeared under the influence of the Islamic missionaries sponsored by Russian Empire). I think this pattern is reinforced by what we saw in the correspondence analysis, where the Turkic groups exhibited a linear distribution toward East Asia, while the Iranian ones were placed where you’d expect them geographically. Finally, I want to note that Dienekes observes that using South Asians as a Central Asian population source is strange since South Asia is more appropriately thought of as a demographic sink for Turan. True, but the HGDP populations are strongly biased toward groups with relatively little indigenous South Asian ancestry, with the Sindhi being the only Indo-Aryan speakers within the set. So I think that objection is mitigated by these factors. Rather, the Iranian-speaking Pakistani groups serve as proxies for the original Central Asian Iranian substrate, from which both they and the Tajiks presumably derive.

Moving back to the Turk vs. Iranian distinction, the authors note that the Turkic groups exhibit a strong degree of genetic homogeneity on the Y chromosomal lineages. This points to the possible manner in which the East Asian genetic element spread in Central Asia, not necessarily just through population displacement, but also through polygamy and the high reproductive fitness of particular “super-male” lineages. The children of elite Turkic men who took Iranian wives presumably adopted the culture of their fathers, including the linguistic identity. This may have been particularly easy in Central Asia because they did not have to repudiate their maternal heritage in totality, as Persian culture still had great status and currency. If we partition the ancestry into “East Eurasian” and “West Eurasian” components the Turkic groups have much more of the latter than the Iranian ones have of the former. That stands to reason as the Turks were newcomers, and an elite which the locals would wish to assimilate to if they had the opportunity. In contrast, the shift from Turk to Iranian may have been rarer, and a switch which individuals would wish to avoid since the latter did not have the same level of temporal power. Over ~1,500 years gene flow does occur between the groups, and even the Yagnobis have appreciable East Asian ancestry. Eventually the linguistic differences would probably be dwarfed by the geographical ones, but currently we’re taking a snapshot of a “transient.”

It’s complicated. And one has to be very careful about using terms like “Turk” in a localized context, vs. a more international one. The Turks of Turkey are overwhelming derived from the same source populations as their Balkan (because of Rumelian Turks), Iranian, and Armenian, neighbors. The decline in East Asian fraction is evident even in this sample, as the Turcomans from western Uzbekistan have the least eastern ancestry of any of the groups. But this paper is an excellent within into a critical geographical hinge of genetic variation and historical tumult (though one must set aside some of their tacked-on historical speculations).

Citation: Martínez-Cruz B, Vitalis R, Ségurel L, Austerlitz F, Georges M, Théry S, Quintana-Murci L, Hegay T, Aldashev A, Nasyrova F, & Heyer E (2010). In the heartland of Eurasia: the multilocus genetic landscape of Central Asian populations. European journal of human genetics : EJHG PMID: 20823912

Image Credit: Wikimedia

Powered by WordPress