Razib Khan One-stop-shopping for all of my content

May 14, 2020

The deep origins of the Han Chinese

Filed under: East Asia,Historical Population Genetics — Razib Khan @ 10:25 pm

A new paper came out today on ancient East Asian DNA. More precisely, this work focused on early and late Neolithic samples from China, especially the lower Yellow river basin (north-central China) and the Fujian in southeast China. A major result can be boiled down to the Admixturegraph to the right.

The first ancient DNA out of East Eurasia was that from Tianyuan cave near modern Beijing. As you can see that individual is basal to other ancient (and modern) East Asians. That is, it isn’t representative of the ancestors of modern East Asians. But, the Tianyuan individual was already closer to modern East Asians than West Eurasians. Since the Tianyuan individual is ~40,000 years old, that means the bifurcation between eastern and western Eurasian groups predates 40,000 years ago.

This is not a surprising result, as the bifurcations between various “eastern” Eurasian groups (e.g., the ancestors of the Andamanese and East Asians) date to close to 50,000 years ago. The separation from Western Eurasians had to have happened after ~55,000 years ago since that’s about when the common shared Neanderthal admixture occurred.

The graph also shows that some ancient West Eurasian ancestry did come into the ancestors of East Asians through Siberians. More precisely, the Paleo-Siberian populations (replaced more recently by Neo-Siberian groups) had some ancestry from Ancient North Eurasians, who themselves were ~70% West Eurasian in ancestry (the other ~30% being a deeply basal East Eurasian). These Paleo-Siberians contributed ancestry to many northern East Asian groups, and likely explain the affinity between these groups and the Mal’ta-related individuals.

Finally, most of the edges show the separations between northern and southern East Asians and differences between inland and coastal populations. Though there is a deep distinction between northern and southern groups, the paper makes it clear that there is gene flow between coastal groups. This may explain affinities between the Japanese and Koreans, and peoples in southern China.

In terms of broad dynamics, one pattern that is evident, and repeats what we see all across Eurasia, is that the more recent periods seem to have undergone some level of panmixia. Ancient samples from northern and southern China are well differentiated, with pairwise Fst of around 0.04. Modern individuals sampled from these regions are closer to 0.02. Part of this is due to a significant expansion of “northern” ancestry at the expense of “southern”. But there is also some flow northward of “southern” ancestry. Though not highlighted in this paper because they lacked the samples, the movement throughout the Chinese Empire over the last 2,000 years is surely mediating this. In instances of famine or war resulting in depopulation in a province, the Chinese central authorities routinely encouraged migration from overpopulated provinces (modern Sichuan was repopulated from Hunan after a series of wars during the Ming-Qing transition). After 800 AD the demographic center China was in the Yangzi river valley, and south.

Unsurprisingly, the authors find that the southern samples from Fujian seem most similar to Austronesians. Today no one from these regions is “pure” southern. Rather, they are a mix. The Austronesians migrated out early enough that they carry southern East Asian ancestry exclusively. This recapitulates a common phenomenon where the ancestral “homeland” of a given group changes over time, reducing the ability to infer origins (e.g., the percentage of “Middle Eastern” ancestry in Southern Europe was underestimated because Anatolian farmers were partially replaced in Anatolia by migrants from the east).

There are also details in the supplements which confirm earlier inferences. For example, the Tianyuan individual has affinities with the Goyet Aurginacian sample from Belgium which dates to 35,000 years ago. But other East Asians do not. This seems to imply that Tianyuan was much more closely connected to a population that had trans-Eurasian affinities (another possibility is ancient structure, but the bifurcation between eastern and western Eurasian populations was more than 15,000 years before the time of Goyet so I am skeptical). Additionally, they also detect possible gene flow into Mesolithic Europeans from a population with East Asian ancestry (one possibility here that doesn’t seem to be explored is shared Ancient North Eurasian ancestry into both groups).

What is the overall takeaway? I think this confirms the other early papers that East Asia exhibits more continuity with its past that Europe and South Asian, rather like West Asia. While Europeans and South Asians have substantial ancestry from profoundly intrusive groups during the Holocene, the Han Chinese are in many ways “sons of the soil.” They did to some extent marginalize and absorb many other peoples in the modern area of “China proper”, and are themselves as a compound of two ancestral streams, but at the end of the last Ice Age, more than 90% of their ancestors were living within the boundaries of China proper.

More generally, modern imperial polities are exactly what some of their critics accuse of them of being: panmixia machines. Pre-state people were more genetically differentiated across local spatial scales. This seems the case everywhere there are good transects.

March 26, 2020

Version alpha of trying to understand East Asian population history is now out!

Filed under: Ancient DNA,East Asia,Han Chinese,Historical Genetics — Razib Khan @ 12:58 am

We’ve been waiting for ancient DNA to answer some questions about Eastern Eurasia for a while. I always thought Qiaomei Fu would spearhead it, but it doesn’t seem like it worked out that way. That’s bcause she’s not on a new preprint, The Genomic Formation of Human Populations in East Asia, which fills in a lot of gaps and confusing aspects of what has been reported from fragments of publications before (e.g., this clarifies a lot of things with Japan, see below). Since there has already been ancient DNA work on eastern Siberia and Southeast Asia, this is really focusing on the area in and around what is today the Peoples’ Republic of China. The first author has an affiliation with a university in Fujian.

Much of the analysis can be understood as language families. In this way, it goes back to L. L. Cavalli-Sforza’s correlations between gene trees and language trees, as well as his work on the agricultural Diasporas.

First, there isn’t something radically surprising here. As I suggest above, the mass of ancient DNA in the preprint and model-building just snap together a lot of what you can see in other work, some going back decades.

Let’s start with the “Onge-like/related ancestry. ”

Below you see the strange pattern of Y chromosomal haplogroup D. It’s common in Tibet, Japan, and among the Andamanese.

In the preprint the authors argue that there is a deep division among East Eurasian populations, going back further than 40,000 years, between a set of populations descended from groups related to Tianyuan man, and populations with affinities to the indigenous peoples of southeast Eurasia and Australia. Modern populations in East Asia can be thought of as a mix between these two groups, in various pulses and waves. The finding that some peoples in the Amazon had “Australo-Melanesian” affinity is very strange, but note that there’s no guarantee that the geographic distribution of the two clades was so skewed in the past.

The Onge-related ancestry is apparently found as the deepest layer in the Tibetan plateau and contributes 45% of the ancestry to the Jomon of Japan. Among ancient proto-Austronesian peoples of Taiwan, it contributed 14% of the ancestry. Earlier work on Southeast Asia indicated that even before the expansion of Austro-Asiatic farmers out of southern China they mixed with a basal East Eurasian lineage related to the Onge.

Chinese annals record the presence of dark-skinned peoples in Yunnan nearly into historical periods. These could very well be legends or rumors, or, they could be the last relic populations that had not been fully absorbed into the Tianyuan-descended farmer expansion.

Moving more recently into the past, the preprint findings that of the Tianyuan descended populations in East Asia there is a northern and southern grouping. The northern grouping has been discussed before, it is the classic Amur-river valley population. It turns out that a sample from 5,000 years ago in northern Shaanxi, just to the north of the hearth of classical Chinese civilization in Henan, resembles these Amur-river valley populations. Though the authors don’t have samples from southern China, or even the Yangzi, they use modern samples from southern Chinese peoples, as well as ancient samples from Taiwan, to infer that it is likely that the Yangzi river valley was inhabited by a somewhat different group during prehistory.

In the preprint, the argument is made that Austronesian, Tai-Kadai, and Austro-Asiatic all emerged out of the Yangzi valley and its rice cultures. As noted above, other papers have already outlined the peopling of Southeast Asia using ancient DNA, so I will ignore that. But, note that for Austro-Asiatic populations, ~1/3 of the ancestry is Onge-related. Some of this was mixed in while in southern China, but some of it probably accrued later on in Southeast Asia. Modern Austro-Asiatic populations can then be thought of as a compound of Tianyuan, and various  Onge-related groups.


The modern Han Chinese seem to be a fusion of the two idealized ancestral populations:

No great surprise. The Han have more of an affinity for northern East Asian populations than southern ones, with those in the south having more of an affinity for southerners than those in the north. A simple model might be expansion out of Shaanxi and Henan across a landscape with many southern agriculturalists. But that makes us ask: why is there “southern” ancestry among many northern Han today?

I think the explanation is that the expansion of the Han was characterized by reversals, as well as panmixia induced by political unification. Let me outline this explicitly:

– proto-Han identity is focused around Henan and Shaanxi between 2000 BC and 300 AD. As this culture expanded into the margins of the Yangzi and into Sichuan, it absorbed “southern” ancestry (as well as elements of culture).

– During the Han dynasty, 200 BC to 200 AD, the Chinese colonized portions of the far south, and aspects of panmixia occurred, as individuals moved across China north to south and vice versa

– The fall of the Han dynasty after 200 AD saw North China come to be ruled by “barbarians”, usually of Turkic provenance. South China maintained classical Han culture and political forms without external influence. Many northern families moved south between 200 AD and 600 AD. Many barbarians “became” Han, and mixed into the population. I believe this is when the 2-4% “West Eurasian” started to become prevalent in the north. This western ancestry was mediated through Turkic groups who were predominantly Siberian or Amur-river valley in ancestry. R1a1a is found in North China, so I believe that this ancestry is from Iranian groups absorbed into the Turco-Mongol populations.

– The reemergence of an integrated China after 600 AD sees the shift of the center of gravity of the Chinese economy move to the center and south, in particular the Yangzi river valley. Movement northward of South China repopulating areas that had been uninhabited moves “southern” ancestry north. Most of the population growth in the south is endogeneous, and not due to migration. There is very little to no West Eurasian ancestry in the south, as one might expect if large numbers of North Chinese moved south (the exception are probably the Hakka, who are known to be Northerners).

– There are still ethnic minorities in the South. Over the past 1,000 years, they have slowly been Sinicized and assimilated in many areas, so the proportion of “southern” ancestry in places like Guangdong has increased in part through such processes.


The Japanese are not entirely surprising. Using a two-way model with Han or Korean vs. Jomon, the Japanese are about 85% the former and 15% the latter. The proportion is a bit higher for Korean. The reason is straightforward: the Yayoi rice farmers probably derived from the Korean peninsula. Even into the edge of history Japan and the Baekje kingdom of Korea had close relations.

The interesting thing about Japan is this is an area where agriculturalists nearly overwhelmed the indigenous population, albeit absorbing them. The Jomon culture is unique because it was a sedentary hunter-gatherer society that also used pottery extensively. Previously analysis of Jomon remains produced “strange” results. In this preprint the authors give a good explanation of why: the Jomon are an even mixture of a population descended from the Onge-related clade and another one that is closer to the Amur river valley Northeast Eurasian populations, who descend from Tianyun.

Basically the Ainu are a fusion of a Siberian group, and, a population that has affinities with those indigenous to Southeast Asia before the arrival of agriculturalists. Before genetics archaeologists and anthropologists argued about the Ainu affinities. Despite sometimes looking “European” early blood group analysis quickly established an eastern affinity, but morphology and culture suggested connections to Siberia or Australia. The Australian Aboriginals descend from one of the Onge-related groups to a great extent, so the affinities are now intelligible.


Tibetans seem to be mixed between a small proportion of Onge-related, a larger proportion of an East Asian population descended from Tianyuan and closer to the Amur river valley groups than “southern” rice farmers, and finally a population similar to the Han. The latter mixed with the fusion of the first two ~3-4,000 years ago. This makes intelligible the “Sino-Tibetan” language family, whose validity I’m not clear on. But the linguistic affinity might date to this period.


This is the portion that is somewhat “controversial.” In Mongolia, they find that there was the arrival of an early western group, the post-Yamnaya Afanasievo, about 5,000 years ago. They flourished in and around the Altai. They are genetically almost exactly with the Yamnaya. Then, at some point in the Bronze Age, this group was totally replaced by another much more like the Sintashta-Andronovo. These groups were similar to the Yamnaya, but ~30% of their ancestry is like “European-farmers.” The conjecture you can make here is that there was reflux from Europe that came back onto the steppe. These were almost certainly Iranian.  This second wave clearly contributed much of the western ancestry into Mongols, judging by the high fraction of R1a1a-Z93 in the Altai.

But, the more intriguing aspect is south and east in Xinjiang, overlapping the zone occupied by the Indo-European Tocharians, the populations remained similar to the Afanasievo, albeit mixing with East Eurasian groups over time. The implication then is that the authors have “pegged” a separation date from the Tocharian Indo-European branch from the others, about ~5,000 years ago. Aside from Anatolian (e.g., Hittite), Tocharian is often seen to be the most basal.

Later Xinjiang also saw the arrival of Iranians. The western and southern oases of Xijiniang were Iranian, while the northern and eastern ones were Tocharian.

Genetic admixture:

They find that over time genetic distance between populations in East Asia declines over time. This is analogous to what happened in Western Eurasia.

This might be a generalized process, but I think there’s a specific thing driving this: the rise of the Chinese state-polity. Not only did the Han expand and absorb, but there was gene flow to neighboring groups. It is well known that Han Chinese have been moving into Vietnam, and assimilating, for 2,000 years. Similarly, many Han in the north have been known to “go barbarian.”

August 10, 2019

The Neolithic roots of modern East Asian human geography

Filed under: East Asia,Historical Population Genetics — Razib Khan @ 2:32 pm

Because of the long and thorough tradition of Chinese historiography, we have a good and deep chronological record of East Asia going back two to three thousand years ago. Chinese records also help illuminate and clarify aspects of Japanese, Korean, and Southeast Asian, history. For example, what we know about the Indianized kingdom of Funan in eastern mainland Southeast Asia is from textual sources are Chinese.

But, history can take us only so far. We know this for Western Eurasia, where ancient DNA has revolutionized our understanding of Holocene transformations. Unfortunately, we don’t have that much ancient DNA from East Asia. So we still have to make recourse mostly to modern data. A new preprint proposes to use a lot of modern (and some ancient) data to answer a very specific question, Inland-coastal bifurcation of southern East Asians revealed by Hmong-Mien genomic history. The basic results are totally unsurprising:

Consistent with the two distinct routes of agricultural expansion from southern China, this Hmong-Mien founding ancestry is phylogenetically closer to the founding ancestry of Neolithic Mainland Southeast Asians and present-day isolated Austroasiatic-speaking populations than Austronesians. The spatial and temporal distribution of the southern East Asian lineage is also compatible with the scenario of out-of-southern-China farming dispersal. Thus, our finding reveals an inland-coastal genetic discrepancy related to the farming pioneers in southern China and supports an inland southern China origin of an ancestral meta-population contributing to both Hmong-Mien and Austroasiatic speakers.

More interesting to me is the admixture graph to the right. It uses a bunch of ancient and modern populations to model ancient and modern populations. You can see some general patterns and suggestions of what might come out fo ancient DNA.

For example, the green component is defined by the Hoabinhian samples. These are the people who are distantly related to the Andaman Islanders, and occupied Southeast Asia before the arrival of rice farmers. They are distantly related to “Ancient Ancestral South Indians” (AASI) as well. It is unsurprising that this component is well represented in a Munda tribe (Kharia) from northeast India, or in Austro-Asiatic people of Southeast Asia. But notice that it is well represented in the Jomon of Japan, and modern Tibetans.

If you read the preprint, the authors clearly don’t think that this is Hoabinhian ancestry as such. Rather, the model is looking for something very basal (distant) from other East Eurasians, and Hoabinhians fit that (and are somewhat closer to this basal group). This is probably the same phenomenon of “Australo-Melanesian” ancestry in the Amazon. Curiously, Y haplogroup D is found in Tibet, Japan, and the Andaman Islanders.

The largest group in East Asia are Han Chinese and can be modeled as an admixture of the ancient Northeast Asian Devil’s Gate Cave people and modern Ami Taiwanese aboriginals (Austronesians). This is basically a north-south cline. One doesn’t need to posit obviously that the modern Han is truly a mix of these two groups, but rather that Han identity emerged out of a synthesis of various Neolithic groups with differential affinities to these two groups.

Two ancient samples give a good picture of how these groups are related to West Eurasians. The Afanasevio was almost exactly like the Yamnaya. The Namazga sample comes from ancient prehistoric Khorasan, on the border of modern Iran and Turkmenistan. These two samples do have some affinities with each other. Both have ancestry that related to or derived from “Ancestral North Eurasians” (ANE) and “Caucasus Hunter-Gatherers” (CHG), with the Yamnaya having more ANE and Namazga more CHG. But the Yamnaya also had affinities with “Western Hunter-Gatherers” (WHG) that Namazga lacked. You see that the Kharia has affinities to Namazga, but not Afanasevio. This is not surprising: the Munda tribes of Northeast India seem almost untouched by Indo-Aryan influence (they are entirely lacking in R1a1a, which is found in South Indian tribals). Rather, they mixed with Indian populations which were impacted by migrations of farmers from West Asia.

The proportion of Afanasevio and Namazga are illustrative of particular historical dynamics. Mongols and Xiongnu (ancient) had some connection to the Afanasevio. This is almost certainly Indo-European (probably East Iranian) contact. In contrast, the Hui, Chinese Muslims who are mostly no different from Han aside from religion, have contributions from both Afanasevio and Namazga. This is a strong indication that Hui do have more recent Central Asian (Muslim) ancestry, while Mongolians do not. The increase in Namazga ancestry across Central Asia is probably a function of the rise of Persian and Islamic polities, and the movement north of agriculturalists. The shift to Turkic dominated polities integrated Turan with the rest of the Islamic steppe, which happens to exclude the Mongolians.

It is also interesting that the Thai have more Namazga than Khmer. This is strongly suggestive of a large contribution of Indian ancestry to the Dvaravati culture (the enrichment for Devil’s Cave in the Khmer is probably due to the reality that a few of the HGDP samples seem to be mixed with Chinese), though it could be more recent admixture from India. Note however that the Mon people of Burma seem to have more Indian ancestry, and were often associated with Dvaravati.

Finally, the authors point out that the read southern Northeast Asian component is now common in peoples like the Koreans and Japanese. A clear indication of the spread of farming from southern people, as well as the likely later demographic impact of the expansion of the Chinese state and its spillover impact on Korea.

Powered by WordPress

Do NOT follow this link or you will be banned from the site!