October 7, 2017

The Tibeto-Burman and Austro-Asiatic ancestry of Bengalis

October 7, 2017

The Tibeto-Burman and Austro-Asiatic ancestry of Bengalis
My father’s mtDNA lineage phylogeography

When I first got my father’s 23andMe results the Y and mtDNA were an interesting contrast. He, and therefore myself, carried Y lineage R1a1a, the lord of the paternal lineages. That was not that great a surprise. In the 1000 Genomes results for the Bangladeshi sample 20% of the men were direct paternal descendants of the R1a1a progenitor.

The mtDNA was a surprise. It was G1a2. This was curious to me since Bangladesh has some of the highest frequencies in the world of haplogroups M, the subhaplogroups in question being mostly restricted to South Asia. I wasn’t surprised that I was R1a1a, but I was even more confident that my maternal lineage was going to be an M, as would my father’s (my own mtDNA is U2b, not common, but not so surprising). As you can see from the map 23andMe places my father’s maternal lineage somewhere in Northeast Asia. The only information I could get about the geography was for G1a, “G1a has been found in samples from China (Daur, Hui, Kazakh, Korean, Manchu, and a sample of the general population of the city of Shenyang), Japan, Korea, Vietnam, and Siberia (Yakut).”

The biggest sample of mtDNA results from Bangladesh I could find at N = 240 does not find any G at all, let alone G1a2. So this is clearly it is a rare haplogroup in the region. But, the authors do classify 13% of the Bangladeshis as carrying an “East Eurasian” haplogroup. Haplogroup A is found among Southeast Asians and Southern China, though not among Austronesians. Haplogroup F seems to have a similar distribution, as does D, B. The other haplogroups also seem “correctly” assigned in terms of modal distribution. They are all mostly East Asian.

Looking at the Y chromosome haplogroups in the 1000 Genomes there are two of O2 and O3, and one of C3, which are clearly of Southeast Asian origin. With N =5 out of 44 samples that is ~10%. O2 is interesting because it is found at very high frequencies among the Austro-Asiatic populations in South Asia, whether it be the Khasi, or Munda groups (general O2a). O3 seems associated with Tibeto-Burman populations, and C3 with East Asia more generally.

If you know much about the ethnolinguistic of South Asia you know that the two major language families are Indo-Aryan and Dravidian. But, there are other groups. In the northwest you have various other Indo-European speaking populations, and along the northern and northeast fringe, you have Tibeto-Burman languages being spoken. But most anomolous is the distribution of Austro-Asiatic languages. The most numerous Austro-Asiatic language in the world today is Vietnamese, followed by the language of the Khmers.

But there are numerous other Austro-Asiatic languages in Southeast and South Asia. The indigenous people of the deep forests of the Malay peninsula, including the Negritos, speaking Austro-Asiatic languages. As one moves west there are Austro-Asiatic languages in Burma, such as Mon, which used to be far more common. And in India there are two groups, the language of the Khasi of the northeast, which seems to share some affinity with the Palaungic dialects of interior Burma and southern China, and the Munda languages farthest west which seem very distinct from all the other branches.

The genetics seems to suggest that the Munda tribes do have East Asian ancestry, but it is almost totally male-mediated. Their Y chromosomal lineages are very unique, with high proportions of O2a, but their mtDNA lineages are overwhelming South Asian macro-haplogroup M. The Khasi of the hills north of Bangladeshi occupy a different position, with both maternal and paternal East Asian heritage, as well as much higher genome-wide ancestry that is not South Asian. At this point, I am convinced that the Austro-Asiatic language groups came into South Asia from the east to the west.

The other language family with East Asian connections in South Asia is that the of the Tibeto-Burmans. Unlike the Austro-Asiatic group, these peoples tend to occupy only the periphery of South Asia, the far north and east.

Finally, there are historically attested Tai peoples who migrated into South Asia. The most famous of these are the Ahoms of Assam. These were part of the same migrations ~1,000 years ago that led to the shift of Thailand from being a zone dominated by Mon and Khmer Austro-Asiatic peoples, to Tai peoples. In Burma, the Tai migrations resulted in the Shan states of the uplands, though the Burman and Mon polities were able to fight off the attempts at take over.

Ultimately the Ahom became totally Indianized. Their traditional language became relegated to ritual, and they adopted the Indo-Aryan Assamese language. Additionally, at some point, they converted to orthodox Hinduism. This became so much a part of their identity that by the 17th century were checking Islamic expansion to the east by defeating the Mughals.

All of this ultimately goes back to the question: how did my father get his mtDNA? If you read my post from a few years back, How did Bengalis get East Asian?, you will know that it is probably a mix of Austro-Asiatic and Tibeto-Burman ancestry. Can we say any more at this stage?

Some Austronesian data sets have come online. So I thought I’d give it another shot. Additionally, I spent several hours removing outliers and combining populations to generate a full data set. The number of markers was 195,000 SNPs.

Label N Notes
AA 17 Munda (outliers removed)
BD 74 Bangladesh, 1K BEB (outliers removed)
Borneo 31 Orang Asli tribes (outliers removed)
Burmese 20 Bamar ethnicity
Cambodians 39 Outliers removed
Dai 40
Han_C 47 Pooled Han from HGDP and 1K
Han_N 28 Pooled Han from HGDP and 1K
Han_S 29 Pooled Han from HGDP and 1K
Japanese 28
Malay 21
Miao 10
Phil 16 Luzon and Visaya
Phil_Highland 15 Igorot tribesman Luzon (outliers removed)
Telugu 34 1K STU (outliers removed)
Viet 18

I ran ADMIXTURE at K = 4 on the full data set.  Please to click on on the image if you want details, but the results are straightforward:

yellow = South Asian (modal in Telugu)

green = Northeast Asian (modal in Japan and northern Han)

navy = Southeast Asian/Austro-Asiatic (modal in Cambodians)

red = Austronesian (modal in Igorot tribesman from the highlands of the Philippines)

The two bottom population groups are Bangladeshis and Munda. You can see that all are mostly yellow. That is, they’re mostly South Asian. But the Munda have a much lower South Asian proportion than the Bangladeshis. This is not surprising. The Munda language and mythology is very distinct from other South Asians. Clearly, they have ancient East Asian connections, and this shows in their genome-wide ancestry.

But notice a difference between Bangladeshis and Munda: most of the Bangladeshis have a green component, which is in common among Northeast Asians, while none of the Munda do. The total fractions are 38% navy (Austro-Asiatic) for the Munda, and 7% each for navy and green (Northeast Asian) for the Bangladeshis.

The two components also exhibit a negative correlation in the Bangladeshis of -0.47. Why? My own suspicion is there is some population structure and clinal variation exists within Bangladesh. As I’ve noted before my parents are among the most East Asian of Bangaldeshis I’ve ever analyzed…and it is no surprise that we are from the east of eastern Bengal. In contrast when I’ve looked at genotypes from West Bengalis, they tend to have less East Asian ancestry, though still an appreciable amount in a broader South Asian context (in fact, even Bengali Brahmins have East Asian ancestry, though at smaller fractions).

This seems to be pretty clear rejection of the model where Bangladeshis are a two population mix of Munda tribesman, and a more conventional South Asian group.

Here are the average percentages by population:

Group Austro-Asiatic Austronesian South Asian Northeast Asian
AA 38% 0% 62% 0%
BD 7% 2% 84% 7%
Borneo 61% 38% 0% 0%
Burmese 29% 0% 23% 48%
Cambodians 73% 1% 15% 11%
Dai 49% 7% 0% 44%
Han_C 16% 5% 0% 79%
Han_N 1% 1% 2% 96%
Han_S 27% 7% 0% 66%
Japanese 0% 1% 2% 97%
Malay 64% 16% 13% 7%
Miao 24% 3% 0% 73%
Phil 34% 37% 6% 22%
Phil_Highland 0% 100% 0% 0%
Telugu 0% 3% 96% 0%
Viet 45% 7% 0% 48%

I’m 99% sure that “South Asian” is in some of these cases a proxy for anything that’s not East Asian. But the Malay and Cambodian results are probably South Asian. And the Burmese certainly are.

Click to enlarge the PCA plot to the left, but PC1 is South Asian to East Asian, PC1 is Northeast Asian to Southeast Asian.

Both the Malays and the Burmese exhibit a “South Asia cline.” This is due to admixture. But the Burmese project toward the position of the central Han, while the Malays are shifted toward a Southeastern Asian population.

Both the Bangladeshis and Munda samples are East Asia shifted, but the Munda sample clearly skews toward the Southeast Asian populations. The Bangladeshi samples do not seem to exhibit this clear pattern.

Then I ran Treemix with blocks of 1000 SNPs and no migration edges as well as global rearrangements turned on and rooted with the Telugu.


The results are absolutely unsurprising. Unfortunately adding migration edges doesn’t really add much value with so many populations, as there is a great deal of complex population history in Southeast Asia.

Removing many of the populations and setting the migration edges to 3, you get:


The Austro-Asiatic connection between Cambodians and Munda is always clear no matter what you do. The Bangladeshis tend to have more complex relationships, but often the edges are toward the Burmese, who are a compound between South Asian, Austro-Asiatic, and Northeast Asian.

At this point I ran a “three population test.” Basically, you take an outgroup, and compare it to a clade of two other populations, and see how good the fit of the data to the model is. If there is “complex population history” you’ll get a negative f3 statistic. Complex population history means that there is almost certainly gene flow between the outgroup and one of the ingroups.

Below are results where the Bangladeshis are the outgroup, and f3 statistics are negative (sorted most negative to least).

Ougroup Pop1 Pop2 f3 f3-error Z-score
BD Telugu Miao -0.00240554 6.21107e-05 -38.7298
BD Telugu Han_S -0.00238905 5.49332e-05 -43.4901
BD Telugu Dai -0.00238103 5.73977e-05 -41.4831
BD Telugu Han_C -0.00237904 5.74148e-05 -41.4359
BD Telugu Viet -0.0023151 5.63663e-05 -41.0725
BD Telugu Han_N -0.00229979 5.55838e-05 -41.3752
BD Telugu Japanese -0.00225745 5.65642e-05 -39.9095
BD Telugu Phil_Highland -0.00225153 6.87595e-05 -32.745
BD Telugu Borneo -0.00219619 5.91978e-05 -37.0992
BD Telugu Phil -0.00209752 5.97396e-05 -35.1111
BD Telugu Cambodians -0.00198719 4.88719e-05 -40.6613
BD Telugu Malay -0.00195706 5.32466e-05 -36.7547
BD Telugu Burmese -0.00183415 4.79121e-05 -38.2816
BD AA Telugu -0.000744786 4.17995e-05 -17.818


The model where Bangladeshis are a combination of Austro-Asiatic populations and conventional South Asians is not crazy. But observe that there is a jump in the f3 statistics between that row and the previous row. Bangladeshis almost certainly have non-Austro-Asiatic ancestry, which is why the scores are more extreme for cases such as (Bangladesh(Telugu, Vietnamese)).

What I’ve established then are:

  • Bangladeshi East Asian ancestry is not sufficiently explained by Munda ancestry.
  • A minority of Bangladeshi Y and mtDNA lineages have East Asian connections, and this can not be explained exclusively by Munda ancestry.
  • Some of these Y and mtDNA lineages seems to be of Tibeto-Burman affinity.
  • Admixture analysis genome-wide indicates ancestry from non-Munda populations of East Asian origin.
  • The fraction of Austro-Asiatic ancestry is balanced with more “northern” elements, while in Burma the northern element is a greater proportion than in Bangladesh.
  • There is a moderate negative correlation between Austro-Asiatic ancestry and Northeast Asian ancestry in the Bangladeshi sample.
  • Bangladeshis seem to have moderate signatures of gene flow from a wide range of East Asian populations.
  • In contrast, the Mundas seem to have a connection most strongly with Cambodians.

A paper from several years ago looking at the patterns of genetic ancestry in the Bangladeshi population found that a single pulse of admixture around 500 AD from an East Asian population was a good fit for the origins of the variation they saw. A two-pulse model with more ancient and more recent admixture events did not improve the fit.

I assume that there is a true signal there. But the model may still be too parsimonious.

My own predictions are as follows:

  • There will be a east-west cline of Tibeto-Burman ancestry.
  • There will be a more constant fraction of Austro-Asiatic ancestry.
  • The ratio of Austro-Asiatic ancestry will be reversed from the Tibeto-Burman cline.
  • Two admixture events will eventually be detected. A strong sex-balanced pulse at 500 AD and later. And an older continuous event that will be more male skewed, as it will involve absorption of Munda substrate.
  • The Padma river will turn out to be a major differentiator, with much more Tibeto-Burman ancestry to the east (Bengali dialects from east of the Padma show more Tibeto-Burman influence).

 Note: a separate issue that I did not want to explore is that the South Asian ancestry of the Munda seems to show almost no Indo-Aryan influence. The Bengali population does have a small, but consistent, “Indo-Aryan” signature that you can not find in the Telugu sample. Naturally this will bias the statistics a touch.

November 29, 2012

Burma’s “Muslims” are kalar Bengali

November 29, 2012

Burma's "Muslims" are kalar Bengali

The American media often confuses the subtleties of international ethography. For example, there is a tendency to use the term “Uyghur” and “Chinese Muslim” interchangeably. This is misleading. The largest Muslim ethnic group in China are the Hui, who were rather culturally similar to the Han, except in the many areas where the Islamic religion results in their deviation from Han practice (e.g., they do not eat pork). Though Uyghur religious feelings are real, and their resentment at the government of China does derive from religious persecution, it is also an expression of nationalistic alientation. Uyghurs are ethnic Turks. In short, the Uyghurs are Muslims in the People’s Republic of China (the governmental entity which is the heir to the extra-Chinese territories of the Manchu dynasty; Xinjiang, Manchuaria, and Tibet). The Hui are Muslims of China.

“Burmese Muslims”

A similar nuance is surely important when considering the situation of “Burmese Muslims.” In the article itself the author is peculiarity cryptic about who these people are aside from their religious identity, and their putative foreign origins. Who these people are are Rohingyas. They are the Muslims inhabitants of Arakan state, which extends southeast of Bangladesh. And importantly Rohingyas are descended from and closely related to ethnic Bengalis. Their language is a sister to Sylheti, standard Bengali, and Chittagongian, with a particular affinity to the latter. Additionally, there are other Muslims in Burma who are not Rohingya! Some of these are ethnic Burmans, also called Bamars, who are the majority community with Burma/Myanmar. Aung San Suu Kyi herself reportedly has some Muslim ancestry from the civil servants and soldiers who were to be found around the courts of the kings of old.

There are two issues which need to be highlighted. First, it seems reasonable that the Rakhine people of Arakan worry that the Islamic demographic wave will inundate them. Though Bangladesh now has the same fertility as Burma, until recently Muslim demographic expansion has been a fact on the eastern marchlands of South Asia. The ratio of Rakhine to Rohingya seems to be on the order of 3 or 4 to 1, which is a majority, but not a comfortable one. But there is a clear racial element to the animus here, which would likely not be present if the Muslims were of one of the Sino-Tibetan or Mon people. Following attack, Muslims demonstrate in Rangoon:

“We should either kill all the Kalars in Burma or banish them, otherwise Buddhism will cease to exist,” said another user.

“Kalar” is used to describe perceived outsiders within the country, especially individuals with dark skin, but the term often carries a pejorative tone. In the Burmese edition of the New Light of Myanmar today, the victims of the sectarian attack were referred to as “Kalar” instead of Muslims.

Second, the Rohingya themselves deny strenuously their association with Bengal and Bengalis, because that would give credence to the Rakhine accusation that they are recent migrants into Arakan. As it happens I think in the main the Rakhine are probably right. Though some of the Rohingya date to the long-standing Muslim minority of Arakan which likely dates to the vassalage of the region to the Sultanate of Bengal in the late medieval period, most of the Rohingya probably are the descendants of peasants from Bengal, who were part of the great global migration which brought Tamils to Malaysia further south.

But, when the ancestors of most of the Rohingya were leaving Bengal a self-consciously Muslim and Bengali identity was inchoate at best. Elite culture in Bengal by the late Mughal period was the purview of Urdu speaking elites, and elite Bengali culture arose in the early 19th century with the Hindu bhadralok. The Rohingya detachment from a Bengali identity is to a great extent natural, insofar as their peasant ancestors were never part of the consciousness raising and nation-creation project of the 19th and early 20th centuries, whereby an elite nationalistic and Muslim Bengali identity emerged.


June 16, 2011

Bengali Muslims are new (?)

June 16, 2011

Bengali Muslims are new (?)

A quick follow up to Zack’s post on Rohingya. On the demographics, if you believe the claims of Muslims and Christians in Burma, they are the majority of the population, not the Theravada Buddhists. This means ethnic Burmans are a minority, as are the combination of Burmans, Mons, and Shans, three ethnic groups that are overwhelmingly Buddhist (the majority of Karens are also Buddhist, but these Buddhist Karens tend to assimilate to Burman identity, while the large and politically mobilized Christian Karen minority remains distinct). I wouldn’t put too much stock in the demographic exaggerations, though because of Burma’s lack of a good census it seems plausible that there’s an undercount of minority groups. Until democracy comes, the government and minority activists can keep making up whatever numbers they want.

More interestingly, the Rohingya’s have an ambiguous ethnic identity. As a matter of fact they are clearly derived from the southeastern Bengali people. Their language has affinities to the dialect of Chittagong. And they have the standard look of South Asians (ergo, the Burmese accuse them of being ugly black trolls!), with the tinge of Southeast Asian which is very common amongst eastern Bengali. But from the reading, and some interaction with a few Bangladeshi Rohingyas I’ve met personally (these are the descendants of recent refugees), they have an ambivalent attitude toward identification with the Bengali nation. Some of this is political, as the Rakhine of Arakan amongst whom they reside of accuse them of being arriviste interlopers. This has some truth, the Rohingya demographic heft probably is a function of the last few centuries. But then, so is the white American demographic heft! I tend to think that if a people have a rootedness of centuries in a locale they are local…but then I’m American, so I would think that!

But some of the ambivalence is I think a function of the reality that the Rohingya were not part of the creation of the Bengali Muslim identity in the 19th and 20th centuries. This is clear when you notice that they don’t utilize the Bengali script! The Rohingya are folk Bengalis. There are many of these in Bangladesh and West Bengal. They speak a Bengali dialect, but are not participants in high Bengali culture, and wouldn’t know literary Bengali because they’re not literate. But there’s a vertical integration between the peasantry and an elite culture which is nationally self-conscious. In West Bengal this is led by the intelligentsia of Calcutta. But in Bangladeshi it’s focused on Dhaka.

To do a quick summary from the history that I’ve read, there’s a two act aspect to the self-consciousness of Bengali Muslims. The first act preceded the Mughals, when Afghans and other Islamic groups patronized literary Bengali as a counterweight to the Sanskrit favored by local Hindu elites (though these groups also patronized Persian naturally). With the rise of Mughal power though the Muslim elite of Bengal shifted toward an Urdu orientation. A large proportion of the Muslim peasantry were Bengali speaking in dialect, but in the 19th century they didn’t have a natural leadership class which identified with them in both religion and language. The Bengal Renaissance was a Hindu affair, because the elite Muslims of Bengal were participants in the high culture of Urdu speaking North Indian Islam.

Economic and social development in the 19th and especially 20th century led to the reemergence of a Bengali Muslim elite. This class did not assimilate to Urdu literary norms, and though it gave due deference to the cultural attainments of Hindu Bengalis, it also asserted its own religious distinctiveness, as is made clear by the strength of the Muslim League in eastern Bengal. Middle class Muslim Bengalis who came to maturity in the time before Pakistan resented the Hindu elite of Calcutta a great deal because of its cultural and political hegemony. They felt their religious difference keenly, not their ethnic one. I know this personally because my grandfather, who was often the only Muslim doctor in a given town where he practiced, expressed this attitude (he began practicing medicine in the 1920s). This is in contrast to my parents’ generation, who were more resentful of the racism and discrimination which they experienced from Biharis and West Pakistanis, and had a somewhat rose-tinted view of the beauty and elegance of Hindu Bengali culture in Calcutta. They felt their ethnic difference more keenly, and have no social discomfort around Bengali Hindus, because they have never have the memory of Bengali Hindu hegemony.

Shifting back toward the Rohingyas: their ambivalence to Bengali identity is due to the fact that they “missed out” on these centuries of interplay between Muslim and Bengali self-identification, at least at the elite level. The Rohingya nationalists don’t want to make aliyah “back” to Bengal. They don’t consider themselves from Bengal, they’re from Arakan, they’re from Burma. Their identity is as nationals of Burma, if not ethnic Burmans. Like many South Asian Muslims they are wont to construct a false identity of descent from Arabs, but at least they often used the Arabic script, unlike Bengali Muslims in Bangladesh and India! The Rohingya are assertive in their Islam, and they certainly wouldn’t part with that. But I suspect that it wouldn’t be a major issue for them if their descendants no longer spoke the Rohingya dialect. The Burmese Rohingya I’ve met exhibit little of the fixation with the Bengali language which Bengali Muslims steeped in Tagore express as a matter of course. I know my parents will be sad when the last Bengali speaking generation passes. The term “mother tongue” has more than clinical descriptive connotation for them (part of this is obviously due to the Language Movement, but part of it is probably the reality that Bengali Muslims accept some of the metaphorical aspects of linguistic unity which Bengali Hindus also espouse).

November 9, 2010

The importance of representativeness

November 9, 2010

The importance of representativeness

A few weeks ago when I posted on the results of a high likelihood of a partially eastern origin for the Mundari people I received a message via Facebook that the article really wasn’t relevant to most South Asians, since only 1-2% spoke a Mundari language (along with pointers to old out of date articles). I immediately replied that it is likely that the Mundari were one of the base populations from which the Indo-Aryan speaking peoples of Bengal, Orissa and Assam arose. The Santals are present as a minority in all three of these states, and the likelihood is that Santal tribals were assimilated into the Hindu (and later Muslim) society, not the other way around. My interlocutor was a little too fixated on issues having to do with colonialism to see clearly what I was trying to get at. That’s fine, we all have our own experiences.

But in any case the bigger point of that post was to emphasize the importance of representativeness. This is something that really stands out with South Asians. There are around 1.3 billion of us, but the HGDP sample has only Pakistani groups. Some of these, such as the Kalash and Burusho are cultural isolates, whose sampling was justified on the grounds that these people were likely going to be assimilated in the near future. Of the HGDP South Asians only one, the Sindhi, are Indo-Aryan speakers, the language family which covers about ~80% of South Asians. More recent papers have moderately rectified that situation. Though as a Indian American Bengali friend of mine observed, “there are 200 million of us!” I believe, and hope, in three years that these sorts of worries and questions will seem like ancient history. Below the fold I’ve taken Dienekes ADMIXTURE estimates for HGDP and HapMap3 South Asian groups and appended myself to them.


I’m soon going to get my parents tested via 23andMe, and I’ll have a better sense of my elevated “East Asian” ancestry is due to recent admixture, or part of the normal range in eastern Bengal. If, as I suspect, most of the East Asian is from my father I’ll increase the probability of the former. If it’s more balanced I’ll increase the likelihood that I’m representative of many Bengalis. There are a few Bengalis on 23andMe and most of them have elevated “Asian” ancestry, though not as much as me.

