Razib Khan One-stop-shopping for all of my content

January 11, 2011

Me, myself, and Myanmar

Filed under: BGA,Dodecad,Genetics,Genomics,Personal genomics — Razib Khan @ 3:22 am

I have spoken of my somewhat atypical, for a South Asian, genetic results before. Recently Dienekes performed some cluster analysis which confirmed the initial findings, while adding a little detail:

I am DOD075. The Southeast Asian component is modal in Malays, while the East Asian component is modal in the North Chinese. Vietnamese and Cambodians are mixed, with the former biased toward East Asian, and the latter Southeast Asian. My own proportions are more balanced, but there might be some noise in there. That being said, from what I have read of Southeast Asia it is highly likely that Burmese ethnicities will be between the Cambodians and Vietnamese in proportions. The Burmans were more shaped by the indigenous Mon-Khmer people than the Vietnamese were, though like the Vietnamese they seem to hail from southern China. My family is traditionally from eastern Bengal, and has been at various points the subjects of the kingdom of Tripura.

Here’s the Dodecad Indians, HapMap Gujaratis, and Behar et al. North Kannadans. The orange is Asian. Can you tell which one I am?

I’m pretty sure I’m second from the left. Not only am I atypically Asian, but I often show trace levels of other ancestral components in Dienekes’ ADMIXTURE results (I suspect this is due a rather cosmopolitan great-grandfather who was from Delhi). In any case, so far I’ve had pretty general pointers of what’s going on here. Unlike most people who get this stuff done, I found something interesting, though not too surprising (more on this later). But I ran into something which makes the case for specific Burmese origin even stronger.

I got involved in BGA thanks to the urging of my friend Paul. Recently David of BGA sent me matches for various HGDP and other populations, as well as his own project samples, for extended haplotype blocks on the first 3 chromosomes. These are long stretches of correlated markers which haven’t been broken apart by recombination. They may be indications of recent common ancestry between two individuals who share regions of genomic affinity. I decided to look at my matches. Also, there are several other South Asian individuals in the project. I don’t know who they are, but it’s clear they’re South Asian. I was curious to compare myself to them in terms of my matches.

First, I removed all the project samples. So basically I limited it to populations whose names I could perceive easily. Then I limited it to blocks of at least 100 bases. Below are the number of hits in the populations ordered. I matched some Pathans more than others, but I threw them all into a big pool. You can see some of the Genomes Unzipped guys, and Ilana Fisher & Kate Morely too.

Group Razib # hits
Pathan 72
Burusho 58
Hazara 47
Han 26
Naxi 19
North Russian 18
She 18
Yizu 16
Buryat 15
Miaozu 14
Chuvash 13
Adygei 12
Altai 12
Uygur 12
Tuva 11
Tujia 9
Mongol 8
Xibo 8
Daur 7
Yakut 7
Athabask 5
Evenk 5
Oroqen 5
Hezhen 4
Ilana Fisher 4
Maya 3
Chukchi 2
Ket 2
Nganassan 2
Vincent Plagnol 2
Daniel MacArthur 1
Joe Pickrell 1
Kate Morley 1
Komi 1
Luke Jostins 1

The raw number of hits isn’t really that informative. But obviously I’m going to be more open about my data than other peoples’. So here’s what I did: I took the # of hits that I had for these populations, and calculated the ratio with the mean for all other South Asians. For example, the mean South Asian # of hits for Pathans without me was 75. I was 72. So my ratio was 0.96. Here’s a table of the higher hit # groups for me, and my ratio with the South Asian mean:

Group Me vs. Mean
Pathan 0.96
Burusho 0.69
Hazara 1.77
Han 2.04
Naxi 10.86
North Russian 0.77
She 5.4
Yizu 4.27
Buryat 3
Miaozu 2.8
Chuvash 0.85
Adygei 0.52
Altai 1.71
Uygur 1.09
Tuva 2.2
Tujia 2.4
Mongol 2.13

I bolded what I noticed to be way out of the norm. The mean number of hits for South Asians with the Naxi, aside from me, was 1.75. I had 19. The She and Yizu were also atypical. These are three HGDP populations. The Naxi and Yizu reside predominantly in Yunnan. The She are based further east in southern China. The connection with my East Asian ancestry seems pretty straightforward then. Here’s a section on the origins of the Bamars in Wikipedia, the dominant group in Burma: “They migrated from the present day Yunnan in China into the Ayeyarwady river valley in Upper Burma about 1200–1500 years ago. Over the last millennium, they have largely replaced/absorbed the Mon and the earlier Pyu, ethnic groups that originally dominated the Ayeyarwady valley.” I believe that ~1/6th of my ancestry had something to do with the massive Völkerwanderung which brought the various ethnic groups dominant in Burma, Thailand, Laos and Vietnam to their current locations. Only Cambodia managed to maintain a native elite culture, though the modern Khmer polity was being crushed between Thailand and Vietnam when the French colonial authorities froze the current borders roughly in place.

~1/6th is not trivial, but it isn’t quite one grandparent either. So how did I come by this? It could just be a natural component of the eastern Bengali genetic landscape. I wasn’t too surprised by the results because so many of my extended family members do resemble people from Southeast Asia. I myself have a few characteristics which are not typical for South Asians (e.g., very little body hair). But I have reason to suspect that there might be some recent admixture. All my grandparents were Muslim, but I know the original Hindu caste origins of two of them (one of them was from a family who converted when she was an infant). They were unlikely to have had recent admixture. My maternal grandmother was only half-Bengali in terms of recent ancestry. But her father was from Delhi, from that city’s Muslim elite. That leaves my paternal grandfather, who died just before I was born. No photos of him were ever taken, but I was always told that his physical appearance was not typical for a Bengali. He was tall and relatively light in complexion. His title as a Khan came down through his paternal lineage, and was a legacy of the late Mughal (really, de facto post-Mughal by then from what I can gather) period. My paternal Y chromosomal lineage is R1a1a*, not an eastern haplogroup at all. I had always assumed that like my great-grandfather on my mother’s side my paternal grandfather was from the mixed heritage Muslim upper class family. I now strongly suspect that his background was more exotic than my family has let on, or at least more than they suspected.

It turns out that his family may have come to Bengal from Assam, to the northeast. Assam has an even stronger Tibeto-Burman presence that eastern Bengal. My paternal grandfather may have been from a family which mixed with the Tibeto-Burman tribal people in Assam, some of whom converted to Islam and assimilated to a Bengali identity. I find this rather interesting, and am curious as to the omission. My own personal experience discussing the eastern element of my ancestry is that many South Asians are just confused by the whole idea. Persian or Scythian ancestors they can grok. Burmese, not so much.

I will know more soon. I have had my parents typed, and will be able to ascertain if the eastern element is more from my father as I suspect, or whether it is from both parents. If the latter is the case, then there need be no exotic story. Rather, eastern Bengal is simply on the clinal continuum of allele frequencies which differentiate South Asia from Southeast Asia.

November 23, 2010

Eurogenes 500K SNP BioGeographicAncestry Project

Filed under: Admixture,Ancestry,BGA,Genetics,Genomes Unzipped,Genomics — Razib Khan @ 12:11 am

Since I have been promoting the Dodecad Ancestry Project, it seems only fair to bring to your attention Eurogenes 500K SNP BioGeographicAncestry Project. The sample populations are a bit different from Dodecad, but again ADMIXTURE is the primary tool. But the author also makes recourse to other methodologies to explore more than simply population level variation. For example, his most recent post is Locating and visualizing minority non-European admixtures across our genomes:

Imagine, for example, a white American carrying a couple of tiny segments of West African origin, from an ancestor who lived 250 years ago, and an eastern Finn with no Asian ancestors in the last 4000 years or more. If we run an inter-continental ADMIXTURE analysis with these two, it’s very likely the American will score 100% European, while the eastern Finn will probably come out around 9% North and East Asian due to really old Uralic influence.

That sort of thing isn’t a huge problem when comparing the genetic structure of populations. Obviously, overall, eastern Finns rather than white Americans are genetically closer to North Asians, and that’s basically what ADMIXTURE picks up. However, if the focus is also on individuals, this certainly can become an issue. Our hypothetical American might be aware of that African ancestor, with solid paperwork backing up their genealogical connection, but he’s pulling his hair out because nothing’s showing up via genetic tests.

So let’s take a look at a real life example of how RHHcounter can pick up segments of potentially recent Sub-Saharan African origin…

Olivia Munn & Uyghur woman

The basic issue here is that in terms of genomic variation old admixture looks different from new admixture. Someone who is a first generation Eurasian, with a Chinese and European parent, may be about the same ancestral mix proportionally as a Uyghur. They would resemble a Uyghur on STRUCTURE and be placed within that cluster on a PCA chart (this is what happens in 23andMe). But, the Uyghur “Eastern” and “Western” genetic heritage has been reshuffled to a great extent by recombination over the past 1,000-2,000 years. In contrast, a first generation Eurasian will have huge swaths of their genome which are Eastern or Western on alternating strands (from their respective parents). In population genetic language a group of first generations hybrids would be exhibit a lot of linkage disequilibrium (LD). In a panmictic hybrid population LD will decay due to recombination, which breaks apart the distinctive allelic associations inherited from the parental populations.

This is the key to differentiating between the old “Asian” ancestry which sometimes falls out of the genetic variation of Finns at low frequencies, and more recent Asian ancestry. For example, the paleoanthropologist Vance Haynes is apparently a great-grandson of one of the original “Siamese Twins,” Chang Bunker. Chang Bunker was a Chinese Thai, so presumably Vance Haynes would come out to be ~10% Asian, and would be shifted toward the Asian cluster in relation to other Europeans. On the other hand, a closer look at his genome would indicate differences from a Turk who was ~10% Asian, because Vance Haynes’ Asian ancestry has only had three generations for recombination to break apart the original allelic associations which were passed down from Chang Bunker. After only these few generations the genome would still show many segments of clustered ancestry with distinctive sets of markers characteristic of Han Chinese.

Let’s make this more concrete. Below are two “ancestry paintings” from 23andMe. One is of a reference example, a Uyghur woman, and another is of a Eurasian individual. The difference is pretty obvious:


23andmeclusFor the record, 23andMe says that the Eurasian man is 50% Asian, 50% European. For the Uyghur woman, 52% European, 48% Asian. As I indicated above, Eurasian individuals who are projected onto the variation of the HGDP sample tend to cluster with the Uyghurs. In the image to the left the black mark indicates the Eurasian man. The Uyghurs are green. The purple rectangles are Hazaras.

But obviously this is a trivial example. What’s the point of sniffing around for non-European ancestry in individuals whose non-European ancestry is 1) visible, and 2) recent and immediate. No, a bigger question here are claims and suggestions by some white Americans that they have significant non-European ancestry. Usually this is Native American. But in the case of one of the European-origin samples which “Polako” (the principal behind the BGA Project) analyzed it seems there is a suggestion of West African ancestry.

dandonThis individual is Dr. Don Conrad of Genomes Unzipped. In particular, Polako found that there were two nearby segments on two chromosomes which exhibited a pattern of population atypical heterozygosity in Dr. Don Conrad’s genome. Look at chromosomes 7 and 13. Contrast the pattern with my distant paternal cousin, Dr. Daniel MacArthur. He also exhibits points of heterozygosity, but they’re randomly distributed across the genome. It’s old admixture or just noise.

Polako doesn’t make much of Dr. Don Conrad’s results, and neither do I (presumably as Dr. Don Conrad is a member of Genomes Unzipped it’s easy to talk about his results without any of the ethical or moral hassles about confidentiality). On the other hand, unlike Dr. Dan MacArthur, a little utilization of the powers of the interwebs indicates that Dr. Don Conrad is an American. In particular, of recent Midwestern background. Though I’m not a total creep, so I didn’t start poking around Ancestry.com. But after the Pickrell affair I am probably just a touch more hesitant to laugh off peculiar results from these sorts of analyses as simply algorithms-gone-meshugana.

Image Credit: Colegota

Powered by WordPress