Razib Khan One-stop-shopping for all of my content

July 8, 2011

On the genetic structure of Afro-Indians

ResearchBlogging.orgThe Pith: Afro-Indians are mostly African, with a substantial Indian minority ancestry. The latter is disproportionately female mediated. It also seems that that ancestry is more northwest Indian, and that natural selection has been operating upon them outside of the African environment.

Along the western coast of South Asia, from Makran in southwest Pakistan, down to the Konkan coast of southwest Iindia, there are isolated communities of Afro-Indians. They are called Siddis or Habshi. Their African origin is clear in their physical appearance, as well as aspects of their folk customs which tie them back to Sub-Saharan African. Nevertheless, they have assimilated to many Indian cultural traits. They generally speak the local language, and practice Islam, Hinduism, or Roman Catholic Christianity (in that order in proportion).

How and why did the Siddis arrive in India? The earliest date for their arrival almost certainly must be bounded by the period when Indo-Islamic polities rose to prominence in the early second millennium. The cosmopolitan melange of the armies of the Muslim warlords included diverse groups of Africans, some of whom took power, and established their own self-conscious Afro-Indian dynasties, set apart from the Turkish, Afghan, ...

April 23, 2011

Resolutions in the Indian genetic layer cake

Filed under: Genetics,Genomics,Indian Genetics,Indian genomics — Razib Khan @ 7:54 pm

Two years ago Reconstructing Indian Genetic History reframed how we should view South Asian historical genomics. In short, Indians can be viewed as a hybrid between a West Eurasian group, “Ancestral North Indians” (ANI) and a very different group, “Ancestral South Indians” (ASI), which had distant connections to West and East Eurasians. At least to a first approximation. Last fall I posted on a new paper which surveyed the Austro-Asiatic speaking peoples of India, and concluded that they were exogenous to the subcontinent. This is an interesting point. Prehistoric treatments of South Asia often use linguistic terms to denote putative ancient populations. One model is that first it was the Munda, the most ancient Austro-Asiatics. Then the Dravidians. And finally the Indo-Aryans. These genetic data imply that the Munda arrived after the initial ANI-ASI synthesis. The Munda people of India can be thought of as ANI-ASI, with an overlay of East Eurasian ancestry.

Zack Ajmal’s K = 11 ADMIXTURE run has highlighted some further issues. He has a set of Austro-Asiatic samples, as well as a host of Indo-Aryan and Dravidian speaking populations. I now believe we can now further clarify and refine our model of the peopling ...

March 12, 2011

Harappa Ancestry Project @ N ~ 50

Zack Ajmal now has over 50 participants in the Harappa Ancestry Project. This does not include the Pakistani populations in the HGDP, the HapMap Gujaratis, the Indians from the SVGP. Nevertheless, all these samples still barely cover vast heart of South Asia, the Indo-Gangetic plain. Here is the provenance of the submitted samples Zack has so far:

Punjab: 7 Iran: 7 Tamil: 6 Bengal: 5 Andhra Pradesh: 2 Bihar: 2 Karnataka: 2 Caribbean Indian: 2 Kashmir: 2 Uttar Pradesh: 2 Sri Lankan: 2 Kerala: 2 Iraqi Arab: 2 Anglo-Indian: 1 Roma: 1 Goa: 1 Rajasthan: 1 Baloch: 1 Unknown: 1 Egyptian/Iraqi Jew: 1 Maharashtra: 1

Again, note the underrepresentation of two of India’s most populous states, Uttar Pradesh, ~200 million, and Bihar, ~100 million. Nevertheless, there are already some interesting yields from the project. Below I’ve reedited Zack’s static images (though go to his website for something more dynamic) with the labels of individuals. I’ve highlighted myself and my parents with the red pointers.

To the left is a set of plots and tables which I’ve spliced together from Zack’s various posts. What you need to know is that this at K = 12, and I’ve used the labels that Zack gave the various putative “ancestral populations” which emerged out ...

January 28, 2011

Harappa Ancestry Project, before the first wave

Zack has been posting his data sources, as well as how he filtered and formatted them, all this week. I assume that the first wave of results will be online soon. As of yesterday, this is what he had (I know he got some more today):

- Punjab 7
- Bengal 1
- Bihar 1
- Tamil 5
- Karnataka 1
- Anglo-Indian 1
- Roma 1
- Iran 3

Whole swaths of north-central India are missing. I am hopeful that more people will join in after the first wave of results are put out there. But, from what I have discussed with Zack it looks plausible that the very first wave will have a richer set of results because of the necessity of preliminary steps. So there’s some benefit in getting early. It’s really ridiculous to have literally 1 sample representing the 300 million people of Uttar Pradesh and Bihar. That’s 25% of South Asians represented by one person. I’ve gotten a commitment from one friend who was born U.P. to give his data up once it comes in, but there have to be others out there. (the Bengali N should go up to 2 when I swap my parents ...

Harappa Ancestry Project, before the first wave

Zack has been posting his data sources, as well as how he filtered and formatted them, all this week. I assume that the first wave of results will be online soon. As of yesterday, this is what he had (I know he got some more today):

- Punjab 7
- Bengal 1
- Bihar 1
- Tamil 5
- Karnataka 1
- Anglo-Indian 1
- Roma 1
- Iran 3

Whole swaths of north-central India are missing. I am hopeful that more people will join in after the first wave of results are put out there. But, from what I have discussed with Zack it looks plausible that the very first wave will have a richer set of results because of the necessity of preliminary steps. So there’s some benefit in getting early. It’s really ridiculous to have literally 1 sample representing the 300 million people of Uttar Pradesh and Bihar. That’s 25% of South Asians represent by one person. I’ve gotten a commitment from one friend who was born U.P. to give his data up once it comes in, but there have to be others out there. (the Bengali N should go up to 2 when I swap my parents ...

November 24, 2010

We were all Africans…before the intermission

modelhumanQuick review. In the 19th century once the idea that humans were derived from non-human ancestral species was injected into the bloodstream of the intellectual classes there was an immediate debate as to the location of the proto-human homeland; the Urheimat of us all. Charles Darwin favored Africa, but in many ways this ran against the cultural grain. The theory of evolution was birthed before the highest tide of the age of white supremacy and European hegemony, and Darwin’s model had to swim against the conviction that Africans were the most primitive of the colored races. After the waning of the ideological edifice of white supremacy, and the shock it received during and after World War II, the debates as to the origin of humanity still remained contentious and followed the same outlines (though without the charged normative inferences). But as the decades wore on many more researchers began to believe that Darwin was correct, and that the origin of humanity lay in the African continent. First, the deep origin of the human lineage in Africa was accepted, but eventually a more recent expansion out of Africa was argued for by one school. The turning point in these academic disputes was the popularization of the “mitochondrial Eve” theory of the 1980s.

What some paleontologists had long argued, that anatomically modern humans have their locus of origin in Africa, was supported now by research from genetics which indicated that Africans were the most basal clade of humans on a continental scale, so that non-Africans could be conceived of as a subset of Africans. From this originates the chestnut of wisdom that Africans have more genetic diversity than all other human populations combined. By the year 2000 one could say that the “Out of Africa” triumphalism had proceeded to the point where an almost exterminationist model had taken hold when it came to the relationships of anatomically modern H. sapiens, and other groups which had evolved outside of Africa over the past million or so years, such as the Neandertals.

ResearchBlogging.orgBut the theoretical dichotomies were too coarse and absolute as it turns out. A division between multiregionalist phyletic gradualism, where H. sapiens evolved out of its hominin ancestors concurrently on a world wide scale, and a model of rapid expansion of one tribe in Africa to replace all others in totality, may have been warranted in the age of classical genetics and a morphometric analysis, but now we can look at the raw genomic material in a more fine-grained fashion. In fact, we can now look at the genomic patterns of variation among extinct hominins! Though there have long been hints that the expansion-and-replacement paradigm was too extreme from the genetic and morphological data, with the publication last spring in Science of a paper which made the claim for admixture between Neandertals and non-Africans in the range of 1-4% in all non-African groups based on a comparison of Neandertal and modern human genetic variation, one can dismiss absolutist expansion-and-replacement as self-evidently true orthodoxy. But one orthodoxy has no given way to another, and the shock to the old models presented by the data has not resulted in the coalescence of new robust paradigms. We live in a time of scientific troubles, so to speak.

One of the more notable results in the Science paper from last spring was that all non-Africans had about the same admixture in relation to the Neandertal reference genome, ~1-4%. This means from the Orkneys to New Guinea. Because Neandertals were distributed only in the western half of Eurasia this implies that the admixture was an early event. By the time of modern human expansion across Eurasia, Australasia, and the New World, it had become equally distributed across the individuals within the population. Recall the contrast between African Americans and Uyghurs. Among the Uyghurs the ancestral quanta are equitably distributed from individual to individual, but among African Americans there remains substantial intra-population variance. The reason is that African Americans are quite new, an order of magnitude younger than the Uyghurs in a genetic sense, and admixture is still occurring into the African American population from the ancestral groups. The Uyghurs as we known them today genetically are probably ~1,000-2,000 years old (though their cultural origins are both more and less ancient, as a matter of linguistics in the former, and ethnic self-conception as a Muslim East Turkic group in the latter). The implication here is clear: there was a pause in the Out of Africa movement, where the proto-non-Africans mixed with a Neandertal group, possibly in the Middle East, and only began a massive demographic expansion after an unspecified sojourn. A paper from last spring makes this all explicit:

A more likely explanation for the OoA bottleneck is that Eurasia was populated by a larger population that had been relatively isolated from other modern human populations for tens of thousands of years prior to the expansion. The first fossil evidence for modern humans outside of Africa is in the Middle East at Skhul and Qafzeh between 80,000-100,000 years ago, which is at least 20,000 years prior to the Eurasian diaspora. If a population of modern humans remained in the Middle East until the expansion into Eurasia, there would have been sufficient time for genetic drift to reduce heterozygosity dramatically before the Eurasia expansion. This “Middle East isolation” hypothesis provides a robust explanation for the relative homogeneity of European and Asian populations relative to African populations (see Figures 3A-B) and is supported by a recent maximum likelihood estimate of 140,000 years ago for the time of Eurasian-West African population separation . Interestingly, a recent study of the Neandertal genome suggests that the non-African individuals, but not the Africans, contain similar amount of admixture (1-4%) with the Neandertals . The authors suggest that the admixture must have happened between the Neandertals with an ancestral non-African population before the Eurasian expansion. Given the fossil, archaeological, and genetic evidence, the Middle East isolation hypothesis warrants rigorous evaluation as whole-genome sequence data become available.

Now the same group has published a follow up paper in Genome Biology which fleshes out the Deep Time aspect of human evolutionary history by looking closely at the genetic variation of an under-sampled population: South Asians. You may have noticed that the HGDP populations include Pakistani groups as South Asian exemplars. That’s apparently because during the Permit Raj era in India the government was wary of cooperating with the HGDP consortium. But more recently the barriers have come down in India, and one can viably supplement the data sets with Indian Americans. So the GIH sample in HapMap3 consists of Gujaratis from Houston. At ~1.25 billion, or nearly 20% of the world’s population, South Asians are a critical portion of the “big picture” when it comes to world wide genetic variation.

Genetic diversity in India and the inference of Eurasian population expansion:

To analyze an unbiased sample of genetic diversity in India and to investigate human migration history in Eurasia, we resequenced one 100 kb ENCODE region in 92 samples collected from three castes and one tribal group from the state of Andhra Pradesh in south India. Analyses of the four Indian populations, along with eight HapMap populations (692 samples), showed that 30% of all SNPs in the south Indian populations are not seen in HapMap populations. Several Indian populations, such as the Yadava, Mala/Madiga, and Irula, have nucleotide diversity levels as high as those of HapMap African populations. Using unbiased allele-frequency spectra, we investigated the expansion of human populations into Eurasia. The divergence time estimates among the major population groups suggest that Eurasian populations in this study diverged from Africans during the same time frame (approximately 90-110 thousand years ago). The divergence among different Eurasian populations occurred more than 40,000 years after their divergence with Africans.

First, I want to put into the record that I think there are high enough uncertainties (evident in the confidence intervals in the paper itself) that we need to be careful about taking the divergence times from their results as values we’d bet the house on. Someone with a better knowledge of the fossils (e.g., John Hawks) or controversies about the mutational rates (e.g., Dienekes) can comment on the plausibilities of the dating. But, I think we can infer that there was a time lag closer to a 10,000 years order of magnitude than 1,000 years when it comes to the Middle Eastern sojourn of non-African humans.

The basic method here is that the research group zoomed in on a ~100 kb region of the genome, on chromosome 12, and surveyed their Indian populations, as well as the HapMap3 ones. This is important because the SNPs in the HapMap probably exhibit an ascertainment bias toward variants in European and other more widely surveyed groups. The fact 30% of the SNPs in the South Indian groups seem to not be found among the HapMap populations confirms this hunch. Before digging into the details of the paper, let’s note that the South Indian groups are from the state of Andhara Pradesh, Brahmins, a lower caste group (Yadava), Dalits (Mala/Madiga), and a tribe (Irula). This is a case where even more thorough coverage is necessary. There is some suggestion that South Asian groups have a long history of endogamy and genetic peculiarities, which would limit the usefulness of extrapolations from this sample. Even within the HapMap Gujarati sample there seems to be two clusters when the PCA is used with reference to the European samples.

There are basically three portions of the paper:

- A survey of conventional population genetic statistics,

θ = 4Neμ (Ne = effective population, μ = mutation rate)
π = nucleotide diversity
H = heterozygosity
D = Tajima’s D

- Measures of genetic distance between contemporary populations, Fst and PCA

- Finally, taking the genetic variance from the ~100 kb and plugging it into explicit models of human evolutionary history

Table 1 (I reformatted) shows the genetic statistics by “continent.” Indian includes some Gujarati individuals. They sampled out of the HapMap populations to equalize the numbers.


euro2Some of these results are striking. The general truism is that Africans are the most diverse population in the world, but some of the South Indian groups are very diverse indeed. Of particular interest though is that some Indian groups are not very diverse at all. What’s going on here? Here you have to look at the specifics of each group. It is likely that South Indian Brahmins are the result of a relatively recent population expansion, with some uptake of other genes through hypergamy. A paper from last year argued that all Indian populations can be modeled as a two-way admixture of different quantities from two ancestral groups, Ancient North Indians and Ancient South Indians. The heterozygosity values may be explained in such a fashion, though the relatively low values for Gujaratis and Andhara Pradesh Brahmins would still surprise. Frankly, I’m just mostly confused by the diversity statistics. Probably the substructure through endogamy and population bottlenecks are obscuring broader dynamics. We can, though, conclude that the idea that all non-Africans are uniformly homogeneous in comparison to Africans may not hold water. Figure 2 above illustrates this by plotting heterozygosity vs. distance from Africa.

Next, let’s move to genetic distance. There’s two ways you can look at this: a summary statistic like Fst, which partitions between and within population variance, and PCA, which visualizes the largest dimensions of variations in the data set. So you have both below (reedited for reasons of space):


In the generality the results are expected, but there are weird details. For example, the Brahmins from Andhara Pradesh are on the margins, where you’d expect them to cluster with the Gujaratis. The Gujaratis are closer to the Chinese from Denver than Utah Whites? This is a provisional paper, so I’m almost wondering if there’s a typo or coding error here, as I don’t understand how the GIH can be so close to the Tuscans and Chinese from Denver, and much further from the Northern Europeans and Chinese from Beijing. The two European and Chinese samples are rather close in other analyses.

So let’s get to the real deal. The modified Out of Africa model where non-Africans take a “break” after they leave the mother continent:


I’ve mashed up the figures. The models were generated by looking at allele frequencies. They took the variants they found by sequencing the ~100 kb on chromosome 12, which was in a very gene-poor region so as to bias it toward neutrality, and plugged them into a few models in the ∂a∂i program. I’ll jump to the text here:

…the divergence time between African and the ancestral Eurasian population (88-112 kya, CIs: 63-150 kya) is much older than the divergence time among the Eurasian groups (27-39 kya, CI: 20-59 kya). The more recent divergence time and the low migration rate estimates among the current Eurasian populations support the “delayed expansion” hypothesis for the human colonization of Eurasia (Figure 5). Consistent with previous studies…these estimates indicate that a single Eurasian ancestral population remained separated from African populations for more than 40 thousand years prior to the population expansion throughout Eurasia and the divergence of individual Eurasian populations.

Manafi al-Hayawan, Adam and Eve

Take a good look at those confidence intervals. We know that some of those have to be false: the bones don’t lie. From what little I know a very young consensus date for the settlement of Australasia by modern humans is 40,000 years ago. That happens nicely to be their median, but the dispersion toward younger dates is probably not right, unless Aborigines are a separate population who are remnants of an earlier wave of migrants (or the current Aborigines replaced earlier waves). It is also hard to reconcile these dates for the diversification of non-African humanity with very old dates for Chinese fossils which exhibit some elements of modern morphology.

In the broad outlines I think we can accept that the model outlined in this paper may be correct. It would explain the uniform admixture of Neandertal in non-Africans, since they’d need time as a compact population before demographic expansion to integrate the Neandertal genes as part of their genetic background. But before the Neandertal genome came out there were plenty of papers which purported to show how there was no archaic admixture in modern humans, and plenty of papers which did claim there was evidence for such admixture. The point is that these computational models are sensitive to their inputs, and being models they simplify what really happened. In the discussion the authors repeatedly observe that migration between the various non-African demes doesn’t effect the outcome. That is fine, but there is modestly strong evidence that the Indian samples that they’re using are an admixed population of old. That would make me skeptical of claims about dating the separation of “Indians” when Indians are themselves possibly a compound between other groups.

Below is the model presented from Reconstructing Indian population history:


The teens of this century are going to be very exciting when it comes to reconstructing human evolutionary history. You’d be a fool to put bets on any horse at this time.

eurasicansAddendum: I need a term for non-African humanity. So I’m making up one right now: Eurausicans. From Eurasians, Australasians, and Americans.

Citation: Jinchuan Xing, W Scott Watkins, Ya Hu, Chad D Huff, Aniko Sabo, Donna M Muzny, Michael J Bamshad, Richard A Gibbs, Lynn B Jorde, & Fuli Yu (2010). Genetic diversity in India and the inference of Eurasian population expansion Genome Biology : 10.1186/gb-2010-11-11-r113

August 10, 2010

PCA, Razib around the world (a little)

price_fig1I have put up a few posts warning readers to be careful of confusing PCA plots with real genetic variation. PCA plots are just ways to capture variation in large data sets and extract out the independent dimensions. Its great at detecting population substructure because the largest components of variation often track between population differences, which consist of sets of correlated allele frequencies. Remeber that PCA plots usually are constructed from the two largest dimensions of variation, so they will be drawn from just these correlated allele frequency differences between populations which emerge from historical separation and evolutionary events. Observe that African Americans are distributed along an axis between Europeans and West Africans. Since we know that these are the two parental populations this makes total sense; the between population differences (e.g., SLC24A5 and Duffy) are the raw material from which independent dimensions can pop out. But on a finer scale one has to be cautious because the distribution of elements on the plot as a function of principal components is sensitive to the variation you input to generate the dimensions in the first place.

I can give you a concrete example: me. I showed you my 23andMe ancestry painting yesterday. I didn’t show you my position on the HGDP data set because I’ve shared genes with others and I don’t want to take the step of displaying other peoples’ genetic data, even if at a remove. But, I have reedited some “demo” screenshots and placed where I am on the plot to illustrate what I’m talking about above. The first shot is my position on the two-dimensional plot of first and second principal components of genetic variation from the HGDP data set.

gsa-lillymendel-worldNo surprise that I’m in the Central/South Asian cluster. But what may surprise you is that I’m not in the South Asian cluster, I’m in the Central Asian cluster. In the Central Asian cluster are Uyghurs and Hazaras. These are two hybrid populations, a mixture of West and East Eurasian elements. The Uyghurs are likely the outcome of a process of admixture between the Iranian and Tocharian Indo-European populations of the cities of the Tarim basin, and later Turkic speaking settlers who arrived in the wake of the expansion and later collapse of the first Uyghur Empire (the historical connection between the current Uyghurs and ancient Uyghurs is tenuous at best, and complicated). The Hazaras are a more recent population, likely emerging as the product of intermarriages between Mongol soldiers who arrived in the 13th century, and indigenous women, Persians, Turks, and assorted Indo-Iranian groups between the Zagros and Khyber Pass. It is somewhat ironic that I’m on the edge of the Hazara cluster since they are almost certainly in part descended from Genghis Khan’s family, and my own surname is Khan. But I know that my Y chromosomal lineage is R1a1, very common across Central and Southern Eurasia, and not a Mongolian one at all.

23andmepcazoomZoom! Now we’ve constrained the input data set to the Central/South Asian groups. First, look at the Kalash. They’re strange, which is no surprise, they’re an inbred mountain group in Pakistan who have not adopted Islam. The Pakistani Taliban looks to be ending them as we speak. I really would prefer that they were just thrown out of the data set for this zoom view, because on this fine grained scale I don’t think they add much at all. They’re just an example of what long term endogamy can do to your allele frequencies. The bigger picture is the axis between the populations of Pakistan, and those of Central Asia. Observe that I’ve changed position. Whereas when taking world wide genetic variation into account I clustered with Central Asians, now I’m 2/3 of the way to the South Asian cluster. I will tell you that I’ve shared “genes” with around 50 South Asians now, from various parts of the subcontinent, and in the 23andMe plot they overlay the South Asians nearly perfectly. I’ve put labels at the approximate ethno-linguistic position. I’m an outlier. 23andMe tells me that I’m 43% “East Asian.” The typical South Asian is in the 10-30% range. My first assumption was that I have a lot of ancient South Indian, which just shows up as East Asian in their algorithm. With this in mind I tried sharing with a lot of South and East Indians, and found out two interesting points. First, South Indians seem no higher than 30-35% East Asian. Bengalis on the other hand are more East Asian, with Bangladeshis more East Asian than West Bengalis. My sample size for Bengalis is small, so take that with caution. Second, the PCA plots put the South Indians firmly in the South Asian cluster, but the Bengalis trail out toward my own position. This indicates again that different methods are telling you slightly different things. The PCA is only a thin slice of variation, but it’s highly informative of between population differences. A Bengali and a South Indian with the same “East Asian” fraction in the ancestry painting nevertheless have consistently different positions on the PCA, with Bengalis closer to the East Asians. Additionally, there’s an ethnic Persian in this zoom plot that I’m describing, and they are positioned near the Balochi. But on the world wide plot they’re on the margins of the European cluster. Another illustration that position of an element is sensitive to the input data because of how the dimensions are generated.

Blaine Bettinger, who inspired me to post this, told a story with his ancestry painting which was plausible. What can I say? First, I have less than 1% African ancestry. This could be noise. But, I do observe that the South Asians with Muslim names are enriched in the set of those who I’ve shared genes with and who have less than 1%, but not 0%, African ancestry. Just as Muslim South Asians have non-trivial West Asian ancestry, I suspect that many of us have Sub-Saharan African ancestry through the same dynamic. Sub-Saharan African soldiers were prominent across South Asia with the arrival of Muslims. Bengal even has a period of rule by Abyssinian rulers. But the bigger issue for me is the East Asian component. Here is a figure from a paper published 4 years ago:


The figure is showing Fst value comparing Indian Americans with Europeans and East Asians. Fst measures between population differences in allele frequency, in this case the alleles being 207 indels. Take a look at the Bengalis. These are West Bengalis, who I believe have a lesser East Asian component, but even there the allele frequency difference to East Asians is near that of Europeans. The Assamese, who speak a language very close to Bengali, are similar. Assam was ruled by a Tibeto-Burman people for nearly 600 years. The Oriya speakers, from the southwest of Bengal, are more distant from East Asians. As one goes south and east, and west and north, the distance from East Asians increases. This shouldn’t be that surprising, but nice to confirm. The fact that the genetic distance increases as one goes south means that for northeast South Asia you need to complexify the model from a two-way admixture with “ancient North Indians” and “ancient South Indians.” Set next to these two is an East Asian element, which is also clear in the Indo-Aryan peoples of Nepal.

Sheikh Hasina, Khaleda ZiaOf course anyone who knows Bengalis won’t be totally surprised by an East Asian component to their ancestry. To the left are head shots of the two women who have dominated Bangladeshi politics for the past two decades, Khaleda Zia and Sheik Hasina. They’re both Bengalis, but they do look different, and I know many people who look like one or the other (or a combination). My family is from one of most easternmost districts of Bengali, next to Tripura. In fact my late maternal grandmother lived in Tripura for some of her childhood (she was almost trampled to death by the Maharani of Tripura’s insane elephant as a young girl!). When I was a young child I once saw a black and white photo from my father’s college days, and I was curious who the Asiatic looking young man in the middle of the photograph was. Turns out it was my father! Sometimes our expectations affect how we perceive people. I have never perceived my father to have an Asian cast to his features as a more mature man, but others have told me that he does still exhibit them.

There is still the question of how Bengalis came to have this particular admixture. I think the most plausible scenario probably synthesizes conventional village-to-village intermarriage and isolation-by-distance, along with some component of migrationism. Tribes such as the Chakma have left Burma in historical time. The Chakma of Bangladesh now speak a dialect of Bengali, not their ancestral Sino-Tibetan tongue. I believe that a non-trivial portion of Bengalis have ancestors who were tribal people who shifted their religious identity to that of Hinduism or Islam (from Theravada Buddhism in the case of the Chakma, or animism in the case of the Garos before their Christianization). But eastern South Asia is adjacent to mainland Southeast Asia, and it stands to reason that continuous gene flow would over time would also have introduced East Asian alleles into the Bengali gene pool.

Image Credit: TopNews.in

August 6, 2010

Strange genetic variation in South Asia

Filed under: Genetics,Genomics,Indian Genetics,Indian genomics — Razib Khan @ 12:11 am

Dienekes has a post up where he highlights the fact that the recent paper on South Asian metabolic diseases has a figure which elucidates population structure within the region. Accounting for structure is important for genome-wide associations since you might get a spurious correlations if trait value/disease frequency is simply tracking cryptic population variation. Dienekes says:

The existence of two clusters is kind of obvious, while their interpretation is not as dots of the same color appear in both clusters: a placement of these individuals in a global context might have been useful here. Things are clearer at the top cluster which shows a clear gradient anchored by Punjabi Sikh and Hindu Tamils on either end.

Also of interest is the group of isolated Muslim/Christian individuals on the left which deviate strongly from the mainstream; these probably represent exogenous elements that don’t resembe the bulk of the Indian population.

The second issue is easily addressed. The Christian outliers are both give English as their native language. That suggests to me that they’re Anglo-Indian, a community of mixed South Asian and European origin. South Asian Muslims are overwhelmingly of indigenous origin. But, a minority of the Muslim elite are West Asian, or have substantial West Asian ancestry, as is evident by the fact that they look white. Benazir Bhutto’s mother was of Kurdish and Persian ethnic background (her family was from Esfahan in Iran). I’ve reedited the religious & linguistic PC plots to fit onto the screen.


So what’s going on with the cluster which extends along the second principal component? The first component is probably just a European/West Asian-South Asian axis of variation. But I don’t understand where the variation for the second is coming from. Observe that the one South Indian group, Tamil speakers, are not represented in the secondary cluster. The plot reminded me of something I saw last fall.

Below is figure S4 is from the supplements of Reconstructing Indian population history. I added some labels. The Indian cluster is tight when the genetic variation includes non-Indian groups. But, when you constrain the variation to Europeans and South Asians only, something strange happens:

The Gujarati sample is from Houston, and is from HapMap Phase 3. I have a suspicion that the secondary cluster among the Gujaratis here is of the same class of phenomenon as the secondary cluster in the first plot. The Anglo-Indians and West Asian Muslims serve as rough proxies for Europeans, and you have an expected European-South Asian axis. But you also have this strange orthogonal component. I had assumed that the plot from the Reich et al. paper was an anomaly, but I’m not so sure seeing the second paper.

Powered by WordPress