Razib Khan One-stop-shopping for all of my content

November 20, 2013

The long First Age of mankind

Filed under: Anthroplogy,Archaeology,Siberians — Razib Khan @ 10:22 am

OldSiberian

“What it begins to suggest is that we’re looking at a ‘Lord of the Rings’-type world – that there were many hominid populations,” says Mark Thomas, an evolutionary geneticist at University College London who was at the meeting but was not involved in the work.

- Mark Thomas, as reported by Nature

This is in reference to the ancient DNA meeting where David Reich reported that the Denisovans, an exotic archaic population which contributed ~5-10 percent of the ancestry of Papuans, was itself a synthesis of Neandertals and a mysterious group currently unknown. This is not surprising, as the broad outlines of these results were presented at ASHG 2012, though no doubt they’re moving closer to publication. But for this post I want to shift the focus to a different time and place, after the ancient admixture with archaic lineages, and to the reticulation present within our own.


But first we need to backtrack a bit. Let’s think about what we knew in the early 2000s. If you want a refresher, you might check our Spencer Wells’ The Journey of Man or Stephen Oppeneheimer’s Out of Eden, which focused on Y and mtDNA lineages respectively. These books were capstones to the era of uniparental phylogeographic analysis of the spread and diversification of anatomically modern African hominids ~50-100,000 years ago. Rather than looking at the whole genome (the technology was not there yet) these researchers focused on pieces of DNA passed down via direct maternal or paternal lineages, and reconstructed clean phylogenetic trees using a coalescent framework. Broadly speaking these trees were concordant, and told us that our lineage, all extant humans, derived from a small African population which flourished ~100,000 years ago. These insights suffused the thought of human evolutionary thinkers in other disciplines (see The Dawn of Human Culture). H. sapiens sapiens, veni, vidi, vici.

After that initial “Out of Africa” migration a series of bottlenecks and founder events led to the expansion of our lineage, as it replaced all predecessors. By the Last Glacial Maximum, ~20-25,000 years ago, the rough outlines of human genetic variation were established (with the exception of the expansion into the New World). We know now that this picture is very incomplete at the most innocuous, and highly misleading given the least charitable interpretation.

Reticulation. Graphs. Admixture. These words all point to the reality that rather than being the culmination of deep rooted regional populations which date back to the depths of the Pleistocene, most modern humans are recombinations of ancient lineages. On the grandest scale this is illustrated by the evidence of ‘archaic’ ancestry in modern humans. But even more pervasively we see evidence of widespread admixture between distinct lineages which are major world populations which we think of as archetypes. This is true for Amerindians, South Asians, and Europeans. This is also the case for Ethiopians, and Australian populations. A major problem crops up when we talk about extinct ancient populations which were the founding substituent elements of modern ones: it doesn’t make sense to use modern referents when they are simply recombinations of what they are describing. But language and history being what they weare we can’t change the awkwardness of talking about “Ancestral North Eurasians,” anodyne and somewhat incoherent at the same time (Eurasia is a modern construct with contemporary historical salience).

Into the mix comes another ancient DNA paper which reconstructs the genome of a boy who lived in Siberia, near Lake Baikal, somewhat over 20,000 years ago. It’s titled Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans. Here’s the topline finding: a substantial minority of the ancestry of modern Native Americans derives from a North Eurasian population which has closer affinities to West Eurasians than East Eurasians. And, this is an old admixture event. In the paper itself they observe that all “First American” populations seem to exhibit the same admixture distance to the Siberian genome. These results are also broadly consistent with the admixture of this population in Western Eurasia, especially northeast Europe. As among Amerindian populations it seems that this element is at substantial minority across Europe as a whole, and perhaps at parity in some populations, such as Finns.

Fig1To the left you see the geographical affinities of the MA-1 Siberian sample. It is shifted toward West Eurasians in the PCA. But on the map with circles representing populations, the definite evidence of admixture between Amerindians and MA-1 is clear in the shading. The statistic used, f-3, looks for complex population history between and outgroup (X) and a putative clade. From this test it is evident Amerindians had some admixture related to MA-1. Because of the dating of Siberian remains it does not seem likely that admixture was from Amerindians to West Eurasian and related populations. Rather, the reverse seems more plausible. You can also see from the map the close affinities with particular European and Central Asian populations of MA-1. This is intriguing, and requires further follow up. Though MA-1 and its kin were closer to West Eurasians than East Eurasians, it still seems likely that there was an early divergence between the populations of north-northeast Eurasia, and those of the southwest. Eventually they came back together in various proportions to produce modern Europeans, but it seems likely that during the Pleistocene these two groups went their own way.

treemixThere are hints of this in the TreeMix plot to the right. Note now drifted MA-1 is in relation to other West Eurasians (the branch is long). I suspect some of this is due to the fact that this individual is nearly 1,000 generations in the past. Not only is it difficult to name ancient populations with those of moderns, I suspect that some of the variation in the ancient populations has been lost, and so they seem exotic and difficult to fit into a broader phylogenetic framework (they had hundreds of thousands of SNPs though). And yet MA-1 can be fitted into the broader framework of populations which went north or west after leaving Africa because of mtDNA and Y chromosome results. Both of these indicate that MA-1 was basal to West Eurasians, with haplogroup U for mtDNA, and R for the Y lineage.

To really understand what’s going on here is going to take a while. A later subfossil, circa ~15,000 years before the present, yielded some genetic material, and exhibited continuity with MA-1. This suggests that Siberia may have had massive population replacement relatively recently. We know this was likely the case elsewhere. Reading Jean Manco’s Ancestral Journeys one possible scenario is that Pleistocene Europeans were MA-1 like, but were replaced by Middle Eastern farmers in the early Neolithic. But later eruptions from Central Asia brought mixed populations (Indo-Europeans?) with substantial MA-1 affinities to the center of European history.

Finally, one must make a note of phenotype. The authors looked at 124 pigmentation related SNPs (see supplemental). The conclusion seems to be that MA-1 was not highly de-pigmented, as is the case with most modern Northern Europeans. This stands to some reason, as substantial ancestry of this sort in Amerindians would result in phenotypic variation which does not seem to be present. Though the authors do suggest that coarse morphological variation among early First Americans (e.g., Kennewick Man) might be due to this population, which had West Eurasian affinities.

Where does this leave us? More questions of course. Though I’m confident the befuddlement will clear up in a few years….

Citation: doi:10.1038/nature12736

Addendum: Please read the supplements. They’re rich enough that you don’t need to read the letter if you don’t have access. Also, can we now finally bury the debate when east and west Eurasians diverged? Obviously it can’t have been that recent if a >20,000 year old individual had closer affinity to western populations.

The post The long First Age of mankind appeared first on Gene Expression.

November 13, 2013

The color of life as a coincidence

Filed under: Anthroplogy,Evolution,Evolutionary Genetics,Genetics of taste,Taste — Razib Khan @ 12:35 am

Credit: Eric Hunt

Credit: Eric Hunt

I do love me some sprouts! Greens, bitters, strong flavors of all sorts. I’ve always been like this. Some of this is surely environment. My family comes from a part of South Asia known for its love of bracing and bold sensation. But perhaps I was born this way? There’s a fair amount of evidence that taste has a substantial genetic component. This does not mean genes determine what one tastes, but it certainly opens the door for passive gene-environment correlations. If you do not find a flavor offensive, you are much more likely to explore it depths, and cultivate your palette.

220px-Durio_kutej_F_070203_ime

Dost thou dare?
Credit: W.A. Djatmiko

And of course I’m not the only one with a deep interest in such questions. With the marginal income available to us many Americans have become “foodies,” searching for flavor bursts and novelties which their ancestors might never have been able to comprehend. More deeply in a philosophical sense the question of qualia reemerges if there is a predictable degree of inter-subjectivity in taste perception (OK, qualia is always there, though scientific sorts tend to view it as intractable in a fundamental sense).


But there’s heritability, and then there’s genes. We know that perception in some ways is heritable, but what is perhaps more interesting is if you can peg a specific genomic location to it. Then the evolutionary story becomes all the richer. And so it is with the locus TAS2R16, where a nonsynonymous mutation at location 516 seems to result in heightened sensitivity to bitter tastes. More specifically, it’s rs846664, and the derived T allele is fixed outside of Africa, while the ancestral G allele still segregates at appreciable fractions within African populations. A new paper in Molecular Biology and Evolution puts this locus under a microscope, though it does not come up with any clear conclusions. Origin and Differential Selection of Allelic Variation at TAS2R16 Associated with Salicin Bitter Taste Sensitivity in Africa presents some interesting findings. First, let’s look at the distribution of the variation in their sample populations at the SNP of most particular interest:

Region Population T516G
Outside of Africa Non-Africans 0.000
Ethiopia Semitic 0.059
Tanzania Sandawe 0.083
Ethiopia Omotic 0.093
Ethiopia Cushitic 0.095
Tanzania Iraqw 0.111
West Central Africa Fulani 0.114
Kenya Niger-Kordofanian 0.133
Ethiopia Nilo-Saharan 0.156
Kenya Afroasiatic 0.162
West Central Africa Niger-Kordofanian 0.214
Kenya Nilo-Saharan 0.225
Kenya Luo 0.250
Central Africa Niger-Kordofanian 0.329
Tanzania Hadza 0.333
Central Africa Bulala 0.361
Central Africa Nilo-Saharan 0.367
West Central Africa Afroasiatic 0.462
West Central Africa Nilo-Saharan 0.500

As you can see T is fixed outside of Africa, and varies across many African populations  Previous work implied this, though coverage within Africa was not good. One thing to observe though is that the frequency of A within Africa can not be explained by recent Eurasian admixture. The frequency is way too high for that to be the sole explanation, and in any case there is no evidence that ~33% of the Hadza’s ancestry is of Eurasian provenance (the Hadza being one of the three major groups of African hunter-gatherers, along with the Bushmen and Pygmies).

Within the paper the authors resequenced ~1,000 base pairs across diverse African populations in an exonic region of this gene (the stuff that codes for amino acids). What they discovered is that of the SNPs segregating, 516 in particular was critical toward effecting phenotyping change. Not only did individuals with the T variant notably exhibit stronger bitter sensitivity, but in vitro expression with a reporter was elevated. Because they had such a dense genomic region they could perform various nucleotide based tests to detect natural selection, and, attempt coalescent models to infer genealogical history.

I’m going to spare you some of the gory details at this point. Here’s what they found. First, it does look like the region is under natural selection in many African populations, in particular, the derived haplotype with T at 516 at the center. But this result is not reproduced across all tests. The coalescent simulations make clear why: the mutation is an old variant with deep roots in the hominin lineage. In other words this variation pre-dates H. sapiens. It looks like the T allele has rapidly increased in frequency relatively recently, though more on the order of ~50,000 years, rather than ~10,000.* Basically around the time of the “Out of Africa” event. Additionally, there’s a tell-tale sign that this is being subject to selection within Africa: the genetic differences across populations at TAS2R16 far exceed the genome-wide values (the Fst at this locus is in the top 1% of loci within the African genome). Finally, one should note that the G allele haplotypes seem to be much more strongly constrained, as if they’re under purifying selection. This means that the switch to T is not all gain.

At this point you may be ready for a story about how some African populations, like Eurasians, underwent a lifestyle change, and diet changes resulted in a shift in sensory perception. That does not seem to be the story. Rather, the authors did not seem to be able to agree upon a neat explanation for what is driving these recent sweeps up from ancient standing genetic variation. They do observe that the variation does tend to cluster geographically, more so than the genome-wide results would imply. There’s likely some adaptation going on, they simply don’t know what. In the introduction and elsewhere you can see that variation at TAS2R16 does correlate with other traits. Not too surprising due to the relatively ubiquity of pleiotropy; one gene with many effects.

Stepping outside of the implications of this specific result, let’s think about what might be a takeaway: something as essential as taste perception might be a side effect of other aspects of evolutionary processes. In other words, we don’t know what the phenotypic target of selection is in this case, but we do have a good handle one of the major side effects, which is sensory perception. How one taste seems like a big deal.** Andthere have been many theories propounded that variation in bitter sensitivity is due to adaptation to poisonous plants and such, but really no one knew, and that was just the most plausible of low hanging fruit. With these results from Africa, where there is more variation in the trait and genes, and good geographic coverage, that seems to be an implausible model to adhere to (one would think the hunter-gatherer Hadza would exhibit the most sensitivity, no?). Many of the traits and tendencies which we humans see as fundamental, essential, and of great import, many actually be side effects of powerful evolutionary forces hammering at the genetic-correlation matrices which define the hidden network of co-dependencies within the genome. So there, I said it. Life is an accident. Enjoy it.

Citation: Campbell, Michael C., et al. “Origin and Differential Selection of Allelic Variation at TAS2R16 Associated with Salicin Bitter Taste Sensitivity in Africa.” Molecular biology and evolution (2013): mst211.

* If it was closer to ~10,000 I think haplotype based tests would come back with something, but they do not.

** Some Epicureans might be accused of reducing the good to taste!

The post The color of life as a coincidence appeared first on Gene Expression.

The color of life as a coincidence

Filed under: Anthroplogy,Evolution,Evolutionary Genetics,Genetics of taste,Taste — Razib Khan @ 12:35 am

Credit: Eric Hunt

Credit: Eric Hunt

I do love me some sprouts! Greens, bitters, strong flavors of all sorts. I’ve always been like this. Some of this is surely environment. My family comes from a part of South Asia known for its love of bracing and bold sensation. But perhaps I was born this way? There’s a fair amount of evidence that taste has a substantial genetic component. This does not mean genes determine what one tastes, but it certainly opens the door for passive gene-environment correlations. If you do not find a flavor offensive, you are much more likely to explore it depths, and cultivate your palette.

220px-Durio_kutej_F_070203_ime

Dost thou dare?
Credit: W.A. Djatmiko

And of course I’m not the only one with a deep interest in such questions. With the marginal income available to us many Americans have become “foodies,” searching for flavor bursts and novelties which their ancestors might never have been able to comprehend. More deeply in a philosophical sense the question of qualia reemerges if there is a predictable degree of inter-subjectivity in taste perception (OK, qualia is always there, though scientific sorts tend to view it as intractable in a fundamental sense).


But there’s heritability, and then there’s genes. We know that perception in some ways is heritable, but what is perhaps more interesting is if you can peg a specific genomic location to it. Then the evolutionary story becomes all the richer. And so it is with the locus TAS2R16, where a nonsynonymous mutation at location 516 seems to result in heightened sensitivity to bitter tastes. More specifically, it’s rs846664, and the derived T allele is fixed outside of Africa, while the ancestral G allele still segregates at appreciable fractions within African populations. A new paper in Molecular Biology and Evolution puts this locus under a microscope, though it does not come up with any clear conclusions. Origin and Differential Selection of Allelic Variation at TAS2R16 Associated with Salicin Bitter Taste Sensitivity in Africa presents some interesting findings. First, let’s look at the distribution of the variation in their sample populations at the SNP of most particular interest:

Region Population T516G
Outside of Africa Non-Africans 0.000
Ethiopia Semitic 0.059
Tanzania Sandawe 0.083
Ethiopia Omotic 0.093
Ethiopia Cushitic 0.095
Tanzania Iraqw 0.111
West Central Africa Fulani 0.114
Kenya Niger-Kordofanian 0.133
Ethiopia Nilo-Saharan 0.156
Kenya Afroasiatic 0.162
West Central Africa Niger-Kordofanian 0.214
Kenya Nilo-Saharan 0.225
Kenya Luo 0.250
Central Africa Niger-Kordofanian 0.329
Tanzania Hadza 0.333
Central Africa Bulala 0.361
Central Africa Nilo-Saharan 0.367
West Central Africa Afroasiatic 0.462
West Central Africa Nilo-Saharan 0.500

As you can see T is fixed outside of Africa, and varies across many African populations  Previous work implied this, though coverage within Africa was not good. One thing to observe though is that the frequency of A within Africa can not be explained by recent Eurasian admixture. The frequency is way too high for that to be the sole explanation, and in any case there is no evidence that ~33% of the Hadza’s ancestry is of Eurasian provenance (the Hadza being one of the three major groups of African hunter-gatherers, along with the Bushmen and Pygmies).

Within the paper the authors resequenced ~1,000 base pairs across diverse African populations in an exonic region of this gene (the stuff that codes for amino acids). What they discovered is that of the SNPs segregating, 516 in particular was critical toward effecting phenotyping change. Not only did individuals with the T variant notably exhibit stronger bitter sensitivity, but in vitro expression with a reporter was elevated. Because they had such a dense genomic region they could perform various nucleotide based tests to detect natural selection, and, attempt coalescent models to infer genealogical history.

I’m going to spare you some of the gory details at this point. Here’s what they found. First, it does look like the region is under natural selection in many African populations, in particular, the derived haplotype with T at 516 at the center. But this result is not reproduced across all tests. The coalescent simulations make clear why: the mutation is an old variant with deep roots in the hominin lineage. In other words this variation pre-dates H. sapiens. It looks like the T allele has rapidly increased in frequency relatively recently, though more on the order of ~50,000 years, rather than ~10,000.* Basically around the time of the “Out of Africa” event. Additionally, there’s a tell-tale sign that this is being subject to selection within Africa: the genetic differences across populations at TAS2R16 far exceed the genome-wide values (the Fst at this locus is in the top 1% of loci within the African genome). Finally, one should note that the G allele haplotypes seem to be much more strongly constrained, as if they’re under purifying selection. This means that the switch to T is not all gain.

At this point you may be ready for a story about how some African populations, like Eurasians, underwent a lifestyle change, and diet changes resulted in a shift in sensory perception. That does not seem to be the story. Rather, the authors did not seem to be able to agree upon a neat explanation for what is driving these recent sweeps up from ancient standing genetic variation. They do observe that the variation does tend to cluster geographically, more so than the genome-wide results would imply. There’s likely some adaptation going on, they simply don’t know what. In the introduction and elsewhere you can see that variation at TAS2R16 does correlate with other traits. Not too surprising due to the relatively ubiquity of pleiotropy; one gene with many effects.

Stepping outside of the implications of this specific result, let’s think about what might be a takeaway: something as essential as taste perception might be a side effect of other aspects of evolutionary processes. In other words, we don’t know what the phenotypic target of selection is in this case, but we do have a good handle one of the major side effects, which is sensory perception. How one taste seems like a big deal.** Andthere have been many theories propounded that variation in bitter sensitivity is due to adaptation to poisonous plants and such, but really no one knew, and that was just the most plausible of low hanging fruit. With these results from Africa, where there is more variation in the trait and genes, and good geographic coverage, that seems to be an implausible model to adhere to (one would think the hunter-gatherer Hadza would exhibit the most sensitivity, no?). Many of the traits and tendencies which we humans see as fundamental, essential, and of great import, many actually be side effects of powerful evolutionary forces hammering at the genetic-correlation matrices which define the hidden network of co-dependencies within the genome. So there, I said it. Life is an accident. Enjoy it.

Citation: Campbell, Michael C., et al. “Origin and Differential Selection of Allelic Variation at TAS2R16 Associated with Salicin Bitter Taste Sensitivity in Africa.” Molecular biology and evolution (2013): mst211.

* If it was closer to ~10,000 I think haplotype based tests would come back with something, but they do not.

** Some Epicureans might be accused of reducing the good to taste!

The post The color of life as a coincidence appeared first on Gene Expression.

November 8, 2013

Selection happens; but where, when, and why?

Filed under: Anthroplogy,Genetics,Genomics,Pigmentation — Razib Khan @ 1:49 am
Distribution of SLC452 variation at SNP rs1426654. Credit, HGDP Browser

Distribution of SLC452 variation at SNP rs1426654. Credit, HGDP Browser

Nina Davuluri, Miss America 2014, Credit: Andy Jones

Nina Davuluri, Miss America 2014, Credit: Andy Jones

One of the secondary issues which cropped up with Nina Davuluri winning Miss America is that it seems implausible that someone with her complexion would be able to win any Indian beauty contest. A quick skim of Google images “Miss India” will make clear the reality that I’m alluding to. The Indian beauty ideal, especially for females, is skewed to the lighter end of the complexion distribution of native South Asians. Nina Davuluri herself is not particularly dark skinned if you compared her to the average South Asian; in fact she is likely at the median. But it would be surprising to see a woman who looks like her held up as conventionally beautiful in the mainstream Indian media. When I’ve pointed this peculiar aspect out to Indians* some of them of will submit that there are dark skinned female celebrities, but when I look up the actresses in question they are invariably not very dark skinned, though perhaps by comparison to what is the norm in that industry they may be. But whatever the cultural reality is, the fraught relationship of color variation to aesthetic variation prompts us to ask, why are South Asians so diverse in their complexions in the first place? A new paper in PLoS Genetics, The Light Skin Allele of SLC24A5 in South Asians and Europeans Shares Identity by Descent, explores this genetic question in depth.

Much of the low hanging fruit in this area was picked years ago. A few large effect genetic variants which are known to be polymorphic across many populations in Western Eurasia segregate within South Asian populations. What this means in plainer language is that a few genes which cause major changes in phenotype are floating around in alternative flavors even within families among people of Indian subcontinental origin. Ergo, you can see huge differences between full siblings in complexion (African Americans, as an admixed population, are analogous). While loss of pigmentation in eastern and western Eurasia seems to be a case of convergent evolution (different mutations in overlapping sets of genes), the H. sapiens sapiens ancestral condition of darker skin is well conserved from Melanesia to Africa.


So what’s the angle on this paper you may ask? Two things. The first is that it has excellent coverage of South Asian populations. This matters because to understand variation in complexion you should probably look at populations which vary a great deal. Much of the previous work has focused on populations at the extremes of the human distribution, Africans and Europeans. There are obvious limitations using this approach. If you are looking at variant traits, then focusing on populations where the full range of variation is expressed can be useful. Second, this paper digs deeply into the subtle evolutionary and phylogenomic questions which are posed by the diversification of human pigmentation. It is often said that race is often skin deep, as if to dismiss the importance of human biological variation. But skin is a rather big deal. It’s our biggest organ, and the pigmentation loci do seem to be rather peculiar.

You probably know that on the order of ~20% of genetic variation is partitioned between continent populations (races). But this is not the case at all genes. And pigmentation ones tend to be particular notable exceptions to the rule. In late 2005 a paper was published which arguably ushered in the era of modern pigmentation genomics, SLC24A5, a putative cation exchanger, affects pigmentation in zebrafish and humans. The authors found that one nonsynonomous mutation was responsible for on the order of 25 to 33% of the variation in skin color difference between Africans and Europeans. And, the allele frequency was nearly disjoint across the two populations, and between Europeans and East Asians. When comparing Europeans to Africans and East Asians almost all the variation was partitioned across the populations, with very little within them. The derived SNP, which differs from the ancestral state, is found at ~100% frequency in Europeans, and ~0% in Africans and East Asians. It is often stated (you can Google it!) that this variant is the second most ancestrally informative allele in the human genome in relation to Europeans vs. Africans.

SLC24A5 was just the beginning. SLC45A2, TYR, OCA2, and KITLG are just some of the numerous alphabet soup of loci which has come to be understood to affect normal human variation in pigmentation. Despite the relatively large roll call of pigmentation genes one can safely say that between any two reasonably distinct geographic populations ~90 percent of the between population variation in the trait is going to be due to ~10 genes. Often there is a power law distribution as well. The first few genes of large effect are over 50% of the variance, while subsequent loci are progressively less important.

So how does this work to push the overall results forward?

- With their population coverage the authors confirm that SLC24A5 seems to be polymorphic in all Indo-European and Dravidian speaking populations in the subcontinent. The frequency of the derived variant ranges from ~90% in the Northwest, and ~80% in Brahmin populations all over the subcontinent, to ~10-20% in some tribal groups.

- Though there is a north-south gradient, it is modest, with a correlation of ~0.25. There is a much stronger correlation with longtitude, but I’m rather sure that this is an artifact of their low sampling of Indo-European populations in the eastern Gangetic plain. As hinted in the piece the correlation with longitude has to do with the fact that Tibetan and Burman populations in these fringe regions tend to lack the West Eurasian allele.

- Using haplotype based tests of natural selection the authors infer that the frequency of this allele has been driven up positively in north, but not south, India. It could be that the authors lack power to detect selection in the south because of lower frequency of the derived allele. And, I did wonder if selection in the north was simply an echo of what occurred in West Eurasia. But if you look at the frequency of the A allele in the north most of the populations seem to have a higher frequency of the derived variant than they do of inferred “Ancestral North Indian”.

What’s perhaps more interesting is the bigger picture of human evolutionary dynamics and phylogenetics that these results illuminate. Resequencing the region around SLC24A5 these researchers confirmed it does look like the derived variant is identical by descent in all populations across Western Eurasia and into South Asia. What this means is that this mutation arose in someone at some point around the Last Glacial Maximum, after West Eurasians separated from East Eurasians. The authors gives some numbers using some standard phylogenetic techniques, but admit that it is ancient DNA that will give true clarity on the deeper questions. When I see something written like that my hunch, and hope, is that more papers are coming soon.

When I first read The Light Skin Allele of SLC24A5 in South Asians and Europeans Shares Identity by Descent, I thought that it was essential to read Ancient DNA Links Native Americans With Europe and Efficient moment-based inference of admixture parameters and sources of gene flow. The reason goes back to the plot which I generated at the top of this post: notice that Native Americans do not carry the West Eurasian variant of SLC24A5. What the find of the ~24,000 Siberian boy, and his ancient DNA, suggest is that there was a population with affinities closer to West Eurasians than East Eurasians that contributed to the ancestry of Native Americans. The lack of the European variant of SLC24A5 in Native Americans suggests to me that the sweep had not begun, or, that the European variant was disfavored. What the other paper reports is that on the order of 20-40% of the ancestry of Europeans may be derived from an ancient North Eurasian population, unrelated to West Eurasians (or at least not closely related). It is likely that this population has something to do with the Siberian boy. Since Europeans are fixed for the derived variant of SLC24A5, that implies to me that sweep must have occurred after 24,000 years ago.

journal.pgen.1003912.g002At this point I have to admit that I believe need to be careful calling this a “European variant.” Just because it is nearly fixed in Europe, does not imply that the variant arose in Europe. If you look at the frequency of the derived variant you see it is rather high in the northern Middle East. Looking at some of the populations in the Middle Eastern panel the ancestral variant might be all explained by admixture in historical time from Africa. If the sweep began during the last Ice Age, then most of Europe would have been uninhabited. The modern distribution is informative, but it surely does not tell the whole story.

Where we are is that SLC24A5 , and pigmentation as a whole, is coming to be genomically characterized fully. We don’t know the whole story of why light skin was selected so strongly. And we don’t quite know where the selection began, and when it began. But through gradually filling in pieces of the puzzle we may come to grips with this adaptively significant trait in the nearly future.

Citation: Basu Mallick C, Iliescu FM, Möls M, Hill S, Tamang R, et al. (2013) The Light Skin Allele of SLC24A5 in South Asians and Europeans Shares Identity by Descent. PLoS Genet 9(11): e1003912. doi:10.1371/journal.pgen.1003912

* From my personal experience American born Indians often do not share the same prejudices and biases, partly because subtle shades of brown which are relevant in the Indian context seem ludicrous in the United States.

The post Selection happens; but where, when, and why? appeared first on Gene Expression.

December 18, 2012

Buddy, can you spare some ascertainment?

The above map shows the population coverage for the Geno 2.0 SNP-chip, put out by the Genographic Project. Their paper outlining the utility and rationale by the chip is now out on arXiv. I saw this map last summer, when Spencer Wells hosted a webinar on the launch of Geno 2.0, and it was the aspect which really jumped out at me. The number of markers that they have on this chip is modest, only >100,000 on the autosome, with a few tens of thousands more on the X, Y, and mtDNA. In contrast, the Axiom® Genome-Wide Human Origins 1 Array Plate being used by Patterson et al. has ~600,000 SNPs. But as is clear by the map above Geno 2.0 is ascertained in many more populations that the other comparable chips (Human Origins 1 Array uses 12 populations). It’s obvious that if you are only catching variation on a few populations, all the extra million markers may not give you much bang for the buck (not to mention the biases that that may introduce in your population genetic and phylogenetic inferences).


To the left are the list of populations against which the Human Origins 1 Array was ascertained, and they look rather comprehensive to me. In contrast, for Geno 2.0 ‘ancestrally informative markers’ were ascertained on 450 populations. The ultimate question for me is this: is all the extra ascertainment on diverse and obscure groups worth it? On first inspection Geno 2.0′s number of SNPs looks modest as I stated, but in my experience when you quality control and merge different panels together you are often left with only a few hundred thousand SNPs in any case. 100-200,000 SNPs is also sufficient to elucidate relationships even in genetically homogeneous regions such as Europe in my experience (it’s more than enough for model-based clustering, and seems to be overkill for MDS or PCA). One issue that jumps out at me about the Affymetrix chip is that it is ascertained toward the antipodes. In contrast, Geno 2.0 takes into account the Eurasian heartland. I suspect, for example, that Geno 2.0 would be better for population or ancestry assignment for South Asians because it would have more informative markers for those populations.

Ultimately I can’t really say much more until I use both marker sets in different and similar contexts. Since Geno 2.0 consciously excludes many functional and medically relevant SNPs its utility is primarily in the domain of demographics and history. If the populations in question are well covered by the Human Origins 1 Array, I see no reason why one shouldn’t go with it. Not only does it have more information about biological function, but the number of markers are many fold greater. On the other hand, Geno 2.0 may be more useful on the “blank zones” of the Affy chip. Hopefully the Genographic Project results paper for Geno 2.0 will come out soon and I can pull down their data set and play with it.

Cite: arXiv:1212.4116

December 12, 2012

A lighter shade of brown: the Dan MacArthur chronicles, not a Romani

Filed under: Anthroplogy,Daniel MacArthur,Human Genetics,Human Genomics — Razib Khan @ 9:25 am

Pakistani honor guard

A few days ago I suggested that Dr. Daniel MacArthur might have South Asian ancestry. Now, when confronted with surprise the best option is to stick with your prior assumption, unless that surprise is powerful enough for you to “update” your model. After a few days of further analysis I will update: I do think Dan MacArthur has South Asian ancestry. Dienekes dug further, and noticed that there are hallmarks of “Ancestral South Indian” ancestry along the first 2/3 or so of chromosome 10. Now, you do have to remember that this genomic region is only half South Asian. The other half is European.

But in any case, one question that some people brought up: perhaps MacArthur has Romani heritage? I’m skeptical of this partly because:

1) there weren’t that many Romani in Britain in the 19th century

2) The British Romani are already very highly admixed

Another friend, who is a population genomicist himself, expressed some skepticism that such a long segment wasn’t broken up by recombination over the generations. My only moderately informed answer is this: we’d only notice the long segments, because if a very small region of ‘exotic’ ancestry was embedded within the dominant ancestral component it probably would not show up on some of these tests (or, we’d assume it was noise). Dan has another segment of South Asian ancestry, but much smaller in size. It may be there are other regions which we could find if we used better reference populations.

Here’s what I tentatively want to do with Dan’s data now. First, take the 80 MB or so which has South Asian ancestry, and phase it. That way I’d have a South Asian chromosome and a European one, and we could look for matches for only the South Asian one. But being busy I didn’t have time to do this. What I did have time to do was reduce the chromosomal region under consideration, and then run an IBS distance analysis in a private data set I have. This is a crude, but not always uninformative analysis. But by looking at the relationships I can now conclude that Dan MacArthur probably does not have Romani ancestry. Why? Because the Romani are of Northwest Indian heritage, and MacArthur’s match pattern using the diploid genotype (so South Asian + European) does not match what I expect would emerge from such a combination.

The full table is below, but to me the fact that he has so many matches with Northwest Indian populations is evidence that his ancestry was not Northwest Indian. Otherwise, he would be matching more Utah white (CEU samples) more often. Rather, someone with a mix of more conventional South Asian ancestry and European ancestry often resembles some of the less South Asian populations of South Asia (e.g., Brahui) in these crude measures. In fact, one of the closest matches to Dan’s IBS profile’s is that of my own mother. She is a rather vanilla ethnic Bengali, so I think there is a strong chance that his Indian ancestry is similar. This weak genetic data isn’t really the primary reason. The British East India company operated out of Bengal for much of its history, and there are simply a lot of Bengalis.

There’s a lot more that can be done here. Since I don’t have time, here’s the pedigree file if anyone wants to play with them (Dan is DGM001).

Population Genetic distance from Dan Standardized distance
Brahui 0.253 81.268
Burusho 0.257 82.736
Razib’s Mother 0.258 82.783
CEU 0.258 82.993
Burusho 0.258 83.024
CEU 0.26 83.547
Sakilli 0.26 83.555
Brahui 0.261 83.831
Brahui 0.261 83.857
GIH 0.261 83.955
CEU 0.261 83.972
CEU 0.261 83.985
CEU 0.262 84.043
North Kannadi 0.262 84.169
CEU 0.262 84.207
CEU 0.262 84.318
CEU 0.262 84.33
CEU 0.263 84.391
Paniya 0.263 84.408
CEU 0.263 84.437
CEU 0.263 84.445
CEU 0.263 84.488
CEU 0.263 84.606
CEU 0.263 84.609
CEU 0.264 84.691
Brahui 0.264 84.709
CEU 0.264 84.752
CEU 0.264 84.764
Brahui 0.264 84.822
GIH 0.264 84.826
Burusho 0.264 84.841
CEU 0.264 84.898
CEU 0.264 84.975
North Kannadi 0.264 84.992
CEU 0.265 85.087
Paniya 0.265 85.212
CEU 0.265 85.226
CEU 0.265 85.25
CEU 0.265 85.25
CEU 0.265 85.278
CEU 0.265 85.299
North Kannadi 0.265 85.3
Burusho 0.265 85.309
Burusho 0.266 85.328
CEU 0.266 85.363
CEU 0.266 85.409
North Kannadi 0.266 85.412
CEU 0.266 85.436
Burusho 0.266 85.446
Bene Israel 0.266 85.508
CEU 0.266 85.521
GIH 0.266 85.618
GIH 0.267 85.661
CEU 0.267 85.696
CEU 0.267 85.722
CEU 0.267 85.732
Brahui 0.267 85.777
GIH 0.267 85.793
CEU 0.267 85.799
CEU 0.267 85.816
Cochin Jews 0.267 85.85
CEU 0.267 85.943
Brahui 0.268 85.996
CEU 0.268 86.005
Cochin Jews 0.268 86.011
CEU 0.268 86.08
CEU 0.268 86.115
CEU 0.268 86.18
GIH 0.268 86.229
Cochin Jews 0.268 86.234
CEU 0.268 86.244
Burusho 0.268 86.265
CEU 0.268 86.277
CEU 0.268 86.278
CEU 0.269 86.288
CEU 0.269 86.291
CEU 0.269 86.318
CEU 0.269 86.325
CEU 0.269 86.326
GIH 0.269 86.327
CEU 0.269 86.329
CEU 0.269 86.354
CEU 0.269 86.387
CEU 0.269 86.463
CEU 0.269 86.515
CEU 0.269 86.517
CEU 0.269 86.55
CEU 0.27 86.609
Paniya 0.27 86.682
CEU 0.27 86.687
CEU 0.27 86.696
CEU 0.27 86.717
CEU 0.27 86.733
Sakilli 0.27 86.74
CEU 0.27 86.866
Malayan 0.27 86.879
North Kannadi 0.27 86.883
CEU 0.271 86.937
Brahui 0.271 86.952
Burusho 0.271 86.956
CEU 0.271 86.957
CEU 0.271 86.977
North Kannadi 0.271 86.995
GIH 0.271 87.018
CEU 0.271 87.042
CEU 0.271 87.066
CEU 0.271 87.07
Brahui 0.271 87.09
Bene Israel 0.271 87.094
Sakilli 0.271 87.141
CEU 0.271 87.2
CEU 0.271 87.24
North Kannadi 0.272 87.253
CEU 0.272 87.297
Burusho 0.272 87.307
CEU 0.272 87.327
GIH 0.272 87.353
CEU 0.272 87.355
Cochin Jews 0.272 87.381
CEU 0.272 87.384
CEU 0.272 87.5
CEU 0.272 87.535
CEU 0.273 87.594
Malayan 0.273 87.676
CEU 0.273 87.702
CEU 0.273 87.741
Burusho 0.273 87.806
CEU 0.273 87.846
Cambodians 0.274 87.932
North Kannadi 0.274 87.951
CEU 0.274 87.951
Burusho 0.274 88.03
CEU 0.274 88.047
CEU 0.274 88.081
CEU 0.274 88.089
CEU 0.274 88.101
CEU 0.274 88.179
CEU 0.274 88.19
North Kannadi 0.275 88.243
CEU 0.275 88.32
GIH 0.275 88.325
CEU 0.275 88.349
Brahui 0.275 88.393
CEU 0.275 88.402
CEU 0.275 88.457
Bene Israel 0.276 88.552
CEU 0.276 88.577
CEU 0.276 88.603
CEU 0.276 88.647
CEU 0.276 88.7
CEU 0.276 88.729
CEU 0.276 88.814
CEU 0.276 88.85
Brahui 0.276 88.855
CEU 0.277 88.923
GIH 0.277 88.99
Paniya 0.277 89.082
CEU 0.277 89.118
CEU 0.277 89.15
CEU 0.277 89.151
CEU 0.277 89.17
CEU 0.278 89.184
Cambodians 0.278 89.208
Cambodians 0.278 89.233
Cambodians 0.278 89.383
CEU 0.278 89.45
CEU 0.278 89.493
Cambodians 0.279 89.522
CEU 0.279 89.595
CEU 0.279 89.679
CEU 0.279 89.753
CEU 0.279 89.762
CEU 0.279 89.807
Cambodians 0.28 89.942
GIH 0.28 90.085
CEU 0.281 90.178
Brahui 0.281 90.364
Cambodians 0.282 90.543
Cambodians 0.282 90.559
Cambodians 0.282 90.77
Cambodians 0.283 90.898
CEU 0.283 90.956
CEU 0.284 91.316
CHD 0.289 92.952
Sakilli 0.29 93.103
Bene Israel 0.29 93.122
CHD 0.291 93.619
CHD 0.291 93.663
CHD 0.293 94.125
CHD 0.293 94.248
CHD 0.294 94.451
CHD 0.294 94.629
CHD 0.296 94.965
CHD 0.296 95.279
Yorubas 0.297 95.298
CHD 0.297 95.368
CHD 0.297 95.438
CHD 0.297 95.441
Yorubas 0.297 95.567
CHD 0.298 95.678
CHD 0.298 95.828
CHD 0.299 96.032
CHD 0.299 96.127
CHD 0.3 96.349
CHD 0.3 96.403
CHD 0.3 96.443
CHD 0.3 96.508
CHD 0.3 96.523
CHD 0.3 96.533
CHD 0.301 96.575
CHD 0.301 96.598
CHD 0.301 96.624
CHD 0.301 96.625
CHD 0.301 96.738
CHD 0.301 96.758
CHD 0.301 96.869
Yorubas 0.302 97.106
CHD 0.303 97.37
CHD 0.303 97.41
Yorubas 0.304 97.681
CHD 0.304 97.713
CHD 0.304 97.747
Yorubas 0.304 97.829
CHD 0.304 97.838
CHD 0.305 98.106
CHD 0.306 98.309
Yorubas 0.307 98.499
CHD 0.307 98.546
CHD 0.307 98.547
CHD 0.307 98.606
CHD 0.307 98.764
CHD 0.307 98.78
CHD 0.307 98.803
Yorubas 0.308 98.947
Yorubas 0.308 99.03
Yorubas 0.309 99.411
Yorubas 0.309 99.417
CHD 0.309 99.452
CHD 0.31 99.624
Yorubas 0.311 100

A lighter shade of brown: the Dan MacArthur chronicles, not a Romani

Filed under: Anthroplogy,Daniel MacArthur,Human Genetics,Human Genomics — Razib Khan @ 9:25 am

Pakistani honor guard

A few days ago I suggested that Dr. Daniel MacArthur might have South Asian ancestry. Now, when confronted with surprise the best option is to stick with your prior assumption, unless that surprise is powerful enough for you to “update” your model. After a few days of further analysis I will update: I do think Dan MacArthur has South Asian ancestry. Dienekes dug further, and noticed that there are hallmarks of “Ancestral South Indian” ancestry along the first 2/3 or so of chromosome 10. Now, you do have to remember that this genomic region is only half South Asian. The other half is European.

But in any case, one question that some people brought up: perhaps MacArthur has Romani heritage? I’m skeptical of this partly because:

1) there weren’t that many Romani in Britain in the 19th century

2) The British Romani are already very highly admixed

Another friend, who is a population genomicist himself, expressed some skepticism that such a long segment wasn’t broken up by recombination over the generations. My only moderately informed answer is this: we’d only notice the long segments, because if a very small region of ‘exotic’ ancestry was embedded within the dominant ancestral component it probably would not show up on some of these tests (or, we’d assume it was noise). Dan has another segment of South Asian ancestry, but much smaller in size. It may be there are other regions which we could find if we used better reference populations.

Here’s what I tentatively want to do with Dan’s data now. First, take the 80 MB or so which has South Asian ancestry, and phase it. That way I’d have a South Asian chromosome and a European one, and we could look for matches for only the South Asian one. But being busy I didn’t have time to do this. What I did have time to do was reduce the chromosomal region under consideration, and then run an IBS distance analysis in a private data set I have. This is a crude, but not always uninformative analysis. But by looking at the relationships I can now conclude that Dan MacArthur probably does not have Romani ancestry. Why? Because the Romani are of Northwest Indian heritage, and MacArthur’s match pattern using the diploid genotype (so South Asian + European) does not match what I expect would emerge from such a combination.

The full table is below, but to me the fact that he has so many matches with Northwest Indian populations is evidence that his ancestry was not Northwest Indian. Otherwise, he would be matching more Utah white (CEU samples) more often. Rather, someone with a mix of more conventional South Asian ancestry and European ancestry often resembles some of the less South Asian populations of South Asia (e.g., Brahui) in these crude measures. In fact, one of the closest matches to Dan’s IBS profile’s is that of my own mother. She is a rather vanilla ethnic Bengali, so I think there is a strong chance that his Indian ancestry is similar. This weak genetic data isn’t really the primary reason. The British East India company operated out of Bengal for much of its history, and there are simply a lot of Bengalis.

There’s a lot more that can be done here. Since I don’t have time, here’s the pedigree file if anyone wants to play with them (Dan is DGM001).

Population Genetic distance from Dan Standardized distance
Brahui 0.253 81.268
Burusho 0.257 82.736
Razib’s Mother 0.258 82.783
CEU 0.258 82.993
Burusho 0.258 83.024
CEU 0.26 83.547
Sakilli 0.26 83.555
Brahui 0.261 83.831
Brahui 0.261 83.857
GIH 0.261 83.955
CEU 0.261 83.972
CEU 0.261 83.985
CEU 0.262 84.043
North Kannadi 0.262 84.169
CEU 0.262 84.207
CEU 0.262 84.318
CEU 0.262 84.33
CEU 0.263 84.391
Paniya 0.263 84.408
CEU 0.263 84.437
CEU 0.263 84.445
CEU 0.263 84.488
CEU 0.263 84.606
CEU 0.263 84.609
CEU 0.264 84.691
Brahui 0.264 84.709
CEU 0.264 84.752
CEU 0.264 84.764
Brahui 0.264 84.822
GIH 0.264 84.826
Burusho 0.264 84.841
CEU 0.264 84.898
CEU 0.264 84.975
North Kannadi 0.264 84.992
CEU 0.265 85.087
Paniya 0.265 85.212
CEU 0.265 85.226
CEU 0.265 85.25
CEU 0.265 85.25
CEU 0.265 85.278
CEU 0.265 85.299
North Kannadi 0.265 85.3
Burusho 0.265 85.309
Burusho 0.266 85.328
CEU 0.266 85.363
CEU 0.266 85.409
North Kannadi 0.266 85.412
CEU 0.266 85.436
Burusho 0.266 85.446
Bene Israel 0.266 85.508
CEU 0.266 85.521
GIH 0.266 85.618
GIH 0.267 85.661
CEU 0.267 85.696
CEU 0.267 85.722
CEU 0.267 85.732
Brahui 0.267 85.777
GIH 0.267 85.793
CEU 0.267 85.799
CEU 0.267 85.816
Cochin Jews 0.267 85.85
CEU 0.267 85.943
Brahui 0.268 85.996
CEU 0.268 86.005
Cochin Jews 0.268 86.011
CEU 0.268 86.08
CEU 0.268 86.115
CEU 0.268 86.18
GIH 0.268 86.229
Cochin Jews 0.268 86.234
CEU 0.268 86.244
Burusho 0.268 86.265
CEU 0.268 86.277
CEU 0.268 86.278
CEU 0.269 86.288
CEU 0.269 86.291
CEU 0.269 86.318
CEU 0.269 86.325
CEU 0.269 86.326
GIH 0.269 86.327
CEU 0.269 86.329
CEU 0.269 86.354
CEU 0.269 86.387
CEU 0.269 86.463
CEU 0.269 86.515
CEU 0.269 86.517
CEU 0.269 86.55
CEU 0.27 86.609
Paniya 0.27 86.682
CEU 0.27 86.687
CEU 0.27 86.696
CEU 0.27 86.717
CEU 0.27 86.733
Sakilli 0.27 86.74
CEU 0.27 86.866
Malayan 0.27 86.879
North Kannadi 0.27 86.883
CEU 0.271 86.937
Brahui 0.271 86.952
Burusho 0.271 86.956
CEU 0.271 86.957
CEU 0.271 86.977
North Kannadi 0.271 86.995
GIH 0.271 87.018
CEU 0.271 87.042
CEU 0.271 87.066
CEU 0.271 87.07
Brahui 0.271 87.09
Bene Israel 0.271 87.094
Sakilli 0.271 87.141
CEU 0.271 87.2
CEU 0.271 87.24
North Kannadi 0.272 87.253
CEU 0.272 87.297
Burusho 0.272 87.307
CEU 0.272 87.327
GIH 0.272 87.353
CEU 0.272 87.355
Cochin Jews 0.272 87.381
CEU 0.272 87.384
CEU 0.272 87.5
CEU 0.272 87.535
CEU 0.273 87.594
Malayan 0.273 87.676
CEU 0.273 87.702
CEU 0.273 87.741
Burusho 0.273 87.806
CEU 0.273 87.846
Cambodians 0.274 87.932
North Kannadi 0.274 87.951
CEU 0.274 87.951
Burusho 0.274 88.03
CEU 0.274 88.047
CEU 0.274 88.081
CEU 0.274 88.089
CEU 0.274 88.101
CEU 0.274 88.179
CEU 0.274 88.19
North Kannadi 0.275 88.243
CEU 0.275 88.32
GIH 0.275 88.325
CEU 0.275 88.349
Brahui 0.275 88.393
CEU 0.275 88.402
CEU 0.275 88.457
Bene Israel 0.276 88.552
CEU 0.276 88.577
CEU 0.276 88.603
CEU 0.276 88.647
CEU 0.276 88.7
CEU 0.276 88.729
CEU 0.276 88.814
CEU 0.276 88.85
Brahui 0.276 88.855
CEU 0.277 88.923
GIH 0.277 88.99
Paniya 0.277 89.082
CEU 0.277 89.118
CEU 0.277 89.15
CEU 0.277 89.151
CEU 0.277 89.17
CEU 0.278 89.184
Cambodians 0.278 89.208
Cambodians 0.278 89.233
Cambodians 0.278 89.383
CEU 0.278 89.45
CEU 0.278 89.493
Cambodians 0.279 89.522
CEU 0.279 89.595
CEU 0.279 89.679
CEU 0.279 89.753
CEU 0.279 89.762
CEU 0.279 89.807
Cambodians 0.28 89.942
GIH 0.28 90.085
CEU 0.281 90.178
Brahui 0.281 90.364
Cambodians 0.282 90.543
Cambodians 0.282 90.559
Cambodians 0.282 90.77
Cambodians 0.283 90.898
CEU 0.283 90.956
CEU 0.284 91.316
CHD 0.289 92.952
Sakilli 0.29 93.103
Bene Israel 0.29 93.122
CHD 0.291 93.619
CHD 0.291 93.663
CHD 0.293 94.125
CHD 0.293 94.248
CHD 0.294 94.451
CHD 0.294 94.629
CHD 0.296 94.965
CHD 0.296 95.279
Yorubas 0.297 95.298
CHD 0.297 95.368
CHD 0.297 95.438
CHD 0.297 95.441
Yorubas 0.297 95.567
CHD 0.298 95.678
CHD 0.298 95.828
CHD 0.299 96.032
CHD 0.299 96.127
CHD 0.3 96.349
CHD 0.3 96.403
CHD 0.3 96.443
CHD 0.3 96.508
CHD 0.3 96.523
CHD 0.3 96.533
CHD 0.301 96.575
CHD 0.301 96.598
CHD 0.301 96.624
CHD 0.301 96.625
CHD 0.301 96.738
CHD 0.301 96.758
CHD 0.301 96.869
Yorubas 0.302 97.106
CHD 0.303 97.37
CHD 0.303 97.41
Yorubas 0.304 97.681
CHD 0.304 97.713
CHD 0.304 97.747
Yorubas 0.304 97.829
CHD 0.304 97.838
CHD 0.305 98.106
CHD 0.306 98.309
Yorubas 0.307 98.499
CHD 0.307 98.546
CHD 0.307 98.547
CHD 0.307 98.606
CHD 0.307 98.764
CHD 0.307 98.78
CHD 0.307 98.803
Yorubas 0.308 98.947
Yorubas 0.308 99.03
Yorubas 0.309 99.411
Yorubas 0.309 99.417
CHD 0.309 99.452
CHD 0.31 99.624
Yorubas 0.311 100

December 2, 2012

TreeMix: Who were the West Eurasian ancestors of Ethiopians?

Filed under: Anthroplogy,Ethiopia,Genetics,Genomics — Razib Khan @ 3:46 pm

One of the primary concerns/questions I had about Luca Pagani’s paper on the genetic origin of Ethiopians is that he found that their West Eurasian ancestor was closer to Levantine than Arabian. I was confused by this because on model-based clustering (e.g., Admixture) when you push down to a fine level of granularity you always see that the Ethiopians cluster with the Yemenis for their non-African ancestry. More precisely, Yemeni Jews are often ~100% component X, which ~50% of the ancestry of Ethiopians.

From what I recall Pagani et al. used haplotype windows which they assigned to Eurasian or African ancestral components, and they compared these to the populations related to the putative ancestral groups. Because Pagani et al. used blocks of the genome, rather than just on specific genotypes, I weight their finding more strongly. But I wanted to double check with TreeMix if the finding in Admixture was peculiar.

So again, I took a ~150,000 SNP set ran it on TreeMix with migration = 5.

Again, you see that the gene flow to the Ethiopians is coming from a position on the tree rather close to Yemenite Jews. One model which may explain this, and still align with Pagani’s findings, is that Arabians themselves are a synthetic population. A “pure” Yemenite Jew may have ancient admixture of African affinity beneath an intrusive element from the north. The parallelism between Ethiopia and Arabia in this model is clear, with the major difference being magnitude of the source population admixture (greater in Arabia), as well as some differences of the target population.

This again reiterates us to be careful of trust first-blush summaries.

Layering genetic histories

Filed under: Anthroplogy,Genetics,Genomics,Human Genetics,Human Genomics — Razib Khan @ 12:14 pm

As a follow up to my post from yesterday, I decided to run TreeMix on a data set I happened to have had on hand (see Inference of Population Splits and Mixtures from Genome-Wide Allele Frequency Data for more on TreeMix). Basically I wanted to display a tree with, and without, gene flow.

The technical details are straightforward. I LD pruned ~550,000 SNPs down to ~150,000. I ran TreeMix without and with migration parameters with the Bantu Kenya population being the root. Finally, when I did turn on the migration parameter I set it for 5. You can see the results below.

Most of the flows are pretty expected. The West Eurasian flow from the Turks to the Uygurs makes sense, because there is a large West Asian component to what the Uygurs have (from East Iranians?). The Chuvash are a Turkic group with minor, but significant, Turkic component. The HGDP Russian sample does have some East Eurasian ancestry. And the Moroccans also have African ancestry. But your guess is as good as mine with the Bantu flow in. These are I think Kenya, so it might be trying to interpret Nilotic admixture as generalized Eurasian.

A minor note: installing TreeMix and generating the appropriate files from pedigree format is not to difficult. But you might have confusion in how to generate the pedigree input file. You do it like so in PLINK:

./plink --noweb --bfile YourFile --freq --within YourGroupNamesFile --out YourOutPutFile

It’s the last you want to put into TreeMix’s python conversion script. The YourGroupNamesFile is basically the .fam file with an extra column, the population names for each individual.

December 1, 2012

Africa’s hidden people hold the keys to the past

I mentioned this in passing on my post on ASHG 2012, but it seems useful to make explicit. For the past few years there has been word of research pointing to connections between the Khoisan and the Cushitic people of Ethiopia. To a great extent in the paper which is forthcoming there is the likely answer to the question of who lived in East Africa before the Bantu, and before the most recent back-migration of West Eurasians. On one level I’m confused as to why this has to be something of a mystery, because the most recent genetic evidence suggests a admixture on the order of 2-3,000 years before the past.* If the admixture was so recent we should find many of the “first people,” no? As it is, we don’t. I think these groups, and perhaps the Sandawe, are the closest we’ll get.

Publication is imminent at this point (of this, I was assured), so I’m going to just state the likely candidate population (or at least one of them): the Sanye, who speak a Cushitic language with possible Khoisan influences. There really isn’t that much information on these people, which is why when I first heard about the preliminary results a few years back and looked around for Khoisan-like populations in Kenya I wasn’t sure I’d hit upon the right group. But at ASHG I saw some STRUCTURE plots with the correct populations, and the Sanye were one of them. I would have liked to see something like TreeMix, but the STRUCTURE results were of a quality that I could accept that these populations were not being well modeled by the variation which dominated their data set. Though Cushitic in language the Sanye had far less of the West Eurasian element present among other Cushitic speaking populations of the Horn of Africa. Neither were their African ancestral components quite like that of the Nilotic or Bantu populations. The clustering algorithm was having a “hard time” making sense of them (it seemed to wanted to model them as linear combinations of more familiar groups, but was doing a bad job of it).

Here is an interesting article on these groups: Little known tribe that census forgot. Like the Sandawe this is a population which seems to have been hunter-gatherers very recently, and to some extent still engage in this lifestyle. In this way I think they are fundamentally different from Indian tribal populations, who are often held up to be the “first people” of the subcontinent.  More and more it seems that the tribes of India are less the descendants of the original inhabitants of the subcontinent, at least when compared to the typical Indian peasant, and more simply those segments of the Indian population which were marginalized and pushed into less productive territory. Over time they naturally diverged culturally because of their isolation, but the difference was not primal. In contrast, groups like the Sanye and Sandawe may have mixed to a great extent with their neighbors (and lost their language like the Pygmies), but evidence of full featured hunting & gathering lifestyles implies a sort of direct cultural continuity with the landscape of eastern Africa before the arrival of farmers and pastoralists from the west and north.

* I understand some readers refuse to accept the likelihood of these results because of other lines of information. I am just relaying the results of the geneticists. I am not interested in re-litigating prior discussions on this. We’ll probably have a resolution soon enough.

Northern Europeans and Native Americans are not more closely related than previously thought

A new press release is circulating on the paper which I blogged a few months ago, Ancient Admixture in Human History. Unlike the paper, the title of the press release is misleading, and unfortunately I notice that people are circulating it, and probably misunderstanding what is going on. Here’s the title and first paragraph:

Native Americans and Northern Europeans More Closely Related Than Previously Thought

Released: 11/30/2012 2:00 PM EST
Source: Genetics Society of America

Newswise — BETHESDA, MD – November 30, 2012 — Using genetic analyses, scientists have discovered that Northern European populations—including British, Scandinavians, French, and some Eastern Europeans—descend from a mixture of two very different ancestral populations, and one of these populations is related to Native Americans. This discovery helps fill gaps in scientific understanding of both Native American and Northern European ancestry, while providing an explanation for some genetic similarities among what would otherwise seem to be very divergent groups. This research was published in the November 2012 issue of the Genetics Society of America’s journal GENETICS

 

The reality is ta Native Americans and Northern Europeans are not more “closely related” genetically than they were before this paper. There has been no great change to standard genetic distance measures or phylogeographic understanding of human genetic variation. A measure of relatedness is to a great extent a summary of historical and genealogical processes, and as such it collapses a great deal of disparate elements together into one description. What the paper in Genetics outlined was the excavation of specific historically contingent processes which result in the summaries of relatedness which we are presented with, whether they be principal component analysis, Fst, or model-based clustering.

What I’m getting at can be easily illustrated by a concrete example. To the left is a 23andMe chromosome 1 “ancestry painting” of two individuals. On the left is me, and the right is a friend. The orange represents “Asian ancestry,” and the blue represents “European” ancestry. We are both ~50% of both ancestral components. This is a correct summary of our ancestry, as far as it goes. But you need some more information. My friend has a Chinese father and a European mother. In contrast, I am South Asian, and the end product of an ancient admixture event. You can’t tell that from a simple recitation of ancestral quanta. But it is clear when you look at the distribution of ancestry on the chromosomes. My components have been mixed and matched by recombination, because there have been many generations between the original admixture and myself. In contrast, my friend has not had any recombination events between his ancestral components, because he is the first generation of that combination.

So what the paper publicized in the press release does is present methods to reconstruct exactly how patterns of relatedness came to be, rather than reiterating well understood patterns of relatedness. With the rise of whole-genome sequencing and more powerful computational resources to reconstruct genealogies we’ll be seeing much more of this to come in the future, so it is important that people are not misled as to the details of the implications.

Northern Europeans and Native Americans are not more closely related than previously thought

A new press release is circulating on the paper which I blogged a few months ago, Ancient Admixture in Human History. Unlike the paper, the title of the press release is misleading, and unfortunately I notice that people are circulating it, and probably misunderstanding what is going on. Here’s the title and first paragraph:

Native Americans and Northern Europeans More Closely Related Than Previously Thought

Released: 11/30/2012 2:00 PM EST
Source: Genetics Society of America

Newswise — BETHESDA, MD – November 30, 2012 — Using genetic analyses, scientists have discovered that Northern European populations—including British, Scandinavians, French, and some Eastern Europeans—descend from a mixture of two very different ancestral populations, and one of these populations is related to Native Americans. This discovery helps fill gaps in scientific understanding of both Native American and Northern European ancestry, while providing an explanation for some genetic similarities among what would otherwise seem to be very divergent groups. This research was published in the November 2012 issue of the Genetics Society of America’s journal GENETICS

 

The reality is ta Native Americans and Northern Europeans are not more “closely related” genetically than they were before this paper. There has been no great change to standard genetic distance measures or phylogeographic understanding of human genetic variation. A measure of relatedness is to a great extent a summary of historical and genealogical processes, and as such it collapses a great deal of disparate elements together into one description. What the paper in Genetics outlined was the excavation of specific historically contingent processes which result in the summaries of relatedness which we are presented with, whether they be principal component analysis, Fst, or model-based clustering.

What I’m getting at can be easily illustrated by a concrete example. To the left is a 23andMe chromosome 1 “ancestry painting” of two individuals. On the left is me, and the right is a friend. The orange represents “Asian ancestry,” and the blue represents “European” ancestry. We are both ~50% of both ancestral components. This is a correct summary of our ancestry, as far as it goes. But you need some more information. My friend has a Chinese father and a European mother. In contrast, I am South Asian, and the end product of an ancient admixture event. You can’t tell that from a simple recitation of ancestral quanta. But it is clear when you look at the distribution of ancestry on the chromosomes. My components have been mixed and matched by recombination, because there have been many generations between the original admixture and myself. In contrast, my friend has not had any recombination events between his ancestral components, because he is the first generation of that combination.

So what the paper publicized in the press release does is present methods to reconstruct exactly how patterns of relatedness came to be, rather than reiterating well understood patterns of relatedness. With the rise of whole-genome sequencing and more powerful computational resources to reconstruct genealogies we’ll be seeing much more of this to come in the future, so it is important that people are not misled as to the details of the implications.

November 17, 2012

The archaeologist, James Fallows, and Neandertals

Filed under: Anthroplogy,Archaeology,Genetics,Neandertals — Razib Khan @ 10:10 pm

A month ago I posted Don’t trust an archaeologist about genetics, don’t trust a geneticist about archaeology, in response to James Fallows at At 5% Neanderthal, You Are an Outlier. Fallows has now put up a follow up, The Neanderthal Defense Committee Swings Into Action, where he links to my response post. This prompted the original archaeologist in question to reach out to me via email. I am posting the letter, with their permission, below.

Hello!

I’m dropping an email because I followed a link from Fallows to find my email to him highlighted negatively on your blog. I’ve emailed a few times with Mr. Fallows on various topics and had no idea he was going to post that email – he didn’t ask until after it was already up and so, yes you’re right it was a casual dashed off email and confused two different articles (both of which incidentally I have read so please no more comments on what I may or may not have done). Mea Culpa. And you’re right, I’m not a geneticist – I’m not even a lab scientist. However, I know a heck of a lot about archaeology and I work closely with ...

November 15, 2012

Open Thread, 11-15-2012

Filed under: Anthroplogy,Open Thread — Razib Khan @ 1:58 am

Your cry is heard!

November 11, 2012

The Genographic Project’s Scientific Grants Program

While I was at Spencer Wells’ poster at ASHG I was primarily curious about bar plots. He’s got really good spatial coverage, so I’m moderately excited about the paper (though I didn’t see much explicit testing of phylogenetic hypotheses, which I think this sort of paper has to do now; we’re beyond PCA and bar plots only papers). That being said, Spencer was more interested in me promoting the Scientific Grants Program. Here’s some more information:

The Genographic Project’s Scientific Grants Program awards grants on a rolling basis for projects that focus on studying the history of the human species utilizing innovative anthropological genetic tools. The variety of projects supported by the scientific grants will aim to construct our ancient migratory and demographic history while developing a better understanding of the phylogeographic structure of world populations. Sample research topics could include subjects like the origin and spread of the Indo-European languages, genetic insights into Papua New Guinea’s high linguistic diversity, the number and routes of migrations out of Africa, the origin of the Inca, or the genetic impact of the spread of maize agriculture in the Americas.

Recipients will typically be population geneticists, students, linguists, and other researchers or scientists interested ...

Reflections on the evolution at ASHG 2012

As most readers know I was at ASHG 2012. I’m going to divide this post in half. First, the generalities of the meeting. And second, specific posters, etc.

Generalities:

- Life Technologies/Ion Torrent apparently hires d-bag bros to represent them at conferences. The poster people were fine, but the guys manning the Ion Torrent Bus were total jackasses if they thought it would be funny/amusing/etc. Human resources acumen is not always a reflection of technological chops, but I sure don’t expect organizational competence if they (HR) thought it was smart to hire guys who thought (the d-bags) it would be amusing to alienate a selection of conference goers at ASHG. Go Affy & Illumina!

- Speaking of sequencing, there were some young companies trying to pitch technologies which will solve the problem of lack of long reads. I’m hopeful, but after the Pacific Biosciences fiasco of the late 2000s, I don’t think there’s a point in putting hopes on any given firm.

- I walked the poster hall, read the titles, and at least skimmed all 3,000+ posters’ abstracts. No surprise that genomics was all over the place. But perhaps a moderate ...

November 4, 2012

Honey Boo Boo, Jersey Shore, and the Sistine Chapel

Filed under: Anthroplogy,Honey Boo Boo,Jersey Shore,Sistine Chapel — Razib Khan @ 10:32 pm

Scott Jackisch (a.k.a., “Oakland Futurist Guy”) has a post up with the title, Jersey Shore is better than cat burning. Provocative? Yes. Timely? No. Jersey Shore is so 2010. We live in the age of Honey Boo Boo. This is clear from Google Trends. The red line represents searches for Honey Boo Boo, and the blue for Jersey Shore.

Mama June farting up a storm is still superior to cat burning. But about the Sistine Chapel, I’ll leave you with Mr. Jackisch’s ruminations on that and genital mutilation:

Take genital mutilation. That’s cultural. I place it right along side of the Sistine Chapel as an example of culture. Most of the Mills Students agreed that we need to take the good and leave the bad behind in regard to the old cultures. But I wonder how divisible cultural artifacts truly are. Is the Sistine Chapel integrally linked to oppression and Inquisition? Can the beauty really be expunged of the horrors that funded it and the message it inheres? Some things were lost with the passing of Culture. Some horrible things along with the great.

The question of modularity and contingency ...

October 31, 2012

R1a1a conquers the world…in a few pulses?

Filed under: Anthroplogy,R1a1a — Razib Khan @ 1:52 am

As many of you know around the year 2000 the analyses of Y chromosomal human lineages became a pretty big deal. The reason these lineages are important and useful is that they record the uninterrupted ancestry of males, from father to son, along the Y chromosome. Instead of the complexities of the whole genome, as with mtDNA you have a simple and elegant phylogenetic tree to interpret. The clusters along this tree are defined as broad haplogroups, united by derived states from a common ancestor. One of the largest haplogroups is R1a1a. It happens to be my paternal lineage, as well as Dr. Daniel MacArthur’s and Dr. Zack Ajmal’s.

The map above illustrates the peculiarity of R1a1a: it is geographically enormously expansive. How to explain this distribution? A naive response might be that this distribution is surprising similar to that of the Indo-European languages. Unfortunately this runs up against the conundrum that low caste South Indian groups, relatively untouched by Indo-Aryan culture (at least until the past few hundred years), also manifest high frequencies of R1a1a.

To make a long story short it seems that R1a1a is an ...

October 23, 2012

Why chimpanzees can donate blood in movies

There is a high likelihood that you know of which ABO blood group you belong to. I am A. My daughter is A. My father is B. My mother is A. I have siblings who are A, O, B, and AB. The inheritance is roughly Mendelian, with O being “recessive” to A and B (which are co-dominant with each other, ergo, AB). It is also generally common knowledge that O is a “universal donor,” while A and B can only give to individuals within their respective blood group and AB.

Because ABO was easy to assay it was one of the earliest Mendelian markers utilized in human genetics. In the first half of the 20th century while some anthropologists were measuring skulls, others were mapping out the frequency of A, B, and O. Today with much more robust genetic methods ABO has lost its old luster as a genetic marker, especially since there is a strong suspicion that the variants are strongly shaped by natural selection. This makes them only marginally useful for systematics, which rely upon loci which are honest mirrors of demographic history.


But there’s another aspect of ...

October 21, 2012

The arcane art of ancient admixture

Filed under: Anthroplogy,Human Evolution,Human Genetics — Razib Khan @ 9:29 pm


I have mentioned the PLoS Genetics paper, The Date of Interbreeding between Neandertals and Modern Humans, before because a version of it was put up on arXiv. The final paper has a few additions. For example, it mentions the generally panned (at least in the circles I run in) PNAS paper which suggested that ancient population structure could produce the same patterns which were earlier used to infer admixture with Neandertals (the authors also point to Yang et al. as a support for the proposition of admixture rather than structure). The primary result, dating the admixture between Neandertals and anatomically modern humans ~40-80,000 years before the present, is reiterated.

An interesting aspect is that their method is to utilize linkage disequilibrium (LD) decay. It’s interesting because tens of thousands of years is a hell of a long time to be able to detect an admixture event via LD! In particular because there’s likely a palimpsest effect where there are intervening admixtures and other assorted demographic events (e.g., bottlenecks and selective sweeps can also generate LD). So how’d they do it? Basically the authors figured out a way to ...

Older Posts »

Powered by WordPress