Razib Khan One-stop-shopping for all of my content

January 27, 2012

Out of Africa and out of Siberia

The latest edition of The American Journal of Human Genetics has two papers using “old fashioned” uniparental markers to trace human migration out of Africa and Siberia respectively. I say old fashioned because the peak novelty of these techniques was around 10 years ago, before dense autosomal SNP marker analyses, let alone whole genome sequencing. But mtDNA, passed down the maternal line, and Y chromosomes, passed from father to son, are still useful. Prosaically they’re useful because the data sets are now so large for these sets of markers after nearly 20 years of surveying populations. More technically because these two regions of the genome do not recombine they lend themselves to excellent representation as a tree phylogeny. Finally, mtDNA in particular is particularly amenable to estimates via molecular clock methodologies (it has a region with a higher mutational rate, so you can sample a larger range of variation over a given number of base pairs; you can use STRs, which mutate rapidly, for Y chromosomes, but there seems to be a lot of controversy in dating).

The papers are The Arabian Cradle: Mitochondrial Relicts of the First Steps along the Southern Route out of Africa and Mitochondrial DNA and Y Chromosome Variation Provides Evidence for a Recent Common Ancestry between Native Americans and Indigenous Altaians. Dienekes has already commented on the first paper. I am not going to take a detailed position on either, but I have to add that we need to be very careful of extrapolating from maternal or paternal lineages, and, assuming that population turn over is low enough that we can make phylogeographic inferences about the past from the present. For example, if you look at mtDNA South Asians as a whole strongly cluster with East Asians and not Europeans, while if you look at Y chromosomes you see the reverse. The whole genome gives a more mixed picture. Additionally, ancient DNA analyses in Northern Eurasia are showing strong discontinuities between past and present populations. So coalescence back to last common ancestor between two different lineages in two different regions may actually be due to diversity in a common source population more recently, which entered into demographic expansion and replaced other groups.

If you need the papers, email me. Some of you know the alphabet soup of haplogroups better than I do. Below are two figures which I think give the top line results.

September 17, 2011

Out of Africa’s end?

The BBC has a news report up gathering reactions to a new PLoS ONE paper, The Later Stone Age Calvaria from Iwo Eleru, Nigeria: Morphology and Chronology. This paper reports on remains found in Nigeria which date to ~13,000 years B.P. that exhibit a very archaic morphology. In other words, they may not be anatomically modern humans. A few years ago this would have been laughed out of the room, but science moves. Here is Chris Stringer in the BBC piece:

“[The skull] has got a much more primitive appearance, even though it is only 13,000 years old,” said Chris Stringer, from London’s Natural History Museum, who was part of the team of researchers.

“This suggests that human evolution in Africa was more complex… the transition to modern humans was not a straight transition and then a cut off.”

Prof Stringer thinks that ancient humans did not die away once they had given rise to modern humans.

They may have continued to live alongside their descendants in Africa, perhaps exchanging genes with them, until more recently than had been thought.


In the broad outlines most people still seem to hold that within the last ~100,000 years there was a major demographic pulse which swept out of Africa and populated the rest of the world. Something special did happen. Oceania and the New World were settled by the descendants of anatomically modern humans, whom we can trace back to Africa. The key modifications to the old model seem to be two-fold:

1) The possibility of admixture with other lineages on the way out

2) The sublocalization of the “Out of Africa” scenario, and further admixture with lineages within Africa

There have long been debates about an East or South Africa ur-heimat for the first anatomically modern humans. Others are now even positing a North African origin! To a great extent I wonder if a West or Central African origin is forgone in part due to the paucity of fossil remains entailed by the unfavorable conditions for preservation.

However the details shake out the story seems to be getting more, not less, complicated. This makes for less pithy one liners for the media, but also more work for scientists. Figuring out stuff can be fun!

July 24, 2011

Why the human X chromosome is less diverse

The Pith: The human X chromosome is subject to more pressure from natural selection, resulting in less genetic diversity. But, the differences in diversity of X chromosomes across human populations seem to be more a function of population history than differences in the power of natural selection across those populations.

In the past few years there has been a finding that the human X chromosome exhibits less genetic diversity than the non-sex regions of the genome, the autosome. Why? On the face of it this might seem inexplicable, but a few basic structural factors derived from the architecture of the human genome present themselves.

First, in males the X chromosome is hemizygous, rendering it more exposed to selection. This is rather straightforward once you move beyond the jargon. Human males have only one copy of genes which express on the X chromosome, because they have only one X chromosome. In contrast, females have two X chromosomes. This is the reason why sex linked traits in humans are disproportionately male. For genes on the X chromosome women can be carriers of many diseases because they have two copies of a gene, and one copy may be functional. In contrast, a male ...

July 13, 2011

What one (or more) genomes can tell us

ResearchBlogging.orgThe Pith: We are now moving from the human genome project, to the human genomes project. As more and more full genomes of various populations come online new methods will arise to take advantage of the surfeit of data. In this paper the authors crunch through the genomes of half a dozen individuals to make sweeping inferences about the history of the human species over the past few hundred thousand years.

Since the integration of evolution and genetics in the early years of the 20th century there have been several revolutions in our ability to perceive the underlying variation which is the raw material and result of evolutionary genetics. The understanding that DNA was the concrete substrate of Mendelian genetics, and the rise to prominence of molecular genetic techniques in understanding evolution the 1970s and 1980s, was one key transition. No longer were geneticists simply tracking the coat colors of mice or the visible mutations of fruit flies. In the 1990s the uniparental loci, the maternal and paternal lineages as inferred from the mtDNA and Y chromosomes, came into their own. Finally, the 2000s saw the post-genomic era, and researchers routinely began analyzing ...

April 30, 2011

“Out of Africa” vs. “Multi-regionalism” revisited

Filed under: Evolution,Human Evolution,Multiregionalism,Out-of-Africa — Razib Khan @ 12:48 pm

A few months ago I exchanged some emails with Milford H. Wolpoff and Chris Stringer. These are the two figures who have loomed large in paleoanthropology and the origins of modernity human for a generation, and they were keen in making sure that their perspectives were represented accurately in the media. To further that they sent me some documents which would lay out their perspective, in their own words, and away from the public glare (as in, they’re academic publications).


Here is Wolpoff’s 1984 manifesto of sorts of ‘Multi-regionalism.’ Much of the morphological material is totally opaque to me, but the basic evolutionary logic is rather clear. Stringer sent me two documents, a scientific paper and a more personal chapter of a book. These works predate recent developments, so they are of interest from a history of thought perspective.

I’m not one of the personalities at the heart of this debate obviously. There are hard feelings here. Wolpoff indicated to me that he still has issues with Stringer, despite reports that there was some sort of reconciliation. But one of the things that is really evident to me to reading through this material is ...

April 27, 2011

The continuing tangling of the human tree

ResearchBlogging.orgLast summer I made a thoughtless and silly error in relation to a model of human population history when asked by a reader the question: “which population is most distantly related to Africans?” I contended that all non-African populations are equally distant. This is obviously wrong on the face of it if you look at any genetic distance measures. West Eurasians, even those without recent Sub-Saharan African admixture (e.g., North Europeans) are closer than East Eurasians, who are often closer than Oceanians and Amerindians. One explanation I offered is that these latter groups were subject to greater genetic drift through a series of population bottlenecks. In this framework the number of generations until the last common ancestor with Sub-Saharan Africans for all groups outside of Africa should be about the same, but due to evolutionary factors such as more extreme genetic drift or different selective pressures some non-African groups had diverged more from Africans than others in terms of their genetic state. In other words, the most genetically divergent groups in relation to Africans did not diverge any earlier, but simply diverged more ...

March 8, 2011

Where in the world did anatomically modern humans come from?

ResearchBlogging.orgThe Pith: I review a recent paper which argues for a southern African origin of modern humanity. I argue that the statistical inference shouldn’t be trusted as the final word. This paper reinforces previously known facts, but does not add much that both novel and robust.

I have now read the paper which I expressed a touch of skepticism toward yesterday. Do note, I did not dispute the validity of their results. They seem eminently plausible. I was simply skeptical that we could, with any level of robustness, claim that anatomically modern humans arose in southern vs. eastern, or western, Africa. If I had to bet, my rank order would be southern ~ eastern > western. But my confidence in my assessment is very low.

First things first. You should read the whole paper, since someone paid for it to be open access. Second, much props to whoever decided to put their original SNP data online. I’ve already pulled it down, and sent off emails to Zack, David, and Dienekes. There are some northern African populations which allow us to expand beyond the Mozabites, though unfortunately there are only 55,000 ...

January 28, 2011

A ‘leaky’ model

John Farrell pointed me to this Anne Gibbons’ piece, A New View Of the Birth of Homo sapiens. Here’s some interesting passages:

The new picture most resembles so-called assimilation models, which got relatively little attention over the years. “This means so much,” says Fred Smith of Illinois State University in Normal, who proposed such a model. “I just thought ‘Hallelujah! No matter what anybody else says, I was as close to correct as anybody.’ ”

But the genomic data don’t prove the classic multiregionalism model correct either. They suggest only a small amount of interbreeding, presumably at the margins where invading moderns met archaic groups that were the worldwide descendants of H. erectus, the human ancestor that left Africa 1.8 million years ago. “I have lately taken to talking about the best model as replacement with hybridization, … [or] ‘leaky replacement,’ ” says paleogeneticist Svante Pääbo of the Max Planck Institute for Evolutionary Anthropology in Leipzig, lead author of the two nuclear genome studies.

I suppose ‘assimilation’ sounds too generic, but ‘leaky replacement’ seems more fitting for a building ‘super’. But it isn’t as if paleoanthropology has a Don Draper, a rogue with a way with words.

Here’s the ...

January 27, 2011

After the evolutionary revolution


Image credit:
Luna04

My post The paradigm is dead, long live the paradigm! expressed to some extent my befuddlement at the current state of human evolutionary genetics and paleoanthropology. After the review of the paper of possible elevated admixture with Neandertals on the dystrophin locus a friend emailed, “Remember when we thought everything would be so simple once we could finally see this stuff?” Indeed I do remember. The fact that things aren’t simple is very exhilarating, but it is also a major quash on theoretical clarity. Science is after all not a collection of facts, but it is in part facts which one can sieve through a analytic framework.

In hindsight with the relative robustness of ancient DNA results we can make some assessments about the role of human bias within particular heuristic frameworks over the past generation. From the mid-1980s up until 2000 it was victory after victory for the Out-of-Africa with total replacement model. The rise of mtDNA and Y chromosomal lineage studies seemed to buttress the idea of common descent from neo-Africans within the last 100-200,000 years for all human populations. There wasn’t much ...

December 31, 2010

Mapping the “Green Sahara”

Filed under: geography,Geology,Green Sahara,Human Evolution,Out-of-Africa,Sahara — Razib Khan @ 2:30 pm


Guelta d’Archei, Chad. Credit: Dario Menasce.

Everyone who is literate knows that the Sahara desert is the largest of its kind in the world. The chasm in cultural, biological, and physical geography is very noticeable. Northern Africa is part of the Palearctic zone, while the peoples north of the Sahara have long been part of the circum-Mediterranean population continuum. The primary continuous habitable corridor is that of the Nile valley. And yet scholars have long known that there has been variation in the climatic regime of the Sahara. The pharaohs of ancient Egypt seem to have hunted a wider range of fauna than is to be found in the deserts surrounding the current Nile valley, perhaps relics from a more humid period. Rock art in some regions of the desert indicate aquatic life, and species more characteristic of the savanna. And yet we should not think of the Sahara as a recent phenomenon; it does seem to be geologically ancient, despite periodic humid interregnums.

ResearchBlogging.orgA new paper in PNAS attempts to map the hydrography of the Sahara over the Holocene, as well as back to the Pleistocene. The ultimate aim seems to be to better frame the geographic constraints on the expansion of humanity from its African homeland, and refute a simple projection from the present to the past. In this case, it is the existence of the Nile as a verdant and habitable watercourse which connects the north and south, and bisects the continuous desert. Ancient watercourses and biogeography of the Sahara explain the peopling of the desert:

Evidence increasingly suggests that sub-Saharan Africa is at the center of human evolution and understanding routes of dispersal “out of Africa” is thus becoming increasingly important. The Sahara Desert is considered by many to be an obstacle to these dispersals and a Nile corridor route has been proposed to cross it. Here we provide evidence that the Sahara was not an effective barrier and indicate how both animals and humans populated it during past humid phases. Analysis of the zoogeography of the Sahara shows that more animals crossed via this route than used the Nile corridor. Furthermore, many of these species are aquatic. This dispersal was possible because during the Holocene humid period the region contained a series of linked lakes, rivers, and inland deltas comprising a large interlinked waterway, channeling water and animals into and across the Sahara, thus facilitating these dispersals. This system was last active in the early Holocene when many species appear to have occupied the entire Sahara. However, species that require deep water did not reach northern regions because of weak hydrological connections. Human dispersals were influenced by this distribution; Nilo-Saharan speakers hunting aquatic fauna with barbed bone points occupied the southern Sahara, while people hunting Savannah fauna with the bow and arrow spread southward. The dating of lacustrine sediments show that the “green Sahara” also existed during the last interglacial (∼125 ka) and provided green corridors that could have formed dispersal routes at a likely time for the migration of modern humans out of Africa.

This paper was written before the Denisovan admixture results shifted the necessity to genuflect so explicitly to Out of Africa. But its results are interesting nonetheless, since they don’t depend too deeply on a paleoanthropological model. Rather, by surveying biogeogeography and geologic data they produce a sense of how the Sahara exhibited climatic flux over the past 100,000 years as a function of time and space. The latter is important because the Sahara is not an amorphous sandy waste. Rather, it exhibits a great deal of topographical variation:

Credit: T L Miles

In the Tibesti mountains the highest peaks are ~11,000 feet above sea level (3,400 meters). Because of the aridity of the Sahara in general even these elevations does not induce sufficient precipitation to produce a “green mountain” effect, common in other arid parts of northern Africa and Arabia. But in a regime of slightly only higher precipitation and milder temperatures (remove 3 degrees fahrenheit per 1,000 feet against latitude controlled sea level temperature) one can imagine the Tibesti having been much more biologically productive in the past. Consider this from the Tassili n’Ajjer region of southern Algeria:

Because of the altitude and the water-holding properties of the sandstone, the vegetation is somewhat richer than the surrounding desert; it includes a very scattered woodland of the endangered endemic species Saharan Cypress and Saharan Myrtle in the higher eastern half of the range.

The range is also noted for its prehistoric rock paintings and other ancient archaeological sites, dating from neolithic times when the local climate was much moister, with savannah rather than desert. The art depicts herds of cattle, large wild animals including crocodiles, and human activities such as hunting and dancing….

The main thrust of the paper seems to be to refute the common assumption that an eternal Nile served as the north-south corridor for African fauna, including humans. Here is the reason:

Reanalysis of the Saharan zoogeography…suggests that many animals, including water-dependent creatures such as fish and amphibians, dispersed across the Sahara recently. For example, 25 North African animal species have a spatial distribution with population centers both north and south of the Sahara and small relict populations in central regions. This distribution suggests a trans-Saharan dispersal in the past, with subsequent local isolation of central Saharan populations during the more recent arid phase. If a diverse range of species (including fish) can cross the Sahara, it is impossible to envisage the Sahara functioning as barrier to hominin dispersal. The zoogeography of the Nile suggests that it was a much less effective corridor…Only nine animal species that occupy the Nile corridor today are also found both north and south of the Sahara….

There are also isolated pieces of evidence which refute a Nile-only model: Saharan oases which have endemic species of crocodiles. The existence of “desert crocodile” populations is a signal of a more well-watered past, with a subsequent retreat into isolated oases (some of these populations did go extinct in the 20th century though). In some ways this is a problem. Simple models make simple predictions, and are easier to test. But if simple models are false, that is an even greater problem.

Here are the figures which outline the primary results from geology and biogeography:

There are two primary inferences made in regards to humans:

1) The Holocene inference seems to be that Nilo-Saharan populations have their origins in the societies which expanded north and south along the liminal zone of the Sahara. The authors argue that Nilo-Saharan populations on isolated oases in the northern Sahara are relics from the past expansion in the early Holocene. This sounds plausible, but it would be nice to explore this in more depth via linguistic and genetic analysis. With the rise of the camel and Islam a trans-Saharan trade in humans may have resulted in a great deal of trans-location of whole populations from one area to another. Concurrent with the Nilo-Saharans who pushed north the authors also suggest that savanna hunters moved south. I am not clear who these people are from the paper, and the mapping between archaeology and linguistics here seems more tentative.

2) A deep history inference also seems to be that trans-Sahara population movements were feasible in a period around 120-100 years BP, but not 50-60 years BP. The distinction here matters because the latter is a relatively young age for the Out of Africa migration, while the former is an older one. If the latter view is correct then the only plausible route of migration is probably the coastal fringe of the Horn of Africa. If the former view is correct then a whole host of possibilities confront us, because the hydrography of the Sahara may have been constrained, but there were several avenues of migration.

In regards to #2, a clement phase, and then resealing of the genetic barrier, may align well with recent models which posit a non-trivial period of separation between Africans and non-Africans after the Out of Africa migration. In other words early modern humans may have followed the pattern of many species, with  an expansion into, and beyond, the Sahara, and then a subsequent separation of two populations by a resurgent desert. The difference is that the daughter population isolated on the far side of the desert eventually “broke out” from the margins of the African homeland to the rest of the world.

Citation: Drake NA, Blench RM, Armitage SJ, Bristow CS, & White KH (2010). Ancient watercourses and biogeography of the Sahara explain the peopling of the desert. Proceedings of the National Academy of Sciences of the United States of America PMID: 21187416

December 27, 2010

Out of Africa: mend it, don’t end it!

Filed under: Culture,Genetics,Genomics,Multiregionalism,Out-of-Africa — Razib Khan @ 11:51 pm

Dilettante human genetics blogger Dienekes Pontikos has a post up with a somewhat oblique title, Is multi-regional evolution dead? I say oblique because a straightforward title would be “Multi-regionalism lives!” He posted a chart from a 2008 paper which outlines various models of human origins, and their relationship to molecular data at the time. I have also posted the chart, but with a little creative editing on the “assimilation” scenario to reflect the possible Neandertal and Denisovan admixture events. Of these models the “candelabra” can be rejected as highly implausible. It posits very deep roots in a given region for distinct human populations. Unless you accept some sort of hominin population structure in Africa which were maintained by distinctive migrations out of Africa then the “replacement” model can be discarded (since the classic replacement model did not posit ancient African population structure being of any relevance outside of Africa you’d have to salvage it with a modification in light of new results).

So the two primary disputants are a resurrected multi-regional model, and the assimilation model. But these two are really endpoints on a spectrum of models. What you need to do is vary the number of discrete populations and the rate of migration between the populations over time. The beauty of the replacement model was its parsimony: as far as recent human origins were concerned past gene flow via migration was a relatively academic concern. It was an exceedingly simple narrative framework. Consider this first episode of a 2009 British documentary, The Incredible Human Journey:

In the first episode, Roberts introduces the notion that genetic analysis suggests that all modern humans are descended from Africans. She visits the site of the Omo remains in Ethiopia, which are the earliest known anatomically modern humans, and visits the San people of Namibia to demonstrate the hunter-gatherer lifestyle. In South Africa, she visits Pinnacle Point, to see the cave in which very early humans lived. She then explains that genetics suggests that all non-Africans may descend from a single, small group of Africans who left the continent tens of thousands of years ago. She explores various theories as to the route they took. She describes the Jebel Qafzeh remains in Israel as a likely dead end of a traverse across Suez, and sees a route across the Red Sea and the around the Arabian coast as the likelier route for modern human ancestors, especially given the lower sea levels in the past.

A neat and tidy story. But reality is getting a lot less tidy & neat. Personally, the assimilation model as we understand it now seems to be the most plausible model. It remains more parsimonious than the alternatives: ancient population structure and complex patterns of gene flow and hybridization. But parsimony has misled us toward undo confidence in the recent past, so we should not weight this as strongly at this point. Where would we be without ancient DNA extraction? Some researchers have long claimed a more complex model than Out of Africa, but as long we relied in inferences from extant populations theses result were ignored or dismissed (notably, ancient DNA extraction is also unsettling our understanding of the very recent human past).

There is though the pattern of greater African genetic diversity. Dienekes observes that a recent paper reports that some Indian populations may be more diverse genetically than HapMap Africans. I’m not too keen on overturning a generation of consensus yet in regards to this question based on one deeply sequenced region on one chromosome comparing some Indian tribal groups to two HapMap African populations (Yoruba, and a Kenyan Bantu group). So I accept the pattern of greater diversity until further research brings it more into doubt. Now the question is to explain the pattern. The most plausible explanation would naturally be the one outlined above in the 2009 documentary: non-Africans are the descendants by and large of a small number of Africans who left ~100,000 years B.P. They went through a population bottleneck which reduced genetic diversity sharply. Their genetic variance was a subset of that of Africans (with some admixture from other human lineages outside of Africa, as it now happens).

But, there are other possibilities. One option sounds rather bizarre to me on first blush:

With respect to the reduced genetic diversity, one idea is that it is the result of genetic drift following a bottleneck in a small African population. But, the data can just as well be explained by species-wide selection which culled genetic variation.

Presumably selection would operate outside of Africa and homogenize non-Africans through a series of sweeps. Remember that selection and stochastic population events can sometimes be hard to differentiate, because both expunge variation from long swaths of the genome, resulting in long linkage disequilibrium blocks. This seems rather incredible as a proposition to me. Could selection operate all across Eurasia in such a fashion? From what I can tell in relation to more recent signatures of natural selection that does not tend to occur. The pattern for skin color for example is convergent phenotypes through different genetic architectures. How could gene flow tie together ancient human lineages and not H. sapiens sapiens? On the other hand, this could be an explanation for the consistent and taxon wide pattern of encephalization (though I believe this occurred in Africa as well).

A second alternative would be that Africa’s greater genetic diversity is simply a function of a much longer term effective population. In this model the climatic fluctuations of the Pleistocene periodically reduced non-African population to such an extent that these groups became a very minor proportion of the total census size of humans, and were so were swamped out by gene flow with the more numerous African humans. It seems to me that an extreme case of this model really verges into the same territory as the assimilation model. So I see this as more of a difference of degree than kind.

Dienekes points to Y chromosomal markers which suggest “back-migration” to Africa. I don’t totally discount this, but looking at the enormous diversity in groups like the Bushmen, I don’t think we can attribute that to back-migration from Eurasia. It is notable that the Bushmen are basal to the rest of humanity, including the Yoruba + (Eurasicans + Australasians). Also, the genetic divergence between the Denisovan/Neandertal clade and modern humans is only ~33% greater than between Bushmen and Papuans. Speaking of differences of degree, that is becoming more and more the case when it comes to the so-called “dead ends” of human evolution and ourselves.

Finally, there’s the issue of non-neo-African admixture. Reich et al. give a figure of ~7.5% in Melanesians, and ~2.5% in Eurasicans. It is valid I think to point out that though others have offered figures in the literature before only with the reference sequences of ancient DNA are these widely accepted values. Perhaps they would be revised upward with other sequences. But two cautions:

- There are only so many hominins to go around. Australia and the New World were only settled by modern humans. So how many were there running around in Eurasia? I think perhaps there may have been something different in South Asia, but that’s just a very uninformed guess.

- On the margin it seems clear that the autosomal DNA has enough fudge that interpretation meant that the archaic admixture signal could be dismissed. But the upper bound can’t be that high, or the Fst values would be more extreme than they currently are. Modern humans do seem to share a great deal of “shallow” common ancestry.

At the end of the day I am going to put my money on the assimilationist model because I believe in diminishing marginal returns. The Out of Africa replacement model was maximalist. Some tweaking on the margin is not very surprising, at least in hindsight, but more baroque forms of multi-regionalism have far too many moving parts. Newtonian mechanics may have been superseded in some domains by Einstein’s theories and Quantum Mechanics, but for many purposes it does very well at predicting phenomena and modeling the world. I have full expectation of further refinements in the assimilation model, but I would bet that the age of revolutions is over for a long time. Then again, my confidence is modest at best. This is no time for certitudes.

Note: A illustration of models:

November 24, 2010

We were all Africans…before the intermission

modelhumanQuick review. In the 19th century once the idea that humans were derived from non-human ancestral species was injected into the bloodstream of the intellectual classes there was an immediate debate as to the location of the proto-human homeland; the Urheimat of us all. Charles Darwin favored Africa, but in many ways this ran against the cultural grain. The theory of evolution was birthed before the highest tide of the age of white supremacy and European hegemony, and Darwin’s model had to swim against the conviction that Africans were the most primitive of the colored races. After the waning of the ideological edifice of white supremacy, and the shock it received during and after World War II, the debates as to the origin of humanity still remained contentious and followed the same outlines (though without the charged normative inferences). But as the decades wore on many more researchers began to believe that Darwin was correct, and that the origin of humanity lay in the African continent. First, the deep origin of the human lineage in Africa was accepted, but eventually a more recent expansion out of Africa was argued for by one school. The turning point in these academic disputes was the popularization of the “mitochondrial Eve” theory of the 1980s.

What some paleontologists had long argued, that anatomically modern humans have their locus of origin in Africa, was supported now by research from genetics which indicated that Africans were the most basal clade of humans on a continental scale, so that non-Africans could be conceived of as a subset of Africans. From this originates the chestnut of wisdom that Africans have more genetic diversity than all other human populations combined. By the year 2000 one could say that the “Out of Africa” triumphalism had proceeded to the point where an almost exterminationist model had taken hold when it came to the relationships of anatomically modern H. sapiens, and other groups which had evolved outside of Africa over the past million or so years, such as the Neandertals.

ResearchBlogging.orgBut the theoretical dichotomies were too coarse and absolute as it turns out. A division between multiregionalist phyletic gradualism, where H. sapiens evolved out of its hominin ancestors concurrently on a world wide scale, and a model of rapid expansion of one tribe in Africa to replace all others in totality, may have been warranted in the age of classical genetics and a morphometric analysis, but now we can look at the raw genomic material in a more fine-grained fashion. In fact, we can now look at the genomic patterns of variation among extinct hominins! Though there have long been hints that the expansion-and-replacement paradigm was too extreme from the genetic and morphological data, with the publication last spring in Science of a paper which made the claim for admixture between Neandertals and non-Africans in the range of 1-4% in all non-African groups based on a comparison of Neandertal and modern human genetic variation, one can dismiss absolutist expansion-and-replacement as self-evidently true orthodoxy. But one orthodoxy has no given way to another, and the shock to the old models presented by the data has not resulted in the coalescence of new robust paradigms. We live in a time of scientific troubles, so to speak.

One of the more notable results in the Science paper from last spring was that all non-Africans had about the same admixture in relation to the Neandertal reference genome, ~1-4%. This means from the Orkneys to New Guinea. Because Neandertals were distributed only in the western half of Eurasia this implies that the admixture was an early event. By the time of modern human expansion across Eurasia, Australasia, and the New World, it had become equally distributed across the individuals within the population. Recall the contrast between African Americans and Uyghurs. Among the Uyghurs the ancestral quanta are equitably distributed from individual to individual, but among African Americans there remains substantial intra-population variance. The reason is that African Americans are quite new, an order of magnitude younger than the Uyghurs in a genetic sense, and admixture is still occurring into the African American population from the ancestral groups. The Uyghurs as we known them today genetically are probably ~1,000-2,000 years old (though their cultural origins are both more and less ancient, as a matter of linguistics in the former, and ethnic self-conception as a Muslim East Turkic group in the latter). The implication here is clear: there was a pause in the Out of Africa movement, where the proto-non-Africans mixed with a Neandertal group, possibly in the Middle East, and only began a massive demographic expansion after an unspecified sojourn. A paper from last spring makes this all explicit:

A more likely explanation for the OoA bottleneck is that Eurasia was populated by a larger population that had been relatively isolated from other modern human populations for tens of thousands of years prior to the expansion. The first fossil evidence for modern humans outside of Africa is in the Middle East at Skhul and Qafzeh between 80,000-100,000 years ago, which is at least 20,000 years prior to the Eurasian diaspora. If a population of modern humans remained in the Middle East until the expansion into Eurasia, there would have been sufficient time for genetic drift to reduce heterozygosity dramatically before the Eurasia expansion. This “Middle East isolation” hypothesis provides a robust explanation for the relative homogeneity of European and Asian populations relative to African populations (see Figures 3A-B) and is supported by a recent maximum likelihood estimate of 140,000 years ago for the time of Eurasian-West African population separation . Interestingly, a recent study of the Neandertal genome suggests that the non-African individuals, but not the Africans, contain similar amount of admixture (1-4%) with the Neandertals . The authors suggest that the admixture must have happened between the Neandertals with an ancestral non-African population before the Eurasian expansion. Given the fossil, archaeological, and genetic evidence, the Middle East isolation hypothesis warrants rigorous evaluation as whole-genome sequence data become available.

Now the same group has published a follow up paper in Genome Biology which fleshes out the Deep Time aspect of human evolutionary history by looking closely at the genetic variation of an under-sampled population: South Asians. You may have noticed that the HGDP populations include Pakistani groups as South Asian exemplars. That’s apparently because during the Permit Raj era in India the government was wary of cooperating with the HGDP consortium. But more recently the barriers have come down in India, and one can viably supplement the data sets with Indian Americans. So the GIH sample in HapMap3 consists of Gujaratis from Houston. At ~1.25 billion, or nearly 20% of the world’s population, South Asians are a critical portion of the “big picture” when it comes to world wide genetic variation.

Genetic diversity in India and the inference of Eurasian population expansion:

To analyze an unbiased sample of genetic diversity in India and to investigate human migration history in Eurasia, we resequenced one 100 kb ENCODE region in 92 samples collected from three castes and one tribal group from the state of Andhra Pradesh in south India. Analyses of the four Indian populations, along with eight HapMap populations (692 samples), showed that 30% of all SNPs in the south Indian populations are not seen in HapMap populations. Several Indian populations, such as the Yadava, Mala/Madiga, and Irula, have nucleotide diversity levels as high as those of HapMap African populations. Using unbiased allele-frequency spectra, we investigated the expansion of human populations into Eurasia. The divergence time estimates among the major population groups suggest that Eurasian populations in this study diverged from Africans during the same time frame (approximately 90-110 thousand years ago). The divergence among different Eurasian populations occurred more than 40,000 years after their divergence with Africans.

First, I want to put into the record that I think there are high enough uncertainties (evident in the confidence intervals in the paper itself) that we need to be careful about taking the divergence times from their results as values we’d bet the house on. Someone with a better knowledge of the fossils (e.g., John Hawks) or controversies about the mutational rates (e.g., Dienekes) can comment on the plausibilities of the dating. But, I think we can infer that there was a time lag closer to a 10,000 years order of magnitude than 1,000 years when it comes to the Middle Eastern sojourn of non-African humans.

The basic method here is that the research group zoomed in on a ~100 kb region of the genome, on chromosome 12, and surveyed their Indian populations, as well as the HapMap3 ones. This is important because the SNPs in the HapMap probably exhibit an ascertainment bias toward variants in European and other more widely surveyed groups. The fact 30% of the SNPs in the South Indian groups seem to not be found among the HapMap populations confirms this hunch. Before digging into the details of the paper, let’s note that the South Indian groups are from the state of Andhara Pradesh, Brahmins, a lower caste group (Yadava), Dalits (Mala/Madiga), and a tribe (Irula). This is a case where even more thorough coverage is necessary. There is some suggestion that South Asian groups have a long history of endogamy and genetic peculiarities, which would limit the usefulness of extrapolations from this sample. Even within the HapMap Gujarati sample there seems to be two clusters when the PCA is used with reference to the European samples.

There are basically three portions of the paper:

- A survey of conventional population genetic statistics,

θ = 4Neμ (Ne = effective population, μ = mutation rate)
π = nucleotide diversity
H = heterozygosity
D = Tajima’s D

- Measures of genetic distance between contemporary populations, Fst and PCA

- Finally, taking the genetic variance from the ~100 kb and plugging it into explicit models of human evolutionary history

Table 1 (I reformatted) shows the genetic statistics by “continent.” Indian includes some Gujarati individuals. They sampled out of the HapMap populations to equalize the numbers.

indiaeur1

euro2Some of these results are striking. The general truism is that Africans are the most diverse population in the world, but some of the South Indian groups are very diverse indeed. Of particular interest though is that some Indian groups are not very diverse at all. What’s going on here? Here you have to look at the specifics of each group. It is likely that South Indian Brahmins are the result of a relatively recent population expansion, with some uptake of other genes through hypergamy. A paper from last year argued that all Indian populations can be modeled as a two-way admixture of different quantities from two ancestral groups, Ancient North Indians and Ancient South Indians. The heterozygosity values may be explained in such a fashion, though the relatively low values for Gujaratis and Andhara Pradesh Brahmins would still surprise. Frankly, I’m just mostly confused by the diversity statistics. Probably the substructure through endogamy and population bottlenecks are obscuring broader dynamics. We can, though, conclude that the idea that all non-Africans are uniformly homogeneous in comparison to Africans may not hold water. Figure 2 above illustrates this by plotting heterozygosity vs. distance from Africa.

Next, let’s move to genetic distance. There’s two ways you can look at this: a summary statistic like Fst, which partitions between and within population variance, and PCA, which visualizes the largest dimensions of variations in the data set. So you have both below (reedited for reasons of space):

EURFINAL

In the generality the results are expected, but there are weird details. For example, the Brahmins from Andhara Pradesh are on the margins, where you’d expect them to cluster with the Gujaratis. The Gujaratis are closer to the Chinese from Denver than Utah Whites? This is a provisional paper, so I’m almost wondering if there’s a typo or coding error here, as I don’t understand how the GIH can be so close to the Tuscans and Chinese from Denver, and much further from the Northern Europeans and Chinese from Beijing. The two European and Chinese samples are rather close in other analyses.

So let’s get to the real deal. The modified Out of Africa model where non-Africans take a “break” after they leave the mother continent:

modelhumanfinal

I’ve mashed up the figures. The models were generated by looking at allele frequencies. They took the variants they found by sequencing the ~100 kb on chromosome 12, which was in a very gene-poor region so as to bias it toward neutrality, and plugged them into a few models in the ∂a∂i program. I’ll jump to the text here:

…the divergence time between African and the ancestral Eurasian population (88-112 kya, CIs: 63-150 kya) is much older than the divergence time among the Eurasian groups (27-39 kya, CI: 20-59 kya). The more recent divergence time and the low migration rate estimates among the current Eurasian populations support the “delayed expansion” hypothesis for the human colonization of Eurasia (Figure 5). Consistent with previous studies…these estimates indicate that a single Eurasian ancestral population remained separated from African populations for more than 40 thousand years prior to the population expansion throughout Eurasia and the divergence of individual Eurasian populations.

549px-Islamic_Adam_&_Eve
Manafi al-Hayawan, Adam and Eve

Take a good look at those confidence intervals. We know that some of those have to be false: the bones don’t lie. From what little I know a very young consensus date for the settlement of Australasia by modern humans is 40,000 years ago. That happens nicely to be their median, but the dispersion toward younger dates is probably not right, unless Aborigines are a separate population who are remnants of an earlier wave of migrants (or the current Aborigines replaced earlier waves). It is also hard to reconcile these dates for the diversification of non-African humanity with very old dates for Chinese fossils which exhibit some elements of modern morphology.

In the broad outlines I think we can accept that the model outlined in this paper may be correct. It would explain the uniform admixture of Neandertal in non-Africans, since they’d need time as a compact population before demographic expansion to integrate the Neandertal genes as part of their genetic background. But before the Neandertal genome came out there were plenty of papers which purported to show how there was no archaic admixture in modern humans, and plenty of papers which did claim there was evidence for such admixture. The point is that these computational models are sensitive to their inputs, and being models they simplify what really happened. In the discussion the authors repeatedly observe that migration between the various non-African demes doesn’t effect the outcome. That is fine, but there is modestly strong evidence that the Indian samples that they’re using are an admixed population of old. That would make me skeptical of claims about dating the separation of “Indians” when Indians are themselves possibly a compound between other groups.

Below is the model presented from Reconstructing Indian population history:

reich

The teens of this century are going to be very exciting when it comes to reconstructing human evolutionary history. You’d be a fool to put bets on any horse at this time.

eurasicansAddendum: I need a term for non-African humanity. So I’m making up one right now: Eurausicans. From Eurasians, Australasians, and Americans.

Citation: Jinchuan Xing, W Scott Watkins, Ya Hu, Chad D Huff, Aniko Sabo, Donna M Muzny, Michael J Bamshad, Richard A Gibbs, Lynn B Jorde, & Fuli Yu (2010). Genetic diversity in India and the inference of Eurasian population expansion Genome Biology : 10.1186/gb-2010-11-11-r113

July 23, 2010

One principal component to rule them all?

ResearchBlogging.orgDespite the reality that I’ve cautioned against taking PCA plots too literally as Truth, unvarnished and without any interpretive juice needed, papers which rely on them are almost magnetically attractive to me. They transform complex patterns of variation which you are not privy to via your gestalt psychology into a two or at most three dimensional representation which can you can grok immediately. That is why History and Geography of Genes was so engrossing. You recognize patterns which were otherwise unrecognizable. But how you interpret those patterns, that’s a wholly different matter. And how those patterns arise is also not something one can ignore.

price_fig1First, let’s start with an easy case. To the left is a PCA plot with four populations. Nigerians, East Asians (Chinese + Japanese), Europeans (whites from Utah), and finally, African Americans. The x-axis is the first principal component of variation, and the y-axis the second. That means that the x-axis is the independent dimension of variation within the patterns of genetic data which explains the largest fraction of the total amount of genetic variation. The sum totality of the variation can be decomposed into an large set of independent dimensions which can be rank ordered from the largest explanatory components to the smaller ones, successively by number. In a human genetic context the first principal component invariably separates Africans from non-Africans, and the second principal component often maps onto a west-east axis from Europe to the New World. Subsequent principal components can often be useful in smoking out fine scale distinctions, or relationships which are confused by the existence of similar but different signals in admixed populations.

The interpretation of this plot is rather easy. You see that African Americans lay along a continuum between Nigerians and Europeans, skewed toward Nigerians, with some outliers toward East Asians. We know from other genetic findings that ~20% of the African American ancestral quanta is European, but, that quanta is not equally distributed across the population. ~10% of the African American population is more than 50% European in ancestry, while 90% is less than 50% European. And so you have a distribution which reflects this variation. As for the outliers, I will speculate and suggest that these are indications of Native American ancestry among some African Americans.

The story I presented above is probably plausible as an explanation of the visual because we have a wealth of historical data to corroborate the plausibility of that narrative. The fit between the results from the technique of analysis of genetic variation and what scholars have long inferred from textual sources is relatively easy. It is far more difficult to look at a PCA plot, and generate a plausible narrative that you yourself accept with a high degree of confidence with little external support. It is with that caveat in mind that I present Toward a more uniform sampling of human genetic diversity: A survey of worldwide populations by high-density genotyping:

High-throughput genotyping data are useful for making inferences about human evolutionary history. However, the populations sampled to date are unevenly distributed, and some areas (e.g., South and Central Asia) have rarely been sampled in large-scale studies. To assess human genetic variation more evenly, we sampled 296 individuals from 13 worldwide populations that are not covered by previous studies. By combining these samples with a data set from our laboratory and the HapMap II samples, we assembled a final dataset of ~ 250,000 SNPs in 850 individuals from 40 populations. With more uniform sampling, the estimate of global genetic differentiation (FST) substantially decreases from ~ 16% with the HapMap II samples to ~ 11%. A panel of copy number variations typed in the same populations shows patterns of diversity similar to the SNP data, with highest diversity in African populations. This unique sample collection also permits new inferences about human evolutionary history. The comparison of haplotype variation among populations supports a single out-of-Africa migration event and suggests that the founding population of Eurasia may have been relatively large but isolated from Africans for a period of time. We also found a substantial affinity between populations from central Asia (Kyrgyzstani and Mongolian Buryat) and America, suggesting a central Asian contribution to New World founder populations.

The studies which came out of the original HapMap had northern Europeans, Yoruba from Nigerians, and Chinese & Japanese. These three populations can tell us a lot, but there’s something lacking in the coverage. The HGDP sample is better. But specifically because of political considerations it was not feasible to collect Indian samples, so Pakistani ones are used in their stead. Additionally, the HGDP sample is a touch biased toward isolated and distinctive populations, such as the Kalash of Pakistan. This genetic distinctiveness is important to catalog because it is fast disappearing. But the Kalash are so unique because of their long history of isolation, so one can’t really use them as a proxy population for Pakistanis, as one could with Sindhis. The POPRES sample seems to complement the HGDP well, but I don’t see it being used so much. Since the next phase of the HapMap has more populations, some of the deficiencies which emerged with the utilization of just three terminal groups (in a World Island context) will soon no longer be an issue.

But until that time it’s nice when studies come out which close some of the gaps in our knowledge of world wide genetic variation. This is one such study. I’m somewhat familiar with the samples already because I’ve seen it in an analysis of Indian populations. It seems that it is somewhat skewed toward South and Southeast Asian populations, but hey, these are groups which need to draw the long straw sometimes as well.

Before I go any further I should mention that they use a SNP-chip with hundreds of thousands of markers. Additionally, they looked at copy number variation. Two rather different types of variation within the genome, probably to double check that the outcomes were the same. Population historical events which shape patterns of genomic variation would presumably have a similar large scale effect on both types of variation. In their results that checked out, or so they claimed, as the paper is a manuscript without the supplements attached.

Though there’s some interesting fine-grained analysis to be had, they draw some macro-scale and deep time inferences as well. First, you probably know the famous fact that 15% of variation in genes is between races, and 85% within races. That’s derived from the Fst statistic, which is basically partitioning between and within population variance across two populations. Obviously the value of Fst varies by the set of populations you’re comparing. That between Mbuti Pygmies and Japanese is far higher than between Chinese and Japanese. Using the HapMap the Fst was 16%. About what you’d expect. To equalize sample sizes with the HapMap they randomly selected individuals from a pooled set grouped by continent from their populations, and calculated Fst. They found values around 11%. Why the difference? Because their data set included populations which were between the three clusters within the HapMap.

This is naturally not a surprising result at all, but it does reiterate one issue which sometimes crops up: Platonism in relation to race. The northern European whites in the HapMaps are the whites par excellence. Turks, who are perhaps more centrally located in the genetic variation of West Eurasian and North African peoples, what used to be termed “Caucasoid,” are “less white.” Similarly, Nigerians are more African than Ethiopians. Chinese and Japanese are more Asian than Burmese. And so forth. When modeling between group differences there is I think a somewhat old-fashioned tendency to consider some populations racial archetypes. That modulates the input which modifies the results somewhat. The analytical technique may be as cold as stone, but they are used by flesh and blood human beings.

There is also some funny business going on with haplotype and SNP heterozygosities which I think needs to be highlighted, and speaks to the fact that SNP-chips are not perfect. They’re tools, and human tools are impacted by arbitrary or instrumental choices humans make. Let me quote:

We also compared the SNP and haplotype heterozygosity values in each population (Figure 2B). These two quantities are generally highly correlated, although there are several exceptions: First, SNP heterozygosity is higher than haplotype heterozygosity in European and Central Asian populations. This may reflect a SNP ascertainment bias, since many of these polymorphisms were historically selected to maximize heterozygosity in European populations. Second, the Pygmy sample shows a low SNP heterozygosity despite relatively high haplotype heterozygosity. This unusual pattern could be caused by stronger effects of SNP ascertainment bias in this population than in others. Indeed, a recent study of Khoisan individuals (another hunter-gatherer group from Africa) showed a similar pattern: despite high SNP heterozygosity (~60%) in whole-genome sequence data, a Khoisan individual showed low heterozygosity on the SNP microarray genotypes (~22%) . Alternatively, this difference could also reflect unique attributes of population history.

In plain English the gene chips were designed with Europeans in mind, so they don’t necessarily pick up all the variation in non-European groups, who are believe it or not genetically different. This issue cropped up (as alluded to in the above text) with the recent paper which sequenced some Bushmen as well as Desmond Tutu. The Bushmen have a lot of variation, this is well known, but they have variation at markers where Europeans don’t, and if Europeans don’t the chips may not look for polymorphism at that locus. This sort of thing probably doesn’t affect broad population relationships, but if you want to zoom in and do analysis which is sensitive to fine distinctions and quantitative differences, then it might be problematic.

Let’s jump to the pretty charts. First, a PCA plot with all of the individuals from all of the populations:

indo1

Note that PC 1 accounts for nearly eight times as much variation as PC 2. This speaks to the African vs. non-African gap. Because their data set is relatively thick in “intermediate” groups you see a spectrum. The vertical axis is obviously mostly east-west. And here’s the accompanying bar plot derived from the ADMIXTURE program. K = putative ancestral populations.

indo2

With this many populations at K = 12 I think you could write a fantasy novel worthy of Tolkien. K = 4 is more realistic. Among the African populations you see likely Eurasian admixture in some eastern, and it seems Bushmen, individuals. In Eurasia itself you see a clinal gradation of admixture between putative ancestral components that seems to follow longitude rather well.

Because so much of the variation in the total sample is due to Africans, removing them from the picture will allow us to focus more on the relationships of the Eurasian groups. And so that’s exactly what they did. Note that focusing on the Eurasian groups does not mean simply magnifying or zooming in on the Eurasian section of the PCA plot, rather, the plots are regenerated with a subset of the previous genetic variation. In other words, the dimensions will shake out a bit differently.

The first plot shows Eurasian populations as a whole. The second removes Europeans and Near Easterners.

indo3

Notice again the scale. The vast majority of the variance seems to be east-west. But, there is a noticeable north-south split. For the South Asian population it looks like they had Pakistanis who were farmers of modest means (Arain), high caste South Indians, and very low caste or tribal South Indians. For this Indian sample there’s a problem, and it’s the sample problem which plagued the Up Series, they are looking at the very top and bottom of Indian society and ignoring the middle. Presumably the middle is going to be somewhere in the middle genetically as well, but nevertheless that’s something to consider in a paper which presumes to fill in the patchiness of others. In contrast, the Nepali sample was notably ethnically diverse, including both the dominant Indo-Aryan segment as well as the Tibeto-Burman Newar.

In the first panel there are some curious patterns with the Southeast Asian groups. Culturally, as in language and history, the Thai and Vietnamese have relatively recent roots in the southern regions of modern China. The Dai of Yunnan are the same people in origin as the Thai of Thailand and the Lao of Laos. Both derive from migrations from Yunnan. This is historically attested, even if somewhat fragmentarily. The heartland of the Vietnamese was in the Red River valley and north into southern China, and they spread down the coast and toward the Me kong only within the last 1,000 years. Southeast Asia was not uninhabited during this period. It was dominated by the Khmer Empire, which was slowly consumed by the expanding Thai and Vietnamese polities. Some scholars argue that French colonialism actually preserved an independent Khmer nation, which otherwise would have been divided between Thailand and Vietnam, as Poland was between Germany and Russia. So the Khmer are the indigenous people, while the Thai and Vietnamese are intrusive.

What do the PCA plots tell us? I do not know where the Vietnamese samples were collected. If they were from South Vietnam, then their close position to the Chinese suggests to me that there was substantial demographic replacement or expansion from the Red River valley. In contrast, the Thai are relatively distant from the Chinese. In fact, the Cambodians are somewhat closer to the Chinese! The samples here are small, and the sets overlap, so I wouldn’t put too much stock in that. But, Thailand is geographically closer to South Asia, so isolation by distance models would predict this pattern. It seems that the ethnogenesis of the Thai occurred through the expansion of the Thai identity, likely among Khmer peoples. And it is intriguing that the Iban, an indigenous people of Borneo, are closer to the Vietnamese than they are to the Cambodians. We know that there was substantial migration between coast Vietnam and Maritime Southeast Asia, the Chams of central Vietnam, and dominant in the southern half of the nation before the Vietnamese expansion, are a Malayan people who may have migrated from Borneo.

Shifting to the second panel there’s more here to say about the South Asians. First, geography. The two lower caste groups are actually Dalits from Andhara Pradesh, a South Indian state. Dalits used to be called outcastes, so they aren’t even lower caste, but without caste. The upper caste groups are Brahmins from Andhara Pradesh and Tamil Nadu. Finally, the Irula are tribal people from Tamil Nadu. To me the tribal samples often produce weird results, and I suspect that has to do with population bottlenecks and their demographic isolation. People leave the tribes (becoming part of the Hindu society, or converting to Islam or Christianity), but few join them. The Pakistani sample are Araina, a group of conventional Punjabi farmers who have a made up ancestry from Arabs (obviously made up because they don’t cluster with Near Easterners). Let’s compare to a chart from Reich et al.:

indiareich7

It seems to me that they’re in rough agreement (Reich et al. uses the same two low caste groups for Andhara Pradesh for low caste South Indians by the way). Though South Indian Brahmins speak South Indian languages, and reside amongst other South Indian groups, their genetic heritage is somewhat different. Similarly, tribal peoples are also distinct from caste Hindus. Reich et al. posit that South Asians can be modeled as a composite of two groups, Ancestral North Indians, ANI, and Ancestral South Indians, ASI. Presumably the former are intrusive to the subcontinent in relation to the latter. There seem two clear dimensions along which the ratio of ANI to ASI vary: geography and caste. The proportion of ASI seems to increase from the northwest to the southeast. And, the proportion of ANI seems to increase from tribal to low caste to upper caste. The Pakistani sample does not seem to be from an elite caste (or it does not seem they were converted from an elite caste), but they have more affinity with West Eurasian populations than South Indian Brahmins. It is likely that the latter are intrusive to the south, and have admixed with the local population.

Finally, a word on the Nepali sample. On top of the ANI-ASI mixture, the Nepali groups have varying levels of Tibeto-Burman, and so East Asian, affinity. This is not a surprise if you have met Nepalis. The Assamese, and to a lesser extent Bengalis, also exhibit this pattern of Tibeto-Burman admixture. The Brahmins of Nepal are intrusive like the Brahmins of South India, and like the South Indians they admixed with the local substrate.

Next let’s move to a ADMIXTURE plot.

indo6

The selection of a particular K obviously is conditioned by the patterns which “fit” with what you know, and what you expect. With that caution aired, the population represented by red can easily be thought of as a Middle Eastern group which expanded with agriculture. That seems to be what the authors favor. The brown population is the modal Indian ancestral population, which has little presence outside the subcontinent (nice color coding by the way! Brown people are brown). A green color represents a population which the tribal group, the Irula, are heavily weighted on. This reminds me too much of the Kalash. I suspect that the Irula went through some bottleneck or other distinctive event, and some have assimilated to various low status groups in South India.

I’m not a fantasist intent on world-building, so I’ll stop with that in reading the tea leaves of the charts. But there’s an important section which I skipped over, and will move back to now. And that’s the deep time aspect:

A more likely explanation for the OoA bottleneck is that Eurasia was populated by a larger population that had been relatively isolated from other modern human populations for tens of thousands of years prior to the expansion. The first fossil evidence for modern humans outside of Africa is in the Middle East at Skhul and Qafzeh between 80,000-100,000 years ago, which is at least 20,000 years prior to the Eurasian diaspora. If a population of modern humans remained in the Middle East until the expansion into Eurasia, there would have been sufficient time for genetic drift to reduce heterozygosity dramatically before the Eurasia expansion. This “Middle East isolation” hypothesis provides a robust explanation for the relative homogeneity of European and Asian populations relative to African populations (see Figures 3A-B) and is supported by a recent maximum likelihood estimate of 140,000 years ago for the time of Eurasian-West African population separation . Interestingly, a recent study of the Neandertal genome suggests that the non-African individuals, but not the Africans, contain similar amount of admixture (1-4%) with the Neandertals . The authors suggest that the admixture must have happened between the Neandertals with an ancestral non-African population before the Eurasian expansion. Given the fossil, archaeological, and genetic evidence, the Middle East isolation hypothesis warrants rigorous evaluation as whole-genome sequence data become available.

Like the vast majority of genetic studies this work supports the Out of Africa hypothesis. Non-Africans are all branches from a specific African branch. Or more accurately, an African branch which left Africa. The reduction in heterozygosity, a measure of genetic variation, from Africa to Eurasians was large. Additionally, within Africa south of the Sahara there’s little difference in heterozygosity as a function of geography, but outside of Africa it drops off as a function of distance from Africa. A plausible model then is a radiation from a small ancestral population to the four corners of the world, going through a series of bottlenecks along the way. Or at least that’s a model supported by genomic data. But, the drop in heterozygosity is so great a quick separation from the parental African population would require an implausibly small number of founders (less than 10 in one generation). So, to explain the data, they are suggesting here that the original population was not quite so small, but was isolated from the large African population for thousands of years. They assume genetic drift reduced heterozygosity, but if the model is correct I suspect that the way it worked was that bottlenecks due to climatic fluctuations swept clean a lot of the genetic variation. But in the interregnum the isolated population may have interbred with Neandertals. In fact, perhaps they picked up genes from Neandertals when their own effective population was extremely small.

In any case, a wide ranging paper. They manage to tie their results into two other blockbuster papers.

H/T Dienekes

Citation Xing J, Watkins WS, Shlien A, Walker E, Huff CD, Witherspoon DJ, Zhang Y, Simonson TS, Weiss RB, Schiffman JD, Malkin D, Woodward SR, & Jorde LB (2010). Toward a more uniform sampling of human genetic diversity: A survey of worldwide populations by high-density genotyping. Genomics PMID: 20643205

May 7, 2010

The three layers of the Neandertal cake

I assume by now that everyone has read A Draft Sequence of the Neandertal Genome. It’s free to all, so you should. At least look at the figures. Also, if you haven’t at least skimmed the supplement, you should do that as well. It’s nearly 200 pages, and basically feels more like a collection of minimally edited papers than anything else. There’s no point in me reviewing the paper, since you can read it, and plenty of others have hit the relevant ground already.

Since there seem to be three main segments of the paper, here are a few minimal thoughts on each.

First, the draft genome. What would you have said if someone came up to you ten years ago and told you that you’d live to to see this? Svante Paabo himself admitted he didn’t think he’d see something like this in his lifetime. There was a lot of hard work that went into figuring out how to get at the genetic material, purify it, and confirm that it was actually from the samples in question and not handler contamination and such (remember that there was a problem with contamination a few years back). To a great extent the focus on the results, instead of the methods, is like critiquing a set of landscape photographs taken from a very high peak. We can’t forget the effort and energy that went into scaling the peak itself. A lot of labor input obviously went into this, but additionally we can thank the fact that we live in a technological society where progress is not only expected, but often can’t be accounted for in our projections of future possibilities. I think that’s a very hopeful thing which makes me a little less pessimistic about the possibility of the magic carpet economy.

Second, the are the comparisons between Neandertals, modern humans, and chimpanzees. As Carl Zimmer noted there are an alphabet soup of genes thrown at you in the results. It is hard to make sense of it all, though I did note that genes involved in skin function and phenotype seem to have been the subject of differential evolution between Neandertals and modern humans (i.e., SNP differences in regards to substitutions in the lineages). We already know that there are suggestive signs that Neandertals lost function on pigmentation independently from modern humans. That shouldn’t be too surprising, given that it seems that West and East Eurasians evolved light skin independently. There are some uncertainties about the timing of this, but the different genetic architecture implies that it was unlikely to have occurred immediately after the Out-of-Africa event, and in fact some of the loci imply that depigmentation may have occurred in the Holocene. Skin is famously our biggest organ, so it shouldn’t be that shocking that it is possibly a target of selection, but curious nonetheless (recall that it seems that humans evolved darker skin from a paler ancestor as we lost our fur in the tropics).

Additionally, I think the finding that Neandertals and modern humans seem to share most of the same HARs, regions of the genome where our human lineage seems to differ from other mammals in exhibiting a lot of evolutionary change, is of great interest, though not necessarily surprising. When pointing to Luke Jostins’ post on rates of encephalization, I observed that in some ways it seems like there was a very powerful and consistent lineage specific trend toward greater cranial capacity which had incredible time depth. In The Dawn of Human Culture Richard Klein puts the emphasis on the sharp break between those populations before ~50 thousand years ago, and after. This period is marked by the shift toward behavioral, as opposed to just anatomical, modernity (there were anatomically modern humans in Africa ~200 thousand years ago). Klein’s thesis is that some mutation triggered a radical biocultural change, and was responsible for the Great Leap Forward, the efflorescence of creative symbolic culture which we truly consider the sin qua non of culture. The sharing of HARs between Neandertals and pure humans, and the consistent trend toward encephalization (aside from the post-Ice Age reversal), makes me shift the priors a touch more toward inevitable continuity and away from contingency. I find much of the politics of Robert J. Sawyer’s Neanderthal Parallax series a bit heavy-handed, but his depiction of Neandertals as fundamentally intelligent creatures who differ only on the margins seems a lot more plausible to me now than it was when I first read it in the early to mid-aughts.

Third, and finally, there’s the story of admixture and sex. This is getting all the press, but of course this is the most uncertain, inferential, and speculative aspect of the paper. It’s impressive, but it should open to skepticism, especially after the Out-of-Africa totalism which was ascendant until recently. John Hawks accepts the thrust of the findings, but obviously has his own ideas as to modifications, extensions and qualifications. Dienekes Pontikos favors an alternative interpretation of the data, which the authors point to in the text but dismiss as less parsimonious. My own inclination is to favor the authors in their interpretation of parsimony, but I will admit that this assertion is disputable. Dienekes and others would suggest that it is just as, or more, plausible that the shared variants between non-Africans and Neandertals arise from their common northeast African ancestral population (or some ancestral population of non-Africans and Neandertals). He rightly points out that there may be ancient population substructure within Africa, and using a particular African group as a “reference” for the whole continent may lead to false inferences. The main issue is that the probability of retrieving ancient DNA from northeast African samples in the near future seems low because the conditions for preservation are not optimal  (tropical climates famously degrade and recycle biological material more efficiently than temperate or boreal climates). Additionally, using modern northeast African populations is somewhat problematic because there has clearly been some back-migration from the nearby Arabian populations into this area in the medium-term past (the languages of the Ethiopian highlands are Semitic). One supposes that one could differentiate between the African and Arabian components of the genome of Ethiopians and Somalis, but if the admixture event was two to three thousand years ago I presume it would be technically more challenging than an African American, where very few generations have passed since admixture for recombination to fragment long genomic regions attributable to one ancestral population. In other words, how do you distinguish Neandertal variants which arrived back from Eurasia from ancient African ones? (I suppose that the haplotypes would differ so that the genuinely African ones would be more diverse)

But even if you reject the top-line finding, that most of us are not pure human, I think the paper is a game-changer in terms of shifting your priors in relation to evaluating the plausibility of a result which suggests admixture from an ancient non-African population. I found out about the high likelihood of this paper just before the UNM results were presented at the American Anthropological Society meeting, and it is clear in hindsight with the large author list that many people knew what was coming down the pipepline and had recalibrated their assessment of results which indicated admixture. It is perhaps time to go back and take a second look at papers which you skipped over before because it seemed that they may have been spurious or reporting a statistical quirk because they lay outside of the orthodox paradigm. This is clearly a case where it is good to live in interesting times.

Citation: Green, R., Krause, J., Briggs, A., Maricic, T., Stenzel, U., Kircher, M., Patterson, N., Li, H., Zhai, W., Fritz, M., Hansen, N., Durand, E., Malaspinas, A., Jensen, J., Marques-Bonet, T., Alkan, C., Prufer, K., Meyer, M., Burbano, H., Good, J., Schultz, R., Aximu-Petri, A., Butthof, A., Hober, B., Hoffner, B., Siegemund, M., Weihmann, A., Nusbaum, C., Lander, E., Russ, C., Novod, N., Affourtit, J., Egholm, M., Verna, C., Rudan, P., Brajkovic, D., Kucan, Z., Gusic, I., Doronichev, V., Golovanova, L., Lalueza-Fox, C., de la Rasilla, M., Fortea, J., Rosas, A., Schmitz, R., Johnson, P., Eichler, E., Falush, D., Birney, E., Mullikin, J., Slatkin, M., Nielsen, R., Kelso, J., Lachmann, M., Reich, D., & Paabo, S. (2010). A Draft Sequence of the Neandertal Genome Science, 328 (5979), 710-722 DOI: 10.1126/science.1188021

April 26, 2010

Bayes & Out-of-Africa vs. Alan Templeton

Alan Templeton, whose text Population Genetics and Microevolutionary Theory is right below Hartl & Clark in my book, recently published a strongly worded paper, Coherent and incoherent inference in phylogeography and human evolution. The possibility of statistical errors in published work is not shocking, I have heard that when statisticians are asked to sort through papers in medical genetics journals there are elementary errors in ~3/4 of those which have made it beyond peer review. That being said Templeton seems to be making a stronger case than simple refutation of basic errors, in particular he is suggesting that the “ABC” method which lay at the heart of the paper I reviewed last week is incoherent at the root. Here’s Templeton’s abstract:

A hypothesis is nested within a more general hypothesis when it is a special case of the more general hypothesis. Composite hypotheses consist of more than one component, and in many cases different composite hypotheses can share some but not all of these components and hence are overlapping. In statistics, coherent measures of fit of nested and overlapping composite hypotheses are technically those measures that are consistent with the constraints of formal logic. For example, the probability of the nested special case must be less than or equal to the probability of the general model within which the special case is nested. Any statistic that assigns greater probability to the special case is said to be incoherent. An example of incoherence is shown in human evolution, for which the approximate Bayesian computation (ABC) method assigned a probability to a model of human evolution that was a thousand-fold larger than a more general model within which the first model was fully nested. Possible causes of this incoherence are identified, and corrections and restrictions are suggested to make ABC and similar methods coherent. Another coalescent-based method, nested clade phylogeographic analysis, is coherent and also allows the testing of individual components of composite hypotheses, another attribute lacking in ABC and other coalescent-simulation approaches. Incoherence is a highly undesirable property because it means that the inference is mathematically incorrect and formally illogical, and the published incoherent inferences on human evolution that favor the out-of-Africa replacement hypothesis have no statistical or logical validity.

The method which Templeton favors is naturally one which he has pushed in the past. In any case, I don’t know the statistical details well enough to comment with much knowledge, but I see that a statistician has responded to Templeton already, so I would recommend checking that out. I immediately went looking for responses because the paper uses really strong and dismissive language, and I am somewhat wary of that sort of thing when attempting to tear down the fundamentals of a whole field of research (I want to emphasize that overall I enjoy Templeton’s work, but the paper reminded me a bit too much of Jerry Fodor). His citation of Popper in particular seems an appeal to authority that aims to convince the non-statisticians in the audience, and I don’t see the point of that besides rhetorical utility. I do tend to accept somewhat Templeton’s critique of models which assume very little gene flow between hominin populations before the Out-of-Africa migration, though from what I can tell it does seem that Africa has had relatively little back-migration south of the Sahara over the past 50,000 years, so perhaps this is an older dynamic as well. I am cautiously optimistic that DNA extraction from fossils themselves may put to bed some of these arguments over the dance of parameters, though naturally interpretation is always an issue outside of pure mathematics.

For what it’s worth, here’s the model which Templeton’s method favors:

templ

The thin lines represent continuous gene flow between populations, and the thick lines extremely strong demographic & genetic pulses which overwhelm the genetic structure status quo periodically. I have implied something similar as operative on the smaller scale of H. sapiens sapiens.

Citation: Coherent and incoherent inference in phylogeography and human evolution, PNAS 2010 107 (14) 6376-6381; doi:10.1073/pnas.0910647107

Powered by WordPress