Razib Khan One-stop-shopping for all of my content

December 10, 2017

Visualizing intra-European phylogenetic distances

Filed under: Europe,European genetics,Population genetics,Population genomics — Razib Khan @ 4:53 pm
Neighbor-joining tree of genetic distances between populations


In L. L. Cavalli-Sforza’s The History and Geography of Human Genes he used between population group genetic distances, as measured in FST values, to generate a series of visualizations, which then allowed him to infer historical processes. Basically the way it works is that you look at genetic variation, and see how much of it can be allocated to between groups. If none of it can be allocated to between groups, then in a population genetic sense it doesn’t make much sense to speak of distinctive groups, they’re basically one breeding population. The higher the FST statistic is, the more of the variation is partitioned between the groups.

Roughly this is used to correlate with genetic distance as well as evolutionary divergence. The longer two populations have been separated, the more and more genetic differences they’ll accumulate, inflating the FST value. There are a lot of subtleties that I’m eliding here (see Estimating and interpreting FST: the impact of rare variants for a survey of the recent literature on the topic and pathways forward), but for a long time, FST was the go-to statistic for making phylogenetic inferences on a within-species scale.

Today we have other techniques, Structure, Treemix, fineStructure, and various local ancestry packages.

But FST is still useful to give one a Gestalt sense of population genetic differences. Cavalli-Sforza admits in The History and Geography of Human Genes that European populations had very low pairwise FST, but because of the importance of Europe for sociocultural reasons a detailed analysis of the region was still provided in the text. Additionally, they had lots of European samples (non-European Caucasoids were thrown into one category for macro-group comparisons because there wasn’t that many samples).

Using results from the 2015 paper Massive migration from the steppe was a source for Indo-European languages in Europe, I visualized pairwise genetic distances for European populations, ancient and modern (Han Chinese as an outgroup), on a tree. What the results illustrate is that

  1. Ancient populations were very distinct in Europe from modern ones.
  2. Many modern groups are clustered close together.

The bulk of the population genetic structure in modern Europe seems to have been established in the period between 3000 BCE and 2000 BCE. This is not that much time for a lot of distinctiveness to develop, especially on the geographically open North European plain. I suspect with more and more Mesolithic and early to middle Neolithic DNA we’ll see that some of the modern population structure is a ghost of ancient substrate absorption.

Many of the ethno-national categories that are very significant in recent history, and impact the cultural memories of modern people and their genealogies, have very shallow roots. This does not mean they are not “real” (I don’t know what that’s supposed to mean at all), just that many of the identities which seem so salient to us today may be relatively recent in terms of their significance to large groups of humans….

November 27, 2017

Understanding prehistory through genetic inference and ancient DNA

Filed under: Ancient DNA,Population genomics — Razib Khan @ 8:23 pm

Before David Reich’s book, Who We Are and How We Got Here: Ancient DNA and the New Science of the Human Past, I highly recommend a new preprint from Pontus Skoglund and Iain Mathieson*, Ancient genomics: a new view into human prehistory and evolution.

It’s basically at the sweet spot for a lot of readers: doesn’t overemphasize methods or archaeological minutiae that’s hard to follow. That being said I do think you would benefit if you read two things which would complement in those directions, First Farmers: The Origins of Agricultural Societies, and Ancient Admixture in Human History.

* I have to say, I consider Iain a friend, but am I the only one a bit perplexed by how a British person can have such a difficult to spell version of his name? I always have to look it up!

November 24, 2017

Soft selection for gentleness in Puerto Rican African Honeybees

Filed under: Population genetics,Population genomics,Soft Selection,Soft Sweep — Razib Khan @ 3:07 pm

When I was a kid “killer bees” were a major pop culture thing. There were movies about the bees, and we would get updates about their march northward in the news. They were a cautionary tale of our species’ hubris.

Today we have a little bit more perspective. These bees were actually just African honeybees, the ancestral population to European honeybees, which were introduced to the New World with Europeans centuries earlier than the African honeybees. African honeybees were not that different from European honeybees, but they were more aggressive and tended to outcompete European honeybee colonies. They are a major problem for the beekeeping industry, but not a major threat to human life.

Today the African and European populations in the United States seem to have stabilized in their ranges, with a hybrid zone between them. African bee’s migratory behavior makes them less competitive with European bees in colder climates.

A friend of mine once mentioned to me that if he had to do it all over again he would do research on the evolutionary genomics of Hymenoptera, and in particular bees. People care about bees. So it ‘s no surprise that I noticed this paper out in Nature Communications, A soft selective sweep during rapid evolution of gentle behavior in an Africanized honeybee:

Highly aggressive Africanized honeybees (AHB) invaded Puerto Rico (PR) in 1994, displacing gentle European honeybees (EHB) in many locations. Gentle AHB (gAHB), unknown anywhere else in the world, subsequently evolved on the island within a few generations. Here we sequence whole genomes from gAHB and EHB populations, as well as a North American AHB population, a likely source of the founder AHB on PR. We show that gAHB retains high levels of genetic diversity after evolution of gentle behaviour, despite selection on standing variation. We observe multiple genomic loci with significant signatures of selection. Rapid evolution during colonization of novel habitats can generate major changes to characteristics such as morphological or colouration traits, usually controlled by one or more major genetic loci. Here we describe a soft selective sweep, acting at multiple loci across the genome, that occurred during, and may have mediated, the rapid evolution of a behavioural trait.

Come for the bees, but stay for the soft selection! If you talk to anyone in evolutionary and population genomics you know that the future is in understanding patterns of soft selection and polygenic selection from standing variation. Though these are related phenomena which are associated with each other, all are all distinct.

Standing variation just refers to the diversity which is segregating in the population at any given time. At any given moment many loci exhibit polymorphism. This polymorphism can be a target of natural selection if it is correlated with heritable variation and differentials in fitness. Though soft selection can be quite wooly it’s inverse, hard selection, is clear: in genetic terms hard selection can be seen in allele frequency changes at a single variant in a locus, going from the point where it is a novel mutation to nearly fixed in the population. In Haldane’s original conception hard selection involved excess deaths, and imposed a limit on the rate of evolution as well as the amount variation you could expect within a given population. This model was convenient in the pre-genomic and early genomic era because empirical selection tests had to focus on large allele frequency changes around singular loci. Researchers didn’t have large numbers of whole-genome samples available (nor the computational ability to analyze them).

Today this is not a limitation. In the analysis above the authors had 30 individuals of the 3 populations sequenced at high quality (20x). They ended up with millions of genetic variants they could analyze.

The plot to the left shows that “gentle African honeybees” (gAHB) tend to be closer to the African honeybee populations (AHB) overall (though with some hybridization with European honeybees, EHB). This is not surprising.

But the key observation was that over 12 generations the African honeybees of Puerto Rico became progressively less aggressive, despite maintaining overall morphological similarities to the mainland Mexican African bees from which they likely derive. Though buried in the discussion, there is a rationale for why this morphological change may have occurred: the Puerto Rican bees are subject to a lot of negative selection against aggression because of the density of the island, as well as the reality that aside from humans there aren’t other many species where their aggressive tendencies are beneficial. Basically, if you are an aggressive colony, it’s harder to make a go in densely settled areas (the implication here then is that there are probably “gentle” African honeybee populations across Latin America, they just are never disaggregated from the broader meta-population).

Credit: Phillip Messer and Nandita Garud

It’s the genomics where the real evolutionary insight comes in: they found that there were multiple soft sweep events around genetic regions implicated in behavior. In their overall genome the gAHB of Puerto Rico resembled mainland AHB, but in this subset of genetic loci they resembled EHB. Many of these loci had also been known to be targets of selection when the original European bee population diverged from the ancestral African population. Basically this is a genomic illustration of convergent evolution.

Regular readers of this blog will recognize the ways they detected selection. They used a modified form of EHH, which is reasonable since the selection event was recent enough to have been associated with distinct haplotype blocks. Also, standard Fst analysis showed that these were outliers in relation to the broader genetic pattern of relatedness (these loci were more like EHB than AHB, while most loci were more like AHB than EHB).

So this a form of polygenic selection. Remember, natural selection only knows genes through the phenotype (with intra-genomic selection being an exception). A behavior like aggression is probably subject to the fourth law of behavior genetics. That is, variation won’t be defined around a single genetic locus. Rather, variation across the genome will be correlated with variation in the phenotype. As selection favors a particular value of the phenotype across the distribution the allele frequencies across many genetic loci will shift, but they will not necessarily fix. Polygenic selection operates on the dispersed standing genetic variation which explains much of the variation of the phenotype in question. Instead of total sweeps to fixation due to large fitness differences between a given allele and its alternative form, the selection impact is distributed and diffused across the genome.

Though most of the genetic variants seem to recapitulate the evolution of the less aggressive phenotype that occurred with the original migration north of African honeybees, some of the selection signatures were novel. This points to the reality that when you have soft selection on standing variation you may have similar phenotypes which evolve via different means. Additionally, the authors noted that these results were in contrast to controlled breeding experiments in mammals where selection for gentility (“domestication”) often targeted a few loci and exhibited strong pleiotropic effects (due to the genetic correlation). These results point to the limitations of inferences made from human-directed selection.

Soft selection is probably ubiquitous. Consider the evolution of skin color in humans. There are lots of variants and lots of variation, and most of the variation seems to be ancestral. Only at the locus SLC24A5 do you have a perfect illustration of a hard selective sweep, probably from a de novo mutation that emerged around the Last Glacial Maximum.

From a geneticists’ perspective evolution is basically conceived of as changes in allele frequencies over time. Much of this is due to natural selection. Now that the world of soft selection is opening up, I suspect that we’ll understand a lot more of what we see around us, at least in the generality.

Citation: A soft selective sweep during rapid evolution of gentle behaviour in an Africanized honeybee.

September 23, 2017

Africa, the churning continent

Martin Meredith’s The Fortunes of Africa glosses very quickly over one of the major reasons that the “great scramble” for the continent occurred in the late 19th century, the discovery of the usefulness of quinine as an anti-malarial agent. Perhaps because I’ve read Plagues and Peoples and The Retreat of the Elephants: An Environmental History of China, I have always been conscious of the role of disease in discouraging conquest and migration (malaria in Italy was also a way to limit the extent of long-term occupation).

The coastal regions of Africa had been subject to the trade and depredations of European actors for nearly 400 years when the Berlin Conference partitioned the continent amongst European powers. Despite the fact that much of the interior was not charted, there had long been a colonial presence. Accra, the modern capital of Ghana, was originally a 16th-century Portuguese fort, but for several centuries between the 17th and 19th centuries, it was actually a possession of Scandinavian powers, Sweden and Denmark! (before passing on to the British)

For all these centuries the heart of Africa was unknown to Europeans, in part because there were native powers blocking their way, but also because the mortality rates were so high for outsiders, as indicated above. It is no surprise that the main European settlement in Africa which was more than a simple trading fort was at the southern tip of the continent, where the climate was Mediterranean and so the disease burden low.

But once quinine, and machine guns, came into the equation the interior was accessible. It all happened rather quickly in a few decades, though in some cases European ‘colonialism’ involved little more than nominal allegiance of tribal chieftains.

Now A new paper in Cell may herald the beginning of a great genomic scramble to understand the history of Africa. Carl Zimmer in The New York Times has a piece up, Clues to Africa’s Mysterious Past Found in Ancient Skeletons. It begins:

It was only two years ago that researchers found the first ancient human genome in Africa: a skeleton in a cave in Ethiopia yielded DNA that turned out to be 4,500 years old.

On Thursday, an international team of scientists reported that they had recovered far older genes from bone fragments in Malawi dating back 8,100 years. The researchers also retrieved DNA from 15 other ancient people in eastern and southern Africa, and compared the genes to those of living Africans.

The general results of the paper, Skoglund et.al’s Reconstructing Prehistoric African Population Structure, was presented at the SMBE meeting this summer. So in broad sketches I was not surprised at the results, though the details require some digging into.

The Bantu Expansion repatterned the population structure of Africa

Between 1000 BC and 500 AD the expansion of iron wielding agriculturalists from the environs of modern day southern Cameroon reshaped the cultural and genetic landscape of Sub-Saharan Africa. The relatively late date of this expansion should give us a general sense of how careful we need to be about making assertions about “prehistoric Africa.” When Egypt’s New Kingdom was expanding southward along the Nile and into the Levant Sub-Saharan Africa was qualitatively very different from what we see today in both culture and genetic structure. The continent’s contemporary human geography does not have a deep time depth.

In any case, anyone who has worked with genetic data from Africa is struck by how similar Bantu-speaking populations are genetically. So these results are not surprising. South African Zulus occupy positions far closer to Kenyans and Congolese than they do to Khoisan peoples to the west of them facing the Kalahari. The Xhosa people on the cultural frontier of the Bantus in South Africa exhibit substantial admixture from Khoisan (to the point where they have even integrated clicks into their language!), but even they are preponderantly non-Khoisan.

By sampling ancient genomes from South Africa across a geographical transect which runs up the Rift Valley to Ethiopia Skoglund et al. show that before the Bantu Expansion there was a north-south genetic relatedness cline. When this result was presented at SMBE a few friends were quite excited that they were being presented a cline, as some researchers have felt that this particular lab group has a tendency to model everything as pulse admixtures between distinct groups. But the reasonably deep time transect in Malawi exhibited no variance in admixture fractions, which is indicative of the likelihood that its “mixed” status at a particular K cluster is simply an artifact (see this post for what’s going on).

One particular aspect of the results from Malawi is that they found no continuity between contemporary populations, Bantu agriculturalists, and these ancient hunter-gatherers. That is, hunter-gatherers were replaced in toto. This is not entirely surprising, as many researchers who have worked with European ancient DNA believe that hunter-gatherers in many areas left no descendants at all (the “hunter-gatherer” fractions in modern groups in a particular region are believed to be due to migration of mixed populations who obtained “hunter-gatherer” ancestry at another locale).

But the Bantus were not the first “intrusive” population

These results also have some moderate surprises. A Tanzanian sample from 1100 BC from a pastoralist context exhibits an ancestral mix which is Sub-Saharan African and West Eurasian/North African. More precisely, about 38 percent of this individual’s ancestry resembles that of the Pre-Pottery Neolithic culture of the Levant, and the rest of the genome most resembles a 4500 year old sample from Ethiopia.

This date is before the initiation of the Bantu Expansion. The genetic results in this work, and earlier publications, strongly points to the likelihood that this population(s) mediated the spread of pastoralism to the south and west. In particular, all Khoisan groups of southern Africa seem to have admixture from this group, more (Khoi) or less (San).

But a curious aspect of this result is that these early pastoralists do not carry any evidence of admixture from ancient eastern farmers from the Zagros region. That is, the West Eurasian gene flow into the Tanzanian pastoralists predates the great exchange/admixture in the Middle East between western and eastern lineages. Since that reciprocal gene flow seems to have occurred at least 2,000 years before the Tanzanian pastoralist’s time, it suggests that this West Eurasian element was in Africa for thousands of years.

The second important point to emphasize is that the Iranian-like component is found among Cushitic speaking Somali and Afar samples, at 15-20% clips. Looking at the supporting tables a wide range of East African populations have the Tanzanian pastoralist ancestry but do not show evidence of the Iranian-like ancestry, which is now ubiquitous in the Middle East, and presumably in the highlands of Ethiopia as well (which usually show somewhat higher levels of Eurasian ancestry than is the case on the coast, especially among Semitic language speakers).

This fact is important because many of the Nilotic peoples are reputed to have absorbed Cushitic groups relatively recently in the past. This is also true for Bantu speaking groups according to these and other data. Finally, the Sandawe, who speak a language with clicks, and so may have some affinity to Khoisan, are often stated to have Cushitic affinities (looking at the data they clearly have West Eurasian ancestry). But their Eurasian ancestry seems to lack the Iranian-like component as well.

None of the populations with putative Cushitic ancestry, but who lack Iranian-like ancestry, speak a Cushitic language (most speak Nilotic languages, but East African Bantus have mixed with these Nilotic groups, so they have the same ancestry). Therefore I wonder if these pastoralists spoke an Afro-Asiatic language in the first place.

A patchy landscape

The phylogenetic tree illustrates the relationships of various African populations without much recent Eurasian ancestry. In The New York Times article David Reich indicates that the Hadza people of Tanzania are the closest Sub-Saharan Africans to the lineage ancestral to non-Africans. This is actually a simplification of what you see in the paper, and is illustrated in the tree to the left. The 4500 year old Ethiopian sample, which does not have Eurasian ancestry, nevertheless is the closest of all Sub-Saharan groups to Eurasians. The Hadza have the highest fraction of this ancestral component of all Sub-Saharan Africans in their data set, but many other populations also carry this ancestry (the Tanzanian pastoralist combined the PPN ancestry with this element).

This was patchy landscape of inhabitation, because though the Tanzanian pastoralist ancestry, a combination of PPN and proto-Ethiopian, spread all the way to the Cape, there were populations, such as the Hadza and a 400 year old individual sampled from the Kenya island of Pemba, which lacked this genetic variation. Indeed, they are also not on the north-south (proto-Ethiopian to Khoisan) cline that featured so prominently above.

The sampling of ancient individuals is not very dense yet, so we can’t say much. But I think it does indicate we need to be cautious about assumpting gene flow dynamics as-the-crow-flies, simply a function of distance. Ecological suitability no doubt plays a strong role in how populations expand. The Bantus, for example, were stopped in South Africa by the fact that their agricultural toolkit was not suitable for the western half of the country. So when Europeans arrived in the 16th century the residents of the Cape where Khoi pastoralists.

The presence of the Hadza in Tanzania, or an individual of unmixed proto-Ethiopian ancestry on Pemba 400 years ago, indicates that the ethnic geography of East Africa has long been fluid and dynamic. There is no reason to suppose that the Hadza are not themselves migrants from further north, perhaps easily explaining why they are not on the north-south cline so evident from the ancient DNA.

The rise of Basal Humans

Several years ago researchers discovered that the first farmers of Europe, who descended from an Anatolian population, were in part derived from a group which split off very early from other Eurasian populations. This group was termed “Basal Eurasian” (BEu) because it was an outgroup to all other Eurasians, including European hunter-gatherers, East Asians, Oceanians, and the natives of the New World. Subsequent work has shown that the early Neolithic farmers of the Near East, whether they’re from the Levant or the Zagros, had about half their ancestry from this population.

No ancient genomes which are predominantly BEu have been discovered yet. The fact that populations on the cusp of the Holocene seem to have Basal Eurasian ancestry across the Middle East suggests that the admixture with hunter-gatherers related to those of Europe must have occurred during the Pleistocene. But Basal Eurasian is arguably the most parsimonious explanation of the shared drift patterns that we see.

Skoglund et al. suggest that there may be the necessity of a similar construct in Africa. They are not the first, Schlebusch et al. also suggested the necessity of this lineage in the supplements of their preprint on ancient South Africans. Within Skoglund et al. the authors see variation between the far West African Mende and the eastern West African Yoruba, where the latter exhibits closer affinity to East African populations than the former (this includes those such as the proto-Ethiopian with no Eurasian admixture). Additionally, the authors found that Khoisan groups share more alleles with populations in East Africa than they do with those in West Africa even when you account for admixture.

One model that can explain this variation is long range gene flow, so that there would be connections between various regions as a function of their distance. Another explanation is that West African populations are the product of a Basal Human (BHu) population which separated first, before the bifurcation of Khoisan from other human populations. This would reorder our understanding of who the most basal humans are. Additionally, it would align with long-standing work of deep lineages within Africa contributing a minor component of the continent’s ancestry.

As should be clear due to the tree above, BHu postdates the separation of African humans from Neanderthals. One does wonder about the relevance of the Moroccan “modern” human to these models.

Understanding culture from genetics and genetics from culture

The spread of the Bantus over 1500 years from one end of the continent to the other is perhaps one of the most important dynamics we can use to understanding the spread of farming more generally. The linguistic unity of the Bantus, or at least their affinity, suggests to us that the first farmers of Europe, who spread across much of the continent in 2500 years, probably exhibited the same pattern. The low levels of gene flow between hunter-gatherers and farmers, despite living in the same regions for thousands of years, can be illustrated with African examples (e.g., the Hadza vs. their Bantu neighbors).

We are rather in the early phase of understanding these dynamics. There are more remains to be found, perhaps in the dry fastness of the Sahara or Sahel? (though unfortunately political considerations may prevent excavation due to danger to archaeologists) The genetics will give us a general idea about the nature of genetic variation and how it arose, but robust cultural models also need to be developed which illustrate how these genetic patterns arose.

Citation: Reconstructing Prehistoric African Population Structure, Skoglund, Pontus et al. Cell , Volume 171 , Issue 1 , 59 – 71.e21

Powered by WordPress