Razib Khan One-stop-shopping for all of my content

May 21, 2018

The mutation accumulation controversy continues….

Filed under: Mutation,Population genetics — Razib Khan @ 10:56 pm

Every few years I check to see if the great mutation accumulation controversy has resolved itself. I don’t know if anyone calls it that, but that’s what I think of it as. There are two major issues that matter here: mutation rates are a critical parameter in evolutionary models, and, mutation accumulation over time matters for parental age effects when it comes to disease (speaking as an older father!).

In the latter case, I’m talking about the reasons that people freeze their eggs or sperm. In the former case, I’m talking about whether we can easily extrapolate mutation rates over evolutionary time as semi-fixed, so we can infer dates of last common ancestry and such. To give a concrete example of what I’m talking about, if mutation rates varied a lot over the evolutionary history of our hominin lineage, then we might need to rethink some of the inferred timings.

Today two preprints came out on mutation accumulation. First, Overlooked roles of DNA damage and maternal age in generating human germline mutations. Second, Reproductive longevity predicts mutation rates in primates. What a coincidence in synchronicity!

Additionally, the last author on the second preprint, Matt Hahn, is someone I’ll be doing a podcast with this week. So aside from talking about neutral theory, and his book Molecular Population Genetics, I’m going to have to bring up this mutation business.

The figure above from the first preprint shows that the proportion of mutations derived from the father don’t increase over time, as textbooks generally state. Why would we expect this? Sperm keeps replicating after puberty so you should be gaining more mutations. In contrast, the eggs are arrested in meiosis. There are various mechanistic reasons that the authors of the first preprint give for why the ratio does not change between paternal and maternal mutations (e.g., non-replicative mutations seem to be the primary one). The authors are using a very “pedigree” strategy, rather than an “evolutionary” one. They’re looking at sequenced trios, and noticing patterns. I think in the near future they’ll be far more sure of what’s going on because they’ll have bigger sample sizes. They admit the effects are subtle (also, some of the p-values are getting close to 0.05).

Instead of focusing on a human pedigree, the second preprint does some sequencing on owl monkeys (I had no idea there were “owl monkeys” before this paper). They find that the mutation rate is ~32% lower in owl monkeys than in humans. Why is this?

The plot to the left shows that mutations increase across age with species (though the number of data points is pretty small). The authors contend that:

The association between mutation rates and reproductive longevity implies that changes in life history traits rather than changes to the mutational machinery are responsible for the evolution of these rates. Species that have evolved greater reproductive longevity will have a higher mutation rate per generation without any underlying change to the replication, repair, or proofreading proteins.

If I read this right: owl monkeys reproduce fast and don’t have as much reproductive longevity. Ergo, lower mutation rates (less mutational build-up from paternal side).

After all these years I’m still not convinced about anything. I assume that eventually bigger data sets will come online and we’ll resolve this. Someone has to be right!

(not too many people on Twitter get what’s going on either)

May 8, 2018

The peoples of the Maghreb have some Pleistocene roots

Filed under: Human Population Genetics,North Africa,Population genetics — Razib Khan @ 11:58 pm
Moroccan Berber man

The Maghreb is an important and interesting place. In the history of Western civilization, the tension between Carthage, the ancient port city based out of modern-day Tunisia, and Rome, is one of the more dramatic and tragic rivalries that has resonances down through the ages. Read Adrian Goldsworthy’s chapter on the Battle of Cannae in The Punic Wars for what I’m alluding to (and of course there was Cato the Younger’s dramatic remonstrations).

Later Roman Africa, which really encompassed northern Morocco, coastal Algeria, and Tunisia and Tripolitania, became a major social and economic pillar of the Imperium. Not only did men such as the emperor Septimius Severus and St. Augustine have roots in the region, but these provinces were a major economic bulwark for the Western Empire in its last century. The wealthy Senators of the 4th and 5th century were often absentee landlords of vast estates in North Africa. The fall of these provinces to the Vandals and Alans in the 430s began the transformation of the Western Empire based in Rome into a more regional player, rather than a true hegemon (perhaps an analogy here can be made to the loss of Anatolia by the Byzantines in the 11th century).

Another important aspect of North Africa is that it is the westernmost extension of the region possibly settled by Near Eastern farmers in Africa. The native Afro-Asiatic Berber languages seem to have been dominant in the region despite the influence and prestige of Punic and Latin in the cities when Muslim Arabs conquered the region in the late 7th century. The genetic-demographic characteristics of the region are relevant to attempts to understand the origins of the Afro-Asiatic languages more generally since Berber is part of the clade with the Semitic languages.

A preprint and a paper utilizing ancient DNA have shed a great deal of light on these questions recently. The paper is in Science, Pleistocene North African genomes link Near Eastern and sub-Saharan African human populations. The preprint is Ancient genomes from North Africa evidence prehistoric migrations to the Maghreb from both the Levant and Europe. They are in broad agreement, though they cover somewhat different periods.

The figure below is the big finding of the Science paper:

They retrieved some genotypes from a site in northern Morocco, Taforalt, which dates to ~15,000 years before the present. This is a Pleistocene site, before the rise of agriculture. The Taforalt individuals are about 65% Eurasian in affinity, and 35% Sub-Saharan African. This confirms that the Eurasian back-migration to northern Africa predates the Holocene, just as many archaeologists and geneticists have reported earlier.

The samples from the preprint date to a later time. IAM in the samples dates to 7,200 years before the present, and KEB to ~5,000 years before the present. It seems pretty clear that the IAM samples in the preprint exhibit continuity with the Taforalt samples. Though it is not too emphasized in the preprint the lower K’s seem to strongly suggest that the IAM samples have Sub-Saharan African ancestry, just like the Taforalt samples which are nearly 8,000 years older. In the KEB samples, the fraction drops, probably diluted in part by ancestry related to what we elsewhere term “Early European Farmer” (EEF), related to the Anatolian farming expansion.

Both the Taforalt and IAM samples, in particular, seem to exhibit strong affinities to Natufian/Levantine peoples. Additionally, many of these samples carry Y chromosome haplogroup E1b, just like some of the Natufians. These results indicate that the Natufian-North African populations were exchanging genes or one cline rather deep in the Pleistocene.

Though various methods have suggested that there is a lot of recent Sub-Saharan African admixture, dating to the Arab period, in North Africa, these results suggest that much of it is far older. The Mozabites, as an isolated Berber group, reflect this tendency. Though some individuals have inflated African ancestry due to recent admixture, much of it is older and evener. And yet the Mozabites seem to have less Sub-Saharan African ancestry on average than the IAM sample.

There aren’t enough data points to make a strong inference about the temporal transect, but these few results imply a decline in Sub-Saharan ancestral component after the Pleistocene with further farming migration, and then a rise again with the trans-Saharan slave trade during the Muslim period. Another issue, highlighted in the preprint, is likely heterogeneity within the Maghreb in ancestry (lowland populations in modern North Africa tend to have more Sub-Saharan ancestry due to where slaves were settled).

In the Science paper the authors make an attempt to adduce the origin of the Sub-Saharan contribution to the Taforalt individuals. The result is that there is no modern or ancient proxy that totally fits the bill. These individuals have affinities to many Sub-Saharan African populations.  The Sub-Saharan component is likely heterogeneous, but attempts to model European genetic variation during the Ice Age ran into trouble that divergence from modern populations was quite great. Until we get more ancient DNA there probably won’t be too much more clarity.

On the issue of the Eurasian ancestry, it’s clearly quite like the Natufians. But curiously the authors find that the Neanderthal ancestry in these samples is greater than that found in early Holocene Iran samples. From this, the authors conclude that they may have had a lower fraction of “Basal Eurasian” (BEu) than those populations further to the east. But already 15,000 years ago BEu populations were mixed with more generic West Eurasians to generate the back-migration to Africa. If BEu diverged from other Eurasians >50,000 years ago, then it may have merged back into the “Out-of-Africa” populations around or before the Last Glacial Maximum, ~20,000 years ago.

Finally, the authors looked at some pigmentation genes. Curiously the Taforalt and IAM individuals did not carry the derived variants for pigmentation found in many West and South Eurasians, but the KEB did. This confirms results from Europe, and population genomic inference in modern samples, that selection for derived pigmentation variants is relatively recent in the Holocene.

I do want to add that one possibility about the Sub-Saharan ancestry in the Taforalt, and probably all modern North Africans to a lesser extent, is that it is ancient and local. We now know proto-modern humans were present in the region >300,000 years ago. Northwest Africa may have been part of the multi-regional metapopulation of H. sapiens, as opposed to the Eurasian biogeographic zone that it is often placed, before a post-LGM back migration of Eurasians.

February 24, 2018

Are Turks Armenians under the hood?

Filed under: Historical Population Genetics,Population genetics — Razib Khan @ 8:31 pm

Benedict Anderson’s Imagined Communities: Reflections on the Origin and Spread of Nationalism is one of those books I haven’t read, but should. In contrast, I have read Azar Gat’s Nations, which is a book-length counterpoint to Imagined Communities. To take a stylized and extreme caricature, Imagined Communities posits nations to be recent social and historical constructions, while Nations sees them as primordial, and at least originally founded on on ties of kinships and blood.

The above doesn’t capture the subtlety of  Gat’s book, and I’m pretty sure it doesn’t capture that of Anderson’s either. But, those are the caricatures that people take away and project in public, especially Anderson’s (since Gat’s is not as famous).

When it comes to “imagined communities” I recently have been thinking how much that of modern Turks fits into the framework well. Though forms of pan-Turkic nationalism can be found as earlier as 9th-century Baghdad, the ideology truly emerges in force in the late 19th century, concomitantly with the development of a Turkish identity in Anatolia which is distinct from the Ottoman one.

The curious thing is that though Turkic and Turkish identity is fundamentally one of language and secondarily of religion (the vast majority of Turkic peoples are Muslim, and there are periods, such as the 17th century when the vast majority of Muslims lived in polities ruled by people of Turkic origin*), there are some attempts to engage in biologism. This despite the fact that the physical dissimilarity of Turks from Turkey and groups like the Kirghiz and Yakut is manifestly clear.

Several years ago this was made manifestly clear in the paper The Genetic Legacy of the Expansion of Turkic-Speaking Nomads across Eurasia. This paper clearly shows that Turkic peoples across Eurasia have been impacted by the local genetic substrate. In plainer language, the people of modern-day Turkey mostly resemble the people who lived in Turkey before the battle of Manzikert and the migration of Turkic nomads into the interior of the peninsula in the 11th century A.D. Of course, there is some genetic element which shows that there was a migration of an East Asian people into modern day Anatolia, but this component in the minority one.**

Sometimes the Turkish fascination with the biological comes out in strange ways, Turkish genealogy database fascinates, frightens Turks. Much of the discussion has to do with prejudice against Armenians and Jews. But the reality is that most Turks at some level do understand that they are descended from Greeks, Armenians, Georgians, etc.

To interrogate this further I decided to look at a data set of Greeks, Turks, Armenians, Georgians, and a few other groups, including Yakuts, who are the most northeastern of Turkic peoples. The SNP panel was >200,000, and I did some outlier pruning. Additionally, I didn’t have provenance on a lot of the Greeks, except some labeled as from Thessaly. I therefore just split those up with “1” being closest to the Thessaly sample and “3” the farthest.

First, let’s look at the PCA.

The Turks are shifted toward the Yakuts, but not too much. In contrast, there is much more of Yakut shift in Tajiks, and especially Turkmens. These are two groups from further east, closer to the heart of the zone Turkic expansion. Curiously, the Tajiks, who are the dominant non-Turkic Iranian speaking people of Central Asia, actually have more East Asian ancestry than the Turks of Turkey. This goes to show that ethnicity is somewhat fluid, and Turkic people have assimilated into the Tajik identity. That being said, please note that the Turkmen are notably more east-shifted than the Tajik.

Let’s see how this looks on pairwise Fst.

Fst is kind of difficult for fine distinctions when you have outgroups like Yakuts and Dai. So let’s look at Treemix with five migrations:

On this, you can see that the relationship of the Greece clusters on Treemix to Lithuanians matches PCA. Greece1 is the closest, Greece 3 the farthest.

The Turks are close to the Georgians and Armenians, but not the Kurds, or Tajiks. And, they receive gene flow from the Turkmen-Yakut region of the graph. So do the Tajiks…but the Tajiks also remove gene flow from the Lithuanians. The admixture plot makes it more clear what’s happening I think.

Yellow ~ modal in Southern Europe, green ~ modal Northern Europe, red ~ Central Asian, while blue and purple are northern and southern East Asian. In comparison to Turks of Anatolia Tajiks have a lot more Northern European affinity, probably because of the common steppe heritage. Not surprisingly, Turks have more Southern European like ancestry.

Curiously the East Asian ancestry in the Turkic people seems to be both Yakut and Dai like, so perhaps it was more cosmopolitan than we might think? The Yakuts after all are from the northern edge of the range, and may have absorbed a lot of indigenous Siberian ancestry.

Georgians have none of the Northern European sort of ancestry, but Armenians do, and Turks even more. One could posit that this is due to Slavic ancestry arriving with the Rumelian Turks who arrived in the 20th century, but just as likely is the possibility that Turks have a lot of ancestry from western Anatolia which was Greek, and Greeks have more of this than Armenians.

It’s hard to tell from these results whether Turks have more of an affinity with Greek or Armenians as their non-Turkic ancestors. So I ran a three population test.

Outgroup X1 X2 f3 error z
Turkey Armenians Yakut -0.00253688 6.70852e-05 -37.8158
Turkey Greece3 Yakut -0.00246931 6.72384e-05 -36.7247
Turkey Georgian Yakut -0.00256555 7.60158e-05 -33.7502
Turkey Armenians Dai -0.00246779 7.40038e-05 -33.3468
Turkey Greece3 Dai -0.0024101 7.34629e-05 -32.8071
Turkey Georgian Dai -0.00249174 8.11957e-05 -30.688
Turkey Greece2 Yakut -0.00222382 7.62368e-05 -29.1699
Turkey Greece2 Dai -0.00231001 8.39207e-05 -27.5261
Turkmen Turkey Dai -0.00288213 0.000108049 -26.6742
Turkmen Turkey Yakut -0.00254805 0.000102816 -24.7826
Turkey Greece1 Yakut -0.00225638 9.94722e-05 -22.6836
Turkey GreekCentral Dai -0.00235681 0.000104014 -22.6587
Turkey Greece3 Tajik -0.000622671 2.76666e-05 -22.5063
Turkey GreekCentral Yakut -0.00221985 0.000101654 -21.8373
Turkey Greece1 Dai -0.00243254 0.000112011 -21.717
Turkey Greece3 Turkmen -0.000640439 3.33529e-05 -19.2019
Turkey GreekThessaly Yakut -0.00208436 0.00011042 -18.8767
Turkey Dai GreekThessaly -0.00225435 0.00012241 -18.4163
Turkey Greece2 Turkmen -0.000584983 3.29819e-05 -17.7365
Turkey Armenians Turkmen -0.000520887 3.07253e-05 -16.953
Turkey Armenians Tajik -0.000421139 2.55274e-05 -16.4975
Tajik Turkey Dai -0.00140423 8.51697e-05 -16.4875
Tajik Turkey Yakut -0.00124601 7.60725e-05 -16.3793
Turkey Georgian Turkmen -0.000532496 3.80694e-05 -13.9875
Turkey Greece2 Tajik -0.000412419 3.04172e-05 -13.5587
Turkey Armenians Lithuanians -0.000459831 3.75838e-05 -12.2348
Turkey Greece1 Turkmen -0.000570715 4.7753e-05 -11.9514
Turkey Kurds Yakut -0.00146087 0.000124799 -11.7058
Turkey GreekThessaly Turkmen -0.000516877 4.46683e-05 -11.5714
Turkey Georgian Tajik -0.000328859 3.02443e-05 -10.8734
Turkey GreekCentral Turkmen -0.000504962 4.92555e-05 -10.2519

Armenians beat out Greece3 a bit better, but really it’s hard to say from this that this is definitive. It’s likely that my Turkish sample has both, and/or the original Turkic nomads had Iranian-like ancestry which was more like Armenian than Greek? Hard to say. Additionally, the face that Greece3 is better than the other options suggests to me that the source are Anatolian Greeks who were less impacted by migrations from the north than Greeks in Greece proper.

 

* The Mughals were Central Asian Turks, while the Safavids were mostly Azeri Turks.

** Since the Turks who arrived in Anatolia had long sojourned in Turn and Iran it is important not to assume that their contribution is limited only to the East Asian component of ancestry.

January 27, 2018

Genetic distances across Eurasia

Filed under: Human Population Genetics,Population genetics — Razib Khan @ 10:00 pm

I feel that for whatever reason that over the past few years that many people have started to exhibit weak intuitions about the magnitude of between population differences on this weblog. Two suggestions for why this might occur.

* First, the proliferation of PCA plots with individuals can make it hard to discern averages

* Second, model-based admixture plots don’t explicitly quantify the differences between the different clusters

To get a better sense of between-group differences I decided to take a step back and look at Fst. Fst basically looks all the genetic variance between groups and quantifies the proportion that can be attributed to differences between groups.

The plot at the top of this post is from an Fst matrix I generated with Plink (I wrote a script to do the pairwise comparison). I did some PCA pruning of the populations to be clear (e.g., with both Cambodians and Filipinos I made them more distinct than they would otherwise be). The goal was to give people a sense of genetic distances within regions and between them.

I also generated a PCA plot and a Treemix plot, for the sake of comparison.

It’s also useful to look at a few group comparisons and judge them in a global context.

Fst
Tamil Telugu 0.0011
Tamil Tamil Scheduled Caste 0.0016
Tamil Bangladeshi 0.0024
Tamil South Indian Brahmin 0.0031
Tamil Uttar Pradesh Brahmin 0.0041
Tamil Sindhi 0.0087
Tamil Vietnamese 0.0668
Southern Chinese Northern Chinese 0.0033
Southern Chinese Vietnamese 0.0034
Southern Chinese Korea 0.0045
Southern Chinese Japanese 0.0087
Southern Chinese Tamil 0.0711
Southern Chinese Polish 0.1141
Gujurati_Patel Telugu 0.0062
Gujurati_Patel Uttar Pradesh Brahmin 0.0065
Gujurati_Patel Bangladeshi 0.0069
Gujurati_Patel Velama 0.0094
Gujurati_Patel Sindhi 0.0104
Gujurati_Patel Polish 0.0405
Gujurati_Patel Japanese 0.0781
GreatBritain Ireland 0.0015
GreatBritain Polish 0.0043
GreatBritain Sicily 0.0077
GreatBritain Uttar Pradesh Brahmin 0.0264
GreatBritain Tamil 0.0430
GreatBritain Korea 0.1130

The non-Brahmin and non-Dalit samples in the 1000 Genomes are not much partitioned much by geography. The Tamil vs. Telugu difference is smaller than that between the British and Irish. Within Tamil Nadu Brahmins though are nearly as different from typical Tamils as Poles are from the English (most of the British sample is English). The biggest differences in Europe are between Sicilians and Northern European groups, which similar in a degree to that between South Indians and Pakistanis. The South Chinese sample is nearly as close to Vietnamese as it is to a North Chinese group, while the difference between Koreans and Chinese is relatively small when compared to the variance you see in South Asia and Europe.

Note: Drift tends to inflate Fst.

January 25, 2018

How South Asian populations relate to each other

Filed under: Population genetics,South Asian Genetics — Razib Khan @ 12:45 am

Since people asking me about this, and I’m running the South Asian Genotype Project, I thought I would post two non-PCA visualizations of how various South Asian groups relate to each other (along with a few outgroups).

The radial plot above is a neighbor-joining tree visualized from pairwise Fst statistics (basically a proxy for genetic distance).

I also used Treemix to generate a plot. You see the similar patterns as the one above, though the two methods are different. Treemix tests a bunch of models and sees how the data fit those models. The visualization of Fst is just a way of representing the summary statistic.

I added 5 migration edges to the plot to the right. Not sure if they add anything, but you can see that some of the nodes move around because they are so mixed.

January 9, 2018

Island demes in an empty world

Filed under: Beringia,Population genetics — Razib Khan @ 9:58 pm

 

As you probably know a new ancient genome paper was published last week in Nature, Terminal Pleistocene Alaskan genome reveals first founding population of Native Americans. There is at least one other involving Willerslev in the works for what it’s worth. Carl Zimmer has a good write-up in The New York Times, while Greg Cochran picked up the fact that the latest results show no evidence of “Australo-Melanesian” affinities that have been found in Amazonians.

The key issue here is that they found 11,500 year old remains from Alaska, one of which they sequenced at 17x coverage, which is rather good (not medical grade good, but really sufficient for a lot of population genomic work). It’s clear that the lineage represented by these remains is “basal” to that of other Native American peoples, whom David Reich’s group labeled “First Americans.” Later, the First Americans diverged into different populations, with the two in modern focus being a northern cluster, including the Aboriginal peoples of Canada and parts of the United States, and a southern one including everyone else. This does not mean that the Beringians were isolated outliers. There may have been many other peoples related to the Beringians who diversified, who went extinct as well. The settlement of Alaska by other peoples suggests to me that extreme conditions in the Arctic made it likely that there would be population turnover there. Also, the fact that these samples were located close to the source of settlement in the New World by modern humans makes their distant relation to all other New World populations unsurprising.

The big thing that the press is highlighting is the confirmation of the Beringian Standstill model, where modern humans percolated into the area between Siberia and Alaska, Beringia, and did not move east for thousands of years. Basically, the conditions were inclement toward human habitation on both sides of Beringia, while a relict modern human group likely occupied a pocket of more moderate climes for thousands of years, with minimal gene flow from the west, and blocked from migration to the east. Genetically the Beringia Standstill made sense for a long time…the divergence between Amerindian lineages and those of eastern Eurasia seemed too old to be accounted for by recent migration a bit more than 10,000 years ago (the old “Clovis first” hypothesis).

How old? This paper suggests that the portion of Native American ancestry which indicates an affinity to East Asians stopped exhibiting gene flow from that source around ~25,000 years ago, after diverging around ~36,000 years ago. This points to the fact that after modern humans came to dominate eastern Eurasia they began to diversify rapidly after 40,000 years ago, but gene flow between different populations did not always allow them to drift apart…at least initially. The ancestors of Native Americans and East Asians may have been in extremely separate locations by ~25,000 years ago, whether it be on the fringes of eastern Siberia, or somewhere in southern China (there is no reason that the modern Chinese have to have had ancestors resident on the North China plain before the Last Glacial Maximum).*

One aspect here I want to emphasize is that our image of a world thickly populated with humans may mislead us in our intuition about how patchy occupation was ~25,000 years ago. Yes, humans may have left artifacts all over the world, but that doesn’t mean that there weren’t centuries or millennia of no occupation, or, that meta-population dynamics were such that extinctions were common. For decades in population genetics there has been talk of “clines vs. clusters,” but if human population densities were far lower, or occupation patchier, then clines may have become much more important recently with high density than in the past.

Finally, back to the Australo-Melanesian issue. Either there is a lot of population structure in ancient Beringia to be explored, with diverse quasi-Asiatic groups, or there was an Australo-Melanesian group already in South America.

* Ancient North Eurasian ancestry came into Beringians ~20,000 years ago. Two groups which merged during the middle of the Last Glacial Maximum.

December 20, 2017

Natural selection in humans (OK, 375,000 British people)

Filed under: Natural Selection,Population genetics,Population genomics,Selection — Razib Khan @ 10:41 pm

 


The above figure is from Evidence of directional and stabilizing selection in contemporary humans. I’ll be entirely honest with you: I don’t read every UK Biobank paper, but I do read those where Peter Visscher is a co-author. It’s in PNAS, and a draft which is not open access. But it’s a pretty interesting read. Nothing too revolutionary, but confirms some intuitions one might have.

The abstract:

Modern molecular genetic datasets, primarily collected to study the biology of human health and disease, can be used to directly measure the action of natural selection and reveal important features of contemporary human evolution. Here we leverage the UK Biobank data to test for the presence of linear and nonlinear natural selection in a contemporary population of the United Kingdom. We obtain phenotypic and genetic evidence consistent with the action of linear/directional selection. Phenotypic evidence suggests that stabilizing selection, which acts to reduce variance in the population without necessarily modifying the population mean, is widespread and relatively weak in comparison with estimates from other species.

The stabilizing selection part is probably the most interesting part for me. But let’s hold up for a moment, and review some of the major findings. The authors focused on ~375,000 which matched their sample criteria (white British individuals old enough that they are well past their reproductive peak), and the genotyping platforms had 500,000 markers. The dependent variable they’re focusing on is reproductive fitness. In this case specifically, “rRLS”, or relative reproductive lifetime success.

With these huge data sets and the large number of measured phenotypes they first used the classical Lande and Arnold method, which leveraged regression to measure directional and stabilizing selection. Basically, how does change in the phenotype impact reproductive fitness? So, it is notable that shorter women have higher reproductive fitness than taller women (shorter than the median). This seems like a robust result.

The results using phenotypic correlations for direction (β) and stabilizing (γ) selection are shown below. The abbreviations are the same as above.

 

There are many cases where directional selection seems to operate in females, but not in males. But they note that that is often due to near zero non-significant results in males, not because there were opposing directions in selection. Height was the exception, with regression coefficients in opposite directions. For stabilizing selection there was no antagonistic trait.

A major finding was that compared to other organisms stabilizing selection was very weak in humans. There’s just not that that much pressure against extreme phenotypes. This isn’t entirely surprising. First, you have the issue of the weirdness of a lot of studies in animal models, with inbred lines, or wild populations selected for their salience. Second, prior theory suggests that a trait with lots of heritable quantitative variation, like height, shouldn’t be subject to that much selection. If it had, the genetic variation which was the raw material of the trait’s distribution wouldn’t be there.

Using more complex regression methods that take into account confounds, they pruned the list of significant hits. But, it is important to note that even at ~375,000, this sample size might be underpowered to detect really subtle dynamics. Additionally, the beauty of this study is that it added modern genomic analysis to the mix. Detecting selection through phenotypic analysis goes back decades, but interrogating the genetic basis of complex traits and their evolutionary dynamics is new.

To a first approximation, the results were broadly consonant across the two methods. But, there are interesting details where they differ. There is selection on height in females, but not in males. This implies that though empirically you see taller males with higher rLSR, the genetic variance that is affecting height isn’t correlated with rLSR, so selection isn’t occurring.

~375,000 may seem like a lot, but from talking to people who work in polygenic selection there is still statistical power to be gained by going into the millions (perhaps tens of millions?). These sorts of results are very preliminary but show the power of synthesizing classical quantitative genetic models and ways of thinking with modern genomics. And, it does have me wondering about how these methods will align with the sort of stuff I wrote about last year which detects recent selection on time depths of a few thousand years. The SDS method for example seems to be detecting selection for increasing height the world over…which I wonder is some artifact, because there’s a robust pattern of shorter women having higher fertilty in studies going back decades.

December 14, 2017

A genetic map of the world

Filed under: Population genetics — Razib Khan @ 4:46 pm


The above map is from a new preprint on the patterns of genetic variation as a function of geography for humans, Genetic landscapes reveal how human genetic diversity aligns with geography. The authors assemble an incredibly large dataset to generate these figures. The orange zones are “troughs” of gene flow. Basically barriers to gene flow.  It is no great surprise that so many of the barriers correlate with rivers, mountains, and deserts. But the aim of this sort of work seems to be to make precise and quantitative intuitions which are normally expressed verbally.

To me, it is curious how the borders of the Peoples’ Republic of China is evident on this map (an artifact of sampling?). Additionally, one can see Weber’s line in Indonesia. There are the usual important caveats of sampling, and caution about interpreting present variation and dynamics back to the past. But I believe that these sorts of models and visualizations are important nulls against which we can judge perturbations.

As I said, these methods can confirm rigorously what is already clear intuitively. For example:

Several large-scale corridors are inferred that represent long-range genetic similarity, for example: India is connected by two corridors to Europe (a southern one through Anatolia and Persia ‘SC’, and
a northern one through the Eurasian Steppe ‘NC’)

We still don’t have enough ancient DNA to be totally sure, but it’s hard to ignore the likelihood that “Ancestral North Indians” (AN) actually represent two different migrations.

India also illustrates contingency of these barriers. Before the ANI migration, driven by the rise in agricultural lifestyles, there would likely have been a major trough of gene flow on India’s western border. In fact a deeper one than the one on the eastern border. And if the high genetic structure statistics from ancient DNA are further confirmed then the rate of gene flow was possibly much lower between demes in the past. Perhaps that would simply re-standardize equally so that the map itself would not be changed, but I suspect that we’d see many more “troughs” during the Pleistocene and early Holocene.

Because there are so many geographically distributed samples for humans, and frankly some of the best methods developers work with human data (thank you NIH), it is no surprise that our species would be mapped first. But I think some of the biggest insights may be with understanding the dynamics of gene flow of non-human species, and perhaps the nature and origin of speciation as it relates to isolation (or lack thereof).

A genetic map of the world

Filed under: Population genetics — Razib Khan @ 4:46 pm


The above map is from a new preprint on the patterns of genetic variation as a function of geography for humans, Genetic landscapes reveal how human genetic diversity aligns with geography. The authors assemble an incredibly large dataset to generate these figures. The orange zones are “troughs” of gene flow. Basically barriers to gene flow.  It is no great surprise that so many of the barriers correlate with rivers, mountains, and deserts. But the aim of this sort of work seems to be to make precise and quantitative intuitions which are normally expressed verbally.

To me, it is curious how the borders of the Peoples’ Republic of China is evident on this map (an artifact of sampling?). Additionally, one can see Weber’s line in Indonesia. There are the usual important caveats of sampling, and caution about interpreting present variation and dynamics back to the past. But I believe that these sorts of models and visualizations are important nulls against which we can judge perturbations.

As I said, these methods can confirm rigorously what is already clear intuitively. For example:

Several large-scale corridors are inferred that represent long-range genetic similarity, for example: India is connected by two corridors to Europe (a southern one through Anatolia and Persia ‘SC’, and
a northern one through the Eurasian Steppe ‘NC’)

We still don’t have enough ancient DNA to be totally sure, but it’s hard to ignore the likelihood that “Ancestral North Indians” (AN) actually represent two different migrations.

India also illustrates contingency of these barriers. Before the ANI migration, driven by the rise in agricultural lifestyles, there would likely have been a major trough of gene flow on India’s western border. In fact a deeper one than the one on the eastern border. And if the high genetic structure statistics from ancient DNA are further confirmed then the rate of gene flow was possibly much lower between demes in the past. Perhaps that would simply re-standardize equally so that the map itself would not be changed, but I suspect that we’d see many more “troughs” during the Pleistocene and early Holocene.

Because there are so many geographically distributed samples for humans, and frankly some of the best methods developers work with human data (thank you NIH), it is no surprise that our species would be mapped first. But I think some of the biggest insights may be with understanding the dynamics of gene flow of non-human species, and perhaps the nature and origin of speciation as it relates to isolation (or lack thereof).

December 10, 2017

Visualizing intra-European phylogenetic distances

Filed under: Europe,European genetics,Population genetics,Population genomics — Razib Khan @ 4:53 pm
Neighbor-joining tree of genetic distances between populations

 

In L. L. Cavalli-Sforza’s The History and Geography of Human Genes he used between population group genetic distances, as measured in FST values, to generate a series of visualizations, which then allowed him to infer historical processes. Basically the way it works is that you look at genetic variation, and see how much of it can be allocated to between groups. If none of it can be allocated to between groups, then in a population genetic sense it doesn’t make much sense to speak of distinctive groups, they’re basically one breeding population. The higher the FST statistic is, the more of the variation is partitioned between the groups.

Roughly this is used to correlate with genetic distance as well as evolutionary divergence. The longer two populations have been separated, the more and more genetic differences they’ll accumulate, inflating the FST value. There are a lot of subtleties that I’m eliding here (see Estimating and interpreting FST: the impact of rare variants for a survey of the recent literature on the topic and pathways forward), but for a long time, FST was the go-to statistic for making phylogenetic inferences on a within-species scale.

Today we have other techniques, Structure, Treemix, fineStructure, and various local ancestry packages.

But FST is still useful to give one a Gestalt sense of population genetic differences. Cavalli-Sforza admits in The History and Geography of Human Genes that European populations had very low pairwise FST, but because of the importance of Europe for sociocultural reasons a detailed analysis of the region was still provided in the text. Additionally, they had lots of European samples (non-European Caucasoids were thrown into one category for macro-group comparisons because there wasn’t that many samples).

Using results from the 2015 paper Massive migration from the steppe was a source for Indo-European languages in Europe, I visualized pairwise genetic distances for European populations, ancient and modern (Han Chinese as an outgroup), on a tree. What the results illustrate is that

  1. Ancient populations were very distinct in Europe from modern ones.
  2. Many modern groups are clustered close together.

The bulk of the population genetic structure in modern Europe seems to have been established in the period between 3000 BCE and 2000 BCE. This is not that much time for a lot of distinctiveness to develop, especially on the geographically open North European plain. I suspect with more and more Mesolithic and early to middle Neolithic DNA we’ll see that some of the modern population structure is a ghost of ancient substrate absorption.

Many of the ethno-national categories that are very significant in recent history, and impact the cultural memories of modern people and their genealogies, have very shallow roots. This does not mean they are not “real” (I don’t know what that’s supposed to mean at all), just that many of the identities which seem so salient to us today may be relatively recent in terms of their significance to large groups of humans….

Visualizing intra-European phylogenetic distances

Filed under: Europe,European genetics,Population genetics,Population genomics — Razib Khan @ 4:53 pm
Neighbor-joining tree of genetic distances between populations

 

In L. L. Cavalli-Sforza’s The History and Geography of Human Genes he used between population group genetic distances, as measured in FST values, to generate a series of visualizations, which then allowed him to infer historical processes. Basically the way it works is that you look at genetic variation, and see how much of it can be allocated to between groups. If none of it can be allocated to between groups, then in a population genetic sense it doesn’t make much sense to speak of distinctive groups, they’re basically one breeding population. The higher the FST statistic is, the more of the variation is partitioned between the groups.

Roughly this is used to correlate with genetic distance as well as evolutionary divergence. The longer two populations have been separated, the more and more genetic differences they’ll accumulate, inflating the FST value. There are a lot of subtleties that I’m eliding here (see Estimating and interpreting FST: the impact of rare variants for a survey of the recent literature on the topic and pathways forward), but for a long time, FST was the go-to statistic for making phylogenetic inferences on a within-species scale.

Today we have other techniques, Structure, Treemix, fineStructure, and various local ancestry packages.

But FST is still useful to give one a Gestalt sense of population genetic differences. Cavalli-Sforza admits in The History and Geography of Human Genes that European populations had very low pairwise FST, but because of the importance of Europe for sociocultural reasons a detailed analysis of the region was still provided in the text. Additionally, they had lots of European samples (non-European Caucasoids were thrown into one category for macro-group comparisons because there wasn’t that many samples).

Using results from the 2015 paper Massive migration from the steppe was a source for Indo-European languages in Europe, I visualized pairwise genetic distances for European populations, ancient and modern (Han Chinese as an outgroup), on a tree. What the results illustrate is that

  1. Ancient populations were very distinct in Europe from modern ones.
  2. Many modern groups are clustered close together.

The bulk of the population genetic structure in modern Europe seems to have been established in the period between 3000 BCE and 2000 BCE. This is not that much time for a lot of distinctiveness to develop, especially on the geographically open North European plain. I suspect with more and more Mesolithic and early to middle Neolithic DNA we’ll see that some of the modern population structure is a ghost of ancient substrate absorption.

Many of the ethno-national categories that are very significant in recent history, and impact the cultural memories of modern people and their genealogies, have very shallow roots. This does not mean they are not “real” (I don’t know what that’s supposed to mean at all), just that many of the identities which seem so salient to us today may be relatively recent in terms of their significance to large groups of humans….

December 8, 2017

The Saxon Panmixia

Filed under: Population genetics — Razib Khan @ 9:09 pm


One reason I quite like Norman Davies’ book The Isles is that it is a history of Britain and Ireland which explicitly aims to not privilege the story of the English inordinately. As the most powerful and numerous people of the British Isles the English loom large, but in the period between Gildas and Bede things were very different. In the early 600s the Welsh king Cadwallon ap Cadfan conquered and held Northumbria for a period, northern England from the Irish Sea to the North Sea. But this was the last time that a Celtic monarch held land in eastern England, unless you count the Tudors.

In The Isles, written at the turn of the century, Davies promotes the view dominant among historians at that time that the transition from British Celtic to Anglo-Saxon occurred through diffusion of elite culture. He alludes to the fact that in the year 700 the law code of Wessex alludes explicitly to the fact the weregild paid for the death of a Saxon was many-fold greater than that paid for a Briton (of the same class status). This suggests that many Britons were still resident in the Anglo-Saxon kingdoms. The contrasting view, which was dominant in the early 20th century, was that the English replaced the Celts in toto. The Irish, Welsh, and to some extent the Scots, were viewed as racially distinct from the Germanic English.

2015’s The fine scale genetic structure of the British population answered many of these questions. It turns out the maximal positions were incorrect. The authors estimate that 10-40% of the ancestry in eastern and southern England (the red positions on the map) derive from Germanic peoples which we might term Saxon, Angles, and Jutes. Even if the fraction is as low as 10% that is not trivial. If we take a value closer to ~25%, unless there were massive reproductive advantages for elites, it could not have just been diffusion from the elite. Archaeologists also see wholesale changes in agricultural patterns in eastern England, indicative of a transfer of a whole folkway.

All that being said it is likely that the majority of the ancestry of the population of England proper descends from Britons. In fact, once the Anglo-Saxon cultural hegemony was established it seems that some elite Britons may also have changed their identity. It is always a curious fact that the names of the first kings in the genealogy of the House of Wessex are distinctively Celtic. Just as Romano-Gallic aristocrats began aping the styles and mores of the Frankish elite in the 6th century, so perhaps some British warlords became Saxons.

Using similar methods many of the same authors have now put out a preprint on Ireland, Insular Celtic population structure and genomic footprints of migration. Unlike the earlier work on Britain, they’ve acknowledged the ancient DNA results which have reshaped our understanding of population turnover in Ireland. That being said, they are focused on more recent events, as well as spatial structure in the modern era.

Though they don’t have access to as detailed a regional data set as in the earlier work on Britain, in this case, the authors managed to detect a lot of regional population structure within Ireland. Why? Though the Irish are relatively homogeneous, as all Northern Europeans are, looking at long tracts of the genome and the patterns therein can squeeze out more information.

The figure at the top of this post shows how well they can cluster individuals geographically: they’ve basically recapitulated the “map of the British Isles.” There aren’t too many surprises. Western Ireland seems to exhibit greater genetic differences as a function of distance. Probably because it’s less developed, and perhaps because it has been less impacted by outsiders. Ulster and southern Scotland are strongly connected genetically. There are two issues going on here. First, the famous migration of Protestants into this region of Ireland from Scotland and northern England that occurred after the conquest of the 16th century. And second, the earlier migration of Irish to Scotland, which resulted in the creation of the Dal Riata kingdom.

Additionally, the authors detect more admixture in several parts of Ireland from Norse than they had anticipated. The mixing of Scandinavians and Irish created a hybrid culture, the Norse-Gaels, which was highly influential around the Irish Sea. So it would not be exactly surprising if there was a greater Scandinavian contribution to Irish ancestry than had been anticipated.

Of greater interest to me is the impact of social-political institutions on the genetic structure or lack thereof. Both Britain and Ireland have homogenized modal clusters. In Britain, this is associated with the expanding cultural zone of Anglo-Saxon rule, and later became the core of England. In Ireland, it seems to be the Pale, where Anglo-Norman rule was dominant for many centuries. Rapid cultural change seems to induce a state of panmixia. Genetic distinctiveness in the British Isles seems to have persisted in populations which were geographically isolated, or politically insulated, from expansive, assimilative, and integrative cultures. The modal cluster in Ireland is far smaller than in England, which nicely correlates with the much more limited impact of the Anglo-Norman ascendency of the medieval period.

The Saxon Panmixia

Filed under: Population genetics — Razib Khan @ 9:09 pm


One reason I quite like Norman Davies’ book The Isles is that it is a history of Britain and Ireland which explicitly aims to not privilege the story of the English inordinately. As the most powerful and numerous people of the British Isles the English loom large, but in the period between Gildas and Bede things were very different. In the early 600s the Welsh king Cadwallon ap Cadfan conquered and held Northumbria for a period, northern England from the Irish Sea to the North Sea. But this was the last time that a Celtic monarch held land in eastern England, unless you count the Tudors.

In The Isles, written at the turn of the century, Davies promotes the view dominant among historians at that time that the transition from British Celtic to Anglo-Saxon occurred through diffusion of elite culture. He alludes to the fact that in the year 700 the law code of Wessex alludes explicitly to the fact the weregild paid for the death of a Saxon was many-fold greater than that paid for a Briton (of the same class status). This suggests that many Britons were still resident in the Anglo-Saxon kingdoms. The contrasting view, which was dominant in the early 20th century, was that the English replaced the Celts in toto. The Irish, Welsh, and to some extent the Scots, were viewed as racially distinct from the Germanic English.

2015’s The fine scale genetic structure of the British population answered many of these questions. It turns out the maximal positions were incorrect. The authors estimate that 10-40% of the ancestry in eastern and southern England (the positions on the map) derive from Germanic peoples which we might term Saxon, Angles, and Jutes. Even if the fraction is as low as 10% that is not trivial. If we take a value closer to ~25%, unless there were massive reproductive advantages for elites, it could not have just been diffusion from the elite. Archaeologists also see wholesale changes in agricultural patterns in eastern England, indicative of a transfer of a whole folkway.

All that being said it is likely that the majority of the ancestry of the population of England proper descends from Britons. In fact, once the Anglo-Saxon cultural hegemony was established it seems that some elite Britons may also have changed their identity. It is always a curious fact that the names of the first kings in the genealogy of the House of Wessex are distinctively Celtic. Just as Romano-Gallic aristocrats began aping the styles and mores of the Frankish elite in the 6th century, so perhaps some British warlords became Saxons.

Using similar methods many of the same authors have now put out a preprint on Ireland, Insular Celtic population structure and genomic footprints of migration. Unlike the earlier work on Britain, they’ve acknowledged the ancient DNA work which has reshaped our understanding of population turnover in Ireland. That being said, they are focused on more recent events, as well as spatial structure in the modern era.

Though they don’t have access to as detailed a regional data set as in the earlier work on Britain, in this case, the authors managed to detect a lot of regional population structure within Ireland. Why? Though the Irish are relatively homogeneous, as all Northern Europeans are, looking at long tracts of the genome and the patterns therein can squeeze out more information.

The figure at the top of this post shows how well they can cluster individuals geographically: they’ve basically recapitulated the “map of the British Isles.” There aren’t too many surprises. Western Ireland seems to exhibit greater genetic differences as a function of distance. Probably because it’s less developed, and perhaps because it has been less impacted by outsiders. Ulster and southern Scotland are strongly connected genetically. There are two issues going on here. First, the famous migration of Protestants into this region of Ireland from Scotland and northern England that occurred after the conquest of the 16th century. And second, the earlier migration of Irish to Scotland, which resulted in the creation of the Dal Riata kingdom.

Additionally, the authors detect more admixture in several parts of Ireland from Norse than they had anticipated. The mixing of Scandinavians and Irish created a hybrid culture, the Norse-Gaels, which was highly influential around the Irish Sea. So it would not be exactly surprising if there was a greater Scandinavian contribution to Irish ancestry than had been anticipated.

Of greater interest to me is the impact of social-political institutions on the genetic structure or lack thereof. Both Britain and Ireland have homogenized modal clusters. In Britain, this is associated with the expanding cultural zone of Anglo-Saxon rule, and later became the core of England. In Ireland, it seems to be the Pale, where Anglo-Norman rule was dominant for many centuries. Rapid cultural change seems to induce a state of panmixia. Genetic distinctiveness in the British Isles seems to have persisted in populations which were geographically isolated, or politically insulated, from expansive, assimilative, and integrative cultures. The modal cluster in Ireland is far smaller than in England, which nicely correlates with the much more limited impact of the Anglo-Norman ascendency of the medieval period.

November 24, 2017

Soft selection for gentleness in Puerto Rican African Honeybees

Filed under: Population genetics,Population genomics,Soft Selection,Soft Sweep — Razib Khan @ 3:07 pm


When I was a kid “killer bees” were a major pop culture thing. There were movies about the bees, and we would get updates about their march northward in the news. They were a cautionary tale of our species’ hubris.

Today we have a little bit more perspective. These bees were actually just African honeybees, the ancestral population to European honeybees, which were introduced to the New World with Europeans centuries earlier than the African honeybees. African honeybees were not that different from European honeybees, but they were more aggressive and tended to outcompete European honeybee colonies. They are a major problem for the beekeeping industry, but not a major threat to human life.

Today the African and European populations in the United States seem to have stabilized in their ranges, with a hybrid zone between them. African bee’s migratory behavior makes them less competitive with European bees in colder climates.

A friend of mine once mentioned to me that if he had to do it all over again he would do research on the evolutionary genomics of Hymenoptera, and in particular bees. People care about bees. So it ‘s no surprise that I noticed this paper out in Nature Communications, A soft selective sweep during rapid evolution of gentle behavior in an Africanized honeybee:

Highly aggressive Africanized honeybees (AHB) invaded Puerto Rico (PR) in 1994, displacing gentle European honeybees (EHB) in many locations. Gentle AHB (gAHB), unknown anywhere else in the world, subsequently evolved on the island within a few generations. Here we sequence whole genomes from gAHB and EHB populations, as well as a North American AHB population, a likely source of the founder AHB on PR. We show that gAHB retains high levels of genetic diversity after evolution of gentle behaviour, despite selection on standing variation. We observe multiple genomic loci with significant signatures of selection. Rapid evolution during colonization of novel habitats can generate major changes to characteristics such as morphological or colouration traits, usually controlled by one or more major genetic loci. Here we describe a soft selective sweep, acting at multiple loci across the genome, that occurred during, and may have mediated, the rapid evolution of a behavioural trait.

Come for the bees, but stay for the soft selection! If you talk to anyone in evolutionary and population genomics you know that the future is in understanding patterns of soft selection and polygenic selection from standing variation. Though these are related phenomena which are associated with each other, all are all distinct.

Standing variation just refers to the diversity which is segregating in the population at any given time. At any given moment many loci exhibit polymorphism. This polymorphism can be a target of natural selection if it is correlated with heritable variation and differentials in fitness. Though soft selection can be quite wooly it’s inverse, hard selection, is clear: in genetic terms hard selection can be seen in allele frequency changes at a single variant in a locus, going from the point where it is a novel mutation to nearly fixed in the population. In Haldane’s original conception hard selection involved excess deaths, and imposed a limit on the rate of evolution as well as the amount variation you could expect within a given population. This model was convenient in the pre-genomic and early genomic era because empirical selection tests had to focus on large allele frequency changes around singular loci. Researchers didn’t have large numbers of whole-genome samples available (nor the computational ability to analyze them).

Today this is not a limitation. In the analysis above the authors had 30 individuals of the 3 populations sequenced at high quality (20x). They ended up with millions of genetic variants they could analyze.

The plot to the left shows that “gentle African honeybees” (gAHB) tend to be closer to the African honeybee populations (AHB) overall (though with some hybridization with European honeybees, EHB). This is not surprising.

But the key observation was that over 12 generations the African honeybees of Puerto Rico became progressively less aggressive, despite maintaining overall morphological similarities to the mainland Mexican African bees from which they likely derive. Though buried in the discussion, there is a rationale for why this morphological change may have occurred: the Puerto Rican bees are subject to a lot of negative selection against aggression because of the density of the island, as well as the reality that aside from humans there aren’t other many species where their aggressive tendencies are beneficial. Basically, if you are an aggressive colony, it’s harder to make a go in densely settled areas (the implication here then is that there are probably “gentle” African honeybee populations across Latin America, they just are never disaggregated from the broader meta-population).

Credit: Phillip Messer and Nandita Garud

It’s the genomics where the real evolutionary insight comes in: they found that there were multiple soft sweep events around genetic regions implicated in behavior. In their overall genome the gAHB of Puerto Rico resembled mainland AHB, but in this subset of genetic loci they resembled EHB. Many of these loci had also been known to be targets of selection when the original European bee population diverged from the ancestral African population. Basically this is a genomic illustration of convergent evolution.

Regular readers of this blog will recognize the ways they detected selection. They used a modified form of EHH, which is reasonable since the selection event was recent enough to have been associated with distinct haplotype blocks. Also, standard Fst analysis showed that these were outliers in relation to the broader genetic pattern of relatedness (these loci were more like EHB than AHB, while most loci were more like AHB than EHB).

So this a form of polygenic selection. Remember, natural selection only knows genes through the phenotype (with intra-genomic selection being an exception). A behavior like aggression is probably subject to the fourth law of behavior genetics. That is, variation won’t be defined around a single genetic locus. Rather, variation across the genome will be correlated with variation in the phenotype. As selection favors a particular value of the phenotype across the distribution the allele frequencies across many genetic loci will shift, but they will not necessarily fix. Polygenic selection operates on the dispersed standing genetic variation which explains much of the variation of the phenotype in question. Instead of total sweeps to fixation due to large fitness differences between a given allele and its alternative form, the selection impact is distributed and diffused across the genome.

Though most of the genetic variants seem to recapitulate the evolution of the less aggressive phenotype that occurred with the original migration north of African honeybees, some of the selection signatures were novel. This points to the reality that when you have soft selection on standing variation you may have similar phenotypes which evolve via different means. Additionally, the authors noted that these results were in contrast to controlled breeding experiments in mammals where selection for gentility (“domestication”) often targeted a few loci and exhibited strong pleiotropic effects (due to the genetic correlation). These results point to the limitations of inferences made from human-directed selection.

Soft selection is probably ubiquitous. Consider the evolution of skin color in humans. There are lots of variants and lots of variation, and most of the variation seems to be ancestral. Only at the locus SLC24A5 do you have a perfect illustration of a hard selective sweep, probably from a de novo mutation that emerged around the Last Glacial Maximum.

From a geneticists’ perspective evolution is basically conceived of as changes in allele frequencies over time. Much of this is due to natural selection. Now that the world of soft selection is opening up, I suspect that we’ll understand a lot more of what we see around us, at least in the generality.

Citation: A soft selective sweep during rapid evolution of gentle behaviour in an Africanized honeybee.

October 22, 2017

Machine learning swallowing population genetics = understanding patterns in population genomics

Filed under: Machine Learning,Population genetics — Razib Khan @ 1:09 pm

Dan Schriber and Andy Kern have a new review preprint out, Machine Learning for Population Genetics: A New Paradigm. On Twitter there has already been a little snark to the effect of “oh, you mean regression?” That’s fair enough, and the preprint would probably benefit from a lower key title, though that’s really the sort of titles journals seem to love.

I would recommend this preprint to two large groups of my readers. There are those with strong computational skills who are curious about biology. It makes it clear why population genomics benefits from machine learning methods. Second, those who are interested or trained in genetics with less of a computational and pop gen background.

Yes, all models are wrong. But some give insight, and some are just not salvageable. In population genomics some of the model-building is obviously starting to yield really fragile results.

September 16, 2017

Carving nature at its joints more realistically

Filed under: Admixture,construct,phylogenetics,Population genetics,Structure — Razib Khan @ 10:23 pm

If you are working on phylogenetic questions on a coarse evolutionary scale (that is, “macroevolutionary,” though I know some evolutionary geneticists will shoot me the evil eye for using that word) generating a tree of relationships is quite informative and relatively straightforward, since it has a comprehensible mapping onto to what really occurred in nature. When your samples are different enough that the biological species concept works well and gene flow doesn’t occur between node, then a tree is a tree (one reason Y and mtDNA results are so easy to communicate to the general public in personal genomics).

Everything becomes more problematic when you are working on a finer phylogenetic scale (or in taxa where inter-species gene flow is common, as is often the case with plants). And I’m using problematic here in the way that denotes a genuine substantive analytic issue, as opposed to connoting something that one has moral or ethical objections to.

It is intuitively clear that there is often genetic population structure within species, but how to summarize and represent that variant is not a straightforward task.

In 2000 the paper Inference of Population Structure Using Multilocus Genotype Data in Genetics introduced the sort of model-based clustering most famously implemented with Structure. The paper illustrates limitations with the neighbor-joining tree methods which were in vogue at the time, and contrasts them with a method which defines a finite set of populations and assigns proportions of each putative group to various individuals.

The model-based methods were implemented in numerous packages over the 2000s, and today they’re pretty standard parts of the phylogenetic and population genetic toolkits. The reason for their popularity is obvious: they are quite often clear and unambiguous in their results. This may be one reason that they emerged to complement more visualization methods like PCA and MDS with fewer a priori assumptions.

But of course, crisp clarity is not always reality. Sometimes nature is fuzzy and messy. The model-based methods take inputs and will produce crisp results, even if those results are not biologically realistic. They can’t be utilized in a robotic manner without attention to the assumptions and limitations (see A tutorial on how (not) to over-interpret STRUCTURE/ADMIXTURE bar plots).

This is why it is exciting to see a new preprint which addresses many of these issues, Inferring Continuous and Discrete Population Genetic Structure Across Space*:

A classic problem in population genetics is the characterization of discrete population structure in the presence of continuous patterns of genetic differentiation. Especially when sampling is discontinuous, the use of clustering or assignment methods may incorrectly ascribe differentiation due to continuous processes (e.g., geographic isolation by distance) to discrete processes, such as geographic, ecological, or reproductive barriers between populations. This reflects a shortcoming of current methods for inferring and visualizing population structure when applied to genetic data deriving from geographically distributed populations. Here, we present a statistical framework for the simultaneous inference of continuous and discrete patterns of population structure….

The whole preprint should be read for anyone interested in phylogenomic inference, as there is extensive discussion and attention to many problems and missteps that occur when researchers attempt to analyze variation and relationships across a species’ range. Basically, the sort of thing that might be mentioned in peer review feedback, but isn’t likely to be included in any final write-ups.

As noted in the abstract the major issue being addressed here is the problem that many clustering methods do not include within their model the reality that genetic variation within a species may be present due to continuous gene flow defined by isolation by distance dynamics. This goes back to the old “clines vs. clusters” debates. Many of the model-based methods assume pulse admixtures between population clusters which are random mating. This is not a terrible assumption when you consider perhaps what occurred in the New World when Europeans came in contact with the native populations and introduced Africans. But it is not so realistic when it comes to the North European plain, which seems to have become genetically differentiated only within the last ~5,000 years, and likely seen extensive gene flow.

The figure below shows the results from the conStruct method (left), and the more traditional fastStructure (right):

There are limitations to the spatial model they use (e.g., ring species), but that’s true of any model. The key is that it’s a good first step to account for continuous gene flow, and not shoehorning all variation into pulse admixtures.

Though in beta, the R package is already available on github (easy enough to download and install). I’ll probably have more comment when I test drive it myself….

* I am friendly with the authors of this paper, so I am also aware of their long-held concerns about the limitations and/or abuses of some phylogenetic methods. These concerns are broadly shared within the field.

September 14, 2017

After agriculture, before bronze

 

The above plot shows genetic distance/variation between highland and lowland populations in Papa New Guinea (PNG). It is from a paper in Science that I have been anticipating for a few months (I talked to the first author at SMBE), A Neolithic expansion, but strong genetic structure, in the independent history of New Guinea.

What does “strong genetic structure” mean? Basically Fst is showing the proportion of genetic variation which is partitioned between groups. Intuitively it is easy to understand, in that if ~1% of the genetic variation is partitioned between groups in one case, and ~10% in another, then it is reasonable to suppose that the genetic distance between groups in the second case is larger than in the first case. On a continental scale Fst between populations is often on the order of ~0.10. That is the value for example when you pool the variation amongst Northern Europeans and Chinese, and assess how much of it can be apportioned in a manner which differentiates populations (so it’s about ~10% of the variation).

This is why ancient DNA results which reported that Mesolithic hunter-gatherers and Neolithic farmers in Central Europe who coexisted in rough proximity for thousands of years exhibited differences on the order of ~0.10 elicited surprise. These are values we are now expecting from continental-scale comparisons. Perhaps an appropriate analogy might be the coexistence of Pygmy groups and Bantu agriculturalists? Though there is some gene flow, the two populations exist in symbiosis and exhibit local ecological segregation.

In PNG continental scale Fst values are also seen among indigenous people. The differences between the peoples who live in the highlands and lowlands of PNG are equivalent to those between huge regions of Eurasia. This is not entirely surprising because there has been non-trivial gene flow into lowland populations from Austronesian groups, such as the Lapita culture. Many lowland groups even speak Austronesian languages today.

Using standard ADMIXTURE analysis the paper shows that many lowland groups have significant East Asian ancestry (red), while none of the highland groups do (some individuals with East Asian admixture seem to be due to very recent gene flow). But even within the highlands the genetic differences are striking. The  Fst values between Finns and Southern European groups such as Spaniards are very high in a European context (due to Finnish Siberian ancestry as well as drift through a bottleneck), but most comparisons within the highland groups in PNG still exceeds this.

The paper also argues that genetic differences between Papuans and the natives of Australia pre-date the rising sea levels at the beginning of the Holocene, when Sahul divided between its various constituents. This is not entirely surprising considering that the ecology of the highlands during the Pleistocene would have been considerably different from Australia to the south, resulting in sharp differences in the hunter-gatherer lifestyles. Additionally, there does not seem to have been a genetic cline. Papuans are symmetrically related to all Australian groups they had samples from.

Using coalescence-based genomic methods they inferred that separation between highlands and some lowland groups occurred ~10-20,000 years ago. That is, after the Last Glacial Maximum. For the highlands, the differences seem to date to within the last 10,000 years. The Holocene. Additionally, they see population increases in the highlands, correlating with the shift to agriculture (cultivation of taro).

None of the above is entirely surprising, though I would take the date inferences with a grain of salt. The key is to observe that large genetic differences, as well as cultural differences, accrued in the highlands of PNG during the Holocene. In the paper they have a social and cultural explanation for what’s going on:

  Fst values in PNG fall between those of hunter-gatherers and present-day populations of west Eurasia, suggesting that a transition to cultivation alone does not necessarily lead to genetic homogenization.

A key difference might be that PNG had no Bronze Age, which in west Eurasia was driven by an expansion of herders and led to massive population replacement, admixture, and cultural and linguistic change (7, 8), or Iron Age such as that linked to the expansion of Bantu-speaking
farmers in Africa (24). Such cultural events have resulted in rapid Y-chromosome lineage expansions due to increased male reproductive variance (25), but we consistently find no evidence for this in PNG (fig. S13). Thus, in PNG, wemay be seeing the genetic, linguistic, and cultural diversity that sedentary human societies can achieve in the absence of massive technology-driven expansions.

Peter Turchin in books like Ultrasociety has aruged that one of the theses in Steven Pinker’s The Better Angels of Our Nature is incorrect: that violence has not decreased monotonically, but peaked in less complex agricultural societies. PNG is clearly a case of this, as endemic warfare was a feature of highland societies when they encountered Europeans. Lawrence Keeley’s War Before Civilization: The Myth of the Peaceful Savage gives so much attention to highland PNG because it is a contemporary illustration of a Neolithic society which until recently had not developed state-level institutions.

What papers like these are showing is that cultural and anthropological dynamics strongly shape the nature of genetic variation among humans. Simple models which assume as a null hypothesis that gene flow occurs through diffusion processes across a landscape where only geographic obstacles are relevant simply do not capture enough of the dynamic. Human cultures strongly shape the nature of interactions, and therefore the genetic variation we see around us.

September 10, 2017

Quantitative genomics, adaptation, and cognitive phenotypes

The human brain utilizes about ~20% of the calories you take in per day. It’s a large and metabolically expensive organ. Because of this fact there are lots of evolutionary models which focus on the brain. In Catching Fire: How Cooking Made Us Human Richard Wrangham suggests that our need for calories to feed our brain is one reason we started to use fire to pre-digest our food. In The Mating Mind Geoffrey Miller seems to suggest that all the things our big complex brain does allows for a signaling of mutational load. And in Grooming, Gossip, and the Evolution of Language Robin Dunbar suggests that it’s social complexity which is driving our encephalization.

These are all theories. Interesting hypotheses and models. But how do we test them? A new preprint on bioRxiv is useful because it shows how cutting-edge methods from evolutionary genomics can be used to explore questions relating to cognitive neuroscience and pyschopathology, Polygenic selection underlies evolution of human brain structure and behavioral traits:

…Leveraging publicly available data of unprecedented sample size, we studied twenty-five traits (i.e., ten neuropsychiatric disorders, three personality traits, total intracranial volume, seven subcortical brain structure volume traits, and four complex traits without neuropsychiatric associations) for evidence of several different signatures of selection over a range of evolutionary time scales. Consistent with the largely polygenic architecture of neuropsychiatric traits, we found no enrichment of trait-associated single-nucleotide polymorphisms (SNPs) in regions of the genome that underwent classical selective sweeps (i.e., events which would have driven selected alleles to near fixation). However, we discovered that SNPs associated with some, but not all, behaviors and brain structure volumes are enriched in genomic regions under selection since divergence from Neanderthals ~600,000 years ago, and show further evidence for signatures of ancient and recent polygenic adaptation. Individual subcortical brain structure volumes demonstrate genome-wide evidence in support of a mosaic theory of brain evolution while total intracranial volume and height appear to share evolutionary constraints consistent with concerted evolution…our results suggest that alleles associated with neuropsychiatric, behavioral, and brain volume phenotypes have experienced both ancient and recent polygenic adaptation in human evolution, acting through neurodevelopmental and immune-mediated pathways.

The preprint takes a kitchen-sink approach, throwing a lot of methods of selection at the phenotype of interest. Also, there is always the issue of cryptical population structure generating false positive associations, but they try to address it in the preprint. I am somewhat confused by this passage though:

Paleobiological evidence indicates that the size of the human skull has expanded massively over the last 200,000 years, likely mirroring increases in brain size.

From what I know human cranial sizes leveled off in growth ~200,000 years ago, peaked ~30,000 years ago, and have declined ever since then. That being said, they find signatures of selection around genes associated with ‘intracranial volume.’

There are loads of results using different methods in the paper, but I was curious note that schizophrenia had hits for ancient and recent adaptation. A friend who is a psychologist pointed out to me that when you look within families “unaffected” siblings of schizophrenics often exhibit deviation from the norm in various ways too; so even if they are not impacted by the disease, they are somewhere along a spectrum of ‘wild type’ to schizophrenic. In any case in this paper they found recent selection for alleles ‘protective’ of schizophrenia.

There are lots of theories one could spin out of that singular result. But I’ll just leave you with the fact that when you have a quantitative trait with lots of heritable variation it seems unlikely it’s been subject to a long period of unidirecitional selection. Various forms of balancing selection seem to be at work here, and we’re only in the early stages of understanding what’s going on. Genuine comprehension will require:

– attention to population genetic theory
– large genomic data sets from a wide array of populations
– novel methods developed by population genomicists
– and funcitonal insights which neuroscientists can bring to the table

June 27, 2017

Why you should learn some population genetics

Filed under: Population genetics — Razib Khan @ 10:03 pm

From reader surveys I know a substantial portion of the people who will see this post are financially well off (of those who aren’t, a large number are students). Therefore, you can invest in some books.

Often people ask me questions related to population genetics in the comments (sometimes I get emails). That is all well and good. But it is always better to be able to fish than have to ask for fish. Additionally, learning some population and quantitative genetics allows you to develop some tacit schemas through which you can process information coming at you, and through with you can develop some general intuition.

If you have a modest level of mathematical fluency and and the disposable income, here are three indispensable books which are like the keys to the kingdom:

* Elements of Evolutionary Genetics
* Principles of Population Genetics
* Introduction to Quantitative Genetics.

If you don’t have the cash to spare, there are online notes which are pretty good:

* Graham Coop’s Population Genetics notes
* Joe Felsenstein’s Theoretical Evolutionary Genetics

There are others online resources, but they are not as comprehensive. John Gillespie’s Population Genetics: A Concise Guide is good as very gentle introductions go, but if you are going to spend money, I think just plumping down for a more comprehensive textbook (which will have more genomics in it) is better over the long run.

The goal of getting these books isn’t to make you a population geneticist, but, if you are interested in evolutionary questions it gives you a powerful toolkit. Really nothing in evolutionary process makes sense except in the light of population genetics.

April 25, 2017

Dost thou know the equilibrium at panmixia?

Filed under: Genetics,Population genetics — Razib Khan @ 3:58 pm

If you read a blog about Biblical criticism from a Christian perspective it would probably be best if you were familiar with the Bible. You don’t have to have read much scholarly commentary, rather, just the New Testament. Barring that, at least the synoptic gospels!

At this point, with over 400 individuals responding to the reader survey, it is strange to consider that more people believe they have a handle on what Fst is than the Hardy-Weinberg Equilibrium. First, Fst is a more subtle concept than people often think it is. And second, because the HWE is so easy, important, and foundational to population genetics. I mean p^2 + 2pq^2 + q^2 = 1. Could it be simpler???

So a quick ask. If you are one of the people who doesn’t understand HWE or why it is important, please get yourself a copy of John Gillespie’s Population Genetics: A Concise Guide. I understand that not everyone has the time, interest, or money for Principles of Population Genetics, or any of the more “hardcore” texts. But Population Genetics: A Concise Guide will surely suffice to follow anything on this blog.

Or, barring that, please review the online resources which you have available. Two examples:

Graham Coop’s Notes on Population Genetics or Joe Felsenstein’s unpublished textbook Theoretical Evolutionary Genetics.

Older Posts »

Powered by WordPress