# Razib KhanOne-stop-shopping for all of my content

## April 16, 2018

### What did modern humans look like during the “Out of Africa” event?

Recently I was having an email exchange with a friend (a prominent public intellectual who is not a scientist), and we were thinking about what “ancestral Africans” looked like. More precisely, the populations which were resident around ~100,000 to ~200,000 years before the present. These are the people who are depicted in paleoanthropology documentaries. Here were some of my major contentions:

1) We don’t know what they looked like
2) They probably were more likely to look like modern Africans than non-Africans
3) But modern Africans are diverse in their looks and we could expect that ancient Africans were too

The neighbor-joining tree above is generated with a naive model of successive bifurcation.

1) Khoisan split off 200,000 years ago
2) Mbuti split off 150,000 years ago
3) Mende split off 100,000 years ago
4) Japanese about 50,000 years ago
5) While Pathan and Basque only 15,000 years ago

The model is wrong in the details. Pathan and Basque have some ancestry is which recently diverged, and much that is deeply diverged. The 15,000 year value is just an average. Similarly, the Khoisan have some Eurasian ancestry. But in the broad sketch it illustrates that some African populations diverged a very long time ago from other groups.

Ancient Africans date to ~200,000 years before the present for all the modern populations. Khoisan to Japanese. You could probably use phylogenetic character reconstruction methods to attempt to infer what ancient Africans looked like…but I’m not sure that it would be useful since modern humans have spread over so many ecologies over such a short span of time.

Outside of Sub-Saharan Africa perhaps on the order of 95% of the ancestry derives from an expansion from a small founder group between 60 and 80 thousand years ago. Removing the “Basal Eurasian” component, groups as diverse as Native Americans, Oceanians and East Asians probably derive their ancestry from a common group which flourished between 50 and 60 thousand years ago (this pulse is the majority of the ancestry of Europeans and South and West Asians as well).

The point here is to illustrate that 50,000 years is definitely sufficient for a great deal of diversity to have emerged in human physical variation. And yet the Khoisan are ~200,000 years diverged from their ancestors within Africa. We actually know that indigenous southern Africans have been selected for lighter pigmentation. We also know that loci associated with pigmentation in modern humans exhibits a lot of variation in Africans, and this variation is likely an ancestral feature of our species.

In sum, the number of generations between ancestral Africans and all modern descendent populations is great enough that I’m not uncertain that we can predict what they look like in anything except their skeletal features. Additionally, most of the history of anatomically modern humans was likely highly structured within Africa. That’s another way of saying that ancient Africans themselves were probably physically diverse.

With all that being said, all things equal ancient Africans probably are more likely to look like modern Africans than modern non-Africans. The main reason is simply that modern Africans occupy the same broad ecological landscape as ancient Africans, and many of our features, from our build to our complexion seem dependent upon environmental pressures. There’s lot of evidence that very light skin is probably a derived characteristic of our species (there are consistent signatures of sweeps around pigmentation loci). And, there is also evidence that some of the archaic introgression into non-Africans may have consequences in our morphology and external physical characteristics. For example, Eurasians seem to have very high frequencies of Neanderthal variants of the keratin gene. This is implicated in hair, skin and nail development.

Addendum: Note that even if we have ancient genomes, polygenic characteristics are still hard to predict. Even today common SNPs only explain a minority of the variation in hair color in Europeans.

## February 28, 2018

### Who We Are and How We Got Here, a book worth reading

Filed under: Human Evolution,Human Genetics — Razib Khan @ 7:18 am

Yesterday I talked to a friend who has a review copy of Who We Are and How We Got Here: Ancient DNA and the New Science of the Human Past. They gave me a preview (their overall assessment was positive).

I haven’t personally asked to get a copy because, to be honest, I thought there wouldn’t be anything new in it. If you “read the supplements” what more could there be in 368 pages? So I was waiting until the end of the month to buy the book and read it in my own sweet time as due diligence.

Well, this morning I asked a publicist to send me a copy. I will be getting it next week. The reason is that I’m told the latter portions of the book are quite challenging and candid as to what genetics may tell us in the 21st century. Who We Are and How We Got Here is a 21st-century revision and update of The History and Geography of Human Genes. But it’s apparently a lot more.

Also, I make a small cameo in the book, as does Eurogenes and Dienekes. I have always appreciated how the David Reich and Nick Patterson and their whole lab has taken people outside of the halls of the academy seriously. They didn’t need to as a matter of professional necessity but often engage as a matter of decency and seriousness.

## October 15, 2017

### Another great-great-great…great-uncle in Asia

Filed under: Ancient DNA,Human Genetics,Tianyuan — Razib Khan @ 4:48 pm

The paper which surveys the relationship of the 40,000 year old Tianyuan sample is finally out in Current Biology, 40,000-Year-Old Individual from Asia Provides Insight into Early Population Structure in Eurasia. There isn’t anything too surprising here. Here is the part of the abstract that presents new finding:

…we generated genome-wide data from a 40,000-year-old individual from Tianyuan Cave, China…We find that he is more related to present-day and ancient Asians than he is to Europeans, but he shares more alleles with a 35,000-year-old European individual than he shares with other ancient Europeans, indicating that the separation between early Europeans and early Asians was not a single population split. We also find that the Tianyuan individual shares more alleles with some Native American groups in South America than with Native Americans elsewhere, providing further support for population substructure in Asia [8] and suggesting that this persisted from 40,000 years ago until the colonization of the Americas. Our study of the Tianyuan individual highlights the complex migration and subdivision of early human populations in Eurasia.

The Tianyuan sample lived about ~40,000 years ago in China, and it does not seem to have been the direct ancestor of modern East Eurasians. It also seems to have had some relationship to the Australo-Melanesian affiliated population which contributed ancestry to the indigenous peoples of South America. Additionally, it also shares ancestry above what you’d expect with a 35,000 year old Paleolithic European, the GoyetQ116-1 sample, which is found in an Aurignacian context.

There are some direct conclusions that one can infer from this paper. First, as known beforehand the divergence between East Eurasians and West Eurasians has to predate 40,000 years before the present since this sample already shares drift with East Eurasians far more than West Eurasians. In the paper, the authors give an interval of 40,000 to 80,000 years before the present, which seems advised. Remember that “Basal Eurasians” separated before the divergence of East and West Eurasians.

Second, “ghost” populations were common. There are at minimum two ancient Eurasian populations, represented by the Oase1 sample in Romania from 40,000 years ago, and the 45,000 year old Ust’-Ishim from Siberia, who were not closely related to any populations which left descendants today.

Third, the human “family tree” looks more like a human “family bramble.” One of the interesting points in this paper is that Tianyuan shares drift with Goyet, but does not share drift with El-Miron, which seems to be descended in large from a population like Goyet. The key here is to note that Goyet is the closest proxy to some of the ancestors of El-Miron, but it may not be the ancestor at all. So if Goyet-like populations were heterogeneous in relation to East Eurasian, then El-Miron may descend from a group which never mixed with East Eurasians.

This is clear when you read many of these ancient DNA papers closely. The Mal’ta boy was representative of a population which contributed to both Northern Europeans (via Eastern Hunter-Gatherers) and Amerindians, but the deeper results also indicated that the common contributor to these populations was not the Mal’ta population, but related to them. That is, there is no expectation that the sparse sampling of ancient DNA in many regions and epochs will find the ancestral populations, as opposed to groups related to the ancestral populations.

This is a looking-through-the-glass-darkly situation. The true pattern of population relationships of the past needed to be inferred from a finite set of individuals randomly drawn from those populations. If most of those populations left no descendants due to common and repeated local extinction events, then it may be that most of the time we’re going to have to triangulate to the “true” ancestral groups, who left descendants simply due to luck.

Finally, this should really put the nail in the coffin of the idea that we can think of ancient populations are algebraic recombinations of modern populations. Modern groups almost certainly sample only a small part of the distribution of ancient populations.

## October 12, 2017

### Attendance at ASHG Meetings since 1981

Filed under: American Society for Human Genetics,ASHG,Human Genetics — Razib Khan @ 6:18 pm

## October 11, 2017

### The architecture of skin color variation in Africa

Filed under: Human Genetics,Human Genome,Human Genomics,Pigmentation — Razib Khan @ 3:20 pm

Very interesting abstract at the ASHG meeting of a plenary presentation,Novel loci associated with skin pigmentation identified in African populations. This is clearly the work that one of the comments on this weblog alluded to last summer during SMBE. There I was talking about the likely introduction of the derived SLC24A5 variant to the Khoisan peoples and its positive selection in peoples in southern Africa.

Below is the abstract in full. Those who follow the literature on this see the usual suspects in relation to genes, but also new ones:

Despite the wide range of variation in skin pigmentation in Africans, little is known about its genetic basis. To investigate this question we performed a GWAS on pigmentation in 1,593 Africans from populations in Ethiopia, Tanzania, and Botswana. We identify significantly associated loci in or near SLC24A5MFSD12TMEM138…OCA2 and HERC2. Allele frequencies at these loci in global populations are strongly correlated with UV exposure. At SLC24A5 we find that a non-synonymous mutation associated with depigmentation in non-Africans was introduced into East Africa by gene flow, and subsequently rose to high frequency. At MFSD12, we identify novel variants that are strongly correlated with dark pigmentation in populations with Nilo-Saharan ancestry. Functional assays reveal that MFSD12 codes for a lysosomal protein that influences pigmentation in cultured melanocytes, zebrafish and mice. CRISPR knockouts of murine Mfsd12 display reduced pheomelanin pigmentation similar to the grizzled mouse mutant (gr/gr). Exome sequencing of gr/gr mice identified a 9 bp in-frame deletion in exon two of Mfsd12. Thus, using human GWAS data we were able to map a classic mouse pigmentation mutant. At TMEM138…we identify mutations in melanocyte-specific regulatory regions associated with expression of UV response genes. Variants associated with light pigmentation at this locus show evidence of a selective sweep in Eurasians. At OCA2 and HERC2 we identify novel variants associated with pigmentation and at OCA2, the oculocutaneous albinism II gene, we find evidence for balancing selection maintaining alleles associated with both light and dark skin pigmentation. We observe at all loci that variants associated with dark pigmentation in African populations are identical by descent in southern Asian and Australo-Melanesian populations and did not arise due to convergent evolution. Further, the alleles associated with skin pigmentation at all loci but SLC24A5 are ancient, predating the origin of modern humans. The ancestral alleles at the majority of predicted causal SNPs are associated with light skin, raising the possibility that the ancestors of modern humans could have had relatively light skin color, as is observed in the San population today. This study sheds new light on the evolutionary history of pigmentation in humans.

Much of this is not surprising. Looking at patterns of variation around pigmentation loci researchers suggested years ago that Melanesians and Africans exhibited evidence of similarity and functional constraint. That is, the dark skin alleles date back to Africa and did not deviate from their state due to selection pressures. In contrast, light skin alleles in places like eastern and western Eurasia are quite different.

This abstract also confirms something I said in a comment on the same thread, that Nilotic peoples are the ones likely to have been subject to selection for dark skin in the last 10,000 years. You see above that variants on MFSD12 are correlated with dark complexion. In particular, in Nilo-Saharan groups. The model Nyakim Gatwech is of South Sudanese nationality and has a social media account famous for spotlighting her dark skin. In comparison to the Gatwech and the San Bushman child above are so different in color that I think it would be clear these two individuals come from very distinct populations.

The fascinating element of this abstract is the finding that most of the alleles which are correlated with lighter skin are very ancient and that they are the ancestral alleles more often than the derived! We’ll have to wait until the paper comes out. My assumption is that after the presentation Science will put it on their website. But until then here are some comments:

• There is obviously a bias in the studies of pigmentation toward those which highlight European variability.
• The theory of balancing selection makes sense to me because ancient DNA is showing OCA2 “blue eye” alleles which are not ancestral in places outside of Western Europe. And in East Asia there their own variants.
• Lots of variance in pigmentation not accounted for in mixed populations (again, lots of the early genomic studies focused on populations which were highly diverged and had nearly fixed differences). Presumably, African research will pick a lot of this up.
• This also should make us skeptical of the idea that Western Europeans were necessarily very dark skinned, as now we know that human pigmentation architecture is complex enough that sampling modern populations expand our understanding a great deal.
• Finally, it’s long been assumed that at some stage early on humans were light skinned on most of their body because we had fur. When we lost our fur is when we would need to have developed dark skin. This abstract is not clear at how far long ago light and dark alleles coalesce to common ancestors.

## October 7, 2017

### The Tibeto-Burman and Austro-Asiatic ancestry of Bengalis

Filed under: Bangladeshi,Bengali,Bengali Genetics,Human Genetics,Human Genomics — Razib Khan @ 11:53 am

When I first got my father’s 23andMe results the Y and mtDNA were an interesting contrast. He, and therefore myself, carried Y lineage R1a1a, the lord of the paternal lineages. That was not that great a surprise. In the 1000 Genomes results for the Bangladeshi sample 20% of the men were direct paternal descendants of the R1a1a progenitor.

The mtDNA was a surprise. It was G1a2. This was curious to me since Bangladesh has some of the highest frequencies in the world of haplogroups M, the subhaplogroups in question being mostly restricted to South Asia. I wasn’t surprised that I was R1a1a, but I was even more confident that my maternal lineage was going to be an M, as would my father’s (my own mtDNA is U2b, not common, but not so surprising). As you can see from the map 23andMe places my father’s maternal lineage somewhere in Northeast Asia. The only information I could get about the geography was for G1a, “G1a has been found in samples from China (Daur, Hui, Kazakh, Korean, Manchu, and a sample of the general population of the city of Shenyang), Japan, Korea, Vietnam, and Siberia (Yakut).”

The biggest sample of mtDNA results from Bangladesh I could find at N = 240 does not find any G at all, let alone G1a2. So this is clearly it is a rare haplogroup in the region. But, the authors do classify 13% of the Bangladeshis as carrying an “East Eurasian” haplogroup. Haplogroup A is found among Southeast Asians and Southern China, though not among Austronesians. Haplogroup F seems to have a similar distribution, as does D, B. The other haplogroups also seem “correctly” assigned in terms of modal distribution. They are all mostly East Asian.

Looking at the Y chromosome haplogroups in the 1000 Genomes there are two of O2 and O3, and one of C3, which are clearly of Southeast Asian origin. With N =5 out of 44 samples that is ~10%. O2 is interesting because it is found at very high frequencies among the Austro-Asiatic populations in South Asia, whether it be the Khasi, or Munda groups (general O2a). O3 seems associated with Tibeto-Burman populations, and C3 with East Asia more generally.

If you know much about the ethnolinguistic of South Asia you know that the two major language families are Indo-Aryan and Dravidian. But, there are other groups. In the northwest you have various other Indo-European speaking populations, and along the northern and northeast fringe, you have Tibeto-Burman languages being spoken. But most anomolous is the distribution of Austro-Asiatic languages. The most numerous Austro-Asiatic language in the world today is Vietnamese, followed by the language of the Khmers.

But there are numerous other Austro-Asiatic languages in Southeast and South Asia. The indigenous people of the deep forests of the Malay peninsula, including the Negritos, speaking Austro-Asiatic languages. As one moves west there are Austro-Asiatic languages in Burma, such as Mon, which used to be far more common. And in India there are two groups, the language of the Khasi of the northeast, which seems to share some affinity with the Palaungic dialects of interior Burma and southern China, and the Munda languages farthest west which seem very distinct from all the other branches.

The genetics seems to suggest that the Munda tribes do have East Asian ancestry, but it is almost totally male-mediated. Their Y chromosomal lineages are very unique, with high proportions of O2a, but their mtDNA lineages are overwhelming South Asian macro-haplogroup M. The Khasi of the hills north of Bangladeshi occupy a different position, with both maternal and paternal East Asian heritage, as well as much higher genome-wide ancestry that is not South Asian. At this point, I am convinced that the Austro-Asiatic language groups came into South Asia from the east to the west.

The other language family with East Asian connections in South Asia is that the of the Tibeto-Burmans. Unlike the Austro-Asiatic group, these peoples tend to occupy only the periphery of South Asia, the far north and east.

Finally, there are historically attested Tai peoples who migrated into South Asia. The most famous of these are the Ahoms of Assam. These were part of the same migrations ~1,000 years ago that led to the shift of Thailand from being a zone dominated by Mon and Khmer Austro-Asiatic peoples, to Tai peoples. In Burma, the Tai migrations resulted in the Shan states of the uplands, though the Burman and Mon polities were able to fight off the attempts at take over.

Ultimately the Ahom became totally Indianized. Their traditional language became relegated to ritual, and they adopted the Indo-Aryan Assamese language. Additionally, at some point, they converted to orthodox Hinduism. This became so much a part of their identity that by the 17th century were checking Islamic expansion to the east by defeating the Mughals.

All of this ultimately goes back to the question: how did my father get his mtDNA? If you read my post from a few years back, How did Bengalis get East Asian?, you will know that it is probably a mix of Austro-Asiatic and Tibeto-Burman ancestry. Can we say any more at this stage?

Some Austronesian data sets have come online. So I thought I’d give it another shot. Additionally, I spent several hours removing outliers and combining populations to generate a full data set. The number of markers was 195,000 SNPs.

 Label N Notes AA 17 Munda (outliers removed) BD 74 Bangladesh, 1K BEB (outliers removed) Borneo 31 Orang Asli tribes (outliers removed) Burmese 20 Bamar ethnicity Cambodians 39 Outliers removed Dai 40 Han_C 47 Pooled Han from HGDP and 1K Han_N 28 Pooled Han from HGDP and 1K Han_S 29 Pooled Han from HGDP and 1K Japanese 28 Malay 21 Miao 10 Phil 16 Luzon and Visaya Phil_Highland 15 Igorot tribesman Luzon (outliers removed) Telugu 34 1K STU (outliers removed) Viet 18

I ran ADMIXTURE at K = 4 on the full data set.  Please to click on on the image if you want details, but the results are straightforward:

yellow = South Asian (modal in Telugu)

green = Northeast Asian (modal in Japan and northern Han)

navy = Southeast Asian/Austro-Asiatic (modal in Cambodians)

red = Austronesian (modal in Igorot tribesman from the highlands of the Philippines)

The two bottom population groups are Bangladeshis and Munda. You can see that all are mostly yellow. That is, they’re mostly South Asian. But the Munda have a much lower South Asian proportion than the Bangladeshis. This is not surprising. The Munda language and mythology is very distinct from other South Asians. Clearly, they have ancient East Asian connections, and this shows in their genome-wide ancestry.

But notice a difference between Bangladeshis and Munda: most of the Bangladeshis have a green component, which is in common among Northeast Asians, while none of the Munda do. The total fractions are 38% navy (Austro-Asiatic) for the Munda, and 7% each for navy and green (Northeast Asian) for the Bangladeshis.

The two components also exhibit a negative correlation in the Bangladeshis of -0.47. Why? My own suspicion is there is some population structure and clinal variation exists within Bangladesh. As I’ve noted before my parents are among the most East Asian of Bangaldeshis I’ve ever analyzed…and it is no surprise that we are from the east of eastern Bengal. In contrast when I’ve looked at genotypes from West Bengalis, they tend to have less East Asian ancestry, though still an appreciable amount in a broader South Asian context (in fact, even Bengali Brahmins have East Asian ancestry, though at smaller fractions).

This seems to be pretty clear rejection of the model where Bangladeshis are a two population mix of Munda tribesman, and a more conventional South Asian group.

Here are the average percentages by population:

 Group Austro-Asiatic Austronesian South Asian Northeast Asian AA 38% 0% 62% 0% BD 7% 2% 84% 7% Borneo 61% 38% 0% 0% Burmese 29% 0% 23% 48% Cambodians 73% 1% 15% 11% Dai 49% 7% 0% 44% Han_C 16% 5% 0% 79% Han_N 1% 1% 2% 96% Han_S 27% 7% 0% 66% Japanese 0% 1% 2% 97% Malay 64% 16% 13% 7% Miao 24% 3% 0% 73% Phil 34% 37% 6% 22% Phil_Highland 0% 100% 0% 0% Telugu 0% 3% 96% 0% Viet 45% 7% 0% 48%

I’m 99% sure that “South Asian” is in some of these cases a proxy for anything that’s not East Asian. But the Malay and Cambodian results are probably South Asian. And the Burmese certainly are.

Click to enlarge the PCA plot to the left, but PC1 is South Asian to East Asian, PC1 is Northeast Asian to Southeast Asian.

Both the Malays and the Burmese exhibit a “South Asia cline.” This is due to admixture. But the Burmese project toward the position of the central Han, while the Malays are shifted toward a Southeastern Asian population.

Both the Bangladeshis and Munda samples are East Asia shifted, but the Munda sample clearly skews toward the Southeast Asian populations. The Bangladeshi samples do not seem to exhibit this clear pattern.

Then I ran Treemix with blocks of 1000 SNPs and no migration edges as well as global rearrangements turned on and rooted with the Telugu.

The results are absolutely unsurprising. Unfortunately adding migration edges doesn’t really add much value with so many populations, as there is a great deal of complex population history in Southeast Asia.

Removing many of the populations and setting the migration edges to 3, you get:

The Austro-Asiatic connection between Cambodians and Munda is always clear no matter what you do. The Bangladeshis tend to have more complex relationships, but often the edges are toward the Burmese, who are a compound between South Asian, Austro-Asiatic, and Northeast Asian.

At this point I ran a “three population test.” Basically, you take an outgroup, and compare it to a clade of two other populations, and see how good the fit of the data to the model is. If there is “complex population history” you’ll get a negative f3 statistic. Complex population history means that there is almost certainly gene flow between the outgroup and one of the ingroups.

Below are results where the Bangladeshis are the outgroup, and f3 statistics are negative (sorted most negative to least).

 Ougroup Pop1 Pop2 f3 f3-error Z-score BD Telugu Miao -0.00240554 6.21107e-05 -38.7298 BD Telugu Han_S -0.00238905 5.49332e-05 -43.4901 BD Telugu Dai -0.00238103 5.73977e-05 -41.4831 BD Telugu Han_C -0.00237904 5.74148e-05 -41.4359 BD Telugu Viet -0.0023151 5.63663e-05 -41.0725 BD Telugu Han_N -0.00229979 5.55838e-05 -41.3752 BD Telugu Japanese -0.00225745 5.65642e-05 -39.9095 BD Telugu Phil_Highland -0.00225153 6.87595e-05 -32.745 BD Telugu Borneo -0.00219619 5.91978e-05 -37.0992 BD Telugu Phil -0.00209752 5.97396e-05 -35.1111 BD Telugu Cambodians -0.00198719 4.88719e-05 -40.6613 BD Telugu Malay -0.00195706 5.32466e-05 -36.7547 BD Telugu Burmese -0.00183415 4.79121e-05 -38.2816 BD AA Telugu -0.000744786 4.17995e-05 -17.818

The model where Bangladeshis are a combination of Austro-Asiatic populations and conventional South Asians is not crazy. But observe that there is a jump in the f3 statistics between that row and the previous row. Bangladeshis almost certainly have non-Austro-Asiatic ancestry, which is why the scores are more extreme for cases such as (Bangladesh(Telugu, Vietnamese)).

What I’ve established then are:

• Bangladeshi East Asian ancestry is not sufficiently explained by Munda ancestry.
• A minority of Bangladeshi Y and mtDNA lineages have East Asian connections, and this can not be explained exclusively by Munda ancestry.
• Some of these Y and mtDNA lineages seems to be of Tibeto-Burman affinity.
• Admixture analysis genome-wide indicates ancestry from non-Munda populations of East Asian origin.
• The fraction of Austro-Asiatic ancestry is balanced with more “northern” elements, while in Burma the northern element is a greater proportion than in Bangladesh.
• There is a moderate negative correlation between Austro-Asiatic ancestry and Northeast Asian ancestry in the Bangladeshi sample.
• Bangladeshis seem to have moderate signatures of gene flow from a wide range of East Asian populations.
• In contrast, the Mundas seem to have a connection most strongly with Cambodians.

A paper from several years ago looking at the patterns of genetic ancestry in the Bangladeshi population found that a single pulse of admixture around 500 AD from an East Asian population was a good fit for the origins of the variation they saw. A two-pulse model with more ancient and more recent admixture events did not improve the fit.

I assume that there is a true signal there. But the model may still be too parsimonious.

My own predictions are as follows:

• There will be a east-west cline of Tibeto-Burman ancestry.
• There will be a more constant fraction of Austro-Asiatic ancestry.
• The ratio of Austro-Asiatic ancestry will be reversed from the Tibeto-Burman cline.
• Two admixture events will eventually be detected. A strong sex-balanced pulse at 500 AD and later. And an older continuous event that will be more male skewed, as it will involve absorption of Munda substrate.
• The Padma river will turn out to be a major differentiator, with much more Tibeto-Burman ancestry to the east (Bengali dialects from east of the Padma show more Tibeto-Burman influence).

Note: a separate issue that I did not want to explore is that the South Asian ancestry of the Munda seems to show almost no Indo-Aryan influence. The Bengali population does have a small, but consistent, “Indo-Aryan” signature that you can not find in the Telugu sample. Naturally this will bias the statistics a touch.

## October 6, 2017

### Genetic variation and disease in Africa

Filed under: Africa Genetics,Africa Genomics,African Genetics,Human Genetics — Razib Khan @ 4:17 am

Very readable review, Gene Discovery for Complex Traits: Lessons from Africa. It’s open access, so I recommend it. The summary:

The genetics of African populations reveals an otherwise “missing layer” of human variation that arose between 100,000 and 5 million years ago. Both the vast number of these ancient variants and the selective pressures they survived yield insights into genes responsible for complex traits in all populations.

The main issue I might have is I’m not sure that focusing on 5 million year time spans is particularly useful. Rather, looking at the last major bottleneck for modern humans before the “Out of Africa” event would be key, since that’s when a lot of the common variation would disappear, and very rare variants probably don’t have deep time depth in any case. With all that being said, the qualitative analysis is on point.

One of the major issues in the “SNP-chip” era has been that ascertainment of variation has been skewed toward Europeans. Though more recent techniques have tried to fix this…this review points out that if you by necessity constrain the SNPs of interest to those that vary outside of Africa (most of the world’s population), you are taking may alleles private to Africa off the table. This is relevant because the “Out of Africa” bottleneck ~50,000 years ago means that African populations harbor a lot more genetic variation than non-African populations do.

The move to high-quality whole genome sequencing obviates these concerns. As a matter of course African variation will be “picked up” since the marker set is not constrained ahead of time.

Importantly the authors focus on South Africa and the Xhosa population. This group has about ~20% Khoisan genetic ancestry, which is very diverse, and, very distinct, from that of the remaining ~80% of its ancestry. With its large African immigrant population and highly diverse native groups, some of them quite admixed, South Africa could actually provide some hard-to-substitute value in biomedical genetics.

## September 28, 2017

### Khoisan may not have diverged ~300,000 years ago

Filed under: Human Evolution,Human Genetics — Razib Khan @ 10:23 pm

A few years ago I contributed to an op-ed which defended the utility of the race concept in biology in USA Today (which by the way prompted a quite patronizing email from a famous doyen of population genetics who wished to correct my ignorance; here’s a clue: “Out of Africa again & again”).

In my initial draft, I had stated that the Khoisan diverged from other human populations ~200,000 years ago. The fact-checker came back and said that this didn’t seem to be a supportable claim. The reason I gave the ~200,000 figure is that I’d button-holed people who looked at these genomes, and they were coming to the conclusion that the divergence between Khoisan and non-Khoisan was further back than we’d presupposed. And that was the number given to me.

Ultimately I compromised and allowed them to change the divergence value to 150,000 years before the present.

Today we’re in a different landscape. The above figure is from the Science paper, Southern African ancient genomes estimate modern human divergence to 350,000 to 260,000 years ago, which was earlier a biorxiv preprint (which I mentioned last spring). In concert with the North African find, the media is running with the idea that the origin of modern humans goes back very far indeed. This piece in ScienceNews is actually pretty good in my opinion at staying under control, though not all write-ups have been so measured.

So in a span of two years we’ve gone from me pushing and compromising on a value of ~150,000 years, to researchers suggesting that the Khoisan/non-Khoisan divergence is about two-fold older than that!

Well, I’m here to tell you that a prominent geneticist who is very conversant with these issues is simply incredulous about the likelihood of this particular value. I brought up this preprint to them over lunch and they just didn’t buy it. That is, they are skeptical that the amount of admixture would have skewed the earlier inferences to the magnitude that they seem to have in these results.

The authors in the paper used G-PhoCS and their own ingenious method to come to these inferences of split dates. The problem with these methods is that the inferences generated aren’t nearly as straightforward as an admixture estimate (which can be checked by something as simple as a PCA). I don’t want to get into the details, but I remember seeing models in the 2000s which inferred that East Asians and Europeans diverged ~25,000 years ago, or that there was no Neanderthal admixture in Europeans (to a high degree of confidence). Models can come out with a lot of values.

More importantly, look at the dates of divergence of non-Africans (Sardinians here) from their closest African relatives.

• 115,000 years before the present (Dinka-Sardinian) for G-PhoCS
• 76,000 years before the present for their TT-method

In light of the likelihood that the closest population to non-Africans may have been an East African population represented by Ethiopia Mota individual (along with modern Hadza), we can probably drop that estimate down a bit. But G-PhoCS in particular just gives too old an estimate. There are ways it makes sense (lots of old structure within Africa) of course. I’m just speaking in terms of possibilities. The diversification of extant modern populations seems to have occurred around ~50,000-60,000 years before the present. This aligns with the archaeology, and the ancient genomes which we have on hand.

The diversification of extant modern populations seems to have occurred around ~50,000-60,000 years before the present. This aligns with the archaeology, and the ancient genomes which we have on hand.

Of course the methods in this paper might be right. And the fossil from North Africa does add some plausibility to that. But really the whole field is somewhat unsettled now, and we should be cautious of reporting of definitive truths in the media.

## September 23, 2017

### Africa, the churning continent

Martin Meredith’s The Fortunes of Africa glosses very quickly over one of the major reasons that the “great scramble” for the continent occurred in the late 19th century, the discovery of the usefulness of quinine as an anti-malarial agent. Perhaps because I’ve read Plagues and Peoples and The Retreat of the Elephants: An Environmental History of China, I have always been conscious of the role of disease in discouraging conquest and migration (malaria in Italy was also a way to limit the extent of long-term occupation).

The coastal regions of Africa had been subject to the trade and depredations of European actors for nearly 400 years when the Berlin Conference partitioned the continent amongst European powers. Despite the fact that much of the interior was not charted, there had long been a colonial presence. Accra, the modern capital of Ghana, was originally a 16th-century Portuguese fort, but for several centuries between the 17th and 19th centuries, it was actually a possession of Scandinavian powers, Sweden and Denmark! (before passing on to the British)

For all these centuries the heart of Africa was unknown to Europeans, in part because there were native powers blocking their way, but also because the mortality rates were so high for outsiders, as indicated above. It is no surprise that the main European settlement in Africa which was more than a simple trading fort was at the southern tip of the continent, where the climate was Mediterranean and so the disease burden low.

But once quinine, and machine guns, came into the equation the interior was accessible. It all happened rather quickly in a few decades, though in some cases European ‘colonialism’ involved little more than nominal allegiance of tribal chieftains.

Now A new paper in Cell may herald the beginning of a great genomic scramble to understand the history of Africa. Carl Zimmer in The New York Times has a piece up, Clues to Africa’s Mysterious Past Found in Ancient Skeletons. It begins:

It was only two years ago that researchers found the first ancient human genome in Africa: a skeleton in a cave in Ethiopia yielded DNA that turned out to be 4,500 years old.

On Thursday, an international team of scientists reported that they had recovered far older genes from bone fragments in Malawi dating back 8,100 years. The researchers also retrieved DNA from 15 other ancient people in eastern and southern Africa, and compared the genes to those of living Africans.

The general results of the paper, Skoglund et.al’s Reconstructing Prehistoric African Population Structure, was presented at the SMBE meeting this summer. So in broad sketches I was not surprised at the results, though the details require some digging into.

The Bantu Expansion repatterned the population structure of Africa

Between 1000 BC and 500 AD the expansion of iron wielding agriculturalists from the environs of modern day southern Cameroon reshaped the cultural and genetic landscape of Sub-Saharan Africa. The relatively late date of this expansion should give us a general sense of how careful we need to be about making assertions about “prehistoric Africa.” When Egypt’s New Kingdom was expanding southward along the Nile and into the Levant Sub-Saharan Africa was qualitatively very different from what we see today in both culture and genetic structure. The continent’s contemporary human geography does not have a deep time depth.

In any case, anyone who has worked with genetic data from Africa is struck by how similar Bantu-speaking populations are genetically. So these results are not surprising. South African Zulus occupy positions far closer to Kenyans and Congolese than they do to Khoisan peoples to the west of them facing the Kalahari. The Xhosa people on the cultural frontier of the Bantus in South Africa exhibit substantial admixture from Khoisan (to the point where they have even integrated clicks into their language!), but even they are preponderantly non-Khoisan.

By sampling ancient genomes from South Africa across a geographical transect which runs up the Rift Valley to Ethiopia Skoglund et al. show that before the Bantu Expansion there was a north-south genetic relatedness cline. When this result was presented at SMBE a few friends were quite excited that they were being presented a cline, as some researchers have felt that this particular lab group has a tendency to model everything as pulse admixtures between distinct groups. But the reasonably deep time transect in Malawi exhibited no variance in admixture fractions, which is indicative of the likelihood that its “mixed” status at a particular K cluster is simply an artifact (see this post for what’s going on).

One particular aspect of the results from Malawi is that they found no continuity between contemporary populations, Bantu agriculturalists, and these ancient hunter-gatherers. That is, hunter-gatherers were replaced in toto. This is not entirely surprising, as many researchers who have worked with European ancient DNA believe that hunter-gatherers in many areas left no descendants at all (the “hunter-gatherer” fractions in modern groups in a particular region are believed to be due to migration of mixed populations who obtained “hunter-gatherer” ancestry at another locale).

But the Bantus were not the first “intrusive” population

These results also have some moderate surprises. A Tanzanian sample from 1100 BC from a pastoralist context exhibits an ancestral mix which is Sub-Saharan African and West Eurasian/North African. More precisely, about 38 percent of this individual’s ancestry resembles that of the Pre-Pottery Neolithic culture of the Levant, and the rest of the genome most resembles a 4500 year old sample from Ethiopia.

This date is before the initiation of the Bantu Expansion. The genetic results in this work, and earlier publications, strongly points to the likelihood that this population(s) mediated the spread of pastoralism to the south and west. In particular, all Khoisan groups of southern Africa seem to have admixture from this group, more (Khoi) or less (San).

But a curious aspect of this result is that these early pastoralists do not carry any evidence of admixture from ancient eastern farmers from the Zagros region. That is, the West Eurasian gene flow into the Tanzanian pastoralists predates the great exchange/admixture in the Middle East between western and eastern lineages. Since that reciprocal gene flow seems to have occurred at least 2,000 years before the Tanzanian pastoralist’s time, it suggests that this West Eurasian element was in Africa for thousands of years.

The second important point to emphasize is that the Iranian-like component is found among Cushitic speaking Somali and Afar samples, at 15-20% clips. Looking at the supporting tables a wide range of East African populations have the Tanzanian pastoralist ancestry but do not show evidence of the Iranian-like ancestry, which is now ubiquitous in the Middle East, and presumably in the highlands of Ethiopia as well (which usually show somewhat higher levels of Eurasian ancestry than is the case on the coast, especially among Semitic language speakers).

This fact is important because many of the Nilotic peoples are reputed to have absorbed Cushitic groups relatively recently in the past. This is also true for Bantu speaking groups according to these and other data. Finally, the Sandawe, who speak a language with clicks, and so may have some affinity to Khoisan, are often stated to have Cushitic affinities (looking at the data they clearly have West Eurasian ancestry). But their Eurasian ancestry seems to lack the Iranian-like component as well.

None of the populations with putative Cushitic ancestry, but who lack Iranian-like ancestry, speak a Cushitic language (most speak Nilotic languages, but East African Bantus have mixed with these Nilotic groups, so they have the same ancestry). Therefore I wonder if these pastoralists spoke an Afro-Asiatic language in the first place.

A patchy landscape

The phylogenetic tree illustrates the relationships of various African populations without much recent Eurasian ancestry. In The New York Times article David Reich indicates that the Hadza people of Tanzania are the closest Sub-Saharan Africans to the lineage ancestral to non-Africans. This is actually a simplification of what you see in the paper, and is illustrated in the tree to the left. The 4500 year old Ethiopian sample, which does not have Eurasian ancestry, nevertheless is the closest of all Sub-Saharan groups to Eurasians. The Hadza have the highest fraction of this ancestral component of all Sub-Saharan Africans in their data set, but many other populations also carry this ancestry (the Tanzanian pastoralist combined the PPN ancestry with this element).

This was patchy landscape of inhabitation, because though the Tanzanian pastoralist ancestry, a combination of PPN and proto-Ethiopian, spread all the way to the Cape, there were populations, such as the Hadza and a 400 year old individual sampled from the Kenya island of Pemba, which lacked this genetic variation. Indeed, they are also not on the north-south (proto-Ethiopian to Khoisan) cline that featured so prominently above.

The sampling of ancient individuals is not very dense yet, so we can’t say much. But I think it does indicate we need to be cautious about assumpting gene flow dynamics as-the-crow-flies, simply a function of distance. Ecological suitability no doubt plays a strong role in how populations expand. The Bantus, for example, were stopped in South Africa by the fact that their agricultural toolkit was not suitable for the western half of the country. So when Europeans arrived in the 16th century the residents of the Cape where Khoi pastoralists.

The presence of the Hadza in Tanzania, or an individual of unmixed proto-Ethiopian ancestry on Pemba 400 years ago, indicates that the ethnic geography of East Africa has long been fluid and dynamic. There is no reason to suppose that the Hadza are not themselves migrants from further north, perhaps easily explaining why they are not on the north-south cline so evident from the ancient DNA.

The rise of Basal Humans

Several years ago researchers discovered that the first farmers of Europe, who descended from an Anatolian population, were in part derived from a group which split off very early from other Eurasian populations. This group was termed “Basal Eurasian” (BEu) because it was an outgroup to all other Eurasians, including European hunter-gatherers, East Asians, Oceanians, and the natives of the New World. Subsequent work has shown that the early Neolithic farmers of the Near East, whether they’re from the Levant or the Zagros, had about half their ancestry from this population.

No ancient genomes which are predominantly BEu have been discovered yet. The fact that populations on the cusp of the Holocene seem to have Basal Eurasian ancestry across the Middle East suggests that the admixture with hunter-gatherers related to those of Europe must have occurred during the Pleistocene. But Basal Eurasian is arguably the most parsimonious explanation of the shared drift patterns that we see.

Skoglund et al. suggest that there may be the necessity of a similar construct in Africa. They are not the first, Schlebusch et al. also suggested the necessity of this lineage in the supplements of their preprint on ancient South Africans. Within Skoglund et al. the authors see variation between the far West African Mende and the eastern West African Yoruba, where the latter exhibits closer affinity to East African populations than the former (this includes those such as the proto-Ethiopian with no Eurasian admixture). Additionally, the authors found that Khoisan groups share more alleles with populations in East Africa than they do with those in West Africa even when you account for admixture.

One model that can explain this variation is long range gene flow, so that there would be connections between various regions as a function of their distance. Another explanation is that West African populations are the product of a Basal Human (BHu) population which separated first, before the bifurcation of Khoisan from other human populations. This would reorder our understanding of who the most basal humans are. Additionally, it would align with long-standing work of deep lineages within Africa contributing a minor component of the continent’s ancestry.

As should be clear due to the tree above, BHu postdates the separation of African humans from Neanderthals. One does wonder about the relevance of the Moroccan “modern” human to these models.

Understanding culture from genetics and genetics from culture

The spread of the Bantus over 1500 years from one end of the continent to the other is perhaps one of the most important dynamics we can use to understanding the spread of farming more generally. The linguistic unity of the Bantus, or at least their affinity, suggests to us that the first farmers of Europe, who spread across much of the continent in 2500 years, probably exhibited the same pattern. The low levels of gene flow between hunter-gatherers and farmers, despite living in the same regions for thousands of years, can be illustrated with African examples (e.g., the Hadza vs. their Bantu neighbors).

We are rather in the early phase of understanding these dynamics. There are more remains to be found, perhaps in the dry fastness of the Sahara or Sahel? (though unfortunately political considerations may prevent excavation due to danger to archaeologists) The genetics will give us a general idea about the nature of genetic variation and how it arose, but robust cultural models also need to be developed which illustrate how these genetic patterns arose.

Citation: Reconstructing Prehistoric African Population Structure, Skoglund, Pontus et al. Cell , Volume 171 , Issue 1 , 59 – 71.e21

## September 18, 2017

### Population structure in Neanderthals leads to genetic homogeneity

Filed under: Human Genetics,Paleoanthropology — Razib Khan @ 11:26 pm

The above tweet is in response to a article which reports on the finding past month in PNAS, Early history of Neanderthals and Denisovans. It’s open access, you should read it. I don’t think I’ve reviewed it because I haven’t dug through the supplements. To be frank this is a paper where you pretty much have to read the supplements because they’re introducing a somewhat different model here than is the norm.

I talked to Alan Rogers at SMBE about this paper. Broadly, I think there might be something to it, and it’s because of what David says above. It is simply hard to imagine that Neanderthals could be extremely successful with such low genetic diversity as we see, and spread so thin. Now, the Quanta Magazine tries to emphasize that the effective population is not the true census population, but I wish it would have explained it more clearly. Basically, the size that is relevant for breeding is obviously not going to the same as a head count. And, because effective populations are highly sensitive to bottlenecks you can get really small numbers even when the extant population at any given time may be large.

The PNAS paper makes some novel inferences, and I’ll set that to the side until I read the supplements. But I don’t think it’s crazy that population structure within Neanderthals could be leading to lower total genetic diversity.

### Release the UK Biobank! (the prediction of height edition)

Filed under: Genomic prediction,Human Genetics,Human Genomics,UK Biobank — Razib Khan @ 9:25 pm

There’s so much science coming out of the UK Biobank it’s not even funny. It’s like getting the palantír or something.

Anyway, a preprint, submitted for your approval. A vision of things to come? Accurate Genomic Prediction Of Human Height:

We construct genomic predictors for heritable and extremely complex human quantitative traits (height, heel bone density, and educational attainment) using modern methods in high dimensional statistics (i.e., machine learning). Replication tests show that these predictors capture, respectively, ~40, 20, and 9 percent of total variance for the three traits. For example, predicted heights correlate ~0.65 with actual height; actual heights of most individuals in validation samples are within a few cm of the prediction. The variance captured for height is comparable to the estimated SNP heritability from GCTA (GREML) analysis, and seems to be close to its asymptotic value (i.e., as sample size goes to infinity), suggesting that we have captured most of the heritability for the SNPs used. Thus, our results resolve the common SNP portion of the “missing heritability” problem – i.e., the gap between prediction R-squared and SNP heritability. The ~20k activated SNPs in our height predictor reveal the genetic architecture of human height, at least for common SNPs. Our primary dataset is the UK Biobank cohort, comprised of almost 500k individual genotypes with multiple phenotypes. We also use other datasets and SNPs found in earlier GWAS for out-of-sample validation of our results.

A scatter-plot is worth a thousand derivations.

You know what better than 500,000 samples? One billion samples! A nerd can dream….

## September 17, 2017

### Massive genomic sample sizes = detecting evolution in real time

Filed under: Human Genetics — Razib Khan @ 7:22 pm

The recent PLOS BIOLOGY paper, Identifying genetic variants that affect viability in large cohorts, seems to have triggered a feeding frenzy in the media. For example, Big Think has put up Researchers Find Evidence That Human Evolution Is Still Actively Happening.

I wasn’t paying close attention because of course human evolution is still happening actively. From a genetic perspective, evolution is just change in allele frequencies. Populations aren’t infinite, so even if there wasn’t any selection stochastic forces would shift allele frequencies. But of course selection is probably happening. For adaptation by natural selection to occur you need heritable variation on a trait where there are fitness differences as a function of variation within the population. It seems implausible that these conditions don’t still apply. There’s plenty of fitness variation in the population, and it’s unlikely to be random as a function of heritable variation.

But the devil is in the details. And last year Field et al. used the modern genomic tools available to detect selection occurring over the past 2,000 years. It is not credible that it would have magically stopped a few centuries ago.

So why is this new paper such a big deal? (note that it’s in PLOS BIOLOGY, not PLOS GENETICS) Because the method they use is ingenious and simple. Basically, they’re looking at changes in allele frequencies as a function of age in huge populations. It’s a little more complicated than that, they used a logistic regression to control for some of the other variables. But they found some biologically plausible hits with their data set of 50,000-150,000. And, they replicated their hits from a European sample to a non-European one.

This does bring me back to a discussion I observed a while back. An evolutionary geneticist who works with Drosophila mentioned offhand that in his field there really wasn’t that much of a need for more data. They could spend all their time to doing analysis. A prominent human geneticist whose work focused on biomedicine piped up that that wasn’t true at all for their field. There are some differences in the scientific questions, but there are also differences in terms of what you can do with humans as a model organism.

In the paper they look forward to the day of increasing sample sizes an order of magnitude beyond where it is now. At some point in the near future, large fractions of entire nations will be sequenced at medical grade level (30x coverage).

Anyway, you should read Identifying genetic variants that affect viability in large cohorts. It’s pretty straightforward.

## September 14, 2017

### After agriculture, before bronze

The above plot shows genetic distance/variation between highland and lowland populations in Papa New Guinea (PNG). It is from a paper in Science that I have been anticipating for a few months (I talked to the first author at SMBE), A Neolithic expansion, but strong genetic structure, in the independent history of New Guinea.

What does “strong genetic structure” mean? Basically Fst is showing the proportion of genetic variation which is partitioned between groups. Intuitively it is easy to understand, in that if ~1% of the genetic variation is partitioned between groups in one case, and ~10% in another, then it is reasonable to suppose that the genetic distance between groups in the second case is larger than in the first case. On a continental scale Fst between populations is often on the order of ~0.10. That is the value for example when you pool the variation amongst Northern Europeans and Chinese, and assess how much of it can be apportioned in a manner which differentiates populations (so it’s about ~10% of the variation).

This is why ancient DNA results which reported that Mesolithic hunter-gatherers and Neolithic farmers in Central Europe who coexisted in rough proximity for thousands of years exhibited differences on the order of ~0.10 elicited surprise. These are values we are now expecting from continental-scale comparisons. Perhaps an appropriate analogy might be the coexistence of Pygmy groups and Bantu agriculturalists? Though there is some gene flow, the two populations exist in symbiosis and exhibit local ecological segregation.

In PNG continental scale Fst values are also seen among indigenous people. The differences between the peoples who live in the highlands and lowlands of PNG are equivalent to those between huge regions of Eurasia. This is not entirely surprising because there has been non-trivial gene flow into lowland populations from Austronesian groups, such as the Lapita culture. Many lowland groups even speak Austronesian languages today.

Using standard ADMIXTURE analysis the paper shows that many lowland groups have significant East Asian ancestry (red), while none of the highland groups do (some individuals with East Asian admixture seem to be due to very recent gene flow). But even within the highlands the genetic differences are striking. The  Fst values between Finns and Southern European groups such as Spaniards are very high in a European context (due to Finnish Siberian ancestry as well as drift through a bottleneck), but most comparisons within the highland groups in PNG still exceeds this.

The paper also argues that genetic differences between Papuans and the natives of Australia pre-date the rising sea levels at the beginning of the Holocene, when Sahul divided between its various constituents. This is not entirely surprising considering that the ecology of the highlands during the Pleistocene would have been considerably different from Australia to the south, resulting in sharp differences in the hunter-gatherer lifestyles. Additionally, there does not seem to have been a genetic cline. Papuans are symmetrically related to all Australian groups they had samples from.

Using coalescence-based genomic methods they inferred that separation between highlands and some lowland groups occurred ~10-20,000 years ago. That is, after the Last Glacial Maximum. For the highlands, the differences seem to date to within the last 10,000 years. The Holocene. Additionally, they see population increases in the highlands, correlating with the shift to agriculture (cultivation of taro).

None of the above is entirely surprising, though I would take the date inferences with a grain of salt. The key is to observe that large genetic differences, as well as cultural differences, accrued in the highlands of PNG during the Holocene. In the paper they have a social and cultural explanation for what’s going on:

Fst values in PNG fall between those of hunter-gatherers and present-day populations of west Eurasia, suggesting that a transition to cultivation alone does not necessarily lead to genetic homogenization.

A key difference might be that PNG had no Bronze Age, which in west Eurasia was driven by an expansion of herders and led to massive population replacement, admixture, and cultural and linguistic change (7, 8), or Iron Age such as that linked to the expansion of Bantu-speaking
farmers in Africa (24). Such cultural events have resulted in rapid Y-chromosome lineage expansions due to increased male reproductive variance (25), but we consistently find no evidence for this in PNG (fig. S13). Thus, in PNG, wemay be seeing the genetic, linguistic, and cultural diversity that sedentary human societies can achieve in the absence of massive technology-driven expansions.

Peter Turchin in books like Ultrasociety has aruged that one of the theses in Steven Pinker’s The Better Angels of Our Nature is incorrect: that violence has not decreased monotonically, but peaked in less complex agricultural societies. PNG is clearly a case of this, as endemic warfare was a feature of highland societies when they encountered Europeans. Lawrence Keeley’s War Before Civilization: The Myth of the Peaceful Savage gives so much attention to highland PNG because it is a contemporary illustration of a Neolithic society which until recently had not developed state-level institutions.

What papers like these are showing is that cultural and anthropological dynamics strongly shape the nature of genetic variation among humans. Simple models which assume as a null hypothesis that gene flow occurs through diffusion processes across a landscape where only geographic obstacles are relevant simply do not capture enough of the dynamic. Human cultures strongly shape the nature of interactions, and therefore the genetic variation we see around us.

## September 11, 2017

### Inbreeding causing issues in Osama bin Laden’s family

Filed under: Human Genetics,Osama bin Laden — Razib Khan @ 5:07 pm

I didn’t figure I would have to say much about 9/11 really that others could not say (aside from perhaps you should read Marc Sageman’s Understanding Terror Networks if you want an ethnography of the Salafi jihadist movement which lead to al-Qaeda). But The Daily Best has a profile of one of Osama bin Laden’s sons:

Moreover, by this time, bin Laden already had two wives. But Najwa, the first of them, encouraged him to pursue Khairia, believing that having someone with her training permanently on hand would help her son Saad and his brothers and sisters, some of whom also suffered from developmental disorders.

Osama bin Laden had two dozen some children (approximately). But it was strange to me to see mention of several children with developmental disorders. Inbreeding is a major burden for Arab Muslim societies. And sure enough, Osama bin Laden’s first wife was his first cousin. She gave birth to around 10 children. Her father was Osama bin Laden’s mother’s brother. With the possibility of several generations of cousin marriage their relatedness may have been closer than normal half-siblings.

Note: Osama bin Laden’s father was from Yemen and his mother from Syria. So he was most certainly not inbred.

## August 29, 2017

### Why do percentage estimates of “ancestry” vary so much?

Filed under: Genetics,Human Genetics — Razib Khan @ 10:36 pm

When looking at the results in Ancestry DNA, 23andMe, and Family Tree DNA my “East Asian” percentage is:

– 19%
– 13%
– 6%

What’s going on here? In science we often make a distinction between precision and accuracy. Precision is how much your results vary when you re-run an experiment or measurement. Basically, can you reproduce your result? Accuracy refers to how close your measurement is to the true value. A measurement can be quite precise, but consistently off. Similarly, a measurement may be imprecise, but it bounces around the true value…so it is reasonably accurate if you get enough measurements just cancel out the errors (which are random).

The values above are precise. That is, if you got re-tested on a different chip, the results aren’t going to be much different. The tests are using as input variation on 100,000 to 1 million markers, so a small proportion will give different calls than in the earlier test. But that’s not going to change the end result in most instances, even though these methods often have a stochastic element.

But what about accuracy? I am not sure that old chestnuts about accuracy apply in this case, because the percentages that these services provide are summaries and distillations of the underlying variation. The model of precision and accuracy that I learned would be more applicable to the DNA SNP array which returns calls on the variants; that is, how close are the calls of the variant to the true value (last I checked these are arrays are around 99.5% accurate in terms of matching the true state).

What you see when these services pop out a percentage for a given ancestry is the outcome of a series of conscious choices that designers of these tests made keeping in mind what they wanted to get out of these tests. At a high level here’s what’s going on:

1. You have a model of human population history and dynamics with various parameters
2. You have data that that varies that you put into that model
3. You have results which come back with values which are the best fit of that data to the model you specificed

Basically you are asking the computational framework a question, and it is returning its best answer to the question posed. To ask whether the answer is accurate or not is almost not even wrong. The frameworks vary because they are constructed by humans with difference preferences and goals.

Almost, but not totally wrong. You can for example simulate populations whose histories you know, and then test the models on the data you generated. Since you already know the “truth” about the simulated data’s population structure and history, you can see how well your framework can infer what you already know from the patterns of variation in the generated data.

Going back to my results, why do my East Asian percentages vary so much? The short answer is that one of the major variables in the model alluded to above is the nature of the reference population set and the labels you give them.

Looking at Bengalis, the ethnic group I’m from, it is clear that in comparison to other South Asian populations they are East Asian shifted. That is, it seems clear I do have some East Asian ancestry. But how much?

The “simple” answer is to model my ancestry is a mix of two populations, an Indian one and an East Asian one, and then see what the values are for my ancestry across the two components. But here is where semantics becomes important: what is Indian and East Asian? Remember, these are just labels we give to groups of people who share genetic affinities. The labels aren’t “real”, the reality is in the raw read of the sequence. But humans are not capable of really getting anything from millions of raw SNPs assigned to individuals. We have to summarize and re-digest the data.

The simplest explanation for what’s going on here is that the different companies have different populations put into the boxes which are “Indian/South Asian” and “East Asian.” If you are using fundamentally different measuring sticks, then there are going to be problems with doing apples to apples comparisons.

My personal experience is that 23andMe tends to give very high percentages of South Asian ancestry for all South Asians. Because “South Asian” is a very diverse category when tests come back that someone is 95-99% South Asian…it’s not really telling you much. In contrast, some of the other services may be using a small subset of South Asians, who they define as “more typical”, and so giving lower percentages to people from Pakistan and Bengal, who have admixture from neighboring regions to the west and east respectively.*

Something similar can occur with East Asian ancestry. If the “donor” ancestral groups are South Asian and East Asian for me, then the proportions of each is going to vary by how close the donor groups selected by the company is to the true ancestral group. If, for example, Family Tree DNA chose a more Northeastern Asian population than Ancestry DNA, then my East Asian population would vary between the two services because I know my East Asian ancestry is more Southeast Asian.

The moral of the story is that the values you obtain are conditional on the choices you make, and those choices emerge from the process of reducing and distilling the raw genetic variation into a manner which is human interpretable. If the companies decided to use the same model, the would come out with the same results.

* I helped develop an earlier version of MyOrigins, and so can attest to this firsthand.

## July 25, 2017

### Ancient Europeans: isolated, always on the edge of extinction

Filed under: Europe,Human Genetics,Scandinavia — Razib Khan @ 12:19 am

A few years ago I suggested to the paleoanthropologist Chris Stringer that the first modern humans who arrived in Europe did not contribute appreciable ancestry to modern populations in the continent (appreciable as in 1% or more of the genome).* It seems I may have been right according to results from a 2016 paper, The genetic history of Ice Age Europe. The very oldest European ancient genome samples “failed to contribute appreciably to the current European gene pool.”

Why did I make this claim? Two reasons:

1) 40,000 years is a long time, and there was already substantial evidence of major population turnovers across northern Eurasia by this point. You go far enough into the future and it’s not likely that a local population leaves any descendants. So just work that logic backward.

2) There was already evidence of low population sizes and high isolation levels between groups in Pleistocene and Mesolithic/Neolithic Europe. This would again argue in favor of a high likelihood of local extinctions give enough time.

This does not only apply to just modern humans, descendants of southern, likely African, populations. Neanderthals themselves show evidence of high homogeneity, and expansions through bottlenecks over the ~600,000 years of their flourishing.

The reason that these dynamics characterized modern humans and earlier hominins in northern Eurasia is what ecologists would term an abiotic factor: the Ice Age. Obviously humans could make a go of it on the margins of the tundra (the Neanderthals seem less adept at penetrating the very coldest of terrain in comparison to their modern human successors; they likely frequented the wooded fringes, see The Humans Who Went Extinct). We have the evidence of several million years of continuous habitation by our lineage. But many of the ancient genomes from these areas, whether they be Denisovan, Neanderthal, or Mesolithic European hunter-gatherer, show indications of being characterized by very low effective population sizes. Things only change with the arrival of farming and agro-pastoralism.

For two obvious reasons we happen to have many ancient European genomes. First, many of the researchers are located in Europe, and the continent has a well developed archaeological profession which can provide well preserved samples with provenance and dates. And second, Europe is cool enough that degradation rates are going to be lower than if the climate was warmer. But if Europe, as part of northern Eurasia, is subject to peculiar exceptional demographic dynamics we need to be cautious about generalizing in terms of the inferences we make about human population genetic history. Remember that ancient Middle Eastern farmers already show evidence of having notably larger effective population sizes than European hunter-gatherers.

Two new preprints confirm the long term population dynamics typical of European hunter-gatherers, Assessing the relationship of ancient and modern populations and Genomics of Mesolithic Scandinavia reveal colonization routes and high-latitude adaptation. The first preprint is rather methods heavy, and seems more of a pathfinder toward new ways to extract more analytic juice from ancient DNA results. Those who have worked with population genomic data are probably not surprised at the emphasis on collecting numbers of individuals as opposed to single genome quality. That is, for the questions population geneticists are interested in “two samples sequenced to 0.5x coverage provide better resolution than a single sample sequenced to 2x coverage.”

I encourage readers (and “peer reviewers”) to dig into the appendix of Assessing the relationship of ancient and modern populations. I won’t pretend I have (yet). Rather, I want to highlight an interesting empirical finding when the method was applied to extant ancient genomic samples: “we found that no ancient samples represent direct ancestors of modern Europeans.”

This is not surprising. The ‘hunter-gatherer’ resurgence of the Middle Neolithic notwithstanding, Northern Europe was subject to two major population replacements, while Southern Europe was subject to one, but of a substantial nature. Recall that the Bell Beaker paper found that “spread of the Beaker Complex to Britain was mediated by migration from the continent that replaced >90% of Britain’s Neolithic gene pool within a few hundred years.” This means that less than 10% of modern Britons’ ancestry are a combination of hunter-gatherers and Neolithic farmers.

And yet if you look at various forms of model-based admixture analyses it seems as if modern Europeans have substantial dollops of hunter-gatherer ancestry (and hunter-gatherer U5 mtDNA and Y chromosomal lineage I1 and I2, associated with Pleistocene Europeans, is found at ~10% frequency in modern Europe in the aggregate; though I suspect this is a floor). What gives? Let’s look at the second preprint, which is more focused on new empirical results from ancient Scandinavian genomes, Genomics of Mesolithic Scandinavia reveal colonization routes and high-latitude adaptation. From early on in the preprint:

Based on SF12’s high-coverage and high-quality genome, we estimate the number of single nucleotide polymorphisms (SNPs) hitherto unknown (that are not recorded in dbSNP (v142)) to be c. 10,600. This is almost twice the number of unique variants (c. 6,000) per Finnish individual (Supplementary Information 3) and close to the median per European individual in the 1000 Genomes Project (23) (c. 11,400, Supplementary Information 3). At least 17% of these SNPs that are not found in modern-day individuals, were in fact common among the Mesolithic Scandinavians (seen in the low coverage data conditional on the observation in SF12), suggesting that a substantial fraction of human variation has been lost in the past 9,000 years (Supplementary Information 3). In other words, the SHGs (as well as WHGs and EHGs) have no direct descendants, or a population that show direct continuity with the Mesolithic populations (Supplementary Information 6) (13–17). Thus, many genetic variants found in Mesolithic individuals have not been carried over to modern-day groups.

The gist of the paper in terms of archaeology and demographic history is that Scandinavian hunter-gatherers were a compound population. One component of their ancestry is what we term “Western hunter-gatherers” (WHG), who descended from the late  Pleistocene Villabruna cluster (see paper mentioned earlier). Samples from Belgium, Switzerland, and Spain all belong to this cluster. The second element are “Eastern hunter-gatherers” (EHG). These samples derive from the Karelia region, to the east of modern Finland, bound by the White Sea to the north. EHG populations exhibit affinities to both WHG as well as Siberian populations who contributed ancestry to Amerindians, the “Ancestral North Eurasians” (ANE). There is a question at this point whether EHG are the product of a pulse admixture between an ANE and WHG population, or whether there was a long existent ANE-WHG east-west cline which the EHG were situated upon. That is neither here nor there (the Tartu group has a paper addressing this leaning toward isolation-by-distance from what I recall).

Explicitly testing models to the genetic data the authors conclude that there was a migration of EHG populations with a specific archaeological culture around the north fringe of Scandinavia, down the Norwegian coast. Conversely, a WHG population presumably migrated up from the south and somewhat to the east (from the Norwegian perspective).

And yet the distinctiveness of the very high quality genome as inferred from unique SNPs they have suggests to them that very little of the ancestry of modern Scandinavians (and Finns to be sure) derives from these ancient populations. Very little does not mean all. There is a lot of functional analysis in the paper and supplements which I will not discuss in this post, and one aspect is that it seems some adaptive alleles for high latitudes might persist down to the present in Nordic populations as a gift from these ancient forebears. This is no surprise, not all regions of the genome are created equal (a more extreme case is the Denisovan derived high altitude adaptation haplotype in modern Tibetans).

Nevertheless, there was a great disruption. First, the arrival of farmers whose ultimate origins were Anatolia ~6,000 years ago to the southern third of Scandinavia introduced a new element which came in force (agriculture spread over the south in a few centuries). A bit over a thousand years later the Corded Ware people, who were likely Indo-European speakers, arrived. These Indo-European speakers brought with them a substantial proportion of ancestry related to the hunter-gatherers because they descended in major fraction from the EHG (and later accrued more European hunter-gatherer ancestry from both the early farmers and likely some residual hunter-gatherer populations who switched to agro-pastoralism**).

For several years I’ve had discussions with researchers whose daily bread & butter are the ancient DNA data sets of Europe. I’ve gotten some impressions implicitly, and also from things they’ve said directly. It strikes me that the Bantu expansion may not be a bad analogy in regards to the expansion of farming in Europe (and later agro-pastoralism). Though the expanding farmers initial mixed with hunter-gatherers on the frontier, once they got a head of steam they likely replaced small hunter-gatherer groups in totality, except in areas like Scandinavia and along the maritime fringe where ecological conditions were such hunter-gatherers were at advantage (War Before Civilization seems to describe a massive farmer vs. coastal forager war on the North Sea).

But this is not the end of the story for Norden. At SMBE I saw some ancient genome analysis from Finland on a poster. Combined with ancient genomic analysis from the Baltic, along with deeper analysis of modern Finnish mtDNA, it seems likely that the expansion of Finno-Samic languages occurred on the order of ~2,000 years ago. After the initial expansion of Corded Ware agro-pastoralists.

The Sami in particular seem to have followed the same path along the northern fringe of Scandinavia that the EHG blazed. Though they herd reindeer, they were also Europe’s last indigenous hunter-gatherers. Genetically they exhibit the same minority eastern affinities in their ancestry that the Finns do, though to a greater extent. But their mtDNA harbors some distinctive lineages, which might be evidence of absorption of ancient Scandinavian substate.

I’ll leave it to someone else to explain how and why the Finns and Sami came to occupy the areas where they currently dominate (note that historically Sami were present much further south in Norway and Sweden than they are today). But note that in Latvia and Lithuania the N1c Y chromosomal lineage is very common, despite no language shift, indicating that there was a great deal of reciprocal mixing on the Baltic.

Overall the story is of both population and cultural turnover. This should not surprise when one considers that northern Eurasia is on the frontier of the human range. And perhaps it should temper the inferences we make about other areas of the world.

* You may notice that this threshold is lower than the Neanderthal admixture proportions in the non-African genome. Why is this old admixture still detectable while modern human lineages go extinct? Because it seems to have occurred with non-African humans had a very small effective population, and was mixed thoroughly. Because of the even genomic distribution this ancestry has not been lost in any of the daughter populations.

** Haplogroup I1, which descends from European late Pleistocene populations, exhibits a star phylogeny of similar time depth as R1b and R1a.

## July 17, 2017

### Castes are not just of mind

Filed under: Caste,Human Genetics,India — Razib Khan @ 8:31 pm

Before Nicholas Dirks was a controversial chancellor of UC Berkeley, he was a well regarded historian of South Asia. He wrote Castes of Mind: Colonialism and the Making of Modern India. I read it, along with other books on the topic in the middle 2000s.

Here is Amazon summary from Library Journal:

Is India’s caste system the remnant of ancient India’s social practices or the result of the historical relationship between India and British colonial rule? Dirks (history and anthropology, Columbia Univ.) elects to support the latter view. Adhering to the school of Orientalist thought promulgated by Edward Said and Bernard Cohn, Dirks argues that British colonial control of India for 200 years pivoted on its manipulation of the caste system. He hypothesizes that caste was used to organize India’s diverse social groups for the benefit of British control. His thesis embraces substantial and powerfully argued evidence. It suffers, however, from its restricted focus to mainly southern India and its near polemic and obsessive assertions. Authors with differing views on India’s ethnology suffer near-peremptory dismissal. Nevertheless, this groundbreaking work of interpretation demands a careful scholarly reading and response.

The condensation is too reductive. Dirks does not assert that caste structures (and jati) date to the British period, but the thrust of the book clearly leaves the impression that this particular identity’s formative shape on the modern landscape derives from the colonial experience. The British did not invent caste, but the modern relevance seems to date to the British period.

This is in keeping with a mode of thought flourishing today under the rubric of postcolonialism, with roots back to Edward Said’s Orientalism. As a scholar of literature Said’s historical analysis suffered from the lack of deep knowledge. A cursory reading of Orientalism picks up all sorts of errors of fact. But compared to his heirs Said was actually a paragon of analytical rigor. I say this after reading some contemporary postcolonial works, and going back and re-reading Orientalism.

To not put too fine a point on it postcolonialism is more about a rhetorical posture which aims to destroy what it perceives as Western hegemonic culture. In the process it transforms the modern West into the causal root of almost all social and cultural phenomenon, especially those that are not egalitarian. Anyone with a casual grasp of world history can see this, which basically means very few can, since so few actually care about details of fact.

Castes of Mind is an interesting book, and a denser piece of scholarship than Orientalism. Its perspective is clear, and though it is not without qualification, many people read it to mean that caste was socially constructed by the British.

This seems false. It has become quite evident that even the classical varna categories seem to correlate with genome-wide patterns of relatedness. And the Indian jatis have been endogamous for on the order of two thousand. From The New York Times, In South Asian Social Castes, a Living Lab for Genetic Disease:

The Vysya may have other medical predispositions that have yet to be characterized — as may hundreds of other subpopulations across South Asia, according to a study published in Nature Genetics on Monday. The researchers suspect that many such medical conditions are related to how these groups have stayed genetically separate while living side by side for thousands of years.

This is not really a new finding. It was clear in 2009’s Reconstructing Indian Population History. It’s more clear now in The promise of disease gene discovery in South Asia.

Unfortunately though science is not well known in any depth in the general public. The ascendency of social constructionism is such that a garbled and debased view that “caste was invented by the British” will continue to be the “smart” and fashionable view among many elites.

## July 10, 2017

### The great Bantu expansion was massive

Filed under: History,Human Genetics,Punt — Razib Khan @ 12:01 am

Lots of stuff at SMBE of interest to me. I went to the Evolution meeting last year, and it was a little thin on genetics for me. And I go to ASHG pretty much every year, but there’s a lot of medical stuff that is not to my taste. SMBE was really pretty much my style.

In any case one of the more interesting talks was given by Pontus Skoglund (soon of the Crick Institute). He had several novel African genomes to talk about, in particular from Malawi hunter-gatherers (I believe dated to 3,000 years before the present), and one from a pre-Bantu pastoralist.

At one point Skoglund presented a plot showing what looked like an isolation by distance dynamic between the ancient Ethiopian Mota genome and a modern day Khoisan sample, with the Malawi population about $\frac{2}{3}$ of the way toward the Khoisan from the Ethiopian sample. Some of my friends from a non-human genetics background were at the talk and were getting quite excited at this point, because there is a general feeling that the Reich lab emphasizes the stylized pulse admixture model a bit too much. Rather than expansion of proto-Ethiopian-like populations and proto-Khoisan-like populations they interpreted this as evidence of a continuum or cline across East Africa. I’m not sure if this is the right interpretation of the plot presented, but it’s a reasonable one.

Malawi is considerably to the north of modern Khoisan populations. This is not surprising. From what I have read Khoisan archaeological remains seem to be found as far north as Zimbabwe, while others have long suggested a presence as far afield as Kenya. Perhaps more curiously: the Malawi hunter-gatherers exhibit not evidence of having contributed genes to modern Bantu residents of Malawi.

Surprising, but not really. If you look at a PCA plot of Bantu genetic variation it really starts showing evidence of local substrate (Khoisan) in South Africa. From Cameroon to Mozambique it looks like the Bantu simply overwhelmed local populations, they are clustered so tight. Though it is true that African populations harbor a lot of diversity, that diversity is not necessarily partition between the populations. The Bantu expansion is why.

Of more interest from the perspective of non-African history is the Tanzanian pastoralist. This individual is about 38% West Eurasian, and that ancestry has the strongest affinities with Levantine Neolithic farmers. Specifically, the PPN, which dates to between 8500-5500 BCE. More precisely, this individual was exclusively “western farmer” in the Lazaridis et al. formulation. Additionally, Skoglund also told me that the Cushitic (and presumably Semitic) peoples to the north and east had some “eastern farmer.” I immediately thought back to Hogdson et al. Early Back-to-Africa Migration into the Horn of Africa, which suggested multiple layers. Finally, 2012 Pagani et al. suggested that admixture in the Ethiopian plateau occurred on the order of ~3,000 years ago.

Bringing all of this together it suggests to me two things

1. The migration back from Eurasia occurred multiple times, with an early wave arriving well before the Copper/Bronze Age east-west and west-east gene flow in the Near East (also, there was backflow to West Africa, but that’s a different post….).
2. The migration was patchy; the Mota sample dates to 4,500 years ago, and lacks any Eurasian ancestry, despite the likelihood that the first Eurasian backflow was already occurring.

Skoglund will soon have the preprint out.

## July 9, 2017

### SLC24A5 is very important, but we don’t know why

The golden of pigmentation genetics started in 2005 with SLC24A5, a putative cation exchanger, affects pigmentation in zebrafish and humans. Prior to that pigmentation genetics was really to a great extent coat color genetics, done in mice and other organisms which have a lot of pelage variation.

Of course there was work on humans, mostly related to melanocortin 1. But more interesting were classical pedigree studies which indicated that the number of loci controlling variation in pigmentation was not that high. This, it was a mildly polygenic trait insofar as some large effect quantitative trait loci could be discerned in the inheritance patterns.

From The Genetics of Human Populations, written in the 1960s, but still useful today because of its comprehensive survey of the classical period:

Depending on what study samples you use variance on a locus of SLC24A5 explains less than 10% or more than 30% of the total variance. But it is probably the biggest effect locus on the whole in human populations when you pool them altogether (obviously it explains little variance in Africans or eastern non-Africans since it is homozygous ancestral by and large in both groups).

One aspect of the derived SNP in this locus is that it seems to be under strong selection. In a European 1000 Genomes sample there are 1003 SNPs of the derived variant, and 3 of the ancestral. Curiously this allele was absent in Western European Mesolithic European hunter-gatherers, though it was present in hunter-gatherers on the northern and eastern fringes of the continent. It was also present in Caucasian hunter-gatherers and farmers from the Middle East who migrated to Europe. It seems very likely that these sorts of high frequencies are due to selection in Europe.

The variant is also present in appreciably frequencies in many South Asian populations, and there seems to have been in situ selection there too, as well as the Near East. In Ethiopia it also seems to be under selection.

It could be something due to radiation…but the Near East and South Asia are quite high intensity in that regard. As are the highlands of Ethiopia. About seven years ago I suggested that rather that UV radiation as such the depigmentation that has occurred across the Holocene might be due to agriculture and changes in diet.

But a new result from southern Africa presented at the SMBE meeting this year suggests that this can not be a comprehensive answer. Meng Lin in Brenna Henn’s lab uses a broad panel of KhoeSan populations to find that the derived allele on SLC24A5 reaches ~40% frequency. Probably a high fraction of West Eurasian admixture in these groups is around ~10% being generous. Where did this allele come from? The results from Joe Pickrell a few years back are sufficient to explain: there was a movement of pastoralists with distant West Eurasian ancestry who brought cattle to southern Africa, and so resulted in the ethnogenesis of groups such as the Nama people (there is also Y chromosomal work by Henn on this).

Lin reports that the haplotype around SLC24A5 is the same one as in Western Eurasia. Iain Mathieson (who is now at Penn if anyone is looking for something to do in grad school or a post-doc) has told me that the haplotype in the Motala Mesolithic hunter-gatherers and in the hunter-gatherers from the Caucasus are the same. It seems that this haplotype was widespread early in the Holocene. Curiously, the Motala hunter-gatherers also carry the East Asian haplotype around their derived EDAR variant.

I don’t know what to make of this. My intuition is that if a haplotype like this is so widespread nearly ~10,000 years ago recombination would have broken it apart into smaller pieces so that haplotype structure would be easier to discern. As it is that doesn’t seem to be the case.

And we also don’t know what’s going on withSLC24A5. Obviously it impacts skin color. It has been shown to do so in admixed populations. But it is hard to believe that that is the sole target of natural selection here.

## June 14, 2017

Filed under: Diet,FADS,Genetics,Human Genetics — Razib Khan @ 7:21 pm

Food is a big deal for humans. Without it we die. Unlike some animals (here’s looking at you pandas) we’re omnivorous. We eat fruit, nuts, greens, meat, fish, and even fungus. Some of us even eat things which give off signals of being dangerous or unpalatable, whether it be hot sauce or lutefisk.

This ability to eat a wide variety of items is a human talent. Those who have put their cats on vegetarian diets know this. After a million or so years of being hunters and gatherers with a presumably varied diet for thousands and thousands of years most humans at any given time ate some form of grain based gruel. Though I am sympathetic to the argument that in terms of quality of life this was a detriment to median human well being, agriculture allowed our species to extract orders of magnitude more calories from a unit of land, though there were exceptions, such as in marine environments (more on this later).

Ergo, some scholars, most prominently Peter Bellwood, have argued that farming did not spread through cultural diffusion. Rather, farmers simply reproduced at much higher rates because of the efficiency of their lifestyle in comparison to that of hunter-gatherers. The latest research, using ancient DNA, broadly confirms this hypothesis. More precisely, it seems that cultural revolutions in the Holocene have shaped most of the genetic variation we see around us.

But genetic variation is not just a matter of genealogy. That is, the pattern of relationships, ancestor to descendent, and the extent of admixtures across lineages. Selection is also another parameter in evolutionary genetics. This can even have genome-wide impacts. It seems quite possible that current levels of Neanderthal ancestry are lower than might otherwise have been the case due to selection against functional variants derived from Neanderthals, which are less fitness against a modern human genetic background.

The importance of selection has long been known and explored. Sickle-cell anemia only exists because of balancing selection. Ancient DNA has revealed that many of the salient traits we associate with a given population, e.g., lactose tolerance or blue eyes, have undergone massive changes in population wide frequency over the last 10,000 years. Some of this is due to population replacement or admixture. But some of it is due to selection after the demographic events. To give a concrete example, the frequency of variants associated with blue eyes in modern Europeans dropped rapidly with the expansion of farmers from the Near East ~10,000 years ago, but has gradually increased over time until it is the modal allele in much of Northern Europe. Lactase persistence in contrast is not an ancient characteristic which has had its ups and downs, but something new that evolved due to the cultural shock of the adoption of dairy consumption by humans as adults. The region around lactase is one of the strongest signals of natural selection in the European genome, and ancient DNA confirms that the ubiquity of the lactase persistent allele is a very recent phenomenon.

But obviously lactase is not going to be the only target of selection in the human genome. Not only can humans eat many different things, but we change our portfolio of proportions rather quickly. In a Farewell to Alms the economic historian Gregory Clark observed that English peasants ate very differently before and after the Black Death. As any ecologist knows populations are resource constrained when they are near the carrying capacity, and England during the High Medieval period there was massive population growth due to gains in productivity (e.g., the moldboard plough) as well as intensification of farming and utilization of all the marginal land.

After the Black Death (which came in waves repeatedly) there was a massive population decline across much of Europe. Because institutions and practices were optimized toward maintaining a much higher population, European peasants lived a much better lifestyle after the population crash because the pie was being cut into far fewer pieces. In other words, centuries of life on the margins just scraping by did not mean that English peasants couldn’t live large when the times allowed for it. We were somewhat pre-adapted.

Our ability to eat a variety of items, and the constant varying of the proportions and kind of elements which go into our diet, mean that sciences like nutrition are very difficult. And, it also means that attempts to construct simple stories of adaptation and functional patterns from regions of the genome implicated in diet often fail. But with better analytic technologies (whole genome sequencing, large sample sizes) and some elbow grease some scientists are starting to get a better understanding.

A group of researchers at Cornell has been taking a closer look at the FADS genes over the past few years (as well as others at CTEG). These are three nearby genes, FADS1FADS2, and FADS3 (they probably underwent duplication). These genes are involved in the metabolization of fatty acids, and dietary regime turns out to have a major impact on variation around these loci.

The most recent paper out of the Cornell group, Dietary adaptation of FADS genes in Europe varied across time and geography:

Fatty acid desaturase (FADS) genes encode rate-limiting enzymes for the biosynthesis of omega-6 and omega-3 long-chain polyunsaturated fatty acids (LCPUFAs). This biosynthesis is essential for individuals subsisting on LCPUFA-poor diets (for example, plant-based). Positive selection on FADS genes has been reported in multiple populations, but its cause and pattern in Europeans remain unknown. Here we demonstrate, using ancient and modern DNA, that positive selection acted on the same FADS variants both before and after the advent of farming in Europe, but on opposite (that is, alternative) alleles. Recent selection in farmers also varied geographically, with the strongest signal in southern Europe. These varying selection patterns concur with anthropological evidence of varying diets, and with the association of farming-adaptive alleles with higher FADS1 expression and thus enhanced LCPUFA biosynthesis. Genome-wide association studies reveal that farming-adaptive alleles not only increase LCPUFAs, but also affect other lipid levels and protect against several inflammatory diseases.

The paper itself can be difficult to follow because they’re juggling many things in the air. First, they’re not just looking at variants (e.g., SNPs, indels, etc.), but also the haplotypes that the variants are embedded in. That is, the sequence of markers which define an association of variants which indicate descent from common genealogical ancestors. Because recombination can break apart associations one has to engage with care in historical reconstruction of the arc of selection due to a causal variant embedded in different haplotypes.

But the great thing about this paper is that in the case of Europe they can access ancient DNA. So they perform inferences utilizing whole genomes from many extant human populations, but also inspect change in allele frequency trajectories over time because of the density of the temporal transect. The figure to the left shows variants in both an empirical and modeling framework, and how they change in frequency over time.

In short, variants associated with higher LCPUFA synthesis actually decreased over time in Pleistocene Europe. This is similar to the dynamic you see in the Greenland Inuit. With the arrival of farmers the dynamic changes. Some of this is due to admixture/replacement, but some of it can not be accounted for admixture and replacement. In other words, there was selection for the variants which synthesize more LCPUFA.

This is not just limited to Europe. The authors refer to other publications which show that the frequency of alleles associated with LCPUFA production are high in places like South Asia, notable for a culture of preference for plant-based diets, as well as enforced by the reality that animal protein was in very short supply. In Europe they can look at ancient DNA because we have it, but the lesson here is probably general: alternative allelic variants are being whipsawed in frequency by protean shifts in human cultural modes of production.

In War Before Civilization Lawrence Keeley observed that after the arrival of agriculture in Northern Europe in a broad zone to the northwest of the continent, facing the Atlantic and North Sea, farming halted rather abruptly for centuries. Keeley then recounts evidence of organized conflict in between two populations across a “no man’s land.”

But why didn’t the farmers just roll over the old populations as they had elsewhere? Probably because they couldn’t. It is well known that marine regions can often support very high densities of humans engaged in a gathering lifestyle. Though not farmers, these peoples are often also not nomadic, and occupy areas as high density. The tribes of the Pacific Northwest, dependent upon salmon fisheries, are classic examples. Even today much of the Northern European maritime fringe relies on the sea. High density means they had enough numbers to resist the human wave of advance of farmers. At least for a time.

Just as cultural forms wane and wax, so do some of the underlying genetic variants. If you dig into the guts of this paper you see much of the variation dates to the out of Africa period. There were no great sweeps which expunged all variation (at least in general). Rather, just as our omnivorous tastes are protean and changeable, so the genetic variation changes over time and space in a difficult to reduce manner. The flux of lifestyle change is probably usually faster than biological evolution can respond, so variation reducing optimization can never complete its work.

The modern age of the study of natural selection in the human genome began around when A Map of Recent Positive Selection In the Human Genome was published. And it continues with methods like SDS, which indicate that selection operates to this day. Not a great surprise, but solidifying our intuitions. In the supplements to the above paper the authors indicate that the focal alleles that they are interrogating exhibit coefficients of selection around ~0.5% or so. This is rather appreciable. The fact that fixation has not occurred indicates in part that selection has reversed or halted, as they noted. But another aspect is that there are correlated responses; the FADS genes are implicated in many things, as the authors note in relation to inflammatory diseases. But I’m not sure that the selection effects of these are really large in any case. I bet there are more important things going on that we haven’t discovered or understood.

Obviously genome-wide analyses are going to continue for the foreseeable future. Ten years ago my late friend Mike McKweon predicted that at some point genomics was going to have be complemented by detailed follow up through bench-work. I’m not sure if we’re there yet, but there are only so many populations you can sequence, and only to a particular coverage to obtain any more information. Some selection sweeps will be simple stories with simple insights. But I suspect many more like FADS will be more complex, with the threads of the broader explanatory tapestry assembled publications by publication over time.

Citation: Ye, K., Gao, F., Wang, D., Bar-Yosef, O. & Keinan, A. Dietary adaptation of FADS genes in Europe varied across time and geography. Nat. Ecol. Evol. 1, 0167 (2017).

Older Posts »