December 1, 2012

Africa’s hidden people hold the keys to the past

I mentioned this in passing on my post on ASHG 2012, but it seems useful to make explicit. For the past few years there has been word of research pointing to connections between the Khoisan and the Cushitic people of Ethiopia. To a great extent in the paper which is forthcoming there is the likely answer to the question of who lived in East Africa before the Bantu, and before the most recent back-migration of West Eurasians. On one level I’m confused as to why this has to be something of a mystery, because the most recent genetic evidence suggests a admixture on the order of 2-3,000 years before the past.* If the admixture was so recent we should find many of the “first people,” no? As it is, we don’t. I think these groups, and perhaps the Sandawe, are the closest we’ll get.

Publication is imminent at this point (of this, I was assured), so I’m going to just state the likely candidate population (or at least one of them): the Sanye, who speak a Cushitic language with possible Khoisan influences. There really isn’t that much information on these people, which is why when I first heard about the preliminary results a few years back and looked around for Khoisan-like populations in Kenya I wasn’t sure I’d hit upon the right group. But at ASHG I saw some STRUCTURE plots with the correct populations, and the Sanye were one of them. I would have liked to see something like TreeMix, but the STRUCTURE results were of a quality that I could accept that these populations were not being well modeled by the variation which dominated their data set. Though Cushitic in language the Sanye had far less of the West Eurasian element present among other Cushitic speaking populations of the Horn of Africa. Neither were their African ancestral components quite like that of the Nilotic or Bantu populations. The clustering algorithm was having a “hard time” making sense of them (it seemed to wanted to model them as linear combinations of more familiar groups, but was doing a bad job of it).

Here is an interesting article on these groups: Little known tribe that census forgot. Like the Sandawe this is a population which seems to have been hunter-gatherers very recently, and to some extent still engage in this lifestyle. In this way I think they are fundamentally different from Indian tribal populations, who are often held up to be the “first people” of the subcontinent.  More and more it seems that the tribes of India are less the descendants of the original inhabitants of the subcontinent, at least when compared to the typical Indian peasant, and more simply those segments of the Indian population which were marginalized and pushed into less productive territory. Over time they naturally diverged culturally because of their isolation, but the difference was not primal. In contrast, groups like the Sanye and Sandawe may have mixed to a great extent with their neighbors (and lost their language like the Pygmies), but evidence of full featured hunting & gathering lifestyles implies a sort of direct cultural continuity with the landscape of eastern Africa before the arrival of farmers and pastoralists from the west and north.

* I understand some readers refuse to accept the likelihood of these results because of other lines of information. I am just relaying the results of the geneticists. I am not interested in re-litigating prior discussions on this. We’ll probably have a resolution soon enough.

July 9, 2012

The privileges and burdens of a novel inversion

There’s two papers in Nature Genetics on the 17q21.31, and variation of haplotypes of inversions in world wide populations. Here’s a part of the discussion from the first paper:

In conclusion, we propose that the ancestral H2′ haplotype arose in eastern or central Africa and spread to southern Africa before the emergence of anatomically modern humans…Approximately 2.3 million years ago, the inversion rearranged to what we now refer as the direct orientation haplotype (H1′). This haplotype spread throughout the Homo ancestral populations in the African continent, virtually replacing the H2′ haplotype and becoming the predominant haplotype. We note that both the Denisova and Neandertal sister groups are predicted to have H1′ haplotypes…These early haplotypes were much simpler in their duplication architecture, similar to the patterns seen in great apes. We find that the more complex duplication architectures are particularly enriched in populations that migrated out of Africa. On the basis of sequence at the duplication loci, we estimate that the H2-specific duplication event occurred approximately 1.3 million years ago. Independent of the H2 duplication, the H1-specific duplication event occurred much more recently, approximately 250,000 years ago. Notably, we did not observe this haplotype in any of the African or Asian ...

January 16, 2012

The Fulani have an old “Berber” (?) element

After the second Henn et al. paper I did download the data. Unfortunately there are only 62,000 SNPs intersecting with the HGDP. This is somewhat marginal for fine-grained ADMIXTURE analyses, though sufficient for PCA from what I recall. That being said, the intersection with the HapMap data sets runs from ~190,000 SNPs, to the full 250,000 SNPs (this makes sense since the Henn et al. #2 data set has some HapMap populations in it). So I’ve been experimenting a fair amount in the past few days, and I thought I would post on one issue which was clear in the original paper, but which I have replicated.

The Fulani (Fula) people of the western Sahel seem to have a relatively old West Eurasian component which has distinct affinities with the “Maghrebi” element discerned by Henn et al. In fact, the non-Sub-Saharan African ancestry of the Fulani is almost exclusively of this origin. To me this serves as a peculiar mirror of what you see in the Cushitic and Ethiopian Semitic peoples of the far east of the Sahel-Sudan latitudinal region. These populations also seem to be compounds of a Sub-Saharan Africa element with a West Eurasian one, but in their case the admixture is almost exclusively from a Southwest Eurasian (Arabian) component. Geographically these two symmetric admixture events make sense, but the exclusivity is still a bit surprising. Additionally, in both the case of the Fulani and the Ethiopian and Cushitic groups the admixture is widely distributed and even enough to imply that they are old events. I also assumed this because in some admixture runs a “pure” Fulani cluster partitions out, which is not unexpected for stabilized hybrid populations (all human populations are stabilized hybrids if you go back far enough).

To give you a flavor of what I’m talking about here are some screen shots of a run which is currently going. It has 180,000 markers. I removed Tunisians and many African populations from the Henn et al. data set, and included in the Utah whites from the HapMap. The individual plots show the ancestral proportions for each Fulani in the data set:

So what can we see here? First, let’s reiterate something: as in the case of the populations of the Horn of Africa the West Eurasian element in the Fulani is difficult to find in “pure” form in the populations from which it putatively derived. What does that imply? I think that that means that the Fulani have an origin in relatively recent historic time, on the order of 2,000, not 10,000, years. That is because I am skeptical that the Fulani would be able to maintain genetic distinctiveness for ~10,000 years from other populations around them. In contrast, the last 2,000 years have seen the rise of various cultural institutions, from trans-Saharan nomadism to Islam, which might slow down admixture sufficiently to maintain the differences between the Fulani and their neighbors. It also implies to me that the non-Maghrebi “Near Eastern” element which Henn et al. discerned is relatively a recent phenomenon in northwest Africa, else the Fulani should also carry it. How recent? Probably from Classical Antiquity down to the Muslim period. Observe that many North Africa groups have a red “European” element. This may be from Near Eastern populations, but I suspect that the fraction here is just too high to be explained by that. Also, you can see above that some groups in Morocco have nearly as much of this as Egyptians, but far less of the more genuine Near Eastern components.

In all likelihood the West Eurasian component came to the Fulani via the Tuareg or a related or antecedent population. So if you typed the Tuareg you would probably get a better sense of the “pure” “Maghrebi” genetic profile. These genetic results also can serve as fodder to understanding the ethnogenesis of the landscape of the Sahel. In the map above it is interesting to observe that the Hausa speak an Afro-Asiatic language, even though their West Eurasian component is far lower than the Fulani, who speak Niger-Congo dialects. What gives? I suspect that the difference here is that the Hausa are a case of elite emulation of a cultural complex which was much more integrated and elaborated by the time it arrived on the West African scene. This explains how there could be language shift, while in the case of the Fulani there was none. Another hypothesis is that Afro-Asiatic derives from Sub-Saharan Africa itself, and the Chadic (Hausa) group are basal to the phylogeny. I’ll let readers explore the implications of that. A final aspect, I put the quotations in the title because perhaps the Berber dialects spread via elite emulation, and the original Maghrebi ancestors of the Fulani spoke a different language, which has been lost? As they say, for every answer there bloom a thousand questions….

Image credit: Wikipedia, Wikipedia.

January 13, 2012

Between the desert and the sea

Zinedine Zidane, a Kabyle

There is a new paper in PLoS Genetics out which purports to characterize the ancestry of the populations of northern Africa in greater detail. This is important. The HGDP data set does have a North African population, the Mozabites, but it’s not ideal to represent hundreds of millions of people with just one group. The first author on this new paper is Brenna Henn, who was also first author on another paper with a diverse African data set. Importantly the data was posted online. Unfortunately though most of the populations didn’t have too many markers. This isn’t an issue in an of itself, but it becomes a big deal when trying to combine it with other data sets. If you limit the markers to those which intersect across two data sets you start to thin them down a lot, to the point where they’re not useful. Though the the results of the paper are worth talking about, the authors claim that they’ll be putting the data online. This is important because they used a large number of markers, so the intersections will be nice (I can, for example, envisage exploring the relationship between the North Africans and the IBS Iberian sample in the near future).

As for the paper itself, Genomic Ancestry of North Africans Supports Back-to-Africa Migrations:

Proposed migrations between North Africa and neighboring regions have included Paleolithic gene flow from the Near East, an Arabic migration across the whole of North Africa 1,400 years ago (ya), and trans-Saharan transport of slaves from sub-Saharan Africa. Historical records, archaeology, and mitochondrial and Y-chromosome DNA have been marshaled in support of one theory or another, but there is little consensus regarding the overall genetic background of North African populations or their origin and expansion. We characterize the patterns of genetic variation in North Africa using ~730,000 single nucleotide polymorphisms from across the genome for seven populations. We observe two distinct, opposite gradients of ancestry: an east-to-west increase in likely autochthonous North African ancestry and an east-to-west decrease in likely Near Eastern Arabic ancestry. The indigenous North African ancestry may have been more common in Berber populations and appears most closely related to populations outside of Africa, but divergence between Maghrebi peoples and Near Eastern/Europeans likely precedes the Holocene (>12,000 ya). We also find significant signatures of sub-Saharan African ancestry that vary substantially among populations. These sub-Saharan ancestries appear to be a recent introduction into North African populations, dating to about 1,200 years ago in southern Morocco and about 750 years ago into Egypt, possibly reflecting the patterns of the trans-Saharan slave trade that occurred during this period.

The model outline here is straightforward:

- A population of West Eurasian provenance migrated across the fringe of the southern Mediterranean >10,000 years B.P. (Maghrebi)

- This was later overlain by a later West Asian migration (Near Eastern)

- A third major element here seems to be Sub-Saharan African admixture, which these authors claim is rather new (post-Roman)

Two of the methods used will be familiar to readers of this weblog. They used ADMIXTURE to generate barplots which fractionate putative ancestral components given K number of components. Second, they also use PCA to visualize the largest components genetic variation within the samples on a plane.

As you “move up” the K’s you note that Maghrebi populations “split” from the Near Eastern reference, the Qataris. This is supported by the PCA, which shows that there is a dimension of variation which separates Near Easterners & Europeans from Maghrebis. The authors note that this dimension is orthogonal to the Sub-Saharan African vs. Eurasian component. That suggests that the putative Maghrebi component is likely to be part of the set of “Out of Africa” populations, rather than an African population which simply experienced continuous gene flow with West Eurasians.

They also estimate a Fst, a statistic which partitions genetic variation within and between groups. The value between Sub-Saharan Africans and Europeans is ~0.15 using HGDP SNP data, and between Europeans and East Asians ~0.10.  Using the Tuscans and Qataris as European and West Asian references against the North African populations along their east-west cline they estimate Fsts from ~0.03 to ~0.06. The higher end values are from populations which are less admixed with Near Eastern elements, and the colored polygons illustrate the domain generated by ADMIXTURE Fsts across inferred ancestral components. You also see in the chart estimated time of divergence. I won’t get into the assumptions in the model, but the authors do note that ~12,000 years B.P. seems to be the low bound estimate for when the Maghbrebis diverged from other West Eurasians. This is important, because it predates agriculture.

The final set of methods outlined in this paper looked at ancestry on a more fine-grained genomic scale. To the left you see a plot where each horizontal bar represents an individual’s chromosome 1 (among a set of North Africans). Each color in that bar indicates a component of ancestry (except the black, which are centromeres). This sort of information is important, because saying someone is 50% X and 50% Y summarizes information to the point of eliding it. An individual who is a first generation product of a Chinese-European marriage is going to have the same ancestral proportions as someone who is a Uyghur for those respective populations. But a fine-scale mapping of the genomic ancestry would look very different, because the history of the admixture is very different.

There are many inferences in the paper which I won’t address. Rather, let me focus on this one assertion:

After accounting for putative recent admixture (Figure 1), the indigenous Maghrebi component (k-based) is estimated to have diverged from Near Eastern/Europeans between 18–38 Kya (Figure 3), under a range of Ne and k values. We hence suggest that the ancestral Maghrebi population separated from Near Eastern/Europeans prior to the Holocene, and that the Maghrebi populations do not represent a large-scale demic diffusion of agropastoralists from the Near East.

This is not implausible on the face of it. The component of ancestry modal in the Mozabite HGDP sample tends to have a relatively high Fst in relation to other West Eurasian groups. I had wondered if this was due to ancient Sub-Saharan African admixture which had produced a particular stabilized hybrid, but these results indicate that the component is no closer than other West Eurasians. What I’m confused and skeptical about are the range of divergence times which different papers are producing which seem somewhat implausible taken together.

There are papers which posit that East Asians separated from Europeans ~25,000 years B.P. This is in the same range as the divergence between Maghrebis and West Eurasians, but the Maghrebi genetic distance (Fst) is about 1/2 as great. Also, these sets of results which generate a “bunching” together of the separation of many extant non-African lineages in the 20-40,000 year range imply very rapid differentiation after the “Out of Africa” event, if that event did occur ~50,000 years ago (at least for most Eurasians, even assuming a revised model whereby Australian Aboriginals derive from an earlier wave). One at a time any given divergence estimate may be broadly plausible, but the literature is just not particularly coherent on this matter, and it often seems archaeologically implausible.

Citation: Henn BM , Botigué LR , Gravel S , Wang W , Brisbin A , et al. 2012 Genomic Ancestry of North Africans Supports Back-to-Africa Migrations. PLoS Genet 8(1): e1002397. doi:10.1371/journal.pgen.1002397

Image Credit: Raphaël Labbé

August 30, 2011

Tutsi genetics, ii

In my post below, Tutsi probably differ genetically from the Hutu, there were many comments. Some I did not post because they were rude, though they did ask valid questions. I will address those issues, but let me quote one comment:

That’s an interesting possibility, but this admixture run didn’t split the non-hunter-gatherer Africans that well. In one of your previous analyses on East Africa you managed to get a pretty accurate ‘Afro-Asiatic/Cushitic’ and ‘Nilotic’ cluster. Is it possible that you could run this Tutsi sample using the same admixture settings as in the ‘Flavors of Afro-Asiatic’ blog post to see if he carries a significant Nilotic component or is mainly Bantu & Cushitic derived?

So I replicated ADMIXTURE runs for many of the same populations as I did in my post, Flavors of Afro-Asiatic. I also pared down the population set and generated a PCA with EIGENSOFT. Before I get to those results, let me tackle the questions.

1) “Are the Luhya suitable proxies for the Hutus?”

Probably. The reason is that Bantu-speaking populations, from the Congo to South Africa, are surprisingly similar. Not only that, but these populations are very distinctive from groups which are close them ...

June 15, 2011

The Cape Coloureds are a mix of everything

A Cape Coloured family

I’ve mentioned the Cape Coloureds of South Africa on this weblog before. Culturally they’re Afrikaans in language and Dutch Reformed in religion (the possibly related Cape Malay group is Muslim, though also Afrikaans speaking traditionally). But racially they’re a very diverse lot. In this way they can be analogized to black Americans, who are about ~75% West African and ~25% Northern European, with the variance in ancestral proportions being such that ~10% are ~50% or more European in ancestry. The Cape Coloureds though are much more complex. Some of their ancestry is almost certainly Bantu African. This element is related to the West African affinities of black Americans. And, they have a Northern European element, which likely came in via the Dutch, German, and Huguenot settlers (mostly males). But the Cape Coloureds also have other contributions to their genetic heritage. Firstly, they have Khoisan ancestry, whether from Bushmen or Khoi. This is well known in their oral memory. The the hinterlands of the Cape of Good Hope are beyond the ecological range of the Bantu agricultural toolkit, so the region was still dominated ...

May 9, 2011

Pygmies are short because nature made them so

Aka Pygmies

The Pith: There has been a long running argument whether Pygmies in Africa are short due to “nurture” or “nature.” It turns out that non-Pygmies with more Pygmy ancestry are shorter and Pygmies with more non-Pygmy ancestry are taller. That points to nature.

In terms of how one conceptualizes the relationship of variation in genes to variation in a trait one can frame it as a spectrum with two extremes. One the one hand you have monogenic traits where the variation is controlled by differences on just one locus. Many recessively expressed diseases fit this patter (e.g., cystic fibrosis). Because you have one gene with only a few variants of note it is easy to capture in one’s mind’s eye the pattern of Mendelian inheritance for these traits in a gestalt fashion. Monogenic traits are highly amenable to a priori logic because their atomic units are so simple and tractable. At the other extreme you have quantitative polygenic traits, where the variation of the trait is controlled by variation on many, many, genes. This may seem a simple ...

April 9, 2011

The Sandawe: after the demographic flood

The Sandawe: after the demographic flood

Over the past few days I’ve been trying to read a bit on the Sandawe. Most of the stuff I’ve been able to find is in the domain of linguistics, and is basically unintelligible to me in any substantive manner. The crux of the curiosity here is that the Sandawe, like their Hadza neighbors, have clicks in their language, and so have been classified with the Khoisan. Here’s some background:

The most promising candidate as a relative of Sandawe are the Khoe languages of Botswana and Namibia. Most of the putative cognates Greenberg (1976) gives as evidence for Sandawe being a Khoesan language in fact tie Sandawe to Khoe. Recently Gueldemann and Elderkin have strengthened that connection, with several dozen likely cognates, while casting doubts on other Khoisan connections. Although there are not enough similarities to reconstruct a Proto-Khoe-Sandawe language, there are enough to suggest that the connection is real.

I can’t speak to the validity of this at all, obviously. Some scholars do argue that the clicks in the Sandawe language were only acquired through interaction with peoples such as the Hadza, making an analogy to Xhosa, a Bantu language which has been strongly influenced by Khoi dialects. ...

April 6, 2011

The men of Africa

Khoikhoi on the move….

Dienekes mentioned today a new paper, Signatures of the pre-agricultural peopling processes in sub-Saharan Africa as revealed by the phylogeography of early Y chromosome lineages. Because of the recent comments in this space on the genetic history of Africa I was curious, but after reading it I have to say I can’t make much sense of the alphabet soup of haplogroups. Remember, there are different ways to capture and analyze the variation in one’s genes. A common activity is to sweep over the whole genome and focus on single nucleotide polymorphisms, variation at the base pair level. So my own analyses using ADMIXTURE focus on tens or hundreds of thousands of such markers. But there are other types of genomic variation, such as copy number, microsatellites, and minsatellites.

Additionally, much of the older human phylogeographic literature focused on mtDNA and Y chromosomal variance. For mtDNA it was partly a function of how easy it was to extract the genetic material (it’s copious on the cellular level). But perhaps more importantly these two types of variance aren’t subject to recombination. This means ...

March 8, 2011

Where in the world did anatomically modern humans come from?

ResearchBlogging.orgThe Pith: I review a recent paper which argues for a southern African origin of modern humanity. I argue that the statistical inference shouldn’t be trusted as the final word. This paper reinforces previously known facts, but does not add much that both novel and robust.

I have now read the paper which I expressed a touch of skepticism toward yesterday. Do note, I did not dispute the validity of their results. They seem eminently plausible. I was simply skeptical that we could, with any level of robustness, claim that anatomically modern humans arose in southern vs. eastern, or western, Africa. If I had to bet, my rank order would be southern ~ eastern > western. But my confidence in my assessment is very low.

First things first. You should read the whole paper, since someone paid for it to be open access. Second, much props to whoever decided to put their original SNP data online. I’ve already pulled it down, and sent off emails to Zack, David, and Dienekes. There are some northern African populations which allow us to expand beyond the Mozabites, though unfortunately there are only 55,000 ...

January 10, 2011

The genetic affinities of Ethiopians

In the open thread someone asked: “Any recent stuff on the genetics of Ethiopians.” That prompted me to look around, because I’m curious too. Poking around Wikipedia I couldn’t find anything recent. A lot of the studies are older uniparental lineage based works (NRY and mtDNA). Ethiopia is interesting because unlike almost all other Sub-Saharan African nations it has a long written history. Culturally and linguistically it has both Sub-Saharan African, and non-Sub-Saharan African, affinities. The languages of highland Ethiopia are clearly Semitic. Those of lowland Ethiopia are Cushitic, a branch of the broader Afro-Asiatic language family concentrated around the Horn of Africa (Somali is a Cushitic language, though most Ethiopian nationals who speak a Cushitic dialect are of the Oromo group).

From a human evolutionary genetic perspective, Ethiopia also has specific interest. It is likely that the main recent pulse of humans Out of Africa traversed this region. Additionally, there is some evidence of deep time connections between the groups ancestral to Ethiopians and the Khoisan of southern Africa. It may be that Ethiopians and Khoisan are reservoirs of ancient genetic variation in Sub-Saharan Africa which as been overlain by Bantu in most other regions outside of West Africa. Finally, Ethiopians are known to have high altitude adaptations. This could be due to long term residence in the region, or, assimilation of favorable alleles from the long term residents by later populations.

Fortunately we can get a sense of the genetic affinities of Ethiopians thanks to a paper published last spring, The genome-wide structure of the Jewish people. The focus was clearly on Jews, but they surveyed Amhara & Tigray (Semitic speaking highlanders), Ethiopian Jews (similar ethnically to the Amhara & Tigray, but religiously non-Christian), and Oromo. In the PCA the Oromo and Semitic speaking populations are pretty obviously distinct clusters.

This just means that when you take worldwide genetic variation, and pull out the biggest independent dimensions, and then visualize individuals on the two largest dimensions in terms of how they explain variance, the Oromo and other Ethiopians don’t really intersect. Interestingly the Amhara and Tigray are almost indistinguishable, but the Ethiopian Jews are in their own cluster. There are, for the record, 7 Oromo, 7 Amhara, 5 Tigray, and 13 Ethiopian Jews in the sample.

Now let’s look at the genetic variation in ADMIXTURE. Remember this assigns the genomes of individuals in proportions to K ancestral units. As an example, if you had African Americans, Yoruba, and White Americans, in a total pool, and did K = 2, you might have a tendency where Yoruba and White Americans are in two totally different ancestral populations of K, while African Americans are 80% in one ancestry and 20% in another. The interpretation of this is straightforward, but when it comes to populations whose backgrounds we don’t know as well, one should be careful. The selection of a particular value for K is going to be really important, and we shouldn’t confuse the method from the reality which the method is trying to plumb.

First, K = 8 from Behar et al. I’ve reedited to highlight populations which might inform the variation of Ethiopians.

Now let’s look at a series of K’s. Note the changes.

Luckily for us, we don’t need to stop here. Dienekes included Behar’s Ethiopians (non-Jews) for Dodecad. Additionally, he included the Masai population from the HapMap. This turns out to be important because he found that Ethiopian Sub-Saharan ancestry is similar to that of the Masai, not the other African groups.

Dienekes also provided individual outputs. I’ve stitched together Ethiopians with Egyptians and Saudis. The color coding is the same as above.

You should be able to tell where the three groups start and stop pretty easily. I’m 99% sure that the six individuals with more East African and less Southwest Asian ancestry are all Oromo. Ethiopians, in particular highland Ethiopians, seem to me likely an ancient stabilized hybrid population between a population from Arabia, and a local Sub-Saharan population. This population seems unlikely to have been related to the peoples of West-Central Africa, who are associated with the Bantus across eastern and southern Africa. The Bantu agricultural toolkit runs into ecological constraints in various regions, and it is in those regions that non-Bantu populations have persisted. Ethiopia, with its unique climate and topography, naturally remains non-Bantu (as well as the Horn of Africa as a whole). The possible connections between Khoisan and Ethiopia may be a function of the fact that these areas harbor genetic variants which have disappeared in the intervening regions because of the Bantu expansion. I have a hard time accepting that the Bantu expansion was particular eliminationist, but I am starting to suspect that outside of Ethiopia population densities were very, very, low.

The antiquity of this ancient hybridization event to me is attested by the fact that Ethiopians lack any of the other Middle Eastern components besides the one modal in Saudi Arabia. There is a great deal of intra-population variance in the Saudi data set. Why? Part of this must be the slave trade, as well as pilgrims who remained in places like Mecca. But, I think part of the untold story here is that there may have been a larger genetic impact on Arabia after the rise of Islam from the Levant than vice versa! Probably the gene flow precedes Islam, as Arabia was hooked into worldwide trade and population movements, which Ethiopia was relatively insulated from. The Saudi data set has several people who are “pure” Southwest Asian, but also several who have a great deal of West Asian + South European. These seem likely to be people who have some background in the Fertile Crescent.

August 22, 2010

Genetic variation within Africa (and the world)

Genetic variation within Africa (and the world)

Last year a paper came out in Science which made a rather large splash, The Genetic Structure and History of Africans and African Americans by Tishkoff et al. Since it’s more than a year old I recommend that those of you curious about the details of the paper and don’t have academic access go through the free registration, as you can then read it in full. Unlike Reich et al. the Science paper didn’t unveil a new method of analysis. It was the standard bread & butter, with PCA’s & STRUCTURE plots & phylogenetic trees. But the coverage of populations within Africa was massive. They had a lot of results and relationships to cover, and ended up with a 100 page supplement.

I commend the whole paper to you. But there are two elements I want to highlight. First, a three dimensional PCA plot. It has the first, second and third principal components of variation. In other words, the three largest independent dimensions in terms of explanatory power of genetic variation. Panel A includes all world populations, and panel B just Africans.


For panel A, PC1 = 20% of the variance, PC2 = 5%, and PC3 = 3.5%. For panel B the PCs didn’t drop off quite so much, PC1 = 11%, PC2 = 6%, PC3 = 5% and PC4 = 4%. In case you don’t know, the Hazda are Africa’s last obligate hunter-gatherers, and speak a language with clicks in it, just as the Bushmen do. The big division highlighted in this paper is that between the “indigenous” relict populations, the Hazda, Sandawe, Bushmen and Pygmies, and those who belong to the more widespread agriculturalist and pastoralist societies of Africa. Implicit within the paper is the model of a Bantu Expansion of farmers, as well as a possible later Nilotic expansion (which brought the Tutsi and Masaai) of herders, in a north-south direction. In the process they assimilated/and or/displaced the indigenous populations, of whom the aforementioned peoples are relict islands persisting in ecologically isolated or unfavorable domains.

324_1035_F5The map to the left shows the population coverage within this paper of African groups. The pie graphs simply show ancestral quanta as inferred by STRUCTURE. You can read the paper for the blow-by-blow. But ultimately it seems there will be need for a finer-grained coverage to the south of the equator. If the Bantu expansion is as recent as archaeologists and linguists assume, on the order of ~2,000 years ago, then the gradients of genetic signals should persist. From what I can tell it is assumed on both genetic and phenotypic grounds that the Xhosa have a higher load of Khoisan ancestry than the Zulu or Tswana. The Bantu Expansion is recent enough that the semi-legendary Phoenician circumnavigation of Africa would have encountered many Khoisan peoples along the eastern coast.

Below are a selection of figures from the above paper. After selecting an image it is probably best to hit F11 for “Full Screen” if you aren’t a on a very big monitor (you can copy image location and view it in a separate window as well).

August 21, 2010

Desmond Tutu, Spaniards, and genetic distance

Desmond Tutu, Spaniards, and genetic distance

Since we’ve been talking about Fst a fair amount, I thought it might be nice to put it in some concrete graphical perspective. First, to review Fst in the genetic context measures the proportion of genetic variation which can be attributed to between population differences. To give a “toy” example if you randomly divided the population of a large Swedish village into two groups, and calculated their Fst, it should be ~ 0, because if you randomly select from an unstructured population by definition there shouldn’t be noticeable between population differences. In contrast, if you compare a Swedish village to a Japanese village, a large fraction of the genetic variation is going to be distinct to each population. Around ~10% of the genetic variation in fact will be between the two groups. Many of the genes will be extremely informative, so that if you know the allelic state from a given individual you can predict with a high degree of certitude which population that individual was from (e.g., SLC24A5 and EDAR). A small set of ancestrally informative alleles would produce a sequence of conditional probabilities of extremely high certitude (on the order of 10 genes for these two populations should suffice, perhaps three for “government work”).

But to put this in perspective, and show how genetic variation differs from locale to locale, I though I would compare continental-scale Fst values with that in a small region, southern Africa. The Fst values for the first I obtained from Investigation of the fine structure of European populations with applications to disease association studies, and the second, Complete Khoisan and Bantu genomes from southern Africa. The Bantu in this case is Desmond Tutu, who is from the Xhosa tribe, and has substantial admixture from the non-Bantu populations which were resident in South Africa prior to the arrival of the Bantus.

First, in tabular format:

Spain Sweden Russia Japan
France 0.0008 0.0023 0.0037 0.1116
Spain 0.0047 0.0059 0.1118
Sweden 0.0025 0.1095
Russia 0.1057

KB1 NB1 TK1 MD8 Desmond Tutu
KB1 0.021 0.024 0.022 0.08
NB1 -0.007 0.006 0.091
TK1 0.016 0.088
MD8 0.061

Second, two adjacent bar graphs. In the foreground I’ve simply take the Spain vs. other Eurasian population comparisons, while in the background Desmond Tutu is the reference for the four Bushmen.


In some ways this comparison is an exaggeration of the variation in African genes. The Bushmen and Bantu populations are of very distinct origins, as the latter spread over eastern and southern Africa only in the last 2,000 years. The Bushmen-Bantu cultural gap is one of sharp discontinuity, and despite gene flow it is still to some extent a genetic one as well. But there are other factors dampening Fst in this case. First, Tutu is himself of partial Khoisan ancestry (of whom there are other groups besides the Bushmen), so his genetic distance is likely to be smaller than someone from the Zulu tribe, which has presumably had less admixture with the indigenous populations, being a bit farther from the edge of the demographic “wave of advance.” Second, the gene chips are geared toward Eurasian populations, and presumably missed African, and particularly Bushmen, specific variants because they didn’t go looking.

My own confusion on these issues the past week illustrates I suppose the difficulty in mapping these abstruse and yet materially concrete patterns onto human categories. But quite often wrestling with the difficulties in the surest path to illumination.

