Razib Khan One-stop-shopping for all of my content

December 1, 2012

Africa’s hidden people hold the keys to the past

I mentioned this in passing on my post on ASHG 2012, but it seems useful to make explicit. For the past few years there has been word of research pointing to connections between the Khoisan and the Cushitic people of Ethiopia. To a great extent in the paper which is forthcoming there is the likely answer to the question of who lived in East Africa before the Bantu, and before the most recent back-migration of West Eurasians. On one level I’m confused as to why this has to be something of a mystery, because the most recent genetic evidence suggests a admixture on the order of 2-3,000 years before the past.* If the admixture was so recent we should find many of the “first people,” no? As it is, we don’t. I think these groups, and perhaps the Sandawe, are the closest we’ll get.

Publication is imminent at this point (of this, I was assured), so I’m going to just state the likely candidate population (or at least one of them): the Sanye, who speak a Cushitic language with possible Khoisan influences. There really isn’t that much information on these people, which is why when I first heard about the preliminary results a few years back and looked around for Khoisan-like populations in Kenya I wasn’t sure I’d hit upon the right group. But at ASHG I saw some STRUCTURE plots with the correct populations, and the Sanye were one of them. I would have liked to see something like TreeMix, but the STRUCTURE results were of a quality that I could accept that these populations were not being well modeled by the variation which dominated their data set. Though Cushitic in language the Sanye had far less of the West Eurasian element present among other Cushitic speaking populations of the Horn of Africa. Neither were their African ancestral components quite like that of the Nilotic or Bantu populations. The clustering algorithm was having a “hard time” making sense of them (it seemed to wanted to model them as linear combinations of more familiar groups, but was doing a bad job of it).

Here is an interesting article on these groups: Little known tribe that census forgot. Like the Sandawe this is a population which seems to have been hunter-gatherers very recently, and to some extent still engage in this lifestyle. In this way I think they are fundamentally different from Indian tribal populations, who are often held up to be the “first people” of the subcontinent.  More and more it seems that the tribes of India are less the descendants of the original inhabitants of the subcontinent, at least when compared to the typical Indian peasant, and more simply those segments of the Indian population which were marginalized and pushed into less productive territory. Over time they naturally diverged culturally because of their isolation, but the difference was not primal. In contrast, groups like the Sanye and Sandawe may have mixed to a great extent with their neighbors (and lost their language like the Pygmies), but evidence of full featured hunting & gathering lifestyles implies a sort of direct cultural continuity with the landscape of eastern Africa before the arrival of farmers and pastoralists from the west and north.

* I understand some readers refuse to accept the likelihood of these results because of other lines of information. I am just relaying the results of the geneticists. I am not interested in re-litigating prior discussions on this. We’ll probably have a resolution soon enough.

January 16, 2012

The Fulani have an old “Berber” (?) element

After the second Henn et al. paper I did download the data. Unfortunately there are only 62,000 SNPs intersecting with the HGDP. This is somewhat marginal for fine-grained ADMIXTURE analyses, though sufficient for PCA from what I recall. That being said, the intersection with the HapMap data sets runs from ~190,000 SNPs, to the full 250,000 SNPs (this makes sense since the Henn et al. #2 data set has some HapMap populations in it). So I’ve been experimenting a fair amount in the past few days, and I thought I would post on one issue which was clear in the original paper, but which I have replicated.

The Fulani (Fula) people of the western Sahel seem to have a relatively old West Eurasian component which has distinct affinities with the “Maghrebi” element discerned by Henn et al. In fact, the non-Sub-Saharan African ancestry of the Fulani is almost exclusively of this origin. To me this serves as a peculiar mirror of what you see in the Cushitic and Ethiopian Semitic peoples of the far east of the Sahel-Sudan latitudinal region. These populations also seem to be compounds of a Sub-Saharan Africa element with a West Eurasian one, but in their case the admixture is almost exclusively from a Southwest Eurasian (Arabian) component. Geographically these two symmetric admixture events make sense, but the exclusivity is still a bit surprising. Additionally, in both the case of the Fulani and the Ethiopian and Cushitic groups the admixture is widely distributed and even enough to imply that they are old events. I also assumed this because in some admixture runs a “pure” Fulani cluster partitions out, which is not unexpected for stabilized hybrid populations (all human populations are stabilized hybrids if you go back far enough).

To give you a flavor of what I’m talking about here are some screen shots of a run which is currently going. It has 180,000 markers. I removed Tunisians and many African populations from the Henn et al. data set, and included in the Utah whites from the HapMap. The individual plots show the ancestral proportions for each Fulani in the data set:

So what can we see here? First, let’s reiterate something: as in the case of the populations of the Horn of Africa the West Eurasian element in the Fulani is difficult to find in “pure” form in the populations from which it putatively derived. What does that imply? I think that that means that the Fulani have an origin in relatively recent historic time, on the order of 2,000, not 10,000, years. That is because I am skeptical that the Fulani would be able to maintain genetic distinctiveness for ~10,000 years from other populations around them. In contrast, the last 2,000 years have seen the rise of various cultural institutions, from trans-Saharan nomadism to Islam, which might slow down admixture sufficiently to maintain the differences between the Fulani and their neighbors. It also implies to me that the non-Maghrebi “Near Eastern” element which Henn et al. discerned is relatively a recent phenomenon in northwest Africa, else the Fulani should also carry it. How recent? Probably from Classical Antiquity down to the Muslim period. Observe that many North Africa groups have a red “European” element. This may be from Near Eastern populations, but I suspect that the fraction here is just too high to be explained by that. Also, you can see above that some groups in Morocco have nearly as much of this as Egyptians, but far less of the more genuine Near Eastern components.

In all likelihood the West Eurasian component came to the Fulani via the Tuareg or a related or antecedent population. So if you typed the Tuareg you would probably get a better sense of the “pure” “Maghrebi” genetic profile. These genetic results also can serve as fodder to understanding the ethnogenesis of the landscape of the Sahel. In the map above it is interesting to observe that the Hausa speak an Afro-Asiatic language, even though their West Eurasian component is far lower than the Fulani, who speak Niger-Congo dialects. What gives? I suspect that the difference here is that the Hausa are a case of elite emulation of a cultural complex which was much more integrated and elaborated by the time it arrived on the West African scene. This explains how there could be language shift, while in the case of the Fulani there was none. Another hypothesis is that Afro-Asiatic derives from Sub-Saharan Africa itself, and the Chadic (Hausa) group are basal to the phylogeny. I’ll let readers explore the implications of that. A final aspect, I put the quotations in the title because perhaps the Berber dialects spread via elite emulation, and the original Maghrebi ancestors of the Fulani spoke a different language, which has been lost? As they say, for every answer there bloom a thousand questions….

Image credit: Wikipedia, Wikipedia.

August 30, 2011

Tutsi genetics, ii

In my post below, Tutsi probably differ genetically from the Hutu, there were many comments. Some I did not post because they were rude, though they did ask valid questions. I will address those issues, but let me quote one comment:

That’s an interesting possibility, but this admixture run didn’t split the non-hunter-gatherer Africans that well. In one of your previous analyses on East Africa you managed to get a pretty accurate ‘Afro-Asiatic/Cushitic’ and ‘Nilotic’ cluster. Is it possible that you could run this Tutsi sample using the same admixture settings as in the ‘Flavors of Afro-Asiatic’ blog post to see if he carries a significant Nilotic component or is mainly Bantu & Cushitic derived?

So I replicated ADMIXTURE runs for many of the same populations as I did in my post, Flavors of Afro-Asiatic. I also pared down the population set and generated a PCA with EIGENSOFT. Before I get to those results, let me tackle the questions.

1) “Are the Luhya suitable proxies for the Hutus?”

Probably. The reason is that Bantu-speaking populations, from the Congo to South Africa, are surprisingly similar. Not only that, but these populations are very distinctive from groups which are close them ...

April 9, 2011

The Sandawe: after the demographic flood

Filed under: African Genetics,African Genomics,Genetics,Genomics,History,Sandawe — Razib Khan @ 9:21 pm

Over the past few days I’ve been trying to read a bit on the Sandawe. Most of the stuff I’ve been able to find is in the domain of linguistics, and is basically unintelligible to me in any substantive manner. The crux of the curiosity here is that the Sandawe, like their Hadza neighbors, have clicks in their language, and so have been classified with the Khoisan. Here’s some background:

The most promising candidate as a relative of Sandawe are the Khoe languages of Botswana and Namibia. Most of the putative cognates Greenberg (1976) gives as evidence for Sandawe being a Khoesan language in fact tie Sandawe to Khoe. Recently Gueldemann and Elderkin have strengthened that connection, with several dozen likely cognates, while casting doubts on other Khoisan connections. Although there are not enough similarities to reconstruct a Proto-Khoe-Sandawe language, there are enough to suggest that the connection is real.

I can’t speak to the validity of this at all, obviously. Some scholars do argue that the clicks in the Sandawe language were only acquired through interaction with peoples such as the Hadza, making an analogy to Xhosa, a Bantu language which has been strongly influenced by Khoi dialects. ...

April 6, 2011

The men of Africa

Khoikhoi on the move….

Dienekes mentioned today a new paper, Signatures of the pre-agricultural peopling processes in sub-Saharan Africa as revealed by the phylogeography of early Y chromosome lineages. Because of the recent comments in this space on the genetic history of Africa I was curious, but after reading it I have to say I can’t make much sense of the alphabet soup of haplogroups. Remember, there are different ways to capture and analyze the variation in one’s genes. A common activity is to sweep over the whole genome and focus on single nucleotide polymorphisms, variation at the base pair level. So my own analyses using ADMIXTURE focus on tens or hundreds of thousands of such markers. But there are other types of genomic variation, such as copy number, microsatellites, and minsatellites.

Additionally, much of the older human phylogeographic literature focused on mtDNA and Y chromosomal variance. For mtDNA it was partly a function of how easy it was to extract the genetic material (it’s copious on the cellular level). But perhaps more importantly these two types of variance aren’t subject to recombination. This means ...

August 22, 2010

Genetic variation within Africa (and the world)

Filed under: African Genetics,African Genomics,Genetics,Genomics,Variation — Razib Khan @ 12:16 pm

Last year a paper came out in Science which made a rather large splash, The Genetic Structure and History of Africans and African Americans by Tishkoff et al. Since it’s more than a year old I recommend that those of you curious about the details of the paper and don’t have academic access go through the free registration, as you can then read it in full. Unlike Reich et al. the Science paper didn’t unveil a new method of analysis. It was the standard bread & butter, with PCA’s & STRUCTURE plots & phylogenetic trees. But the coverage of populations within Africa was massive. They had a lot of results and relationships to cover, and ended up with a 100 page supplement.

I commend the whole paper to you. But there are two elements I want to highlight. First, a three dimensional PCA plot. It has the first, second and third principal components of variation. In other words, the three largest independent dimensions in terms of explanatory power of genetic variation. Panel A includes all world populations, and panel B just Africans.


For panel A, PC1 = 20% of the variance, PC2 = 5%, and PC3 = 3.5%. For panel B the PCs didn’t drop off quite so much, PC1 = 11%, PC2 = 6%, PC3 = 5% and PC4 = 4%. In case you don’t know, the Hazda are Africa’s last obligate hunter-gatherers, and speak a language with clicks in it, just as the Bushmen do. The big division highlighted in this paper is that between the “indigenous” relict populations, the Hazda, Sandawe, Bushmen and Pygmies, and those who belong to the more widespread agriculturalist and pastoralist societies of Africa. Implicit within the paper is the model of a Bantu Expansion of farmers, as well as a possible later Nilotic expansion (which brought the Tutsi and Masaai) of herders, in a north-south direction. In the process they assimilated/and or/displaced the indigenous populations, of whom the aforementioned peoples are relict islands persisting in ecologically isolated or unfavorable domains.

324_1035_F5The map to the left shows the population coverage within this paper of African groups. The pie graphs simply show ancestral quanta as inferred by STRUCTURE. You can read the paper for the blow-by-blow. But ultimately it seems there will be need for a finer-grained coverage to the south of the equator. If the Bantu expansion is as recent as archaeologists and linguists assume, on the order of ~2,000 years ago, then the gradients of genetic signals should persist. From what I can tell it is assumed on both genetic and phenotypic grounds that the Xhosa have a higher load of Khoisan ancestry than the Zulu or Tswana. The Bantu Expansion is recent enough that the semi-legendary Phoenician circumnavigation of Africa would have encountered many Khoisan peoples along the eastern coast.

Below are a selection of figures from the above paper. After selecting an image it is probably best to hit F11 for “Full Screen” if you aren’t a on a very big monitor (you can copy image location and view it in a separate window as well).

Powered by WordPress