August 21, 2012

The Sardinian meter

I cropped the image above from the paper Inference of Population Structure using Dense Haplotype Data. The main reason was emphasize the distinctiveness of the Sardinian cluster, on the bottom right. As you can see this population exhibits a lot of coancestry across individuals. This isn’t too surprising, Sardinia is an island, and islands are often genetically distinctive. Random genetic drift prevents populations from diverging through gene flow, but water is a major impediment to gradual isolation by distance dynamics. The original Sardinians are naturally going to diverge from mainlanders over time, and begin to share the same set of common ancestors in the recent past, because their space of reasonable mating possibilities is constrained. The other population which is similar in the heat map above are the residents of the Orkneys, off the north coast of Scotland (the Orkneys has a much smaller population than Sardinia, but, it is also much closer to the mainland).

February 28, 2012

Are Sardinians like Iberians?

Dienekes asks:

In terms of autosomal DNA, the Iceman clearly clusters with modern Sardinians, and also appears slightly more removed than them compared to continental Europeans. Interestingly, at least as far as the PC analyssi shows, Sardinians appear to be intermediate between the Iceman and SW Europeans, rather than Italians. Perhaps, this makes sense if the Paleo-Sardinian language is indeed related to languages of Iberia.

This trend aroused a little curiosity in me too. I’m sure Dienekes & company will be probing these issues a lot in the near future, but I couldn’t wait. I took the IBS data set, which includes a lot of individuals from various areas of Spain, the Sardinians, French and French Basque from the HGDP, and the Tuscans from the HapMap, and threw them together into a pot. I added HGDP Russians & Orcadians (the latter a British group) to make sure there was a North European “outgroup.” In terms of technical details the combined data set had ~220,000 SNPs, not too shabby. Additionally, I decided to run a PCA, where this number of SNPs is more than sufficient.

On a technical note, the Sardinians were swamped in raw numbers by Iberians and Tuscans (over 100 and around 80 respectively). This means that the peculiarities of the Sardinian genetic heritage didn’t show up, rather, what you see are the Sardinians as they arrange themselves in relation to the genetic variation of these more numerous groups. I used SmartPCA to generate the 10 largest independent dimensions of variation. To make a long story short there really wasn’t much variation added from the second dimension on in this relatively homogeneous sample. So below is PC 1 and 2 (E1 and E2).

I’d be curious if someone could replicate this. I’m rather surprised that the Tuscans form such a tight cluster, but then again the IBS sample is very geographically distributed across Spain. The analogy to the HapMap Tuscans might be if Spain was represented by just Galicians. So what you’re really seeing is a lot of Spanish variation, and of course the north-south range in Europe (which is really a southwest to northeast cline). I don’t see a very strong affinity between Basques and Sardinians, but repeated trials indicated that the Sardinians do not cluster with Tuscans when it comes to their position within the Iberian genetic spectrum.


Ötzi the Iceman and the Sardinians

Well, the paper is finally out, New insights into the Tyrolean Iceman’s origin and phenotype as inferred by whole-genome sequencing. In case you don’t know, Ötzi the Iceman died 5,300 years ago in the alpine region bordering Austria and Italy. His seems to have been killed. And due to various coincidences his body was also very well preserved. This means that enough tissue remained that researchers have been able to amplify his DNA. And now they’ve sequenced it enough to the point where they can make some inferences about his phenotypic characteristics, and, his phylogenetic relationships to modern populations.

The guts of this paper will not be particularly surprising to close readers of this weblog. The guesses of some readers based on what the researchers hinted were correct: Ötzi seems to resemble mostly closely the people of Sardinia. This is rather interesting. One reason is prosaic. The HGDP sample used in the paper has many Northern Italians (from Bergamo). Why is it that Ötzi does not resemble the people from the region that he was indigenous to? (we know that he was indigenous because of the ratio of isotopes in his body) A more abstruse issue is that it is interesting that Sardinians have remained moored to their genetic past, enough so that a 5,300 year old individual clearly can exhibit affinities with them. The distinctiveness of Sardinians jumps out at you when you analyze genetic data sets. They were clearly set apart in L. L. Cavalli-Sforza’s The History and Geography of Human Genes, 20 years ago. One reason that Sardinians may be distinctive is that Sardinia is an isolated island. Islands experience reduced gene flow because they’re surrounded by water. And sure enough, Sardinians are especially similar to each other in relation to other European populations.


But Ötzi’s affinities reduce the strength of this particular dynamic as an explanation for Sardinian distinctiveness. The plot to the left is a PCA. It takes the genetic variation in the data set, and extracts out the largest independent components. PC 1 is the largest component, and PC 2 the second largest. The primary cline of genetic variation in Europe is North-South, with a secondary one going from West-East. This is evident in the plot, with PC 1 being North-South, and PC 2 being West-East. The “Europe S” cluster includes northern, southern, and Sicilian Italians. Now notice the position of Ötzi: he is closest to a large cluster of Sardinians. Interestingly there are also a few others. Who are they? I do not know because I do not have access to the supplements right now. The fact that the Sardinians are shifted closer to the continental populations than Ötzi is also striking. But totally intelligible: Sardinia has had some gene flow with other Mediterranean populations. This obviously post-dates Ötzi; Roman adventurers and Genoaese magnates could not be in his genealogy because Rome and Genoa did not exist 5,300 years ago. These data strongly point to the possibility of rather major genetic changes in continental Europe, and in particular Italy, since the Copper Age.  Juvenal complained that the “River Orantes has long flowed into the Tiber,” a reference to the prominence of easterners, Greek and non-Greek, in the city of Rome. The impact of this is not to be dismissed, but I do not think that it gets to the heart of this matter.

The second panel makes clear what I’m hinting at: Ötzi is actually closer to the “Middle Eastern” cluster than many Italians! In fact, more than most. Why? I suspect that rather than the Orantes, the Rhine and the Elbe have had more of an impact on the genetic character of Italians over the past ~5,000 years. Before Lombardy was Lombardy, named for a German tribe, it was Cisapline Gaul, after the Celts who had settled it. And before that? For that you have to ask where Indo-Europeans came from. I suspect the answer is that they came from the north, and therefore brought northern genes.

A Sardinian

And what of the Sardinians? I believe that the “islanders” of the Mediterranean are a relatively “pristine” snapshot of a particular moment in the history of the region. This is evident in Dienekes’ Dodecad Ancestry Project. Unlike their mainland cousins both the Sardinians and Cypriots tend to lack a “Northern European” component. Are the islanders in part descendants of the Paleolithic populations? In part. Sardinians carry a relatively high fraction of the U5 haplogroup, which has been associated with ancient hunter-gatherer remains. But it is also possible that the preponderant aspect of Sardinian ancestry derives from the first farmers to settle the Western Mediterranean.  I say this because the Iceman carried the G2a Y haplogroup, which has of late been strongly associated with very early Neolithic populations in Western Europe. And interestingly some scholars have discerned a pre-Indo-European substrate in Sardinian which suggests a connection to the Basque. I wouldn’t read too much into that, but these questions need to be explored, as Ötzi’s genetic nature makes Sardiniaology more critical to understanding the European past.

Image credit: Wikipedia

October 29, 2011

Unfrying the egg

Dienekes has a long post, the pith of which is expressed in the following:

If I had to guess, I would propose that most extant Europeans will be discovered to be a 2-way West Asian/Ancestral European mix, just as most South Asians are a simple West Asian/Ancestral South Indian mix. In both cases, the indigenous component is no longer in existence and the South Asian/Atlantic_Baltic components that emerge in ADMIXTURE analyses represent a composite of the aboriginal component with the introduced West Asian one. And, like in India, some populations will be discovered to be “off-cline” by admixture with different elements: in Europe these will be Paleo-Mediterraneans like the Iceman, an element maximally preserved in modern Sardinians, as well as the East Eurasian-influenced populations at the North-Eastern side of the continent.

This does not seem to be totally implausible on the face of it. But it seems likely that any “West Asian” component is going to be much closer genetically to an “Ancestral European” mix than they were to “Ancestral South Indians,” because the two former elements are probably part of a broader West Eurasian diversification which post-dates the separation of those groups from Southern and Eastern Eurasians. In other words, pulling out the distinct elements in Europeans is likely a more difficult task because the constituents of the mixture resemble each other quite a bit when compared to “Ancestral North Indians” vs. “Ancestral South Indians.”

The bigger issue which this highlights though is that the reality that many of these clustering methods are temporally sensitive. Given enough time a “hybrid” population is no longer a hybrid, but rather a new distinctive population which itself can be a “parent.” Recombination breaks apart the long range genetic physical associations which are the hallmarks of distinctive admixed ancestry on the genomic scale. That is why clustering methods easily generate a pure “South Asian” component. After at least ~3-4,000 years of continuous admixture the synthesis is now far less coarse, and the elements much more de facto miscible. And yet via other clustering techniques, such as principle components analysis, you get different results. The peculiar position of the “South Asian” individuals between Europeans and East Asians in direct proportion to their caste and regional origins becomes highly indicative of some sort of admixture event in different proportions as a function of geography and social context. The technique in Reconstructing Indian population history allowed for a resolution of this paradox by sifting through the variation and extracting out the ancestral components. The recent papers which came out on Australian Aboriginal genetics do something similar, in terms of making sense of somewhat puzzling results which are found when generating inferences from aggregate genomic variation.

Imagine how much more difficult the task would have been if the ancestral components were much closer! I suspect that’s what’s going on in Europe. I’m not privy to any big secrets, but I have heard of whispers of research groups using Sardinians as a “pure” outgroup to model the changing demographics of Europe since the arrival of agriculture. What David Reich stated at the conference was not particularly surprising to me in light of that possibility. Sardinia regularly pops out as a weird outlier in many analyses. One simple possibility here is that that’s simply a function of the fact that it’s an island, and therefore has diverged from mainland populations due to isolation from conventional village-to-village mate exchange. Another possibility, mooted by Dienekes, is that it may be a repository of European genetic variation from earlier periods, relatively unaffected by later perturbations due to demographic changes. The main reason that I can give some credit to Dienekes’ thesis has less to do with Sardinians than Basques. The French Basques in the HGDP are less atypical than the Sardinians, but in some runs they do lack a component which is most obviously classed as “West Asian,” and which other French have. In Dienekes’ own runs with a diverse array of Iberian populations this same distinction emerges.

All of this reminds us that clustering methods give us great insights into how populations are related to each other, but they don’t tell us about the details of how that relatedness came to be. It makes a great difference if an element is the outcome of relatively recent (<10,000 years) hybridization events, as opposed to having deeper roots. For example, admixture between Polynesians and Melanesians brings together two components, whatever their own prior origins, diverged on the order of 50,000 years before the present. And yet if the two groups mentioned earlier are correct than the Melanesian component itself must be decomposed into two fractions, one of which is much closer to the Polynesians than the other, our understanding of the past changes.

As I implied earlier today I think the era of wild hypothesis generation in the area of the settling of Europe over the last 10,000 years is coming to the end. The combination of more powerful analytic techniques and the emergence of ancient DNA samples with which to calibrate, peg, and check, inferences from those techniques, will probably clarify our understanding of the past to a great extent.

Image credit: yomi955

