Razib Khan One-stop-shopping for all of my content

December 18, 2012

Buddy, can you spare some ascertainment?

The above map shows the population coverage for the Geno 2.0 SNP-chip, put out by the Genographic Project. Their paper outlining the utility and rationale by the chip is now out on arXiv. I saw this map last summer, when Spencer Wells hosted a webinar on the launch of Geno 2.0, and it was the aspect which really jumped out at me. The number of markers that they have on this chip is modest, only >100,000 on the autosome, with a few tens of thousands more on the X, Y, and mtDNA. In contrast, the Axiom® Genome-Wide Human Origins 1 Array Plate being used by Patterson et al. has ~600,000 SNPs. But as is clear by the map above Geno 2.0 is ascertained in many more populations that the other comparable chips (Human Origins 1 Array uses 12 populations). It’s obvious that if you are only catching variation on a few populations, all the extra million markers may not give you much bang for the buck (not to mention the biases that that may introduce in your population genetic and phylogenetic inferences).


To the left are the list of populations against which the Human Origins 1 Array was ascertained, and they look rather comprehensive to me. In contrast, for Geno 2.0 ‘ancestrally informative markers’ were ascertained on 450 populations. The ultimate question for me is this: is all the extra ascertainment on diverse and obscure groups worth it? On first inspection Geno 2.0′s number of SNPs looks modest as I stated, but in my experience when you quality control and merge different panels together you are often left with only a few hundred thousand SNPs in any case. 100-200,000 SNPs is also sufficient to elucidate relationships even in genetically homogeneous regions such as Europe in my experience (it’s more than enough for model-based clustering, and seems to be overkill for MDS or PCA). One issue that jumps out at me about the Affymetrix chip is that it is ascertained toward the antipodes. In contrast, Geno 2.0 takes into account the Eurasian heartland. I suspect, for example, that Geno 2.0 would be better for population or ancestry assignment for South Asians because it would have more informative markers for those populations.

Ultimately I can’t really say much more until I use both marker sets in different and similar contexts. Since Geno 2.0 consciously excludes many functional and medically relevant SNPs its utility is primarily in the domain of demographics and history. If the populations in question are well covered by the Human Origins 1 Array, I see no reason why one shouldn’t go with it. Not only does it have more information about biological function, but the number of markers are many fold greater. On the other hand, Geno 2.0 may be more useful on the “blank zones” of the Affy chip. Hopefully the Genographic Project results paper for Geno 2.0 will come out soon and I can pull down their data set and play with it.

Cite: arXiv:1212.4116

July 25, 2012

The Genographic Project: on to the autosome!

Filed under: Geno 2.0,Genomics — Razib Khan @ 10:20 am

The Genographic Project is now moving beyond uniparental lineages with Geno 2.0. Spencer Wells kindly invited me to a conference call last month where he outlined a lot of the details, so I’ll hit the salient points for readers of this weblog:


* They’re unveiling a new SNP-chip and a new project which moves beyond the Y and mtDNA to the autosome. But they’re also expanding their coverage of uniparental markers.

* Though there are “only” autosomal 130,000 markers, Wells and his collaborators have selected a subset of markers which are highly informative of population structure (e.g., high Fst). Their SNPs are biased toward those with moderate levels of polymorphism across many populations to maximize the power of diagnosis of differentiation.

* They tried really hard to get rid of ascertainment bias. This means that in many previous chips there is a tendency to work off the polymorphism in Europeans, and then examine worldwide variation using this ruler. The problems with this method are obvious. One of the scientists on this project outlined how they worked to look for SNPs which are very informative for populations where ascertainment bias is a particular problem, Oceanians and Amerindians. I was impressed ...

Powered by WordPress