Razib Khan One-stop-shopping for all of my content

September 14, 2011

Poll on personal genomics

Filed under: DTC personal genomics,Genomes Unzipped,Personal genomics — Razib Khan @ 9:08 pm

Genomes Unzipped points me to a Nature survey on personal genomics for scientific researchers. With price points down to $200 or so many scientists have been at least genotyped. Though it varies by domain. Many molecular biologists seem intrigued by the novelty of personal genotyping services. In contrast, in a room of a dozen or so population geneticists and the like nearly half are liable to have already gone through some service.

All that being said, I haven’t heard from people who want to make their genotype public in a long, long, time. Has the steam run out of that project? You might hear from me again with a subtle twist on this in the near future.

February 10, 2011

My genetic odyssey

Filed under: Genetics,Genomes Unzipped,Personal genomics — Razib Khan @ 9:49 am

I have a guest post at Genomes Unzipped, summarizing what I’ve found via ancestry analysis over the past 6 months with the results from 23andMe. It is in many ways a brief overview of the detailed posts which you’ve see in this space.

December 15, 2010

“Genome blogging”

Nature profiles Dodecad, the Pickrell Affair, and the emergence of amateur genomicists in a new piece. Interestingly David of BGA is going to try and get something through peer review. In particular, the relationship of Assyrians and Jews.

So we have Genomes Unzipped, Dodecad, and BGA. What next? Who next? I hope Dienekes doesn’t mind if I divulge the fact that the computational resources needed to utilize ADMIXTURE as he has is within the theoretical capability of everyone reading this post. Rather, the key is getting familiar with PLINK and writing some code to merge data sets. After you do that, to really add value you’d probably want to get raw data from more than what you can find in the HGDP, HapMap and other public resources.

But here I make an open offer: if you start a blog or a project which replicates the methods of Dodecad and BGA I’ll link to you and promote you. When Dienekes began Dodecad I actually started to play around with the data sets in ADMIXTURE, but I’ve personally held off until seeing what he and David find. What their pitfalls and successes might be. Here’s to 2011 being more interesting than we can imagine!

Update: Already had a friend with a computational background contact me about doing something on South Asian genomics. So again: if you get a site/blog set up, and start pumping out plots, I will promote you. In particular, if you need 23andMe raw data files of geographical region X it might be useful to try and get the word out via blogs and what not.

November 23, 2010

Eurogenes 500K SNP BioGeographicAncestry Project

Filed under: Admixture,Ancestry,BGA,Genetics,Genomes Unzipped,Genomics — Razib Khan @ 12:11 am

Since I have been promoting the Dodecad Ancestry Project, it seems only fair to bring to your attention Eurogenes 500K SNP BioGeographicAncestry Project. The sample populations are a bit different from Dodecad, but again ADMIXTURE is the primary tool. But the author also makes recourse to other methodologies to explore more than simply population level variation. For example, his most recent post is Locating and visualizing minority non-European admixtures across our genomes:

Imagine, for example, a white American carrying a couple of tiny segments of West African origin, from an ancestor who lived 250 years ago, and an eastern Finn with no Asian ancestors in the last 4000 years or more. If we run an inter-continental ADMIXTURE analysis with these two, it’s very likely the American will score 100% European, while the eastern Finn will probably come out around 9% North and East Asian due to really old Uralic influence.

That sort of thing isn’t a huge problem when comparing the genetic structure of populations. Obviously, overall, eastern Finns rather than white Americans are genetically closer to North Asians, and that’s basically what ADMIXTURE picks up. However, if the focus is also on individuals, this certainly can become an issue. Our hypothetical American might be aware of that African ancestor, with solid paperwork backing up their genealogical connection, but he’s pulling his hair out because nothing’s showing up via genetic tests.

So let’s take a look at a real life example of how RHHcounter can pick up segments of potentially recent Sub-Saharan African origin…

Olivia Munn & Uyghur woman

The basic issue here is that in terms of genomic variation old admixture looks different from new admixture. Someone who is a first generation Eurasian, with a Chinese and European parent, may be about the same ancestral mix proportionally as a Uyghur. They would resemble a Uyghur on STRUCTURE and be placed within that cluster on a PCA chart (this is what happens in 23andMe). But, the Uyghur “Eastern” and “Western” genetic heritage has been reshuffled to a great extent by recombination over the past 1,000-2,000 years. In contrast, a first generation Eurasian will have huge swaths of their genome which are Eastern or Western on alternating strands (from their respective parents). In population genetic language a group of first generations hybrids would be exhibit a lot of linkage disequilibrium (LD). In a panmictic hybrid population LD will decay due to recombination, which breaks apart the distinctive allelic associations inherited from the parental populations.

This is the key to differentiating between the old “Asian” ancestry which sometimes falls out of the genetic variation of Finns at low frequencies, and more recent Asian ancestry. For example, the paleoanthropologist Vance Haynes is apparently a great-grandson of one of the original “Siamese Twins,” Chang Bunker. Chang Bunker was a Chinese Thai, so presumably Vance Haynes would come out to be ~10% Asian, and would be shifted toward the Asian cluster in relation to other Europeans. On the other hand, a closer look at his genome would indicate differences from a Turk who was ~10% Asian, because Vance Haynes’ Asian ancestry has only had three generations for recombination to break apart the original allelic associations which were passed down from Chang Bunker. After only these few generations the genome would still show many segments of clustered ancestry with distinctive sets of markers characteristic of Han Chinese.

Let’s make this more concrete. Below are two “ancestry paintings” from 23andMe. One is of a reference example, a Uyghur woman, and another is of a Eurasian individual. The difference is pretty obvious:


23andmeclusFor the record, 23andMe says that the Eurasian man is 50% Asian, 50% European. For the Uyghur woman, 52% European, 48% Asian. As I indicated above, Eurasian individuals who are projected onto the variation of the HGDP sample tend to cluster with the Uyghurs. In the image to the left the black mark indicates the Eurasian man. The Uyghurs are green. The purple rectangles are Hazaras.

But obviously this is a trivial example. What’s the point of sniffing around for non-European ancestry in individuals whose non-European ancestry is 1) visible, and 2) recent and immediate. No, a bigger question here are claims and suggestions by some white Americans that they have significant non-European ancestry. Usually this is Native American. But in the case of one of the European-origin samples which “Polako” (the principal behind the BGA Project) analyzed it seems there is a suggestion of West African ancestry.

dandonThis individual is Dr. Don Conrad of Genomes Unzipped. In particular, Polako found that there were two nearby segments on two chromosomes which exhibited a pattern of population atypical heterozygosity in Dr. Don Conrad’s genome. Look at chromosomes 7 and 13. Contrast the pattern with my distant paternal cousin, Dr. Daniel MacArthur. He also exhibits points of heterozygosity, but they’re randomly distributed across the genome. It’s old admixture or just noise.

Polako doesn’t make much of Dr. Don Conrad’s results, and neither do I (presumably as Dr. Don Conrad is a member of Genomes Unzipped it’s easy to talk about his results without any of the ethical or moral hassles about confidentiality). On the other hand, unlike Dr. Dan MacArthur, a little utilization of the powers of the interwebs indicates that Dr. Don Conrad is an American. In particular, of recent Midwestern background. Though I’m not a total creep, so I didn’t start poking around Ancestry.com. But after the Pickrell affair I am probably just a touch more hesitant to laugh off peculiar results from these sorts of analyses as simply algorithms-gone-meshugana.

Image Credit: Colegota

November 11, 2010

The layers and fault-lines of genes


At Genomes Unzipped Luke Jostins elaborates on how the genetic facts he now has about his paternal lineage change how he views his own personal history:

… my father’s father is Latvian, and the N1 haplogroup is not rare in the Baltic regions. In fact, the subgroup, N1c1, is more common in parts of Eastern Europe than it is in Asia.

Initially, this seemed to play nicely into a part of our ancient family history. There is a folk history, relayed to me be my Dad and my uncle Johnny, that Jostins blood may contain traces of Mongolian. The justification for this is that in around 1260, just before the civil war caused the Mongol Empire to die back in Europe, the Empire extended all the way to the Baltic States. It was at this point, my fellow N1c1-bearers hypothesise, that Mongolian DNA entered the Jostins line.

Unfortunately on closer inspection this tale is not really supported by the DNA evidence. The famous Mongol Expansion haplogroup is actually C3, which is the modal haplogroup of Mongolians. In contrast, N1c1 has existed in Europe for thousands of years, and is far to old and too wide-spread to represent a recent expansion.

dnanlargergTo the left is a frequency map of the concentration of N1c1. Based on the current distribution, and the diversity being modal in the East Baltic, one has to be skeptical of a simple east-west model. Interestingly the frequency difference of this haplogroup between Finland and Sweden is very high. Also, branch of N1c1 seems to be found among the Rurikids of Russia. This was the ruling dynasty of the Rus, a people who originally seem to have been ethnic Scandinavians from Sweden. Eventually they ruled over a polyglot state of Finns, Slavs and Scandinavians, and submerged their own identity with that of the Slavic peasants. In this they followed the example of the Bulgars, who were ethnically distinctive from their Slavic subjects, but were totally absorbed excepting that their ethnonym persisted. There is some evidence that the Serbs are a similar case, an Iranian group which was eventually absorbed into the South Slav substrate.

Going back to northern Europe, let’s try to get some more perspective. Luke Jostins’ personal history is after all a slice of population history, and what we know about the background of the population impacts how Luke views his own personal history. To do that I thought I’d quickly poke around a few older papers on Baltic genetics which I had stashed away. It didn’t turn out to be so quick. But here are some figures. First, from Genome-Wide Analysis of Single Nucleotide Polymorphisms Uncovers Population Structure in Northern Europe:


From Genetic Structure of Europeans: A View from the North–East:


Finally, from Migration Waves to the Baltic Sea Region (N3 = N1c1):


Also see my recent posts on Northern European genetics, as well as the argument about agriculturalists vs. farmers. Ten years ago we have a few simple models, but now it gets more confusing and complicated. Confounders:

- Different reproductive skew parameters for males and females. In short, high fertility of “super-males” as well as dominance of patrilocality can produce different patterns in Y and mtDNA

- Selection on mtDNA. The “neutral” markers which we think of as neutral may not be neutral

- Poor correspondence between inferences of the past based on contemporary patterns of variation and what ancient DNA has discovered. Our assumptions are faulty, or we’re just too stupid to extract the real patterns

- Persistent problems with dating and typing some uniparental lineages. Consider the debate over the pan-Eurasian haplogroup R1a1a* (Dan MacArthur and I both carry this Y lineage, but what’s in a few letters?)

- Reality is complicated. This may be the most intractable issue over the long term

I have used the analogy of a palimpsest to describe the flow of genetic variation over time and space. I think that perhaps that that is misleading in some fundamental ways. Demographic patterns are characterized by different dynamics, persistent and long standing “flows,” as well as punctuated “explosions.” Rather than a palimpsest, a better analogy might be the layering of geological strata. Although there are long periods of gentle wearing and layering, volcanism and earthquakes periodically erupt to disrupt the smooth accumulations. Sequences of catastrophic events can produce inversions.

Consider three dynamics:

- Isolation-by-distance. This is the conventional band/village-to-band/village process of gene flow. This may be analogized to sedimentary accumulation (mutations) and erosion (drift)

- Demic diffusion. The rapid demographic expansion into virgin territory by a culture which introduces a more efficient mode of production. One of the most recent occurrences of this was the rapid multiplication of New England Puritans from ~30,000 circa 1640 to over 700,000 150 years later. Not only did these New Englanders “fill up” their home territory, in the early years of the republic they burst out of the northeast and populated many regions of the Great Lakes. Demic diffusion is like an earthquake, a rapid and ordered shift of the local geology

- The leap frog. The settlement of Europeans in the southern cone of Latin America, Australia, or Mongols in eastern Iran, are instances of leap frogs. We have clear textual of these leap frogs, but without that we wouldn’t know what to make of them. Leap frogs are like volcanic eruptions, reordering the layers beneath and also deposition from above

At least with Luke’s hypothesis about descent from Rurik he can test his own N1c1 profile against other Rurikids. Presumably the modal haplotype and its near relations are those of the original Rurik.

October 12, 2010

The naked geneticists

Filed under: Biology,Genomes Unzipped,Genomics,Personal genomics — Razib Khan @ 1:44 am

Girl_hands_out_flyer_at_Loveparade_03John Hawks, Genomes unzipped, unzipped:

What I wonder is, how much will personal genomics be like nude beaches? I mean, it’s been a long time since the first nude beaches, but most people don’t take advantage of the opportunity. Clearly, there’s variation in different countries! But most people neither feel compelled to see others’ data nor feel comfortable sharing their own.

Well, they used the word unzipped, not me!

Obviously John had his tongue-firmly-planted-in-cheek, but I have wondered about this. How deep is the impact of personal genomics going to be for individuals? If a person gets their genome sequenced and has a list of odds ratios in front of them are they going to bone up on the statistical genetic subtleties of the face value?

That is where genetic counselors come in. The necessity of interpretative experts highlights the difference between nude beaches and personal genomics: personal genomics has more potential societal impact. I know of the nudist/naturist phenomenon only tangentially, but it strikes me as similar to the broader New Age health movement. The focus is on individual health returns.  A colon cleanse simply does not have much of broader social effect. Yes, lest my nudist readers strike me down I do understand that there are purported positive social externalities, but set next to personal genomics nudism still strikes me as a fundamentally more individual activity whose benefits redound to the naked individuals, and not the broader clothed society. It does not pick my pocket nor break my leg if my neighbor is a weekend nudist. It is of no concern of mine (in contrast, my experience with public nudity is that it is generally disruptive when unexpected).

pgenI conceive of the social returns to personal genomics as a function of the proportion utilizing it will be defined by an s-curve. When only a few people have been genotyped your understanding of population-wide variation is still spotty. But as you increase your coverage you get a better sense of the variance within the population…but soon enough you enter the phase of diminishing returns.

Here’s a concrete example. In Reconstructing Indian History Reich et al. indicated that it is likely that South Asian castes are endogamous groups which will carry their own recessive risk alleles. In other words, Kayasthas from Bengal may have their own suite of recessive diseases, while Nadars from Tamil Nadu may have a totally different set of risk alleles. In a world with infinite NIH funding there would studies of Kayasthas and Nadars, and doctors and genetic counselors would be aware of what to look for in each group. We don’t live in a world with infinite NIH funding. Let’s assume that 1% of Indian Americans are Kayasthas from Bengal. That’s 30,000 people. If 5% of them make extensive use of personal genomics, then you have 1,500 people with a deep individual knowledge of their personal genomic profile, as well as a set of possible diagnoses or suites of symptons.

We’re still at the individual level. How does this matter on a social scale? Because with modern technology people can form communities online and get a sense of the nature of things from the “bottom-up.” Granted, the information gleaned isn’t going to go through peer review, and “irrational herds” can no doubt emerge. The bigger point is that sum can become more than the parts as motivated individuals pool information in a coordinated fashion. Once there’s some general insight extracted then that will flow to the rest of the group because of interpersonal networks.

This already happens with genealogy. A few highly motivated individuals dig deep into the archives to learn about their own personal history, and once they’ve retrieved the information they freely distribute what they’ve found to their relatives. From one perspective you could say that others are “free riding” on the passion and labor of a few, but you could also characterize this as a spillover effect or positive externality. The consequences of personal genomics are arguably much more substantive than traditional genealogy because of their potential health import.

Note that I’m emphasizing the social good here. Your sample size only needs to be so large to get a good sense of population-wide dynamics. More prosaically, as I noted before the Genomes Unzipped bloggers have opened a window into the genetics of their extended families. If Dienekes analysis is correct Joseph Pickrell and Vincent Plagnol likely have half-Jewish parents. Not that there’s anything wrong with that, though some people are still somewhat reluctant to acknowledge their Jewish heritage even today.

I assume that there will be individual utility as whole populations are sequenced over time. There’s more you can potentially learn by getting yourself sequenced even after ethnic-group or family level risks are ascertained (e.g., what are your distinctive alleles which are de novo mutations?). But this would simply be a classic increase of well being through summing of the parts. And here the analogy to the nude beach would be valid.

Image Credit: Pradeur

Powered by WordPress