Razib Khan One-stop-shopping for all of my content

July 11, 2017

23andMe ancestry only is $49.99 for Prime Day

Filed under: 23andMe,D.T.C. Personal Genomics,Personal genomics — Razib Khan @ 11:10 am


23andMe has gone below $50 for “Prime Day”! For those of us who bought kits (albeit more fully featured) at $399 or even more this is pretty incredible. But from what I’m to understand these sorts of SNP-chips are now possible to purchase from Illumina for well less than $50 so this isn’t charitable.

At minimum a way to get a raw genotype you can bank later.

June 27, 2017

Genome sequencing for the people is near

Filed under: Genomics,Personal genomics — Razib Khan @ 7:22 am

When I first began writing on the internet genomics was an exciting field of science. Somewhat abstruse, but newly relevant and well known due to the completion of the draft of the human genome. Today it’s totally different. Genomics is ubiquitous. Instead of a novel field of science, it is transitioning into a personal technology.

But life comes at you fast. For all practical purposes the $1,000 genome is here.

And yet we haven’t seen a wholesale change in medicine. What happened? Obviously a major part of it is polygenicity of disease. Not to mention that a lot of illness will always have a random aspect. People who get back a “clean” genome and live a “healthy” life will still get cancer.

Another issue is a chicken & egg problem. When a large proportion of the population is sequenced and phenotyped we’ll probably discover actionable patterns. But until that moment the yield is going to not be too impressive.

Consider this piece in MIT Tech, DNA Testing Reveals the Chance of Bad News in Your Genes:

Out of 50 healthy adults [selected from a random 100] who had their genomes sequenced, 11—or 22 percent—discovered they had genetic variants in one of nearly 5,000 genes associated with rare inherited diseases. One surprise is that most of them had no symptoms at all. Two volunteers had genetic variants known to cause heart rhythm abnormalities, but their cardiology tests were normal.

There’s another possible consequence of people having their genome sequenced. For participants enrolled in the study, health-care costs rose an average of $350 per person compared with a control group in the six months after they received their test results. The authors don’t know whether those costs were directly related to the sequencing, but Vassy says it’s reasonable to think people might schedule follow-up appointments or get more testing on the basis of their results.

Researchers worry about this problem of increased costs. It’s not a trivial problem, and one that medicine doesn’t have a response to, as patients often find a way to follow up on likely false positives. But it seems that this is a phase we’ll have to go through. I see no chance that a substantial proportion of the American population in the 2020s will not be sequenced.

June 12, 2017

10 million DTC dense marker genotypes by end of 2017?


Today I got an email from 23andMe that they’d hit the 2 million customer mark. Since they reached their goal of 1 million kits purchased the company seems to have taken its foot off the pedal of customer base growth to focus on other things (in particular, how to get phenotypic data from those who have been genotyped). In contrast Ancestry has been growing at a faster rate of late. After talking to Spencer Wells (who was there at the beginning of the birth of this sector) we estimated that the direct-to-consumer genotyping kit business is now north of 5 million individuals served. Probably closer to 6 or 7 million, depending on the numbers you assume for the various companies (I’m counting autosomal only).

This pretty awesome. Each of these firm’s genotype in the range of 100,000 to 1 million variant markers, or single nucleotide base pairs. 20 years ago this would have been an incredible achievement, but today we’re all excited about long-read sequencing from Oxford Nanopore. SNP-chips are almost ho-hum.

But though sequencing is the cutting edge, the final frontier and terminal technology of reading your DNA code, genotyping in humans will be around for a while because of cost. At ASHG last year a medical geneticist was claiming price points in bulk for high density SNP-chips are in the range of the low tens of dollars per unit. A good high coverage genome sequence is still many times more expensive (perhaps an order of magnitude ore more depending on who you believe). It also can impose more data processing costs than a SNP-chip in my experience.

Here’s a slide from Spencer:

I suspect genotyping will go S-shaped before 2025 after explosive growth in genotyping. Some people will opt-out. A minority of the population, but a substantial proportion. At the other extreme of the preference distribution you will have those who will start getting sequenced. Researchers will begin talk about genotyping platforms like they talk about microarrays (yes, I know at places like the Broad they already talk about genotyping like that, but we can’t all be like the Broad!).

Here’s an article from 2007 on 23andMe in Wired. They’re excited about paying $1,000 genotyping services…the cost now of the cheapest high quality (30x) whole genome sequences. Though 23andMe has a higher price point for its medical services, many of the companies are pushing their genotyping+ancestry below $100, a value it had stabilized at for a few years. Family Tree DNA has a father’s day sale for $69 right now. Ancestry looks to be $79. The Israel company MyHeritage is also pushing a $69 sale price (the CSO there is advertising that he’s hiring human geneticists, just so you know). It seems very likely that a $50 price point is within site in the next few years as SNP-chip costs become trivial and all the expenses are on the data storage/processing and visualization costs. I think psychologically for many people paying $50 is not cheap, but it is definitely not expensive. $100 feels expensive.

Ultimately I do wonder if I was a bit too optimistic that 50% of the US population will be sequenced at 30x by 2025. But the dynamic is quite likely to change rapidly because of a technological shift as the sector goes through a productivity uptick. We’re talking about exponential growth, which humans have weak intuition about….

Addendum: Go into the archives of Genomes Unzipped and reach the older posts. Those guys knew where we were heading…and we’re pretty much there.

April 7, 2017

Direct-to-consumer genomics, it’s back on!

Filed under: 23andMe,DTC,Genetics,Personal genomics — Razib Khan @ 8:11 am

The past three and a half years, and arguably longer, there has been something of a dark night passing over direct to consumer (DTC) personal genomics. The regulatory issues have been unclear to unfavorable. If you have read this blog you know 23andMe‘s saga with the Food and Drug Administration.

It looks like 2017 DTC is finally turning a regulatory corner, with some clarity and freedom to operate, FDA Opens Genetic Floodgates with 23andMe Decision:

Today, the U.S. Food and Drug Administration told gene-testing company 23andMe that it will be allowed to directly tell consumers whether their DNA puts them at higher risk for 10 different diseases, including late-onset Alzheimer’s disease and Parkinson’s.

The decision to allow these direct-to-consumer tests is a big vindication for 23andMe, which in 2013 was forced to cease marketing such results after the FDA said they could be inaccurate and risky to consumers, and that they required regulatory approval.

I still agree with my assessment in 2013, this won’t mean anything in the long run. DTC is here to stay, and if the decentralization of medical testing and services don’t happen in the USA, they’ll happen elsewhere, and at some point medical tourism will get cheap enough that any restrictions in this nation won’t be of relevance. But, this particular decision alters the timeline in the grand scheme of things, and matters a great deal for specific players.

It’s on!

March 23, 2017

Ancestry inference won’t tell you things you don’t care about (but could)

Filed under: Anthroplogy,Genetics,Genomics,Personal genomics — Razib Khan @ 5:59 pm

The figure above is from Noah Rosenberg’s relatively famous paper, Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure. The context of the publication is that it was one of the first prominent attempts to use genome-wide data on a various of human populations (specifically, from the HGDP data set) and attempt model-based clustering. There are many details of the model, but the one that will jump out at you here is that the parameter defines the number of putative ancestral populations you are hypothesizing. Individuals then shake out as proportions of each element, K. Remember, this is a model in a computer, and you select the parameters and the data. The output is not “wrong,” it’s just the output based how you set up the program and the data you input yourself.

These sorts of computational frameworks are innocent, and may give strange results if you want to engage in mischief. For example, let’s say that you put in 200 individuals, of whom 95 are Chinese, 95 are Swedish, and 10 are Nigerian. From a variety of disciplines we know to a good approximation that non-Africans form a monophyletic clade in relation to Africans (to a first approximation). In plain English, all non-Africans descend from a group of people who diverged from Africans more than 50,000 years ago. That means if you imagine two populations, the first division should be between Africans and non-Africans, to reflect this historical demography. But if you skew the sample size, as the program looks for the maximal amount of variation in the data set it may decide that dividing between Chinese and Swedes as the two ancestral populations is the most likely model given the data.

This is not wrong as such. As the number of Africans in the data converges on zero, obviously the dividing line is between Swedes and Chinese. If you overload particular populations within the data, you may marginalize the variation you’re trying to explore, and the history you’re trying to uncover.

I’ve written all of this before. But I’m writing this in context of the earlier post, Ancestry Inference Is Precise And Accurate(Ish). In that post I showed that consumers drive genomics firms to provide results where the grain of resolution and inference varies a lot as a function of space. That is, there is a demand that Northern Europe be divided very finely, while vast swaths of non-European continents are combined into one broad cluster.

Less than 5% Ancient North Eurasian

Another aspect though is time. These model-based admixture frameworks can implicitly traverse time as one ascends up and down the number of K‘s. It is always important to explain to people that the number of K‘s may not correspond to real populations which all existed at the same time. Rather, they’re just explanatory instruments which illustrate phylogenetic distance between individuals. In a well-balanced data set for humans K = 2 usually separates Africans from non-Africans, and K = 3 then separates West Eurasians from other populations. Going across K‘s it is easy to imagine that is traversing successive bifurcations.

A racially mixed man, 15% ANE, 30% CHG, 30% WHG, 30% EEF

But today we know that’s more complicated than that. Three years ago Pickrell et al. published Toward a new history and geography of human genes informed by ancient DNA, where they report the result that more powerful methods and data imply most human populations are relatively recent admixtures between extremely diverged lineages. What this means is that the origin of groups like Europeans and South Asians is very much like the origin of the mixed populations of the New World. Since then this insight has become only more powerful, as ancient DNA has shed light as massive population turnovers over the last 5,000 to 10,000 years.

These are to some extent revolutionary ideas, not well known even among the science press (which is too busy doing real journalism, i.e. the art of insinuation rather than illumination). As I indicated earlier direct-to-consumer genomics use national identities in their cluster labels because these are comprehensible to people. Similarly, they can’t very well tell Northern Europeans that they are an outcome of a successive series of admixtures between diverged lineages from the late Pleistocene down to the Bronze Age. Though Northern Europeans, like South Asians, Middle Easterners, Amerindians, and likely Sub-Saharan Africans and East Asians, are complex mixes between disparate branches of humanity, today we view them as indivisible units of understanding, to make sense of the patters we see around us.

Personal genomics firms therefore give results which allow for historically comprehensible results. As a trivial example, the genomic data makes it rather clear that Ashkenazi Jews emerged in the last few thousand years via a process of admixture between antique Near Eastern Jews, and the peoples of Western Europe. After the initial admixture this group became an endogamous population, so that most Ashkenazi Jews share many common ancestors in the recent past with other Ashkenazi Jews. This is ideal for the clustering programs above, as Ashkenazi Jews almost always fit onto a particular K with ease. Assuming there are enough Ashkenazi Jews in your data set you will always be able to find the “Jewish cluster” as you increase the value.

But the selection of a K which satisfies this comprehensibility criterion is a matter of convenience, not necessity. Most people are vaguely aware that Jews emerged as a people at a particular point in history. In the case of Ashkenazi Jews they emerged rather late in history. At certain K‘s Ashkenazi Jews exhibit mixed ancestral profiles, placing them between Europeans and Middle Eastern peoples. What this reflects is the earlier history of the ancestors of Ashkenazi Jews. But for most personal genomics companies this earlier history is not something that they want to address, because it doesn’t fit into the narrative that their particular consumers want to hear. People want to know if they are part-Jewish, not that they are part antique Middle Eastern and Southwest European.

Perplexment of course is not just for non-scientists. When Joe Pickrell’s TreeMix paper came out five years ago there was a strange signal of gene flow between Northern Europeans and Native Americans. There was no obvious explanation at the time…but now we know what was going on.

It turns out that Northern Europeans and Native Americans share common ancestry from Pleistocene Siberians. The relationship between Europeans and Native Americans has long been hinted at in results from other methods, but it took ancient DNA for us to conceptualize a model which would explain the patterns we were seeing.

An American with recent Amerindian (and probably African) ancestry

But in the context of the United States shared ancestry between Europeans and Native Americans is not particularly illuminating. Rather, what people want to know is if they exhibit signs of recent gene flow between these groups, in particular, many white Americans are curious if they have Native American heritage. They do not want to hear an explanation which involves the fusion of an East Asian population with Siberians that occurred 15,000 to 20,000 years ago, and then the emergence of Northern Europeans thorough successive amalgamations between Pleistocene, Neolithic, and Bronze Age, Eurasians.

In some of the inference methods Northern Europeans, often those with Finnic ancestry or relationship to Finnic groups, may exhibit signs of ancestry from the “Native American” cluster. But this is almost always a function of circumpolar gene flow, as well as the aforementioned Pleistocene admixtures. One way to avoid this would be to simply not report proportions which are below 0.5%. That way, people with higher “Native American” fractions would receive the results, and the proportions would be high enough that it was almost certainly indicative of recent admixture, which is what people care about.

Why am I telling you this? Because many journalists who report on direct-to-consumer genomics don’t understand the science well enough to grasp what’s being sold to the consumer (frankly, most biologists don’t know this field well either, even if they might use a barplot here and there).

And, the reality is that consumers have very specific parameters of what they want in terms of geographic and temporal information. They don’t want to be told true but trivial facts (e.g., they are Northern European). But neither they do want to know things which are so novel and at far remove from their interpretative frameworks that they simply can’t digest them (e.g., that Northern Europeans are a recent population construction which threads together very distinct strands with divergent deep time histories). In the parlance of cognitive anthropology consumers want their infotainment the way they want their religion, minimally counterintuitive. Consume some surprise. But not too much.

November 28, 2013

The total information world

Filed under: Personal genomics — Razib Khan @ 1:46 pm

Credit: Cryteria

Credit: Cryteria

Happy Thanksgiving (if you are an American)!

It’s been a busy few days in the world of personal genomics. By coincidence I have a coauthored comment in Genome Biology out, Rumors of the death of consumer genomics are greatly exaggerated (it was written and submitted a while back). If you haven’t, please read the FDA’s letter, and 23andMe’s response, as much as there is one right now. Since Slate ran my piece on Monday a lot of people have offered smart, and more well informed, takes. On the one hand you have someone like Alex Tabarrok, with “Our DNA, Our Selves”, which is close to a libertarian cri de coeur. Then you have cases like Christine Gorman, “FDA Was Right to Block 23andMe”. It will be no surprise that I am much closer to Tabarrok than I am to Gorman (she doesn’t even seem to be aware that 23andMe offers a genotyping, not sequencing, service, though fuzziness on the details doesn’t discourage strong opinions from her). An interesting aspect is that many who are not deeply in the technical weeds of the issue are exhibiting politicized responses. I’ve noticed this on Facebook, where some seem to think that 23andMe and the Tea Party have something to do with each other, and the Obama administration and the FDA are basically stand-ins. In other words, some liberals are seeing this dispute as one of those attempts to evade government regulation, something they support on prior grounds. Though Tabarrok is more well informed than the average person (his wife is a biologist), there are others from the right-wing who are taking 23andMe’s side on normative grounds as well. Ultimately I’m not interested in this this argument, because it’s not going to have any significant lasting power. No one will remember in 20 years. As I implied in my Slate piece 23andMe the company now is less interesting than personal genomics the industry sector in the future. Over the long term I’m optimistic that it will evolve into a field which impacts our lives broadly. Nothing the United States government can do will change that.

Yet tunneling down to the level of 23andMe’s specific issues with the regulatory process, there is the reality that it has to deal with the US government and the FDA, no matter what the details of its science are. It’s a profit-making firm. Matt Herper has a judicious take on this, 23andStupid: Is 23andMe Self-Destructing? I don’t have any “inside” information, so I’m not going to offer the hypothesis that this is part of some grand master plan by Anne Wojcicki. I hope it is, but that’s because I want 23andMe to continue to subsidize genotyping services (I’ve heard that though 23andMe owns the machines, the typing is done by LabCorp. And last I checked the $99 upfront cost is a major loss leader; they’re paying you to get typed). I’m afraid that they goofed here, and miscalculated. As I said above, it won’t make a major difference in the long run, but I have many friends who were waiting until this Christmas to purchase kits from 23andMe.


Then there are “the scientists,” or perhaps more precisely the genoscenti. Matt Herper stated to the effect that the genoscenti have libertarian tendencies, and I objected. In part because I am someone who has conservative and/or libertarian tendencies, and I’m pretty well aware that I’m politically out of step with most individuals deeply involved in genetics, who are at most libertarian-leaning moderate liberals, and more often conventional liberal Democrats. Michael Eisen has a well thought out post, FDA vs. 23andMe: How do we want genetic testing to be regulated? Eisen doesn’t have a political ax to grind, and is probably representative of most working geneticists in the academy (he is on 23andMe’s board, but you should probably know that these things don’t mean that much). I may not know much about the FDA regulatory process, but like many immersed in genomics I’m well aware that many people talking about these issues don’t know much about the cutting edge of the modern science. Talk to any geneticist about conversations with medical doctors and genetic counselors, and they will usually express concern that these “professionals” and “gatekeepers” are often wrong, unclear, or confused, on many of the details. A concrete example, when a friend explained to a veteran genetic counselor how my wife used pedigree information combined with genomic data to infer that my daughter did not have an autosomally dominant condition, the counselor asserted that you can’t know if there were two recombination events within the gene, which might invalidate these inferences. Though my friend was suspicious, they did not say anything, because they were not a professional. As a matter of fact there just aren’t enough recombinations across the genome for an intra-genic event to be a likely occurrence (also, recombination likelihood is not uniformly distributed, and not necessarily independent, insofar as there may be suppression of very close events). And this was a very well informed genetic counselor.

Additionally, there are the two major objections to 23andMe’s service which some on Twitter have been pointing me to. First, they return results which are highly actionable. The FDA explicitly used the example of a woman who goes and gets a mastectomy due to a 23andMe result. I don’t think this is a very strong objection. No doctor would perform a mastectomy based on 23andMe results. So that’s not an issue. Then there are those who allude to psychological harm. This could be a problem, but 23andMe has multiple steps which make it so that you have to proactively find out information on these sorts of diseases. Call me a libertarian if you will, but I object on principle to the idea that medical professionals necessarily have to be involved in the dissemination of information about my own genome as a regulatory matter. Obviously when it comes to a course of treatment they will be consulted, and no doubt there will be replications of any actionable results. But I don’t trust the medical sector to be populated by angels. To illustrate why I don’t trust medical professionals to always behave out of the goodness of their hearts, consider that deaths from hospital infections started dropping sharply when Medicare stopped paying for treating these infections. Workers in the health care sector do care about patients, but even here incentives matter, and the human cognitive budget is such that they can shift the outcomes greatly by reminding nurses and doctors that washing hands is going to impact the bottom line (the reality is that hospitals probably instituted much stricter measures). What does this have to do with personal genomics? You are our own best advocate, and one of the major reasons that those in higher socioeconomic strata have better health outcomes is that they are so much less passive as patients. The more detailed the information you have on your own health, the better you can advocate and be involved in the decision-making process. And the reality is that with dropping prices in sequencing, and the ability to design software to do interpretation, without draconian measures there’s almost nothing the United States government will be able to do to prevent anyone with a moderate amount of motivation from getting this sort of information.

A second objection is that the SNPs returned are of small and very probabilistic effect. This is embedded in issues regarding the “missing heritability” and the reality that most complex diseases are due to many factors. Because of the small effect size, and until recently, small sample sizes, this literature has been littered with false positives, which passed arbitrary statistical thresholds. The argument then boils down to the reality that 23andMe in many cases is not really adding any informative value. If that’s the case though then why the urgency to regulate it? Horoscopes and diet books do not add informative value either. This problem with small effect SNPs is widely known, so bringing it up as if it is revelatory is rather strange to me. Additionally, as Eric Lander and others have pointed out the locus which helped us discover statins is of very small effect. As long as they’re not false positives, small-effect SNPs are likely a good way to go in understanding biological pathways for pharmaceutical products. But that doesn’t speak to the risk prediction models. I think there the possibilities are murkier even in the long run because complex traits are complex. Even if we have massive GWAS with sample sizes in the millions and 100x whole-genome coverage (this will happen), the environmental factors may still be significant enough that researchers may balk at definitive risk predictions.

Ultimately where I think personal genomics is going is alluded to in the Genome Biology piece: as part of a broader suite of information services, probably centralized, filtered, and curated, by a helper artificial intelligence. What cognitive science and behavioral economics are telling us is that individuals operate under mental budget constraints. Dan MacArthur is probably right that personal genomics enthusiasts overestimated how involved the average person on the street was going to want to get in terms of their own interpretations of returned results. The reality is that even genetic counselors can barely keep up. Someday the field will stabilize, but this is not that day. But overall the information overload is going to get worse and worse, not better, and where the real upside, and game-changer, will be is in the domain of computational tools which helps us make decisions with a minimum of effort. A cartoon model of this might be an artificial intelligence which talks to you through an ear-bud all day, and takes your genomic, epigenomic, and biomarker status into account when advising you on whether you should pass on the dessert. But to get from here to there is going to require innovation. The end point is inevitable, barring a collapse of post-industrial civilization. The question is where it is going to happen. Here in the United States we have the technology, but we also have cultural and institutional road-blocks to this sort of future. If those road-blocks are onerous enough it doesn’t take a genius to predict that high-tech lifestyle advisement firms, whose aims are to replace the whole gamut of self-help sectors with rationally designed applications and appliances, will simply decamp to Singapore or Dubai.

Personal genomics is a small piece of that. And 23andMe is a small piece of personal genomics. But they are not trivial pieces.

The post The total information world appeared first on Gene Expression.

November 25, 2013

The FDA and 23andMe

Filed under: Personal genomics — Razib Khan @ 10:30 am

napFirst, download your 23andMe raw results now if you have them. If you don’t know what’s going on, the FDA has finally started to move aggressively against the firm. Unfortunately this is not surprising, as this was foreshadowed years ago. And, 23andMe has been moving aggressively to emphasize its medical, as opposed to genealogical, services over the past year. But this isn’t the story of one firm. This is the story of government response to very important structural shifts occurring in the medical delivery system of the United States. The government could potentially bankrupt 23andMe, but taking a step back that would still be like the RIAA managing to take down Napster. The information is coming, and if there’s one thing that can overpower state planning it is consumer demand. Unless the US government wants to ban their citizens from receiving their own genetic data they’re just putting off the inevitable outsourcing of various interpretation services. Engagement would probably be the better long term bet, but I don’t see that happening.

The post The FDA and 23andMe appeared first on Gene Expression.

November 7, 2013

The future always advances

Filed under: DTC personal genomics,Personal genomics — Razib Khan @ 12:56 am

The last week has seen a lot of chatter about the slapping down of the diagnostic patent by Sequenom, Judge Invalidates Patent for a Down Syndrome Test:

A federal judge has invalidated the central patent underlying a noninvasive method of detecting Down syndrome in fetuses without the risk of inducing a miscarriage.

The ruling is a blow to Sequenom, a California company that introduced the first such noninvasive test in 2011 and has been trying to lock out competitors in a fast-growing market by claiming they infringe on the patent.

Sequenom’s stock fell 23 percent on Thursday, to $1.92.

The judge, Susan Illston of the United States District Court in Northern California, issued a ruling on Wednesday that the patent was invalid because it covered a natural phenomenon — the presence of DNA from the fetus in the mother’s blood.

The existence of intellectual property is a utilitarian one. That is, these are institutions which are meant to further the cause of creativity and innovation. Is there going to be an abandonment in this domain of the push toward technological innovation? Coincidentally in the last week of October Sequenom put out a press release which heralded some advances in its panel:

…The MaterniT21 PLUS test will begin reporting additional findings for the presence of subchromosomal microdeletions and autosomal trisomies for chromosomes 16 and 22, in addition to the previously announced additional findings for sex chromosome aneuploidies involving an abnormal number of the X or Y chromosomes. These additional findings complement the MaterniT21 PLUS test core identification of trisomies for chromosome 21, chromosome 18 and chromosome 13. With this expansion, the MaterniT21 PLUS test is the first-of-its-kind noninvasive prenatal technology (NIPT) to provide these comprehensive results from a maternal blood draw.

Sequenom Laboratories will begin reporting on these select, clinically relevant microdeletions, including 22q11.2 deletion syndrome (DiGeorge), Cri-du-chat syndrome, Prader-Willi/Angelman syndrome, 1p36 deletion syndrome, as well as trisomies 16 and 22 the last week of October. Results from a method validation study….

It seems that the firm’s main path to profit and riches is going to be to innovate faster, gain market share, brand recognition, and economies of scale. This seems as if it is a greater good for the public than its rents extracted through intellectual property monopolies.

The post The future always advances appeared first on Gene Expression.

December 18, 2012

Buddy, can you spare some ascertainment?

The above map shows the population coverage for the Geno 2.0 SNP-chip, put out by the Genographic Project. Their paper outlining the utility and rationale by the chip is now out on arXiv. I saw this map last summer, when Spencer Wells hosted a webinar on the launch of Geno 2.0, and it was the aspect which really jumped out at me. The number of markers that they have on this chip is modest, only >100,000 on the autosome, with a few tens of thousands more on the X, Y, and mtDNA. In contrast, the Axiom® Genome-Wide Human Origins 1 Array Plate being used by Patterson et al. has ~600,000 SNPs. But as is clear by the map above Geno 2.0 is ascertained in many more populations that the other comparable chips (Human Origins 1 Array uses 12 populations). It’s obvious that if you are only catching variation on a few populations, all the extra million markers may not give you much bang for the buck (not to mention the biases that that may introduce in your population genetic and phylogenetic inferences).


To the left are the list of populations against which the Human Origins 1 Array was ascertained, and they look rather comprehensive to me. In contrast, for Geno 2.0 ‘ancestrally informative markers’ were ascertained on 450 populations. The ultimate question for me is this: is all the extra ascertainment on diverse and obscure groups worth it? On first inspection Geno 2.0′s number of SNPs looks modest as I stated, but in my experience when you quality control and merge different panels together you are often left with only a few hundred thousand SNPs in any case. 100-200,000 SNPs is also sufficient to elucidate relationships even in genetically homogeneous regions such as Europe in my experience (it’s more than enough for model-based clustering, and seems to be overkill for MDS or PCA). One issue that jumps out at me about the Affymetrix chip is that it is ascertained toward the antipodes. In contrast, Geno 2.0 takes into account the Eurasian heartland. I suspect, for example, that Geno 2.0 would be better for population or ancestry assignment for South Asians because it would have more informative markers for those populations.

Ultimately I can’t really say much more until I use both marker sets in different and similar contexts. Since Geno 2.0 consciously excludes many functional and medically relevant SNPs its utility is primarily in the domain of demographics and history. If the populations in question are well covered by the Human Origins 1 Array, I see no reason why one shouldn’t go with it. Not only does it have more information about biological function, but the number of markers are many fold greater. On the other hand, Geno 2.0 may be more useful on the “blank zones” of the Affy chip. Hopefully the Genographic Project results paper for Geno 2.0 will come out soon and I can pull down their data set and play with it.

Cite: arXiv:1212.4116

December 17, 2012

Buyer beware in ancestry testing!

Filed under: Personal genomics — Razib Khan @ 10:20 pm

Over at Genomes Unzipped Vincent Plagnol has put up a post, Exaggerations and errors in the promotion of genetic ancestry testing, which to my mind is an understated and soft-touch old-fashioned “fisking” of the pronouncements of a spokesperson for an outfit termed Britain’s DNA. The whole post is worth reading, but this is a very grave aspect of the response of the company:

…The main reason is that listening to this radio interview prompted my UCL colleagues David Balding and Mark Thomas to ask questions to the Britain’s DNA scientific team; the questions have not been satisfactorily answered. Instead, a threat of legal action was issued by solicitors for Mr Moffat. Any type of legal threat is an ominous sign for an academic debate. This motivated me to point out some of the incorrect, or at the very least exaggerated, statements made in this interview. Importantly, while I received comments from several people for this post, the opinion presented here is entirely mine and does not involve any of my colleagues at Genomes Unzipped.

From what I can gather this firm is charging two to three times more than 23andMe for state-of-the-art scientific genealogy, circa 2002. So if you can’t be bothered to read the piece, it looks like Britain’s DNA is threatening litigation for researchers having the temerity to point out that the firm is providing substandard services at above-market costs. Plagnol’s critique lays out point-by-point refutation of assertions, but the interpretation services on offer seem to resemble nothing more than genetically rooted epic fantasy. A triumph of marketing over science.


In other scientific genealogy news, a friend recent sent me results for his family from Ancestry.com’s AncestryDNA service. Looking at the pie-charts, I can say one thing: they were whack! But the question then is are they truly just whack, or does their peculiarity indicate real genetic insight? I have no way to judge, because they still aren’t providing raw data downloads, though they promise to soon. I actually talked to a scientist from Ancestry.com for a little while at ASHG 2012, and he claimed that they were tweaking the algorithms as as we were speaking. Nevertheless, bizarre results still seem to abound. It would be nie to figure out the method to this madness.

Finally, the genomic angle to the Dan MacArthur → Dan MacCurry saga is approaching closure. My friend Zack Ajmal promises to put up his analysis before he goes on vacation. I asked Zack to look into the matter because he has a very large database of South Asians, and I want to see if he could find the best match to Dan’s chromosome 10. If it does turn out that it is highly probable that Dan’s South Asian ancestry is Bengali, then I’ll have to make sure he’s introduced to the aloo bhorta which his ancestors no doubt relished (and which is unpalatable to people of other South Asian ethnic groups because of the mustard oil).

December 11, 2012

$99 for 1 million markers

Filed under: 23andMe,Personal genomics — Razib Khan @ 8:18 am

Looks like 23andMe has a new $99 price point. If so, that’s 100 markers per cent! (here’s the press release)

1) Privacy: Yes, this a privacy risk. 23andMe is fundamentally an IT company, and IT companies mess up. But I am confident that within 10-15 years genetic information is going to be pretty easy to get anyhow. Your data will be in too many places for any expectation of privacy.

2) Cost/worth it: That is dependent on your income. If you are willing to spend $100 on a nice meal, I think $100 for 1 million markers is an excellent proposition. The markers never depreciate, though in the near future you will you get sequence data which will supersede them.

$99 for 1 million markers

Filed under: 23andMe,Personal genomics — Razib Khan @ 8:18 am

Looks like 23andMe has a new $99 price point. If so, that’s 100 markers per cent! (here’s the press release)

1) Privacy: Yes, this a privacy risk. 23andMe is fundamentally an IT company, and IT companies mess up. But I am confident that within 10-15 years genetic information is going to be pretty easy to get anyhow. Your data will be in too many places for any expectation of privacy.

2) Cost/worth it: That is dependent on your income. If you are willing to spend $100 on a nice meal, I think $100 for 1 million markers is an excellent proposition. The markers never depreciate, though in the near future you will you get sequence data which will supersede them.

December 10, 2012

Is Daniel MacArthur ‘desi’?

My initial inclination in this post was to discuss a recent ordering snafu which resulted in many of my friends being quite peeved at 23andMe. But browsing through their new ‘ancestry composition’ feature I thought I had to discuss it first, because of some nerd-level intrigue. Though I agree with many of Dienekes concerns about this new feature, I have to admit that at least this method doesn’t give out positively misleading results. For example, I had complained earlier that ‘ancestry painting’ gave literally crazy results when they weren’t trivial. It said I was ~60 percent European, which makes some coherent sense in their non-optimal reference population set, but then stated that my daughter was >90 percent European. Since 23andMe did confirm she was 50% identical by descent with me these results didn’t make sense; some readers suggested that there was a strong bias in their algorithms to assign ambiguous genomic segments to ‘European’ heritage (this was a problem for East Africans too).

Here’s my daughter’s new chromosome painting:

One aspect of 23andMe’s new ancestry composition feature is that it is very Eurocentric. But, most of the customers are white, and presumably the reference populations they used (which are from customers) are also white. Though there are plenty of public domain non-white data sets they could have used, I assume they’d prefer to eat their own data dog-food in this case. But that’s really a minor gripe in the grand scheme of things. This is a huge upgrade from what came before. Now, it’s not telling me, as a South Asian, very much. But, it’s not telling me ludicrous things anymore either!

But in regards to omission I am curious to know why this new feature rates my family as only ~3% East Asian, when other analyses put us in the 10-15% range. The problem with very high values is that South Asians often have some residual ‘eastern’ signal, which I suspect is not real admixture, but is an artifact. Nevertheless, northeast Indians, including Bengalis, often have genuine East Asia admixture. On PCA plots my family is shifted considerably toward East Asians. The signal they are picking up probably isn’t noise. Almost every apportionment of East Asian ancestry I’ve seen for my family yields a greater value for my mother, and that holds here. It’s just that the values are implausibly low.

In any case, that’s not the strangest thing I saw. I was clicking around people who I had “shared” genomes with, and I stumbled upon this:

As you can guess from the screenshot this is Daniel MacArthur’s profile. And according to this ~25% of chromosome 10 is South Asian! On first blush this seemed totally nonsensical to me, so I clicked around other profiles of people of similar Northern European background…and I didn’t see anything equivalent.

What to do? It’s going to take more evidence than this to shake my prior assumptions, so I downloaded Dr. MacArthur’s genotype. Then I merged it with three HapMap populations, the Utah whites (CEU), the Gujaratis (GIH), and the Chinese from Denver (CHD). The last was basically a control. I pulled out chromosome 10. I also added Dan’s wife Ilana to the data set, since I believe she got typed with the same Illumina chip, and is of similar ethnic background (i.e., very white). It is important to note that only 28,000 SNPs remained in the data set. But usually 10,000 is more than sufficient on SNP data for model-based clustering with inter-continental scale variation.

I did two things:

1) I ran ADMIXTURE at K = 3, unsupervised

2) I ran an MDS, which visualized the genetic variation in multiple dimensions

Before I go on, I will state what I found: these methods supported the inference from 23andMe, on chromosome 10 Dr. MacArthur seems to have an affinity with South Asians (i.e., this is his ‘curry chromosome’). Here are the average (median) values in tabular format, with MacArthur and his wife presented for comparison.

ADMIXTURE results for chromosome 10
K 1 K 2 K 3
CEU 0.04 0.02 0.93
GIH 0.87 0.05 0.08
CHD 0.01 0.97 0.01
Daniel MacArthur 0.29 0.07 0.64
Ilana Fisher 0.01 0.06 0.94

You probably want a distribution. Out of the non-founder CEU sample none went above 20% South Asian. Though it did surprise me that a few were that high, making it more plausible to me that MacArthur’s results on chromosome 10 were a fluke:

And here’s the MDS with the two largest dimensions:

Again, it’s evident that this chromosome 10 is shifted toward South Asians. If I had more time right now what I’d do is probably get that specific chromosomal segment, phase it, and then compare it to various South Asian populations. But I don’t have time now, so I went and checked out the results from the Interpretome. I cranked up the settings to reduce the noise, and so that it would only spit out the most robust and significant results. As you can see, again chromosome 10 comes up as the one which isn’t quite like the others.

Is there is a plausible explanation for this? Perhaps Dr. MacArthur can call up a helpful relative? From what  recall his parents are immigrants from the United Kingdom, and it isn’t unheard of that white Britons do have South Asian ancestry which dates back to the 19th century. Though to be totally honest I’m rather agnostic about all this right now. This genotype has been “out” for years now, so how is it that no one has noticed this peculiarity??? Perhaps the issue is that everyone was looking at the genome wide average, and it just doesn’t rise to the level of notice? What I really want to do is look at the distribution of all chromosomes and see how Daniel MacArthur’s chromosome 10 then stacks up. It might be a random act of nature yet.

Also, I guess I should add that at ~1.5% South Asian that would be consistent with one of MacArthur’s great-great-great-great grandparents being Indian. Assuming 25 year generation times that puts them in the mid-19th century. Of course, at such a low proportion the variance is going to be high, so it is quite possible that you need to push the real date of admixture one generation back, or one generation forward.

November 30, 2012

Can your genes be patented?

Filed under: Intellectual property,Myriad Genetics,Personal genomics — Razib Khan @ 7:30 pm

Court to Decide if Human Genes Can Be Patented. So it seems a group of middle aged to very aged lawyers will decide the decades long Myriad Genetics saga. My position on this issue is simple: if you are going to award patents, they must be awarded to acts of engineering, not discoveries of science. See Genomics Law Report for more well informed commentary.

November 22, 2012

Back to crunching personal genomic data

Filed under: Personal genomics — Razib Khan @ 12:54 am

Many months ago I told some of my friends that I’d run analyses of their 23andMe data, and report it back to them. A year ago I made the same promise to some of my readers. But life got in the way, and I’ve been very busy. I’m working on scripts to make the whole process efficient for me (if you want to know, I’m trying to get the output to be easy to merge many runs with CLUMPP and then produce DISTRUCT type outputs; I’ve done this with other Admixture outputs, but for various reasons the labeling gets messed up with my ‘personal’ project). But I’ve decided to at least start pushing some of the results live. I won’t be putting it in this space, probably razib.com. But I thought I would get your attention first. I know a lot of ID’s are missing, but I’ll add them later when I can find anything. And yes, I need to get back to African Ancestry too (that site was infested with a backdoor, so I had to yank it). This is all rather basic stuff, but I just don’t have the time to do things in a manual fashion, and the scripts I have for population sets don’t transfer over when I want to give individual friend results as well as population results.

The results in tabular format are here. And all individual results are here. In terms of the tech details, ~140,000 SNPs, ~3000 total individuals in the data set, at K = 11. I will probably be reporting K = 12 to K = 25 from now on (I’m just going to get 10> replicates and merge them).

November 5, 2012

It takes a village, and guidelines

Filed under: Personal genomics — Razib Khan @ 11:46 am

A week ago I posted on a rather scary case of medical doctors withholding information from a family because they felt that it was in the best interests of the family. I objected mostly because I don’t have a good feeling about this sort of paternalism. Laura Hercher has a follow up. She’s not offering just her opinion, but she actually made some calls to people who were involved in the case. From what I can gather in her post the issue that triggered this outrage (in my opinion, it’s an outrage) is that for these particular tests informed consent was simply not mandatory. Since they didn’t have the consent a priori, the doctors had to go with their judgement.

The reality here is that there isn’t a good solution. That’s because we’re not talking about science, we’re talking about values. The behavior of the medical doctors, withholding information which has serious life consequences, is still objectionable and unacceptable to me. But that’s me. I have a strong bias toward more information, and from all the social science data I’ve seen most people do too. And yet not everyone. Doctors are not mind readers, and they couldn’t consult the ...

It takes a village, and guidelines

Filed under: Personal genomics — Razib Khan @ 11:46 am

A week ago I posted on a rather scary case of medical doctors withholding information from a family because they felt that it was in the best interests of the family. I objected mostly because I don’t have a good feeling about this sort of paternalism. Laura Hercher has a follow up. She’s not offering just her opinion, but she actually made some calls to people who were involved in the case. From what I can gather in her post the issue that triggered this outrage (in my opinion, it’s an outrage) is that for these particular tests informed consent was simply not mandatory. Since they didn’t have the consent a priori, the doctors had to go with their judgement.

The reality here is that there isn’t a good solution. That’s because we’re not talking about science, we’re talking about values. The behavior of the medical doctors, withholding information which has serious life consequences, is still objectionable and unacceptable to me. But that’s me. I have a strong bias toward more information, and from all the social science data I’ve seen most people do too. And yet not everyone. Doctors are not mind readers, and they couldn’t consult the ...

October 27, 2012

A golden age of sibling comparisons

Filed under: Genomics,Personal genomics — Razib Khan @ 6:09 pm

Image credit: Assumption-Free Estimation of Heritability from Genome-Wide Identity-by-Descent Sharing between Full Siblings

I really love the paper Assumption-Free Estimation of Heritability from Genome-Wide Identity-by-Descent Sharing between Full Siblings. I first read it about six years ago. The result is rather straightforward, but the problem is empirically a moderately deep one. Modern analytic genetics as the fusion between Mendelism and biometrics began with R. A. Fisher’s The Correlation between Relatives on the Supposition of Mendelian Inheritance in 1918. But note, that paper assumed particular relatedness between relatives. As highlighted in the above paper the expected values for most categories of relatedness always had a variance component which was unaccounted for, and so reduced the power of the methodology to ascertain the extent of heritability. The relatedness you can expect between any two siblings is ~0.50, and that is also the average across all siblings. But the reality is that in most cases two given siblings will not share half their genes identical by descent.


Before the genomic era the precise assessment of relatedness in a given case was laborious, if at all feasible. But not so today. Because modern ...

October 21, 2012

One girl, one exome

Filed under: Open science,Personal genomics — Razib Khan @ 11:22 pm

Interesting story in The San Jose Mercury News, Open-source science helps San Carlos father’s genetic quest:

“We used materials that are public, freely available,” said Rienhoff, a physician and scientist, as Beatrice frolicked nearby. “And everything we’ve learned we’ve put back out there, in the public domain. It’s for the patient’s good, and the public good.”

Born with small, weak muscles, long feet and curled fingers, Beatrice confounded all the experts.

No one else in her family had such a syndrome. In fact, apparently no one else in the world did either.

Rienhoff — a biotech consultant trained in math, medicine and genetics at Harvard, Johns Hopkins and the Fred Hutchinson Cancer Research Center in Seattle — launched a search.

He combed the publicly available medical literature, researching diseases, while jotting down each new clue or theory. Because her ailment is so rare, he knew no big labs or advocacy groups would be interested.


In the end, basically he compared his daughter’s exome to that of everyone else in the family. By comparing in such a fashion he managed to zero in on a possible causal mutation. This is awesome (the fact that now they know, not the mutation itself). In the near ...

One girl, one exome

Filed under: Open science,Personal genomics — Razib Khan @ 11:22 pm

Interesting story in The San Jose Mercury News, Open-source science helps San Carlos father’s genetic quest:

“We used materials that are public, freely available,” said Rienhoff, a physician and scientist, as Beatrice frolicked nearby. “And everything we’ve learned we’ve put back out there, in the public domain. It’s for the patient’s good, and the public good.”

Born with small, weak muscles, long feet and curled fingers, Beatrice confounded all the experts.

No one else in her family had such a syndrome. In fact, apparently no one else in the world did either.

Rienhoff — a biotech consultant trained in math, medicine and genetics at Harvard, Johns Hopkins and the Fred Hutchinson Cancer Research Center in Seattle — launched a search.

He combed the publicly available medical literature, researching diseases, while jotting down each new clue or theory. Because her ailment is so rare, he knew no big labs or advocacy groups would be interested.


In the end, basically he compared his daughter’s exome to that of everyone else in the family. By comparing in such a fashion he managed to zero in on a possible causal mutation. This is awesome (the fact that now they know, not the mutation itself). In the near ...

Older Posts »

Powered by WordPress