Razib Khan One-stop-shopping for all of my content

November 28, 2013

The total information world

Filed under: Personal genomics — Razib Khan @ 1:46 pm

Credit: Cryteria

Credit: Cryteria

Happy Thanksgiving (if you are an American)!

It’s been a busy few days in the world of personal genomics. By coincidence I have a coauthored comment in Genome Biology out, Rumors of the death of consumer genomics are greatly exaggerated (it was written and submitted a while back). If you haven’t, please read the FDA’s letter, and 23andMe’s response, as much as there is one right now. Since Slate ran my piece on Monday a lot of people have offered smart, and more well informed, takes. On the one hand you have someone like Alex Tabarrok, with “Our DNA, Our Selves”, which is close to a libertarian cri de coeur. Then you have cases like Christine Gorman, “FDA Was Right to Block 23andMe”. It will be no surprise that I am much closer to Tabarrok than I am to Gorman (she doesn’t even seem to be aware that 23andMe offers a genotyping, not sequencing, service, though fuzziness on the details doesn’t discourage strong opinions from her). An interesting aspect is that many who are not deeply in the technical weeds of the issue are exhibiting politicized responses. I’ve noticed this on Facebook, where some seem to think that 23andMe and the Tea Party have something to do with each other, and the Obama administration and the FDA are basically stand-ins. In other words, some liberals are seeing this dispute as one of those attempts to evade government regulation, something they support on prior grounds. Though Tabarrok is more well informed than the average person (his wife is a biologist), there are others from the right-wing who are taking 23andMe’s side on normative grounds as well. Ultimately I’m not interested in this this argument, because it’s not going to have any significant lasting power. No one will remember in 20 years. As I implied in my Slate piece 23andMe the company now is less interesting than personal genomics the industry sector in the future. Over the long term I’m optimistic that it will evolve into a field which impacts our lives broadly. Nothing the United States government can do will change that.

Yet tunneling down to the level of 23andMe’s specific issues with the regulatory process, there is the reality that it has to deal with the US government and the FDA, no matter what the details of its science are. It’s a profit-making firm. Matt Herper has a judicious take on this, 23andStupid: Is 23andMe Self-Destructing? I don’t have any “inside” information, so I’m not going to offer the hypothesis that this is part of some grand master plan by Anne Wojcicki. I hope it is, but that’s because I want 23andMe to continue to subsidize genotyping services (I’ve heard that though 23andMe owns the machines, the typing is done by LabCorp. And last I checked the $99 upfront cost is a major loss leader; they’re paying you to get typed). I’m afraid that they goofed here, and miscalculated. As I said above, it won’t make a major difference in the long run, but I have many friends who were waiting until this Christmas to purchase kits from 23andMe.


Then there are “the scientists,” or perhaps more precisely the genoscenti. Matt Herper stated to the effect that the genoscenti have libertarian tendencies, and I objected. In part because I am someone who has conservative and/or libertarian tendencies, and I’m pretty well aware that I’m politically out of step with most individuals deeply involved in genetics, who are at most libertarian-leaning moderate liberals, and more often conventional liberal Democrats. Michael Eisen has a well thought out post, FDA vs. 23andMe: How do we want genetic testing to be regulated? Eisen doesn’t have a political ax to grind, and is probably representative of most working geneticists in the academy (he is on 23andMe’s board, but you should probably know that these things don’t mean that much). I may not know much about the FDA regulatory process, but like many immersed in genomics I’m well aware that many people talking about these issues don’t know much about the cutting edge of the modern science. Talk to any geneticist about conversations with medical doctors and genetic counselors, and they will usually express concern that these “professionals” and “gatekeepers” are often wrong, unclear, or confused, on many of the details. A concrete example, when a friend explained to a veteran genetic counselor how my wife used pedigree information combined with genomic data to infer that my daughter did not have an autosomally dominant condition, the counselor asserted that you can’t know if there were two recombination events within the gene, which might invalidate these inferences. Though my friend was suspicious, they did not say anything, because they were not a professional. As a matter of fact there just aren’t enough recombinations across the genome for an intra-genic event to be a likely occurrence (also, recombination likelihood is not uniformly distributed, and not necessarily independent, insofar as there may be suppression of very close events). And this was a very well informed genetic counselor.

Additionally, there are the two major objections to 23andMe’s service which some on Twitter have been pointing me to. First, they return results which are highly actionable. The FDA explicitly used the example of a woman who goes and gets a mastectomy due to a 23andMe result. I don’t think this is a very strong objection. No doctor would perform a mastectomy based on 23andMe results. So that’s not an issue. Then there are those who allude to psychological harm. This could be a problem, but 23andMe has multiple steps which make it so that you have to proactively find out information on these sorts of diseases. Call me a libertarian if you will, but I object on principle to the idea that medical professionals necessarily have to be involved in the dissemination of information about my own genome as a regulatory matter. Obviously when it comes to a course of treatment they will be consulted, and no doubt there will be replications of any actionable results. But I don’t trust the medical sector to be populated by angels. To illustrate why I don’t trust medical professionals to always behave out of the goodness of their hearts, consider that deaths from hospital infections started dropping sharply when Medicare stopped paying for treating these infections. Workers in the health care sector do care about patients, but even here incentives matter, and the human cognitive budget is such that they can shift the outcomes greatly by reminding nurses and doctors that washing hands is going to impact the bottom line (the reality is that hospitals probably instituted much stricter measures). What does this have to do with personal genomics? You are our own best advocate, and one of the major reasons that those in higher socioeconomic strata have better health outcomes is that they are so much less passive as patients. The more detailed the information you have on your own health, the better you can advocate and be involved in the decision-making process. And the reality is that with dropping prices in sequencing, and the ability to design software to do interpretation, without draconian measures there’s almost nothing the United States government will be able to do to prevent anyone with a moderate amount of motivation from getting this sort of information.

A second objection is that the SNPs returned are of small and very probabilistic effect. This is embedded in issues regarding the “missing heritability” and the reality that most complex diseases are due to many factors. Because of the small effect size, and until recently, small sample sizes, this literature has been littered with false positives, which passed arbitrary statistical thresholds. The argument then boils down to the reality that 23andMe in many cases is not really adding any informative value. If that’s the case though then why the urgency to regulate it? Horoscopes and diet books do not add informative value either. This problem with small effect SNPs is widely known, so bringing it up as if it is revelatory is rather strange to me. Additionally, as Eric Lander and others have pointed out the locus which helped us discover statins is of very small effect. As long as they’re not false positives, small-effect SNPs are likely a good way to go in understanding biological pathways for pharmaceutical products. But that doesn’t speak to the risk prediction models. I think there the possibilities are murkier even in the long run because complex traits are complex. Even if we have massive GWAS with sample sizes in the millions and 100x whole-genome coverage (this will happen), the environmental factors may still be significant enough that researchers may balk at definitive risk predictions.

Ultimately where I think personal genomics is going is alluded to in the Genome Biology piece: as part of a broader suite of information services, probably centralized, filtered, and curated, by a helper artificial intelligence. What cognitive science and behavioral economics are telling us is that individuals operate under mental budget constraints. Dan MacArthur is probably right that personal genomics enthusiasts overestimated how involved the average person on the street was going to want to get in terms of their own interpretations of returned results. The reality is that even genetic counselors can barely keep up. Someday the field will stabilize, but this is not that day. But overall the information overload is going to get worse and worse, not better, and where the real upside, and game-changer, will be is in the domain of computational tools which helps us make decisions with a minimum of effort. A cartoon model of this might be an artificial intelligence which talks to you through an ear-bud all day, and takes your genomic, epigenomic, and biomarker status into account when advising you on whether you should pass on the dessert. But to get from here to there is going to require innovation. The end point is inevitable, barring a collapse of post-industrial civilization. The question is where it is going to happen. Here in the United States we have the technology, but we also have cultural and institutional road-blocks to this sort of future. If those road-blocks are onerous enough it doesn’t take a genius to predict that high-tech lifestyle advisement firms, whose aims are to replace the whole gamut of self-help sectors with rationally designed applications and appliances, will simply decamp to Singapore or Dubai.

Personal genomics is a small piece of that. And 23andMe is a small piece of personal genomics. But they are not trivial pieces.

The post The total information world appeared first on Gene Expression.

November 25, 2013

The FDA and 23andMe

Filed under: Personal genomics — Razib Khan @ 10:30 am

napFirst, download your 23andMe raw results now if you have them. If you don’t know what’s going on, the FDA has finally started to move aggressively against the firm. Unfortunately this is not surprising, as this was foreshadowed years ago. And, 23andMe has been moving aggressively to emphasize its medical, as opposed to genealogical, services over the past year. But this isn’t the story of one firm. This is the story of government response to very important structural shifts occurring in the medical delivery system of the United States. The government could potentially bankrupt 23andMe, but taking a step back that would still be like the RIAA managing to take down Napster. The information is coming, and if there’s one thing that can overpower state planning it is consumer demand. Unless the US government wants to ban their citizens from receiving their own genetic data they’re just putting off the inevitable outsourcing of various interpretation services. Engagement would probably be the better long term bet, but I don’t see that happening.

The post The FDA and 23andMe appeared first on Gene Expression.

November 7, 2013

The future always advances

Filed under: DTC personal genomics,Personal genomics — Razib Khan @ 12:56 am

The last week has seen a lot of chatter about the slapping down of the diagnostic patent by Sequenom, Judge Invalidates Patent for a Down Syndrome Test:

A federal judge has invalidated the central patent underlying a noninvasive method of detecting Down syndrome in fetuses without the risk of inducing a miscarriage.

The ruling is a blow to Sequenom, a California company that introduced the first such noninvasive test in 2011 and has been trying to lock out competitors in a fast-growing market by claiming they infringe on the patent.

Sequenom’s stock fell 23 percent on Thursday, to $1.92.

The judge, Susan Illston of the United States District Court in Northern California, issued a ruling on Wednesday that the patent was invalid because it covered a natural phenomenon — the presence of DNA from the fetus in the mother’s blood.

The existence of intellectual property is a utilitarian one. That is, these are institutions which are meant to further the cause of creativity and innovation. Is there going to be an abandonment in this domain of the push toward technological innovation? Coincidentally in the last week of October Sequenom put out a press release which heralded some advances in its panel:

…The MaterniT21 PLUS test will begin reporting additional findings for the presence of subchromosomal microdeletions and autosomal trisomies for chromosomes 16 and 22, in addition to the previously announced additional findings for sex chromosome aneuploidies involving an abnormal number of the X or Y chromosomes. These additional findings complement the MaterniT21 PLUS test core identification of trisomies for chromosome 21, chromosome 18 and chromosome 13. With this expansion, the MaterniT21 PLUS test is the first-of-its-kind noninvasive prenatal technology (NIPT) to provide these comprehensive results from a maternal blood draw.

Sequenom Laboratories will begin reporting on these select, clinically relevant microdeletions, including 22q11.2 deletion syndrome (DiGeorge), Cri-du-chat syndrome, Prader-Willi/Angelman syndrome, 1p36 deletion syndrome, as well as trisomies 16 and 22 the last week of October. Results from a method validation study….

It seems that the firm’s main path to profit and riches is going to be to innovate faster, gain market share, brand recognition, and economies of scale. This seems as if it is a greater good for the public than its rents extracted through intellectual property monopolies.

The post The future always advances appeared first on Gene Expression.

March 31, 2013

Genes are not a mirror upon our souls

Filed under: Genomics,Personal genomics — Razib Khan @ 3:09 pm

I have put 1 million markers (from a combination of Illumina SNP-chips) of mine online. I’m also going to put my sequence online when I get it done. Why? What do I gain from this? Hopefully I don’t gain anything from it. By this, I mean that the only major information that is actionable in a life altering sense is likely to be disease related. Though I’ve been contacted about possible loss of function mutations through imputation, so far my genotype has not illuminated any more risk susceptibilities. Rather, I am trying to make it clear by my openness that your genetic information has more power when pooled together with that of others, and small one step in creating that vast pool of information is to demystifying sharing it, and practicing what you (that is, me) preach. My soul is not in my genes, and certainly my genotype reflects me with far less obvious fidelity than a photograph would. By this, I mean that there are many traits that one could predict about me, but many one would be at a loss to predict.

To me this is a coordination problem. The more genetic and phenotypic information researchers and analytic software have, the more correlational juice one might squeeze out of the vast cloud of data. But the temptation here will be to free ride, and keep one’s own genome private, while one benefit from the openness of others (to some extent, this is what happens when you have medical research subjects, whose results are used for the gain of the public, which pays, but does not participate). I can see why someone would not want to divulge their health information. A list of venereal diseases may be a source of shame, whether you find it justifiable or not. There is a reasonable ground for privacy, because communicable diseases are reflective often of life choices one has made. When it comes to genes if you have a major loss of function mutation or a disease which is likely to develop at some point before your death (e.g., early onset Alzheimer’s disease), then there are clear grounds for keeping that information close to the vest. But you don’t bear any responsibility for your genotype, nor do you gain any merit or demerit from your genotype. It is a contingent accident of history, and the information is not who you are, it is just the loading of the die you were given by dint of your birth.

This not just about genetics. It is about life. We already have many private firms, from credit rating agencies to Facebook, to marketers, to the government, monitoring our movement, and attempting to anticipate our choices. One can opt out from this information ecology, but unless one goes off the grid and lives in a subsistence lifestyle it can be a part-time job to do this. Mind you, there are gains to this information ecology, as you are solicited for products and services which previous choices suggest you would be predisposed toward. Similarly, there are upsides and downsides toward an open health and genetic information ecology. If you have a risk allele for a disease with a diet interaction effect, then there is a clear course of action. My own hunch is that this world is coming, no matter our wish, and we need to act in its early phases to grasp the shape of the future and set the parameters of the game. We can’t be passive. The information cloud is going to be there, and someone will parcel and claim it.

It may be trite, but when I push for open genomics on my friends and family I’m not telling them what they may gain. Rather, I’m arguing that the world may gain, and therefore downstream we may gain.

March 29, 2013

On genetic privacy

Filed under: Health,HeLa,Personal genomics — Razib Khan @ 5:35 pm

Larry Moran has a post up, Who Owns Your Genome?, where he mentions me apropos of the HeLa genome disclosure:

In my opinion, there is no excuse for publishing this genome sequence without consent.

Razib Khan disagrees. He thinks that he can publish his genome sequence without obtaining consent from anyone else and I assume he feels the same way about the sequence of the HeLa genome [Henrietta Lacks’ genome, and familial consent].

In response to Larry, I don’t have a definitive opinion about the HeLa genome disclosure in terms of whether it was ethical to release it or not. “Both sides” have positions which I see the validity of. I think ultimately the root issues really date to the 1950s, not today, and they don’t have to do with personal genomics as such. Also, I’d recommend Joe Pickrell’s post, Henrietta Lacks’s genome sequence has been publicly available for years.

Larry also has a question in the comments:

Let’s try a thought experiment. Everyone is free to answer. I’d prefer a simple “yes” or “no” followed by an explanation.

Imagine that you have paid to have your entire genome sequenced. You announce that you intend to upload it to a public site so that anyone can see it. Your parents, your siblings, and your children, all object, saying that this would violate their privacy.

Do you upload it anyway? (“Yes” or “no.” Please respect the rules of a thought experiment and don’t try to quibble about the scenario.)

Yes, I’d upload it, and I will (since I’ve committed to uploading it when I have it done). Also note that my daughter is too young right now to give consent, and she probably will be too young when I upload the genome, so I’m going to do something which might impact her as a child.

One nuance I would like to add though is that decisions may vary given circumstance. For example, if you have one of the high penetrance BRCA mutations, you may not want to expose your family’s information for pragmatic reasons. But my question would be: why do people talk about their highly heritable illnesses in public forums already? I’ve seen media profiles of women with a BRCA mutation, with female relatives. By talking about this they’re exposing their family’s genetic information implicitly. Therefore, I suspect many of the pragmatic concerns are moot, because though there is privacy in regards to health information, there isn’t a taboo about discussing one’s health status in public. Most of the time people who have these diseases want their story put out there to aid in medical advancement and consciousness raising (though obviously there are exceptions).

And that is a subset of the primary issue I have with many worries about privacy and policy and the genome. Just transpose the structure of worries into other fields, and you wonder where the analogous concern is elsewhere. For example, in regards to health I’d argue diet is a much larger issue than genomics, at least for non-aged morbidity. But there is a huge industry of diet books, and very few people see licensed nutritionists. The point here isn’t to argue for paternalism or anti-paternalism, it’s to suggest that genetics isn’t special. It is important, it is cool, and it is fascinating. But so are many other things.

March 24, 2013

Henrietta Lacks’ genome, and familial consent

Filed under: Personal genomics — Razib Khan @ 12:07 am

Rebecca Skloot has an op-ed in The New York Times, The Immortal Life of Henrietta Lacks, the Sequel. I’ve read it a few times now and I’ll be honest and say I’m not totally clear on some of the points she’s trying to make, so I didn’t have a strong reaction to it. This is in contrast to Michael Eisen, who has a post up, The Immortal Consenting of Henrietta Lacks. He told me on Twitter that he had some exchanges with Skloot (on Twitter) which informed his response, so he probably has more context than I do. Eisen says:

I find the way Skloot’s NYT piece moves back and forth between the historical transgressions against Henrietta Lacks and the contemporary threat to her relatives’ privacy incredibly misleading. I doubt this was intentional – rather I think it reflects muddled thinking on her part about these issues. But either way, by juxtaposing the entirely justifiable empowering of the Lacks family to grant individual consent on Henrietta’s behalf with the desire of the same family to protect its genetic privacy, Skloot is implying that these are one and the same – that we should give ANY family the right to veto the publication of a relative’s genome.

I don’t know if Skloot is actually implying this.* If so, then I disagree with her on this, as I’ve stated in the past. But I do want to emphasize I feel that the op-ed overemphasizes the power of genes. For example: “Imagine if someone secretly sent samples of your DNA to one of many companies that promise to tell you what your genes say about you. That report would list the good news (you’ll probably live to be 100) and the not-so-good news (you’ll most likely develop Alzheimer’s, bipolar disorder and maybe alcoholism.” It’s a op-ed in The New York Times, so I don’t hold this against the author.

But much of the controversy about this stuff would be diffused if people were more straightforward about the comparative advantage of genetic information over any other sort of information. If, for example, you tell people that you have a health condition which is known to run in families (e.g., breast cancer), then you’re also telling people about your family. To my knowledge there isn’t a tacit social contract that you need to check with your family before you ever discuss you health status, though I could be out of the loop on this.

* Skloot disputes Eisen’s characterization of what she meant to imply. See her response in Eisen’s comments.

March 17, 2013

Those genius Chinese babies

Filed under: Personal genomics — Razib Khan @ 8:28 pm

I’ve gotten several emails about the Vice interview of Geoffrey Miller on BGI’s Cognitive Genetics Project. It’s a sexy piece, and no surprise given Miller’s fascination with the future of China and science (something I share to a moderate extent). But for the love of God please watch this Steve Hsu video first before reading that.


The problem that seems to crop up with this project, which has been in the works for years, is that any public mention blows up into extreme hyperbole. And yet when I’ve talked to Steve about it he’s often much more modest about the possibilities, even if the ambitions of the people involved in the endeavor are rather grand. I’m also moderately worried that the likely low probability of implementation of the sort of embryo screening for a quantitative trait like intelligence is going to confuse people as to the more probable and ubiquitous application: focusing on large effect deleterious mutations. As a new father the frequency of congenital defects is just staggering,* and I am highly motivated by this issue. In a China where family size is likely to stay small for the medium term future there will surely be a powerful demand side pressure for children without life altering diseases or abnormalities. Whether that is right or wrong, I am willing to be that it will bet routine reality for the affluent Chinese before the decade is out.

Also, I think Miller overdoes it on how China is overtaking the West in genetics (genomics). There is some caution in some cutting edge domains which might have unpalatable ideological implications, but American universities, and places like the Sanger or Max Plank Institute have a huge store of human capital. In fact I would hazard to predict that for the short to medium term future most of the “blue sky” biological research will still be done outside of China, while the Far East will focus on squeezing as much efficiency and insight as possible out of the basic science pioneered in the West.

* Yes, I know that it’s a 1 out of 30 probability. That’s really, really, high for most expectant parents. Think of a congenital defect as tail risk. Unlikely, but totally devastating when it does occur.

March 7, 2013

High likelihood that my daughter does not have an autosomal dominant condition

Filed under: D.T.C. Personal Genomics,Personal genomics — Razib Khan @ 5:21 pm

After my previous post my wife started doing research online. The autosomal dominant condition that I have is almost certainly localized to one particular chromosome (there is a large effect QTL there that is strongly associated with my condition). Additionally, I inherit this condition from my mother. My daughter has her whole pedigree genotyped, thanks to 23andMe. My wife went into the Family Inheritance feature, and compared the identity by descent blocks shared between my mother and my daughter. And, it turns out that on that chromosome the only segments inherited from me, her father, come from my father. Ergo, she can not have inherited the autosomal dominant condition from my mother, since she did not inherit those alleles from her!

We are very happy right now. This is one reason I don’t really care about what the F.D.A. thinks about direct-to-consumer personal genomics. We’re talking about commodity technology. And no one is going to stand between you and your health, if you are motivated.

Addendum: With hindsight I could have figured this out myself a year ago. It just hadn’t crossed my mind.

March 3, 2013

Confidence in inference in phylogenetic data sets

Filed under: Personal genomics,phylogenetics — Razib Khan @ 11:20 am

A few weeks ago I put up a new data set into my repository. As is my usual practice now the populations can be found in the .fam file. But I’ve added more into this. I have to rewrite my ADMIXTURE tutorial soon, so I thought I would bring up an important issue when interpreting these data sets using clustering methods: one has to understand that conclusions can not rest on one single result. Rather, one must attempt to ascertain the statistical robustness of the results. If you arrive at an expected result this is obviously not as important a consideration, but if you arrive at a novel and surprising result, then you have to make sure that it isn’t simply a fluke.

To do this I have been running my PHYLOCORE data set with cross-validation (regular 5-fold). In theory you should be able to see where the value is minimized, and that is your “best” K. But, my personal experience with running ADMIXTURE and STRUCTURE is that the inferred plausibility of a given K derived from the statistic can itself be quite volatile. In other words, it is best to run replicates of a data set when attempt to assess robustness. I’m going to run PHYLOCORE 50 times, but I already have 10 runs.

The results are plotted below

It is seems that the best fit to these data is in the 10 to 15 K range. But notice that < 10 K are not very volatile. There are 10 points, but at K = 5 for example they totally overlay. As you go up the number of populations that the algorithm attempts to infer, the more volatile the cross-validation results are.

Zooming in on the plot you notice that not only does K = 13 have the minimum cross-validation error, but seems to exhibit the least volatility. I suspect that this result will hold, but you never know. The point is not to establish hard and fixed rules. It is to be explicit in the guidelines of how to interpret results, which can be quite varied depending upon the input parameters you begin with.

Addendum: The seed is random, for those who are curious.

February 28, 2013

What is wrong with some ancestry testing

Filed under: Personal genomics — Razib Khan @ 11:27 am

This is an example of the type of question I receive all the time:

Here is some genetic analysis of Somalis from yours truly. I don’t necessarily blame the public here, as the marketing of Y and mtDNA lineages has really gotten out of control recently. The problem is that the fine print that Y and mtDNA follow only one direct line of descent is usually there. But, it is accompanied by rich visual and narrative media that tells a story about that marker, and it is this that is salient for most. Not that the story being told is only a very small part of the overall epic cycle that is your genealogy.

(Also, in population genetics using the word “Caucasian” is really confusing. G2 can often be thought of as a Caucasian haplogroup, but I don’t think that that’s what my correspondent meant)

February 25, 2013

The abuse of ancestry testing is bad for personal genomics

Filed under: Genomics,Personal genomics — Razib Khan @ 12:20 pm

I have very little with which I can disagree with in this Mark Thomas piece, To claim someone has ‘Viking ancestors’ is no better than astrology. His conclusion:

Exaggerated claims from the consumer ancestry industry can also undermine the results of serious research about human genetic history, which is cautiously and slowly building up a clearer picture of the human past for all of us.

Many of the commercial companies plant stories in the media that sound exciting and seem scientific. But very often they are trivial or wrong, are not published in peer-reviewed scientific journals, and just serve as disguised PR for the company.

The only caveat I would offer is that the sort of confusions and misrepresentations that occur with Y and mtDNA phylogeography are dampened when you are looking at a million markers throughout the whole genome. This does not mean there are still no confusions and misrepresentations (e.g., the reference populations matter a great deal when you present someone as a linear combination of X populations, and that summary is still not reality as such, but an informative model). One alarming aspect of the trade in Y and mtDNA is that I’ve met several people who somehow believe that only these lineages are ancestrally informative. That is probably a function of the ease with which you can say someone is “descended from Niall of the Nine Hostages.”

Addendum: I actually asked Jim Wilson on Twitter if I could get a look at the raw results (not even raw data) for the claims made. One major problem when scientists have a go-to-media-first strategy is that things get out of hand very quickly.

February 20, 2013

The age of non-invasive pre-natal testing is here

Last summer Neuroskeptic posted on The Coming Age of Fetal Genomics. It seems likely to me that this “age” won’t be ushered in with a bang, but we’ll be there before we know it. After all, most people aren’t thinking about having children at any given moment, and don’t track biomedical advances in genetic disease screening until they’re crossing that bridge. Over at Xconomy Luke Timmerman has a post up, Natera Joins Quest in Four-Way Battle for Prenatal Genetic Tests. Here are some important details:

…The company uses the same basic instrument to analyze DNA—Illumina’s HiSeq—but it has been tailored differently to look for about 20,000 different single nucleotide polymorphisms (SNPs) in the mom’s blood sample. The workflow is pretty simple—a clinician takes a blood sample from the mom, it gets shipped to Natera’s centralized lab in San Carlos, CA, and results are sent back to the physician and patient in two weeks.

After meeting with executives from all four major players in the non-invasive prenatal genetic testing market, and many physicians at the Society for Maternal-Fetal Medicine meeting, JP Morgan analyst Tycho Peterson wrote that the market appears poised to take off.

“In stark contrast to just a few years ago, NIPT [non-invasive prenatal genetic testing] is now widely understood and used by the maternal-fetal medicine community, which tends to be an early adopter of new technology and often sees high-risk patients,” Peterson wrote in a note to clients Feb. 18. Still, he cautioned there are “widely divergent views” among physicians about the appropriate use of the technology, particularly on whether it should be expanded beyond high-risk pregnancies and into more mainstream usage.

For readers of this weblog who are interested in this sort of thing I think the key to note is that it doesn’t matter what your doctor’s views are, just find a doctor who will align themselves with your views. You’re paying for it, and you are going to raise your children, not them.

And by the way, 20,000 SNPs is a decent amount to work with if you got the raw data. It would be good enough probably for inferring the identity-by-descent from various grandparents and such.

February 18, 2013

1 out of 100 benefit from whole genome sequencing

Filed under: Personal genomics,Whole Genome Sequencing — Razib Khan @ 3:10 pm

Perhaps. The New York Times has a piece out reviewing the vogue for sequencing the genomes of children who have mysterious diseases. The numbers are what matters here I think:

A few years ago, this sort of test was so difficult and expensive that it was generally only available to participants in research projects like those sponsored by the National Institutes of Health. But the price has plunged in just a few years from tens of thousands of dollars to around $7,000 to $9,000 for a family. Baylor College of Medicine and a handful of companies are now offering it. Insurers usually pay.

Demand has soared — at Baylor, for example, scientists analyzed 5 to 10 DNA sequences a month when the program started in November 2011. Now they are doing more than 130 analyses a month. At the National Institutes of Health, which handles about 300 cases a year as part of its research program, demand is so great that the program is expected to ultimately take on 800 to 900 a year.

Experts caution that gene sequencing is no panacea. It finds a genetic aberration in only about 25 to 30 percent of cases. About 3 percent of patients end up with better management of their disorder. About 1 percent get a treatment and a major benefit.

It seems this is a floor in terms of the results outcome for these children, as some of them may receive better or more effective treatments in the future, because the specific nature of their disease is already known. Since most medical treatments today are marginal in effect these outcomes don’t surprise or depress me, and the price point is sure to come down. In the near future I imagine that everyone will have a whole genome sequence, and relevant information about your specific genetic profile in relation to the sea of biomedical literature constantly coming out may be sent to you in a drip, drip, fashion by a phone or web app.

January 27, 2013

The importance of open data in genomics (and in everything!)

Filed under: Personal genomics — Razib Khan @ 4:22 pm

Yesterday a friend of mine who happens to be of doughty German and Scandinavian upper Midwest stock messaged me on Facebook and explained that her father’s results for 23andMe had come in…and he was 43 percent Sub-Saharan African! Her mother’s results came in a few hours later, and she was 35 percent Sub-Saharan African. I went to my account, and my parents were also in the same range. Oh my, overnight I became an underrepresented minority! Obviously this was a bug. The key clause is obviously. There are people who receive results suggesting that they are 5 percent Sub-Saharan African and such. Or someone like Dan MacArthur, who has likely South Asian ancestry, but in the 1-2 percent range.

How do you know that these results are not bugs? You analyze the raw data. Those who have skills with Plink or Admixture can double check easily, as I did with Dan MacArthur. Even if you don’t have that particular skill set, just use a service like Interpretome or GEDmatch. In this way you can use a range of statistical analyses to see if they reproduce concordant signals. Replication of this sort is essential. Methods don’t give you truth, they give you results which you can use to assess the likely shape of the truth.

This is why I was so hard on Ancestry.com and its lack of raw data downloads. You can’t just trust one particular firm to give you perfect analytic accuracy, they are not as gods. Your genetic information is too important to outsource to one company in terms of interpretation. If you have the skills there is no excuse to not go DIY. If you don’t have the skills you need a diverse portfolio of opinions, inferences, and assessments.

January 15, 2013

Do you want your genotype in a public data set?

Filed under: Personal genomics — Razib Khan @ 11:54 pm

In the near future one of my projects is revising and expanding the “PHYLO” pedigree file which I put up a week ago. Basically I want there to be a public data set which has a modest number of SNPs useful for phylogenetic analysis (100-200,000) with a wide population coverage. Additionally, I am going to do a few things like rename the family ids to populations, and also release it with  scripts to help in running Admixture (for example, shell scripts which will automate replication and later analysis of replicates). Finally, I’m planning on running ~50 replicates of K = 2 to K = 20 with 10-fold cross-validation (yes, this is will take a while) to get a good sense of the “best” K’s. The reality is that most people probably are only interested in the “most informative” K, +/- 1, so there’s no need for everyone to run K = 2 to K = 20. The time saved should be used on running replicates, and then CLUMPP to merge the results.

I would say that this is for ‘amateurs’ only, but I don’t think it’s betraying confidence to observe that several academic researchers at prominent institutions have ended up inquiring of me of how to get good public data sets. This sort of information still hasn’t percolated to the general public, including scientists who don’t work on population genomics. After a few trial runs with public data sets people with academic access could move to things like the POPRES data set.

But the ultimate point of this post is to ask: do you want to be in this data set? If so, I need the file (23andMe format is fine, otherwise, pedigree files only), your name, and some minimal ethnic information. I’m not going to add everyone. I just want to diversify the public data set a little. But I am going to put names in the sample sheet, so you won’t have anonymity. As you know I don’t particular care about this personally, but your mileage may vary. Researchers might need to contact or check that people are who they are.

Email: contactgnxp -at- gmail -dot- com

Laura Hercher convinces me there is no non-self interested case for genetic paternalism

Filed under: Personal genomics — Razib Khan @ 10:56 pm

Over at David Dobbs’ weblog Laura Hercher has a guest post up with the heading The Case for Selective Paternalism in Genetic Testing. Here are some relevant sections:

Which brings me back to this issue of paternalism. I agree that it makes no sense to put up obstacles for inquisitive and motivated individuals who wish to query their genome for information, however laced with uncertainty or peril. But forgive us if our first thoughts are often about how to help (yes, and to protect) the patients we see, in the medical setting. Science literacy is rare. The desire to use web-based tools to analyze their own DNA sequence is vanishingly rare. And a sentence like “Your risk of type II diabetes is decreased by the allele that you carry, in a gene that accounts for an estimated 1.5% of the heritability of the disease” is regularly interpreted as “You will not get type II diabetes.” So we worry about the effect that getting this information may have on the people who live where the sky is blue and the sun is yellow. Sue us.

So, yes – more information, not less, is the way of the future, for so many reasons. But I will throw in a plea for understanding that sometimes the opposition is not merely protecting an information fiefdom, but responding to their own previous experience. Sometimes, I get a little protective. I guess that’s paternalism. I plead guilty – guilty, with an explanation.


Actually, I don’t think that’s paternalism. Hercher’s whole post is bathed in a sentiment of protective paternalism, but there’s really little concrete response and rejection of the positions of those of us who lean toward greater information dissemination. In fact I’m even more convinced that genuine defenders of a paternalistic position are acting out of arrogance and self-interest. Their arguments are qualitatively different from the ones that Hercher offers above.

Unlike many people I am willing to cop to the reality that most people are quite stupid and ignorant. In fact, I would class all of the human race aside from analytical engines like Ed Witten in this category. But on a finer scale 25% of American don’t read any books in a given year, and individuals with undergraduate degrees in genetics can be under the mistaken impression that ancestral information is encoded only in the Y and mtDNA, rather than the whole genome (true story!). But this widespread ignorance and stupidity is one reason that I favor open access to genetic information: many ‘gatekeepers’ are no more well informed. Additionally, in a world of probiotics and nutritional supplements holding genetic information back in the interests of the public’s health due to their credulity strikes me as like holding back the waters of a cracking dam with one’s fingers. Rather than censor or cordon the information off, better ways have to be developed to appropriately communicate the information to the public. Understanding is an important good, but more critically in the area of health people just need to make the choices most optimal for them.

The example of Huntington’s disease is I think an important one, as it reiterates the importance of professionals who can facilitate the flow (or lack thereof if the possible patients so choose!) of information. As Americans we can buy and sell our own stocks, but often people rely on intermediaries and specialists. There’s no mandate, but it is often a mark of seriousness to consult experts. Granted, there will always be those who foolishly plunge headlong into adventures without any competence, but such as it always has been.

Addendum: I am actually more open to paternalism than you might think. But paternalism has be expressed and implemented on a macrosocial level. Until that happens I am curious why people would want to make exceptions for the ideal of self-actualizing individualism in the cases DNA is at issue.

December 18, 2012

Buddy, can you spare some ascertainment?

The above map shows the population coverage for the Geno 2.0 SNP-chip, put out by the Genographic Project. Their paper outlining the utility and rationale by the chip is now out on arXiv. I saw this map last summer, when Spencer Wells hosted a webinar on the launch of Geno 2.0, and it was the aspect which really jumped out at me. The number of markers that they have on this chip is modest, only >100,000 on the autosome, with a few tens of thousands more on the X, Y, and mtDNA. In contrast, the Axiom® Genome-Wide Human Origins 1 Array Plate being used by Patterson et al. has ~600,000 SNPs. But as is clear by the map above Geno 2.0 is ascertained in many more populations that the other comparable chips (Human Origins 1 Array uses 12 populations). It’s obvious that if you are only catching variation on a few populations, all the extra million markers may not give you much bang for the buck (not to mention the biases that that may introduce in your population genetic and phylogenetic inferences).


To the left are the list of populations against which the Human Origins 1 Array was ascertained, and they look rather comprehensive to me. In contrast, for Geno 2.0 ‘ancestrally informative markers’ were ascertained on 450 populations. The ultimate question for me is this: is all the extra ascertainment on diverse and obscure groups worth it? On first inspection Geno 2.0′s number of SNPs looks modest as I stated, but in my experience when you quality control and merge different panels together you are often left with only a few hundred thousand SNPs in any case. 100-200,000 SNPs is also sufficient to elucidate relationships even in genetically homogeneous regions such as Europe in my experience (it’s more than enough for model-based clustering, and seems to be overkill for MDS or PCA). One issue that jumps out at me about the Affymetrix chip is that it is ascertained toward the antipodes. In contrast, Geno 2.0 takes into account the Eurasian heartland. I suspect, for example, that Geno 2.0 would be better for population or ancestry assignment for South Asians because it would have more informative markers for those populations.

Ultimately I can’t really say much more until I use both marker sets in different and similar contexts. Since Geno 2.0 consciously excludes many functional and medically relevant SNPs its utility is primarily in the domain of demographics and history. If the populations in question are well covered by the Human Origins 1 Array, I see no reason why one shouldn’t go with it. Not only does it have more information about biological function, but the number of markers are many fold greater. On the other hand, Geno 2.0 may be more useful on the “blank zones” of the Affy chip. Hopefully the Genographic Project results paper for Geno 2.0 will come out soon and I can pull down their data set and play with it.

Cite: arXiv:1212.4116

December 17, 2012

Buyer beware in ancestry testing!

Filed under: Personal genomics — Razib Khan @ 10:20 pm

Over at Genomes Unzipped Vincent Plagnol has put up a post, Exaggerations and errors in the promotion of genetic ancestry testing, which to my mind is an understated and soft-touch old-fashioned “fisking” of the pronouncements of a spokesperson for an outfit termed Britain’s DNA. The whole post is worth reading, but this is a very grave aspect of the response of the company:

…The main reason is that listening to this radio interview prompted my UCL colleagues David Balding and Mark Thomas to ask questions to the Britain’s DNA scientific team; the questions have not been satisfactorily answered. Instead, a threat of legal action was issued by solicitors for Mr Moffat. Any type of legal threat is an ominous sign for an academic debate. This motivated me to point out some of the incorrect, or at the very least exaggerated, statements made in this interview. Importantly, while I received comments from several people for this post, the opinion presented here is entirely mine and does not involve any of my colleagues at Genomes Unzipped.

From what I can gather this firm is charging two to three times more than 23andMe for state-of-the-art scientific genealogy, circa 2002. So if you can’t be bothered to read the piece, it looks like Britain’s DNA is threatening litigation for researchers having the temerity to point out that the firm is providing substandard services at above-market costs. Plagnol’s critique lays out point-by-point refutation of assertions, but the interpretation services on offer seem to resemble nothing more than genetically rooted epic fantasy. A triumph of marketing over science.


In other scientific genealogy news, a friend recent sent me results for his family from Ancestry.com’s AncestryDNA service. Looking at the pie-charts, I can say one thing: they were whack! But the question then is are they truly just whack, or does their peculiarity indicate real genetic insight? I have no way to judge, because they still aren’t providing raw data downloads, though they promise to soon. I actually talked to a scientist from Ancestry.com for a little while at ASHG 2012, and he claimed that they were tweaking the algorithms as as we were speaking. Nevertheless, bizarre results still seem to abound. It would be nie to figure out the method to this madness.

Finally, the genomic angle to the Dan MacArthur → Dan MacCurry saga is approaching closure. My friend Zack Ajmal promises to put up his analysis before he goes on vacation. I asked Zack to look into the matter because he has a very large database of South Asians, and I want to see if he could find the best match to Dan’s chromosome 10. If it does turn out that it is highly probable that Dan’s South Asian ancestry is Bengali, then I’ll have to make sure he’s introduced to the aloo bhorta which his ancestors no doubt relished (and which is unpalatable to people of other South Asian ethnic groups because of the mustard oil).

December 11, 2012

$99 for 1 million markers

Filed under: 23andMe,Personal genomics — Razib Khan @ 8:18 am

Looks like 23andMe has a new $99 price point. If so, that’s 100 markers per cent! (here’s the press release)

1) Privacy: Yes, this a privacy risk. 23andMe is fundamentally an IT company, and IT companies mess up. But I am confident that within 10-15 years genetic information is going to be pretty easy to get anyhow. Your data will be in too many places for any expectation of privacy.

2) Cost/worth it: That is dependent on your income. If you are willing to spend $100 on a nice meal, I think $100 for 1 million markers is an excellent proposition. The markers never depreciate, though in the near future you will you get sequence data which will supersede them.

$99 for 1 million markers

Filed under: 23andMe,Personal genomics — Razib Khan @ 8:18 am

Looks like 23andMe has a new $99 price point. If so, that’s 100 markers per cent! (here’s the press release)

1) Privacy: Yes, this a privacy risk. 23andMe is fundamentally an IT company, and IT companies mess up. But I am confident that within 10-15 years genetic information is going to be pretty easy to get anyhow. Your data will be in too many places for any expectation of privacy.

2) Cost/worth it: That is dependent on your income. If you are willing to spend $100 on a nice meal, I think $100 for 1 million markers is an excellent proposition. The markers never depreciate, though in the near future you will you get sequence data which will supersede them.

Older Posts »

Powered by WordPress