Razib Khan One-stop-shopping for all of my content

April 2, 2017

The future shall, and should, be sequenced

Filed under: Genomics,GWAS,Human Genetics — Razib Khan @ 10:32 pm

Last fall I talked about a preprint, Human demographic history impacts genetic risk prediction across diverse populations. It’s now published in AJHG, with the same informative title, Human Demographic History Impacts Genetic Risk Prediction across Diverse Populations. Even though talked about this before, I thought it would be useful to highlight again.

To recap, GWAS is a pretty big deal, but only in the last 15 years or so. With genome-wide data researchers began to explore associations between diseases and population genetic variation. In some cases they discovered strong associations between characteristics and genetic variants, but in many casese it turned out that though a trait is highly heritable (e.g., schizophrenia) the causal variants are either not common or do not explain much of the variation in the poplation (or both).

But as the second decade of GWAS proceeds the sample sizes are getting larger, and researchers are moving from SNP-chips, with their various biases, to high quality whole-genome sequences. One of the major sorts of low hanging fruit in the minds of many people are rare variants. Basically SNP-chips are geared toward finding common variations within large populations, since they have a finite number of markers they are going to interrogate. Sequencing though is a comprehensive catalog of the genome in a relative sense. If you have high coverage (so you sample the site many times) you can easily discover rare mutations within an individual genome that makes them distinctive from almost the rest of the human race (these may be de novo mutations, or, they could be mutations private to their extended pedigree).

But context matters. Martin et al. find that confirmed GWAS hits in Europeans tend to exhibit decreased portability as a function of genetic distance. This isn’t entirely surprising, especially if rarer variants are part of the explanation. Rare variants usually emerged later in history, after the differentiation between geographic races.

A solution would be to have a diverse panel of populations in your studies. For many reasons this was not to be. Northwest Europeans are enormously enriched in current data sets. Martin et al. observe that recent this has diminished somewhat, from 95% European to less than 80%. But they observe that this is mostly due to the inclusion of “Asian” samples, as opposed to African and Native Americans, who remain as undererpresented as they did several years ago.

The African and Native American samples present somewhat different problems. The Native American groups are quite drifted due to bottlenecks. Likely they have their own variants due to the combined affects of mutation and selection through 15 to 20,000 years of isolation from other human populations. In contrast, the African groups have lots of diversity with a high time depth due to their ancestral histories, which are less subject to bottleneck effects. The prediction ability into Africans of current GWAS looks to be rather pathetic. This is reasonable because their diversity is poorly captured in Eurocentric study designs, and, they are more genetically diverged from Europeans than Asians are.

Ultimatley I think, and hope, this portability question will be of short term utility. As sequencing gets cheap, and studies become more numerous, we’ll fill in the gaps of understudied populations. Finally, ethics is above my paygrade, but I do hope those who demand a strenuous bar on consent keep in mind that that will result in slower growth of these study populations. Academics want to do a good job, but they also want to stay on the good side of IRB.

Citation: Martin, Alicia R., et al. “Human demographic history impacts genetic risk prediction across diverse populations.” bioRxiv (2016): 070797.

November 4, 2012

Inflammatory bowel syndrome is nature’s side effect

Last week Luke Jostins (soon to be Dr. Luke Jostins) published an interesting paper in Nature. To be fair, this paper has an extensive author list, but from what I am to understand this is the fruit of the first author’s Ph.D. project. In any case, you may know Luke because I have used his loess curve on hominin encephalization for years. His bread & butter is statistical genetics, and it shows in this Nature paper. God knows how he managed to cram so much density into ~5.5 pages of plain text. Luke is also a contributor to Genomes Unzipped, and has put up a post over there on one implication of the paper, Dozens of new IBD genes, but can they predict disease? The short answer is that for individual prediction complex traits are going to be a hard haul over the long term.*

They are subject to what Jim Manzi would term “high causal density.” A simple way to state this is that outcome X is dependent on a host of variables, and if you capture only a small number of variables, you aren’t going to be explaining much in a general ...

January 19, 2011

The rise of genetic architecture

In science, like most things, one prefers simple over complex whenever possible. You keep adding variables until the explanatory juice starts hitting diminishing marginal returns. So cystic fibrosis is due to a mutation at one gene, and the disease expresses recessively at that locus. The reality is that one mutation accounts for ~65-70% of cystic fibrosis cases around the world, and there are nearly ~1,400 known mutations on the CFTR locus. How about skin color? Mutations on a dozen genes can probably explain ~90% of the variance in the trait value across the world between populations. In fact, one single mutation on one base pair can explain ~30-40% of the trait value difference between Europeans and Africans. This is a more complex story that cystic fibrosis; you have not just many mutations, but many mutations across many genes. But, the number of genes and mutations are manageable. You can keep track of most of them in your head (e.g., I can tell you that SLC24A5, SLC45A2, KITLG, and HERC2, can explain most of the trait value difference between Africans and Europeans without looking it up).

ResearchBlogging.org

January 18, 2011

Synthetic associations and all that

Filed under: Genetics,Genomics,GWAS,synthetic associations — Razib Khan @ 2:28 pm

PLoS Biology has four items of great interest out today:

Synthetic Associations Created by Rare Variants Do Not Explain Most GWAS Results
Synthetic Associations Are Unlikely to Account for Many Common Disease Genome-Wide Association Signals
The Importance of Synthetic Associations Will Only Be Resolved Empirically
Common Disease: Are Causative Alleles Common or Rare?

These are a response to last year’s paper on synthetic associations from the Goldstein lab. Here’s a critique of that that paper. I plan on reviewing the first in the list above soon. #3 is a response to #1 and #2 from David Goldstein, while #4 is a summation more aimed at the general audience.

September 29, 2010

Every variant with an author!

I recall projections in the early 2000s that 25% of the American population would be employed as systems administrators circa 2020 if rates of employment growth at that time were extrapolated. Obviously the projections weren’t taken too seriously, and the pieces were generally making fun of the idea that IT would reduce labor inputs and increase productivity. I thought back to those earlier articles when I saw a new letter in Nature in my RSS feed this morning, Hundreds of variants clustered in genomic loci and biological pathways affect human height:

Most common human traits and diseases have a polygenic pattern of inheritance: DNA sequence variants at many genetic loci influence the phenotype. Genome-wide association (GWA) studies have identified more than 600 variants associated with human traits1, but these typically explain small fractions of phenotypic variation, raising questions about the use of further studies. Here, using 183,727 individuals, we show that hundreds of genetic variants, in at least 180 loci, influence adult height, a highly heritable and classic polygenic trait2, 3. The large number of loci reveals patterns with important implications for genetic studies of common human diseases and traits. First, the 180 loci are not random, but instead are enriched for genes that are connected in biological pathways (P = 0.016) and that underlie skeletal growth defects (P < 0.001). Second, the likely causal gene is often located near the most strongly associated variant: in 13 of 21 loci containing a known skeletal growth gene, that gene was closest to the associated variant. Third, at least 19 loci have multiple independently associated variants, suggesting that allelic heterogeneity is a frequent feature of polygenic traits, that comprehensive explorations of already-discovered loci should discover additional variants and that an appreciable fraction of associated loci may have been identified. Fourth, associated variants are enriched for likely functional effects on genes, being over-represented among variants that alter amino-acid structure of proteins and expression levels of nearby genes. Our data explain approximately 10% of the phenotypic variation in height, and we estimate that unidentified common variants of similar effect sizes would increase this figure to approximately 16% of phenotypic variation (approximately 20% of heritable variation). Although additional approaches are needed to dissect the genetic architecture of polygenic human traits fully, our findings indicate that GWA studies can identify large numbers of loci that implicate biologically relevant genes and pathways.

The supplements run to nearly 100 pages, and the author list is enormous. But at least the supplements are free to all, so you should check them out. There are a few sections of the paper proper that are worth passing on though if you can’t get beyond the paywall.


fig1bIn this study they pooled together several studies into a meta-analysis. One thing not mentioned in the abstract: they checked their GWAS SNPs against a family based study. This was important because in the latter population stratification isn’t an issue. Family members naturally overlap a great deal in their genetic background. Also, if I read it correctly they’re focusing on populations of European origin, so this might not capture larger effect alleles which impact between population variance in height but don’t vary within a given population (note that if you explored pigmentation genetics just through Europeans you would miss the most important variable on the world wide scale, SLC24A5, because it’s fixed in Europeans). In any case, as you can see what they did was extrapolate out the number of loci which their methods could capture to explain variation with the predictor being the sample size. At 500,000 individuals they’re at ~700 loci, and around 20% of the heritable variation. My initial thought is that I’m not seeing diminishing returns here, but since I haven’t read the supplements I’ll let that pass since I don’t know the guts of this anyhow. They do assert that they are likely underestimating the power of these methods because there may be be smaller effect common variants which can top off the fraction.

But even they admit that they can go only so far. Here are some sections from the conclusion that lays it out pretty clearly:

By increasing our sample size to more than 100,000 individuals, we identified common variants that account for approximately 10% of phenotypic variation. Although larger than predicted by some models26, this figure suggests that GWA studies, as currently implemented, will not explain most of the estimated 80% contribution of genetic factors to variation in height. This conclusion supports the idea that biological insights, rather than predictive power, will be the main outcome of this initial wave of GWA studies, and that new approaches, which could include sequencing studies or GWA studies targeting variants of lower frequency, will be needed to account for more of the ‘missing’ heritability. Our finding that many loci exhibit allelic heterogeneity suggests that many as yet unidentified causal variants, including common variants, will map to the loci already identified in GWA studies, and that the fraction of causal loci that have been identified could be substantially greater than the fraction of causal variants that have been identified.

In our study, many associated variants are tightly correlated with common nsSNPs, which would not be expected if these associated common variants were proxies for collections of rare causal variants, as has been proposed27. Although a substantial contribution to heritability by less common and/or quite rare variants may be more plausible, our data are not inconsistent with the recent suggestion28 that many common variants of very small effect mostly explain the regulation of height.

In summary, our findings indicate that additional approaches, including those aimed at less common variants, will likely be needed to dissect more completely the genetic component of complex human traits. Our results also strongly demonstrate that GWA studies can identify many loci that together implicate biologically relevant pathways and mechanisms. We envisage that thorough exploration of the genes at associated loci through additional genetic, functional and computational studies will lead to novel insights into human height and other polygenic traits and diseases.

The second to last paragraph takes a shot at David Goldstein’s idea of synthetic associations.

We’re still where we were a a few years back though, old fashioned Galtonian quantitative genetics, a branch of statistics, is the best bet to predict the heights of your offspring. As with intelligence, “height genes”, are not improvements upon common sense. But if you’re going into the 10-20% range of variation explained it’s certainly not trivial, and the biological details are going to be of interest.

July 19, 2010

Genome-wide association for newbies

Filed under: genome-wide association,Genomics,GWAS — Razib Khan @ 2:20 pm

It looks like Genomes Unzipped has their own Mortimer Adler, with an excellent posting, How to read a genome-wide association study. For those outside the biz I suspect that #4, replication, is going to be the easiest. In the early 2000s a biologist who’d been in the business for a while cautioned about reading too much into early association results which were sexy, as the same had occurred when linkage studies were all the vogue, but replication was not to be. Goes to show that history of science can be useful on a very pragmatic level. It can give you a sense of perspective on the evanescent impact of some techniques over the long run.

Powered by WordPress