Razib Khan One-stop-shopping for all of my content

August 10, 2011

Crohn’s disease is about barely keeping you alive

The Pith: Natural selection is a quick & dirty operator. When subject to novel environments it can react rapidly, bringing both the good and the bad. The key toward successful adaptation is not perfection, but being better than the alternatives. This may mean that many contemporary diseases are side effects of past evolutionary genetic compromises.

The above is a figure from a recent paper which just came out in Molecular Biology and Evolution, Crohn’s disease and genetic hitchhiking at IBD5. You probably have heard about Crohn’s disease before, there are hundreds of thousands of Americans afflicted with it. It’s an inflammatory bowel ailment, and it can be debilitating even to very young people. The prevalence also varies quite a bit by population. Why? It could be something in the environment (e.g., different diet) or genetic predisposition, or some combination. What the figure above purports to illustrate is the correlation between Crohn’s disease and the expansion of the agricultural lifestyle.

But don’t get overexcited Paleos! There are many moving parts to this story, and I need to back up to the beginning. The tens of thousands of ...

June 12, 2011

You are a mutant!

The Pith: You are expected to have 30 new mutations which differentiate you from your parents. But, there is wiggle room around this number, and you may have more or less. This number may vary across siblings, and explain differences across siblings. Additionally, previously used estimates of mutation rates which may have been too high by a factor of 2. This may push the “last common ancestor” of many human and human-related lineages back by a factor of 2 in terms of time.

There’s a new letter in Nature Genetics on de novo mutations in humans which is sending the headline writers in the press into a natural frenzy trying to “hook” the results into the X-Men franchise. I implicitly assume most people understand that they all have new genetic mutations specific and identifiable to them. The important issue in relation to “mutants” as commonly understood is that they have salient identifiable phenotypes, not that they have subtle genetic variants which are invisible to us. Another implicit aspect is that phenotypes are an accurate signal or representation of high underlying mutational load. In other words, if you can see that someone is weird in ...

February 14, 2011

Who are those Houston Gujus?

The figure to the left is a three dimensional representation of principal components 1, 2, and 3, generated from a sample of Gujaratis from Houston, and Chinese from Denver. When these two populations are pooled together the Chinese form a very homogeneous cluster. They don’t vary much across the three top explanatory dimensions of genetic variance. In contrast, the Gujaratis do vary. This is not surprising. In the supplements of Reconstructing Indian population history it was notable that the Gujaratis did tend to shake out into two distinct clusters in the PCAs. This is a finding you see over and over when you manipulate the HapMap Gujarati data set. In reality, there aren’t two equivalent clusters. Rather, there’s one “tight” cluster, which I will label “Gujarati_B” from now on in my data set, and another cluster, “Gujarati_A,” which really just consists of all the individuals who are outside of Gujarati_B cluster. Even when compared to other South Asian populations these two distinct categories persist in the HapMap Gujaratis.

Zack has already identified a major difference between the two clusters: Gujarat_A has some individuals with much more “West Eurasian” ancestry. ...

February 8, 2011

Health care costs and ancestry

Filed under: Ancestry,Genetics,Genomics,Health,Medical Genetics,race — Razib Khan @ 1:07 am

The Pith: In this post I examine the relationship between racial ancestry and cancer mortality risks conditioned on particular courses of treatment. I review research which indicates that the amount of Native American ancestry can be a very important signal as to your response to treatment if you have leukemia, as measured by probability of relapse.

If you are an engaged patient who has been prescribed medication I assume you’ve done your due diligence and double-checked your doctor’s recommendations (no, unfortunately an M.D. does not mean that an individual is omniscient). Several times when I’ve been prescribed a medication I have seen a note about different recommended dosages by race when I did further research. Because of my own personal background I am curious when it says “Asian.” The problem with this term in medical literature is that “Asian” in the American context is derived from a Census category constructed in 1980 for bureaucratic and political purposes. It amalgamates populations which are genetically relatively close, East and Southeast Asians, with more distant ones, South Asians (when my siblings were born I remember that my parents listed their race as “Asian” ...

September 29, 2010

Every variant with an author!

I recall projections in the early 2000s that 25% of the American population would be employed as systems administrators circa 2020 if rates of employment growth at that time were extrapolated. Obviously the projections weren’t taken too seriously, and the pieces were generally making fun of the idea that IT would reduce labor inputs and increase productivity. I thought back to those earlier articles when I saw a new letter in Nature in my RSS feed this morning, Hundreds of variants clustered in genomic loci and biological pathways affect human height:

Most common human traits and diseases have a polygenic pattern of inheritance: DNA sequence variants at many genetic loci influence the phenotype. Genome-wide association (GWA) studies have identified more than 600 variants associated with human traits1, but these typically explain small fractions of phenotypic variation, raising questions about the use of further studies. Here, using 183,727 individuals, we show that hundreds of genetic variants, in at least 180 loci, influence adult height, a highly heritable and classic polygenic trait2, 3. The large number of loci reveals patterns with important implications for genetic studies of common human diseases and traits. First, the 180 loci are not random, but instead are enriched for genes that are connected in biological pathways (P = 0.016) and that underlie skeletal growth defects (P < 0.001). Second, the likely causal gene is often located near the most strongly associated variant: in 13 of 21 loci containing a known skeletal growth gene, that gene was closest to the associated variant. Third, at least 19 loci have multiple independently associated variants, suggesting that allelic heterogeneity is a frequent feature of polygenic traits, that comprehensive explorations of already-discovered loci should discover additional variants and that an appreciable fraction of associated loci may have been identified. Fourth, associated variants are enriched for likely functional effects on genes, being over-represented among variants that alter amino-acid structure of proteins and expression levels of nearby genes. Our data explain approximately 10% of the phenotypic variation in height, and we estimate that unidentified common variants of similar effect sizes would increase this figure to approximately 16% of phenotypic variation (approximately 20% of heritable variation). Although additional approaches are needed to dissect the genetic architecture of polygenic human traits fully, our findings indicate that GWA studies can identify large numbers of loci that implicate biologically relevant genes and pathways.

The supplements run to nearly 100 pages, and the author list is enormous. But at least the supplements are free to all, so you should check them out. There are a few sections of the paper proper that are worth passing on though if you can’t get beyond the paywall.

fig1bIn this study they pooled together several studies into a meta-analysis. One thing not mentioned in the abstract: they checked their GWAS SNPs against a family based study. This was important because in the latter population stratification isn’t an issue. Family members naturally overlap a great deal in their genetic background. Also, if I read it correctly they’re focusing on populations of European origin, so this might not capture larger effect alleles which impact between population variance in height but don’t vary within a given population (note that if you explored pigmentation genetics just through Europeans you would miss the most important variable on the world wide scale, SLC24A5, because it’s fixed in Europeans). In any case, as you can see what they did was extrapolate out the number of loci which their methods could capture to explain variation with the predictor being the sample size. At 500,000 individuals they’re at ~700 loci, and around 20% of the heritable variation. My initial thought is that I’m not seeing diminishing returns here, but since I haven’t read the supplements I’ll let that pass since I don’t know the guts of this anyhow. They do assert that they are likely underestimating the power of these methods because there may be be smaller effect common variants which can top off the fraction.

But even they admit that they can go only so far. Here are some sections from the conclusion that lays it out pretty clearly:

By increasing our sample size to more than 100,000 individuals, we identified common variants that account for approximately 10% of phenotypic variation. Although larger than predicted by some models26, this figure suggests that GWA studies, as currently implemented, will not explain most of the estimated 80% contribution of genetic factors to variation in height. This conclusion supports the idea that biological insights, rather than predictive power, will be the main outcome of this initial wave of GWA studies, and that new approaches, which could include sequencing studies or GWA studies targeting variants of lower frequency, will be needed to account for more of the ‘missing’ heritability. Our finding that many loci exhibit allelic heterogeneity suggests that many as yet unidentified causal variants, including common variants, will map to the loci already identified in GWA studies, and that the fraction of causal loci that have been identified could be substantially greater than the fraction of causal variants that have been identified.

In our study, many associated variants are tightly correlated with common nsSNPs, which would not be expected if these associated common variants were proxies for collections of rare causal variants, as has been proposed27. Although a substantial contribution to heritability by less common and/or quite rare variants may be more plausible, our data are not inconsistent with the recent suggestion28 that many common variants of very small effect mostly explain the regulation of height.

In summary, our findings indicate that additional approaches, including those aimed at less common variants, will likely be needed to dissect more completely the genetic component of complex human traits. Our results also strongly demonstrate that GWA studies can identify many loci that together implicate biologically relevant pathways and mechanisms. We envisage that thorough exploration of the genes at associated loci through additional genetic, functional and computational studies will lead to novel insights into human height and other polygenic traits and diseases.

The second to last paragraph takes a shot at David Goldstein’s idea of synthetic associations.

We’re still where we were a a few years back though, old fashioned Galtonian quantitative genetics, a branch of statistics, is the best bet to predict the heights of your offspring. As with intelligence, “height genes”, are not improvements upon common sense. But if you’re going into the 10-20% range of variation explained it’s certainly not trivial, and the biological details are going to be of interest.

June 18, 2010

The “how” of cystic fibrosis through the “why”

Filed under: CFTR,cystic fibrosis,Genetics,Genomics,Medical Genetics — Razib Khan @ 5:39 am

It’s just a fact that contemporary human evolutionary genetics has relied upon its potential insights into disease to generate funding, support and interest. I don’t think that this is much of a silver lining when set next to the suffering caused by disease, but it’s a silver lining nevertheless.  Therefore findings which would be of interest in and of themselves are able to push to the front of the line because of possible medical relevance. A new paper in PLoS Genetics illustrates the relationship between what seem like esoteric evolutionary insights and diseases of importance to the medical community. It takes a look at the gene whose disruption results in the horrible illness cystic fibrosis, CFTR, and uncovers some interesting genetic patterns of possible evolutionary relevance. The paper is The CFTR Met 470 Allele Is Associated with Lower Birth Rates in Fertile Men from a Population Isolate. From the author summary:

Cystic fibrosis (CF) is the most common lethal recessive disorder in European-derived populations and is characterized by clinical heterogeneity that involves multiple organ systems. Over 1,600 disease-causing mutations have been identified in the cystic fibrosis transmembrane regulator (CFTR) gene, but our understanding of genotype–phenotype correlations is incomplete. Male infertility is a common feature in CF patients; but, curiously, CF–causing mutations are also found in infertile men who do not exhibit any other CF–related complications. In addition, three common polymorphisms in CFTR have been associated with infertility in otherwise healthy men. We studied these three polymorphisms in fertile men and show that one, called Met470Val, is associated with variation in male fertility and shows a signature of positive selection. We suggest that the Val470 allele has risen to high frequencies in European populations due a fertility advantage but that other genetic and, possibly, environmental factors have tempered the magnitude of these effects during human evolution.

The high frequency of alleles which result in cystic fibrosis is something of a mystery. Basic population genetic theory tells us that lethal (at least in the pre-modern era) recessive traits should be extant only at very low frequencies so that most of the deleterious alleles are “masked” by normal copies. The ΔF508 mutation is found in 1 in 30 people of Northern European descent (you see somewhat different ratios, but all in the same ballpark). That means that assuming a random mating Hardy-Weinberg Equilibrium a touch more than 0.1% of offspring would exhibit the disease due to the coming together of the ΔF508 allele in a homozygote state, not a trivial proportion when you consider that the fitness of these individuals converges upon zero.

In this paper they don’t get at ΔF508 and the other disease causing alleles directly. Rather, they find that one particular SNP has a strong effect on fertility, as well as having a relationship in some contexts to disease implicated alleles. Not too surprising considering that cystic fibrosis is associated with infertility. I presume that the overarching logic is that  understanding the genetics of CFTR in its details will give us a better picture of its internal architecture and the various networks and pathways which result in its proper, or improper, function.

CFTR spans ~200,000 base pairs, but in the paper the authors focus on a few regions of interest within a sample from the American Hutterite community. In particular there is the 5-thymidine (5T) repeat allele at the 3′ splice site of intron 8, a variant which interferences with the proper transcription of exon 9. Then there is TG repeat (TG) on intron 8 and an SNP on exon 10, rs213950. In the latter case the two alleles result in the amino acids methionine and valine respectively at the 470th position (Met470 and Val470). Both of these variants have an effect on the 5T allele, increasing its penetrance in relation to the outcome of cystic fibrosis. The Met470Val mutation’s molecular genetic implications are double-edged outcome; Val470 results in a CFTR protein which matures more quickly, but with lower activity compared to the Met470 allele. Since 5T reduces splicing efficiency one could intuit why the presence of Val470, with its result of lower activity of the protein, might have a a deleterious effect when the two are found in conjunction.

The paper approaches cystic fibrosis sideways because the focus on Met470Val means that they’re looking at a secondary variant from a medical perspective; a modifier, not the primary agent. But from an evolutionary perspective there’s a lot to dig into! First, let me jump to the discussion, where they seem to admit the modest current medical relevance of this paper:

Lastly, there has been a long-standing debate as to whether disease-causing CF mutations, such as ΔF508, confer a fertility advantage to healthy carriers…Unfortunately, the results we report here do not provide insight into this question. The most common CF causing mutations in Europeans (i.e. ΔF508, G542X, N1303K, W1282X) and the most common mutation in the Hutterites, M1101K…all reside on haplotypes carrying the ancestral, Met470 allele in exon 10…the 9T allele at the polyT locus, and (by inference) the TG10 or TG11 alleles…Therefore, any positive fertility effects of the Val470 allele would not be expected to affect the frequencies of the common CF disease-causing mutations in European populations.

A haplotype just refers to a sequence/correlation of alleles along the genome. You know that DNA consists of a string of base pairs, AGCGCTGAGCGCAA…. If there is variation at the first and last positions in the sequence above, and if the alternative variants at the two loci do not associate randomly but exhibit high correlations along a physical sequence, then there may be a haplotype of the variants. In the case of this paper the three regions of mutations combine to form the haplotypes. Tables 1 & 2 show the frequencies of alleles and haplotypes within their Hutterite sample.



Table 1 lays out the frequencies of each allele within the sample, while table 2 illustrates the frequencies of combinations of these alleles. The haplotypes.

The next two figures show the major finding, the association between Val470 and higher fertility in Hutterite men (not women). Remember that p-value = 0.05 is the normal bar for statistical significance. The ticks in the second figure are 95% intervals.



Do I need to emphasize how important it is that the alleles have a correlation with reproductive outcomes? Changes in gene frequencies are driven by variations in reproductive outcomes, whether random or systematically correlated with phenotypes. Drift or selection. Traits strongly tied to reproduction often have low heritabilities because all the variation on such traits quickly disappear because of selection’s homogenizing power. It is interesting that in this case they’re implying that there’s heritable variation in reproductive outcomes, as they know a priori that selection should have expunged the variation, all things equal.

Here’s a more stark figure which illustrates the association between haplotype and fertility in a more stepwise fashion:


OK, so how does this vary across populations? The next figure comes straight out of the HGDP browser:


The variation on Met470Val exhibits an African/non-African difference. I assume that the variation in the non-African segment (compare the Tuscans to the Russians for example) is mostly noise because of the small sizes of some of the HGDP sample groups. The 0.10 frequency in the San sample is intriguing. I’ve never heard anyone assert that the HGDP San had likely non-Africa admixture, so existence of Val470 in this southern African group suggests to me that its appearance among non-Africans is not simply a random act of history (i.e., the outcome of the Out of Africa event and bottleneck). There may be common relaxations of ecological constraints on novel adaptation as one moves away from the tropics, or, new selective pressures.

I wanted to highlight the nature of the haplotype variation earlier because the authors ascertain the possibility of natural selection driving Val470 up in frequency among non-Africans using haplotype based tests of natural selection. In the figure below panel A shows the haplotype blocks. The short of it is that Val470 has a much longer haplotype than Met470, which stands to reason if Met470 was the ancestral state around which a lot of variation had crept in through drift (LCT, the gene which has a derived variant which confers lactase persistence has a very long haplotype on the selected allele because it rose in frequency faster than recombination and mutation could break apart the distinctive genetic profile of the original copy). Panel B shows extended haplotype homozygosity (EHH), while D shows iHS (integrated haplotype score). The latter is to some extent an elaboration of the former, able to detect selective sweeps which have not come close to fixation as those best detected by EHH. Panel C has Fst between African and non-African populations. Fst is a statistic which summarizes between-population variance. It is 0.43 for Met470Val, while genome-wide it’s 0.11. Both the Fst and iHS values for the SNP are on 5% tails of the distribution, illustrated by panel E.


The Fst differences, along with suggestions of homogeneity across the genetic scale for the allele, Val470, which confers reproductive fitness, strongly points to the possibility of natural selection. But the reproductive differences they found were large; why is Met470 still around? In the discussion there throw out some possibilities:

In fact, given the large fertility effects observed in the Hutterites, it is surprising that the Val470 allele has not gone to fixation in non-African populations. However, there might be several reasons why this has not occurred. First, the combined data on fertility effects of the Val470 allele indicate that this allele can be associated with both increased and decreased fertility, depending on genetic background. In the presence of the 5T allele at the intron 8 polyT locus, Val470 increases the risk of CBAVD and male infertility…In the absence of the 5T allele (as in the Hutterites), the Val470 allele is associated with increased male fertility relative to Met470. Although the mechanism of this interaction is obscure, it provides one example of counteracting variation that could increase the time to fixation of the Val470 allele. Second, as mentioned above, the Val allele could also be deleterious in certain environments, such as in the presence of specific pathogens or the 5T allele, as a result of its pleiotropic effects in other organ systems. Third, the fertility advantage we observed is restricted to males; we found no such association in Hutterite women…This would further slow the spread of the allele as there would be no selection advantage in half of all Val carriers. Lastly, this study was conducted in a population living under optimal conditions for reproductive success, including excellent nutrition and abundant food, access to modern health care, and negligible maternal mortality. Thus, estimates of fitness effects based on Hutterite fertility rates are likely inflated compared to the effects in human populations throughout most of evolutionary history, when competing selective pressures were likely more prevalent. Taken together, the lack of fixation of the Val470 alleles in populations outside of African may not be inconsistent with the fertility effects observed in the Hutterites, but rather suggestive of antagonistic effects of other genetic variations or environment factors that tempered these effects during most of human evolution.

Remember that we’ve seen for a while now that loci which exhibit signatures of positive natural selection are often not fixed to 100%. Why not? There have been many explanations offered, and the ones above fall into the general categories mooted. Looking at a relatively isolated population in a snapshot form may not give us a full impression of what’s going on. On the other hand, the Hutterite genetic uniformity presumably eliminates many of the confound signals which might otherwise obscure associations, so there are pluses and negatives to this sample. And of course evolution occurs over time, and peaking at slices tells us what it tells us, no more, no less. This is a place to start, but I bet it will make more sense once we have a better grasp of the distribution of dynamics across the genome. Scientific understanding often proceeds in a piecewise fashion, but the sum is greater than the parts as the sum often exhibits a structure of variation which allows us to squeeze more juice from the parts.

Citation: Kosova G, Pickrell JK, Kelley JL, McArdle PF, Shuldiner AR, Abney M, & Ober C (2010). The CFTR Met 470 allele is associated with lower birth rates in fertile men from a population isolate. PLoS genetics, 6 (6) PMID: 20532200

March 5, 2010

Heterozygote advantage in resistance to tuberculosis

Filed under: Genetics,Heterozygote advantage,Medical Genetics,TB — Razib @ 2:34 am

The lta4h Locus Modulates Susceptibility to Mycobacterial Infection in Zebrafish and Humans:

Exposure to Mycobacterium tuberculosis produces varied early outcomes, ranging from resistance to infection to progressive disease. Here we report results from a forward genetic screen in zebrafish larvae that identify multiple mutant classes with distinct patterns of innate susceptibility to Mycobacterium marinum. A hypersusceptible mutant maps to the lta4h locus encoding leukotriene A4 hydrolase, which catalyzes the final step in the synthesis of leukotriene B4 (LTB4), a potent chemoattractant and proinflammatory eicosanoid. lta4h mutations confer hypersusceptibility independent of LTB4 reduction, by redirecting eicosanoid substrates to anti-inflammatory lipoxins. The resultant anti-inflammatory state permits increased mycobacterial proliferation by limiting production of tumor necrosis factor. In humans, we find that protection from both tuberculosis and multibacillary leprosy is associated with heterozygosity for LTA4H polymorphisms that have previously been correlated with differential LTB4 production. Our results suggest conserved roles for balanced eicosanoid production in vertebrate resistance to mycobacterial infection.

Figure 6C has a mortality curves for patiens from meningeal TB:

Interestingly, heterozygote advantage against tuberculosis has been offered as the reason for the high frequency of the cystic fibrosis allele in Europeans. TB has been around for at least 10,000 years.

ScienceDaily covers this paper, and a few other TB related ones, in the most recent issue of Cell.

Citation: Tobin et al. The lta4h Locus Modulates Susceptibility to Mycobacterial Infection in Zebrafish and Humans. Cell, 2010; 140 (5): 717-730 DOI: 10.1016/j.cell.2010.02.013


Powered by WordPress