Razib Khan One-stop-shopping for all of my content

January 16, 2012

The milkmen

Dienekes and Maju have both commented on a new paper which looked at the likelihood of lactase persistence in Neolithic remains from Spain, but I thought I would comment on it as well. The paper is: Low prevalence of lactase persistence in Neolithic South-West Europe. The location is on the fringes of the modern Basque country, while the time frame is ~3000 BC. Table 3 shows the major result:

Lactase persistence is a dominant trait. That means any individual with at least one copy of the T allele is persistent. As Maju noted a peculiarity here is that the genotypes are not in Hardy-Weinberg Equilibrium. Specifically, there are an excess of homozygotes. Using the SJAPL location as a potentially random mating scenario you should expect ~7 T/C genotypes, not 2. Interestingly the persistent individual in the Longar location also a homozygote.

HWE makes a few assumptions. For example, no selection, migration, mutation, or assortative mating. Deviation from HWE is suggestive of one of these dynamics. The sample size here is small, but the deviation is not to be dismissed. Recall that lactase persistence has dominant inheritance patterns. If the trait was being positively selected for you would only need one copy. The enrichment of homozygotes is unexpected if selection in situ is occurring here. It can not be ruled out that one is observing the admixture of two distinct populations. One generation of random mating would generate HWE, but when populations hybridize in realistic scenarios this is not always a plausible assumption. Rather, assortative mating often persists over the generations, slowing down the diminishing of population substructure.

Stepping back from speculation in this case what can we say? First, the LCT locus has a large mutational target. The trait of lactase persistence has arisen multiple times via different mutational events across the Old World. But, there does seem to be one particular variant which is found from Spain to Northern India. There is some circumstantial evidence that the allele had its origin somewhere in Central Eurasia, but currently its modal frequency is in Northern Europe, Scandinavia and Germany. The region in the genome around this mutation is characterized by a very long haplotype. It is one of the most definitive loci as a candidate for natural selection in the human genome. There is now a fair amount of ancient DNA evidence that lactase persistence in Europe is a feature of the last ~5,000 years or so. Among the modern Basques the frequency of the allele is 66 percent.

For me the key issue is teasing apart the role of migration and selection in each specific case. It does not seem to be correct that the frequency of the -13910T LCT allele in Basques and Punjabis is reflective of the frequency of recent common ancestry. That implies that natural selection is at work at this locus. On the other hand, the haplotype which is present in both the Basque and Punjabis is likely to be descended from a common set of individuals, implying that there is a genealogical chain connecting these two very distinct and distant Eurasian populations. Therefore, we can potentially make some inferences about the power of migration in spreading distinctive alleles. Often we partition selection from genealogical information, because selection so often serves to distort the signal. But the genealogical patterns may lay at the heart of the distribution of different natural selective events at the LCT locus.

Overall, I would say that the results from ancient DNA are disordering and clouding simple elegant models. One hopes and presumes that as sample sizes increase in this domain we’ll start to see more clarity as new paradigms crystallize.

Citation: European Journal of Human Genetics, 10.1038/ejhg.2011.254

December 7, 2010

One diabetes gene to explain it all?

President William Howard Taft

It is the best of times, it is the worse of times. On the one hand the medical consequences of human genomics have been underwhelming. This is important because this is the ultimate reason that much of the basic research is funded. And yet we’ve learned so much. The genetic architecture of skin color has been elucidated, and we’ve seen a clarification of patterns of natural selection in the human genome. The finding last spring of Neandertal admixture in modern human populations is perhaps the most awesome pure science finding of late, coming close to resolving a decades old debate in anthropology. This doesn’t cure cancer, but it does connect the dots about the human past, and that’s not trivial. We are species haunted by our memories, so we might as well get them right!

But all hope is not lost. Research continues. And one area which general surveys of genomic variation have usually shown to be targets of natural selection, and, also have clear and immediate biomedical relevance, is that of metabolism. How we eat, and how we process and integrate the food we eat, is of obvious fitness relevance in the evolutionary and medical senses. It turns out that there is even variation in our saliva which is probably due to natural selection. The combination of diversity in human cuisine and susceptibility to the diseases of modern life indicate possibilities as to the relationship between past selection pressures and contemporary patterns of genetic variation. Of course one has to tread softly in this area, there are the inevitable confounds of environment, as well the unfortunate probability of any given locus being of small effect size in its influence on any given trait.

ResearchBlogging.orgA new paper in Genome Research reports a SNP which seems to have been subject to natural selection in Eurasians within the last 10,000 years. This variant is located within an exon on a gene, GIP, which produces peptides critical in the regulation of various metabolic pathways, in particular insulin response. A possible biomedical relevance to risk susceptibility is then explored subsequent to the evolutionary genomic preliminaries. Adaptive selection of an incretin gene in Eurasian populations:

Diversities in human physiology have been partially shaped by adaptation to natural environments and changing cultures. Recent genomic analyses have revealed single nucleotide polymorphisms (SNPs) that are associated with adaptations in immune responses, obvious changes in human body forms, or adaptations to extreme climates in select human populations. Here, we report that the human GIP locus was differentially selected among human populations based on the analysis of a nonsynonymous SNP (rs2291725). Comparative and functional analyses showed that the human GIP gene encodes a cryptic glucose-dependent insulinotropic polypeptide (GIP) isoform (GIP55S or GIP55G) that encompasses the SNP and is resistant to serum degradation relative to the known mature GIP peptide. Importantly, we found that GIP55G, which is encoded by the derived allele, exhibits a higher bioactivity compared with GIP55S, which is derived from the ancestral allele. Haplotype structure analysis suggests that the derived allele at rs2291725 arose to dominance in East Asians ∼8100 yr ago due to positive selection. The combined results suggested that rs2291725 represents a functional mutation and may contribute to the population genetics observation. Given that GIP signaling plays a critical role in homeostasis regulation at both the enteroinsular and enteroadipocyte axes, our study highlights the importance of understanding adaptations in energy-balance regulation in the face of the emerging diabetes and obesity epidemics.

This is a paper with several moving parts.

-There is genomics (the broad sweep of the genome)

-Genetics (a focus on a few genes and their consequences)


-And some allusion to epidemiology, as befits a paper which comes out of a medical department

The first observation is that rs2291725 differs a great deal across populations. As I said, it’s a SNP on an exon in GIP. Not only that, it’s nonsynonomous, which means that it’s in a position to change the structure and therefore function of the biochemical which the sequence is ultimately coding for. The T allele is the ancestral variant, while the C allele is the derived one. That means that C arose as a mutation against the background of T. There is a figure which shows the geographical distribution of the variance on this SNP from the HGDP data set in the paper, but I think the HGDP browser produces a crisper display, so here it is:


As you can see the ancestral allele is dominant in Africa. In several populations it is fixed. In contrast among non-African populations there’s quite a bit of variation. In East Asia the derived variant is at a high frequency, though not fixed. In West Eurasia and North Africa the two variants are at rough balance, more or less. Finally, in the New World the derived variant is found in appreciable proportions, but the ancestral variant of the SNP is found at much higher proportions than in other non-African populations. Seeing as how Amerindians derive from a branch of East Eurasians, common descent from an ancestor with the derived allele can not explain the frequency discrepancy. Interestingly the HGDP Melanesians have amongst the highest frequencies of the derived allele in the data set.

In any case, most of the analysis was not done with the HGDP sample, but with the first two phases of the HapMap. The marker density is richer in this sample, and obviously it is easier to compare a few populations than dozens. So the primary populations of comparison in this study were the Chinese + Japanese (ASN), Utah Whites (CEU), and Yoruba from Nigeria (YRI). It was immediately noticeable that when doing pairwise comparisons between two populations in the HapMap data set that the SNP of interest in GIP was exceptional in between population difference when set against other nonsynonymous SNPs. The chart below shows the SNP in red, with the full distribution curve of Fst (proportion of between population difference) illustrated by the bars in blue. rs2291725 is the top 0.5% of Fst difference between ASN and YRI.


The expected Fst between continental races is on the order of ~0.15. The ASN vs. YRI difference is far greater than that, and even more exceptional when you note the skew of the distribution. As it happens there’s HapMap3 data on this SNP as well. It doesn’t add much value to the HGDP, but does confirm the general findings:


Population descriptors:
ASW (A): African ancestry in Southwest USA
CEU (C): Utah residents with Northern and Western European ancestry from the CEPH collection
CHB (H): Han Chinese in Beijing, China
CHD (D): Chinese in Metropolitan Denver, Colorado
GIH (G): Gujarati Indians in Houston, Texas
JPT (J): Japanese in Tokyo, Japan
LWK (L): Luhya in Webuye, Kenya
MEX (M): Mexican ancestry in Los Angeles, California
MKK (K): Maasai in Kinyawa, Kenya
TSI (T): Tuscan in Italy
YRI (Y): Yoruban in Ibadan, Nigeria

Now that they’ve established between population variation at the SNP, what about the structure around the SNP? Remember, the SNP is one base pair. T in the ancestral state, C in the derived. The patterns of variation flanking the SNP in GIP can tell us a lot. What they found was this:

- Africans have several different haplotypes around the T allele. A haplotype is just a set of correlated markers

- The C allele in East Asians seem to be embedded within one haplotype, or set of markers

- There was a lot of linkage disequilibrium around the C allele in East Asians

In East Asians both EHH and iHS were consistent with, if not necessarily suggestive of, selection. A plausible scenario is that the C allele was subject to a powerful bout of natural selection recently, and the allele rose so rapidly in frequency that a selective sweep dragged along the flanking regions of the genome. This would homogenize the variance in that genic region within the population in question (East Asians), as the numerous other haplotypes would decline in proportion. To show the relationships of the various haplotypes within the three HapMap populations being analyzed here they produced an unrooted tree. Observe that the haplotype in which the derived variant is embedded has only Asians and Europeans, and is on a separate branch by itself:


I noted above that just because there is a lot of linkage disequilibrium and haplotype block structure in this region of the genome, it doesn’t necessarily mean that it was a target of natural selection. There may have been stochastic phenomenon which produced these results, and so our inference would be a false positive. To check for this they ran several models and simulations which varied demographic parameters under neutral (non-selective) conditions, and for the Asian sample the iHS scores were generally not as low as those for the SNP of interest. This does not “prove” that demography can not explain these results, but it does shift the probability more toward natural selection than before.

The circumstantial evidence presented above is that the derived allele rose to frequency relatively recently (in general LD decays rapidly over time, so these tests detect more recent selective or demographic events). They ran a simulation under neutral parameters, and for the frequency of the derived haplotype it would take 100-500,000 years for the various populations to reach the values which we see (starting from the initial mutant gene copy). The latter figure is outside the bounds of modern humanity, while the former probably pre-dates the ”Out of Africa” event. It is implausible that so much haplotype structure could be preserved over time, because recombination over the generations breaks apart associations between markers. Using the recombination rates, which would slowly degrade long haplotypes in the genome, the authors inferred that the C allele and its haplotype began to rise in frequency on the order of 12-2,000 years before the present.

Why would an allele rise to frequency within the past 10,000 years? The authors gave the game away in the abstract: humans shifted to different modes of primary production after the rise of agriculture. This is where the role of GIP in producing peptides which have a role in regulating our biochemistry is relevant. GIP is of a class of hormones found in the intestine called incretins:

Incretins are a group of gastrointestinal hormones that cause an increase in the amount of insulin released from the beta cells of the islets of Langerhans after eating, even before blood glucose levels become elevated. They also slow the rate of absorption of nutrients into the blood stream by reducing gastric emptying and may directly reduce food intake. As expected, they also inhibit glucagon release from the alpha cells of the Islets of Langerhans….

500px-Incretins_and_DPP_4_inhibitors.svgIncreased insulin reduces blood sugar. Diabetes is a malfunction of the insulin release mechanism, and so blood sugar begins to rise as individuals don’t uptake their glucose. Glucagon has the opposite effect, increasing blood sugar. But just because there is a change in a nonsynonymous position in an exonic region of a gene of relevance to the pathway, it doesn’t mean that that necessarily impacts the pathway which is illustrated to the left. And for natural selection to have any traction it needs to have an impact on some sort of concrete biological process (unless we’re talking intra-genomic competition of some sort).

It turns out that rs2291725 is actually just outside the primary coding region for the GIP peptide. For it to be a functional variant there needs to be more to the story. As it turns out, there are other less common variants which ware modified by changes at this SNP, GIP55S and GIP55G. The first is produced by the ancestral T allele, and the second by the derived C allele. GIP55S and GIP55G are also found in the intestine, though they only constitute a few percent of the total GIP.

gipactBut here’s where it gets really interesting: GIP55G exhibits more bioactivity over the long term. In other words it seems to be more potent the generic GIP or GIP55S, the ancestral variant. They’ve gone from supposition based on the functional significance of the broader gene, to a connection between the T→C transition over the last 10,000 years. As it turns out it may be that those with GIP55G would have a stronger insulin response, and so reduce blood sugar faster, than those without.

It doesn’t take a genius to figure out where there’re going with this. The relationship between insulin response and carbohydrates in our day and age is fraught. But we already suspect that carbs have reshaped the human genome through copy number variation in the amylase gene. It is interesting though that the derived variant has not fixed. That is, it hasn’t replaced the ancestral variant. This may be due to dominance, so that one copy is almost as efficacious as two, or, it may be due to balancing selection of some sort, which the authors suggest in the text. At this point it’s time to jump to the discussion and let the authors speak for themselves. They start out well:

Based on the gene age estimation and biochemical analyses, our study revealed a functional mutation that is associated with the selection of the GIP locus in East Asian populations ~8100 yr ago and the presence of a cryptic GIP isoform. Specifically, we showed that the inventory of human GIP peptides has recently diverged and that individuals could express three different combinations of GIP isoforms (GIP, GIP55S, and GIP55G) with distinct bioactivity profiles. Future study of how this phenotypic variation affects glucose and lipid homeostasis in response to different diets and of which physiological variations in humans can be attributed to prior gene–environmental interactions at the GIP locus is crucial to a better understanding of human adaptations in energy-balance regulation.

As I observed above many of the researchers have a biomedical background, and the NIH is funding this. The evolutionary anthropological findings, cautious as they are, are fascinating and of deep interest. But I don’t think this is going to go anywhere:

It was hypothesized by Neel almost 50 yr ago that mismatches between prior physiological adaptations and contemporary environments can lead to health risks because the ancestral variants that have been selected for the organism’s fitness or reproductive success may not be optimal for the individual’s health in the new environment…In support of this thrifty genotype hypothesis, a number of genes in humans and house mice have been implied to have coevolved with the emergence of agricultural societies…and a rapid shift in diets is associated with the detrimental effects on human survival in a number of human populations…Conceptually, the serum-resistant GIP55G carried by the GIP103C haplotype may have been beneficial for individuals who have unconstrained access to the food supply in many agricultural societies by preventing severe hyperglycemia. As selection pressure changed in these societies, the ancient GIP103T haplotype could have become a liability and conferred a loss of fitness in the new environment. In addition, we speculate that the selection of GIP in East Asians may contribute to the heterogeneity in the risk of diabetes among major ethnic groups at the present time….

Do you believe that the Han Chinese have had a surfeit of food compared to Africans over the past 10,000 years? Or compared to Europeans? Indians have had more food than Africans? The populations of the New World are in a food-poor environment? This doesn’t make any sense as an evolutionary explanation because the stable state for most of human history has been one of Malthusianism. A few people had a lot of food, ergo, the association of wealth with corpulence. Additionally, one can imagine that societies transitioning between modes of production would have a period when land would be in surplus and there was a lot of food. But for most of history life was grinding. This is simply an unbelievable story. Additionally, this SNP can’t explain most of the variation in diabetes. South Asians have the highest rates in the world, but they have appreciable proportions of the derived variant. I am of the CC (derived-derived) genotype myself (I justed checked on 23andMe), and I have a family risk of diabetes, so I know to ignore the relevance of these findings for myself when it comes to personal risk assessment.

There is probably not going to be one gene that explains diabetes, or obesity, etc. We already knew that, but there is a strange kabuki theater which goes on whereby research groups pretend as to the high significance of one locus, because how is it going to look to a granting agency that you’re out or explain ~1% of the variance in a trait for trivial predictive value? And yet usually they’re honest enough in the discussions to suggest that one finding needs to be integrated into a broader picture…as in the hundreds of other genes of interest!?!?!

This paper is fascinating as a work of human evolutionary history. They don’t have a good story, but they have results which need to be integrated into the bigger framework. But the paper is also a story of the culture of science today, driven by biomedical relevances which are often simply phantoms.

Citation: Chang CL, Cai JJ, Lo C, Amigo J, Park JI, & Hsu SY (2010). Adaptive selection of an incretin gene in Eurasian populations. Genome research PMID: 20978139

Powered by WordPress