Razib Khan One-stop-shopping for all of my content

September 17, 2020

The genomic landscape of Brazil in 1950

Filed under: Admixture,Human Population Genetics,Human Variation,race — Razib Khan @ 12:12 am


A new whole-genome analysis out of Brazil has some interesting ancestry information. The preprint, Whole-genome sequencing of 1,171 elderly admixed individuals from the largest Latin American metropolis (São Paulo, Brazil):

As whole-genome sequencing (WGS) becomes the gold standard tool for studying population genomics and medical applications, data on diverse non-European and admixed individuals are still scarce. Here, we present a high-coverage WGS dataset of 1,171 highly admixed elderly Brazilians from a census-based cohort, providing over 76 million variants, of which ~2 million are absent from large public databases….

Admixed populations are useful for a lot of reasons. But let’s observe some things about his Brazilian population.

First, it’s old. The average age is 72, so these are people born in 1950. This is the genetic characteristics of Brazil in 1950 in many ways, not today. This is why you see so many individuals who self-identify as Asian who are nearly 100% Asian. These individuals are the children of Japanese immigrants. In 1950 the endogamy of the community was high. Today the youngest generation of Japanese Brazilians is 60% mixed.

Second, most of the ancestry of self-identified Brazilian whites in this sample is mostly white. Like the Japanese, a large number of these individuals are probably the children of European immigrants. I suspect this accounts for many of the 20% of the “white” sample that has no trace non-European ancestry. But observe that around another 20% has trace proportions (~1%) of non-European ancestry, mostly African.  My supposition, in this case, is that these are “old stock” white Brazilians. That this, one or both of their parents descend from Portuguese Brazilians who settled in overwhelmingly European areas and retain some non-European admixture due to long-term residence in Brazil. The remainder is white Brazilians who have substantial non-European ancestry, with a small minority whose proportions are quite high from a North American perspective.

A point of comparison is probably useful. About 95% of non-Hispanic whites in the United States seem to have almost no detectable non-European ancestry using this sort of model-based clustering. This illustrates the massive demographic difference between the USA and Latin American nations. The vast majority of white Latin Americans look quite Iberian, but the majority also have far more non-European ancestry than 95% of North American whites. This is partly a reflection of the smaller population sizes of native peoples in North America, and, the nature of hypodescent for people of any African ancestry in the United States, so that mixed individuals were integrated into African Americans.

Third, the people who are “mixed” and black in Brazil are more European than you might expect. All the estimates of European ancestry I’ve seen for self-identified black Brazilians (a somewhat protean category due to social changes over the past few generations) indicate higher European ancestry fraction than among African Americans (~20% median in the latter). Self-identified “mixed” Brazilians have more European ancestry than anything.

The native category is interesting because most of these people have only a minor component of that ancestry. Additionally, a huge number of white, mixed, and black Brazilians have native ancestry. This is not surprising from previous work. Ancestry deconvolution indicates this is an old admixture, and mtDNA lineages are more native than Y chromosomes. There was a sex asymmetry in the early settlement, and native women married into the settler population. Both black and white Brazilians (and mixed) have lots of native ancestry.

Finally, though there is some overlap between these groups (despite their average differences), I assume that the overlap is much greater in contemporary cohorts in terms of genomic ancestry. It will be interesting to see when we get temporal transects in Brazil to see how assortative mating does, or doesn’t, work.

Looking forward to more of this from Latin America. So many opportunities for admixture mapping!

The genomic landscape of Brazil in 1950

Filed under: Admixture,Human Population Genetics,Human Variation,race — Razib Khan @ 12:12 am


A new whole-genome analysis out of Brazil has some interesting ancestry information. The preprint, Whole-genome sequencing of 1,171 elderly admixed individuals from the largest Latin American metropolis (São Paulo, Brazil):

As whole-genome sequencing (WGS) becomes the gold standard tool for studying population genomics and medical applications, data on diverse non-European and admixed individuals are still scarce. Here, we present a high-coverage WGS dataset of 1,171 highly admixed elderly Brazilians from a census-based cohort, providing over 76 million variants, of which ~2 million are absent from large public databases….

Admixed populations are useful for a lot of reasons. But let’s observe some things about his Brazilian population.

First, it’s old. The average age is 72, so these are people born in 1950. This is the genetic characteristics of Brazil in 1950 in many ways, not today. This is why you see so many individuals who self-identify as Asian who are nearly 100% Asian. These individuals are the children of Japanese immigrants. In 1950 the endogamy of the community was high. Today the youngest generation of Japanese Brazilians is 60% mixed.

Second, most of the ancestry of self-identified Brazilian whites in this sample is mostly white. Like the Japanese, a large number of these individuals are probably the children of European immigrants. I suspect this accounts for many of the 20% of the “white” sample that has no trace non-European ancestry. But observe that around another 20% has trace proportions (~1%) of non-European ancestry, mostly African.  My supposition, in this case, is that these are “old stock” white Brazilians. That this, one or both of their parents descend from Portuguese Brazilians who settled in overwhelmingly European areas and retain some non-European admixture due to long-term residence in Brazil. The remainder is white Brazilians who have substantial non-European ancestry, with a small minority whose proportions are quite high from a North American perspective.

A point of comparison is probably useful. About 95% of non-Hispanic whites in the United States seem to have almost no detectable non-European ancestry using this sort of model-based clustering. This illustrates the massive demographic difference between the USA and Latin American nations. The vast majority of white Latin Americans look quite Iberian, but the majority also have far more non-European ancestry than 95% of North American whites. This is partly a reflection of the smaller population sizes of native peoples in North America, and, the nature of hypodescent for people of any African ancestry in the United States, so that mixed individuals were integrated into African Americans.

Third, the people who are “mixed” and black in Brazil are more European than you might expect. All the estimates of European ancestry I’ve seen for self-identified black Brazilians (a somewhat protean category due to social changes over the past few generations) indicate higher European ancestry fraction than among African Americans (~20% median in the latter). Self-identified “mixed” Brazilians have more European ancestry than anything.

The native category is interesting because most of these people have only a minor component of that ancestry. Additionally, a huge number of white, mixed, and black Brazilians have native ancestry. This is not surprising from previous work. Ancestry deconvolution indicates this is an old admixture, and mtDNA lineages are more native than Y chromosomes. There was a sex asymmetry in the early settlement, and native women married into the settler population. Both black and white Brazilians (and mixed) have lots of native ancestry.

Finally, though there is some overlap between these groups (despite their average differences), I assume that the overlap is much greater in contemporary cohorts in terms of genomic ancestry. It will be interesting to see when we get temporal transects in Brazil to see how assortative mating does, or doesn’t, work.

Looking forward to more of this from Latin America. So many opportunities for admixture mapping!

July 20, 2020

Solute carrier family genes are important…but how?

Filed under: Human Genetics,Human Population Genetics — Razib Khan @ 10:57 pm

Over the last ten years David Reich and other researchers have been constructing what is basically an atlas of human demographic history. Taking the genealogies written in our DNA, mapping them onto population bifurcations and admixtures, and synthesizing that back together with what we know from history and archaeology.

To a great extent, this is a project of human phylogenomics. Taking genome-wide data and constructing phylogenies out of it (or, perhaps more precisely, graphs, as this is on a intra-species time scale mostly and characterized by lots of gene flow across the “tips” of the tree). But there’s another thing you can do with modern human genomics and evolution: look at patterns of selection within the genome.

The Reich group has already started doing this. For example, they have adduced that CCR5 delta 32 mutation seems to have emerged out of the Yamnaya horizon.

Last fall, a paper came out in MBE, Ancestry-Specific Analyses Reveal Differential Demographic Histories and Opposite Selective Pressures in Modern South Asian Populations, which I gave a cursory read, but which I’ve looked at more closely. It takes a “natural experiment,” the emergence of Indian subcontinental populations from a massive admixture between lineages which diverged 40,000 years ago, and looks to see which genetic regions deviate from what you would expect based on overall genome.

The method is simple: imagine that “Ancestral North Indians” are fixed for an allele at a gene in one state and “Ancestral South Indians” are fixed in the other state. Indian populations are about 50:50 (with a range). If the frequency today in Indian populations is 95% for the allele that is from the “Ancestral North Indians”, one might be suspicious as to what’s going on. Or, vice versa.

In the paper, they used whole genomes to reconstruct the ancestral steppe/Iranian population without any residual “Ancient Ancestral South Indian” (AASI), the latter of which has no West Eurasian. They did the same for the AASI. These reconstructions are always dicey, but they made a good faith effort to check their work. On the whole, that section was impressive. The authors seem to be roughly aligned with the results in Narasimhan et al. 2019. The AASI seems to be homogeneous, with the exception of attempting to model them from donors which were Munda or Burusho, both groups with deep East Asian admixture (illustrating the problem with deconvolution). Second, they show that the AASI are not clustering with the Andamanese, which makes sense since these groups diverged closer to 40,000 years ago. Finally, the steppe/Iranian group looks most like Armenian middle-to-late Bronze Age people. A synthesis of steppe and some Iranian-like ancestry.

But this isn’t the most interesting part of the paper. It’s the selection. Here are the top, top, candidates:

Component# of Pops with Sig ValueGenes (±50-kb Region)
ANI22 (percentile = 99.9949) THUMPD3, SETD5 
21 (percentile = 99.9814) SNAP91, RIPPLY2, CYB5R4, MRAP2, CEP162, TBX18 
21 (percentile = 99.9814) TRIM31, TRIM40, TRIM10, TRIM15, TRIM26, HLA-L 
19 (percentile = 99.9383) Intergenic 
18 (percentile = 99.9195) ZNF681, ZNF726, ZNF254 
ASI−21 (percentile = 0.0057) RXFP3, SLC45A2, AMACR, C1QTNF3, ADAMTS12 
−16 (percentile = 0.038) SRXN1, SCRT2, SLC52A3 
−16 (percentile = 0.038) Intergenic 
−15 (percentile = 0.0757) Intergenic 
−14 (percentile = 0.1268) ATP6V1H, RGS20, TCEA1, LYPLA1, MRPL15 

 

I’ll quote the authors at length from the “Discussion”:

We also show that the interaction between alleles that were highly polarized between the two ancestry sources that admixed in South Asia caused patterns of admixture imbalance across the majority of sampled groups, hence unlikely explainable by population specific random drift, and perhaps due to positive or negative environmental pressures. Interestingly, we report how loci that include genes involved with diabetes (SETD5), diet (ZNF) and the immune response (HLA) show West Eurasian (N) haplotypes to be significantly more represented compared with the South Asian (S) counterparts. This might be a stark contrast to what is expected, given the long-term history of local adaptation of S haplotypes in local environment. We speculate that the diet-related signal may be linked with post-Neolithic dietary shifts that might have followed the arrival of the West Eurasian component in the area, whereas the overrepresentation of West Eurasian HLA haplotypes might have some similarity, although at a different time scale, with what has happened in Native American populations after recent colonization likely caused by European borne epidemic (Lindo et al. 2016).

On the other hand, the top region for significant enrichment of South Asian ancestry includes the rs16891982-G allele of SLC45A2 gene (associated with light skin pigmentation in West Eurasians), suggesting purifying selection at this locus following admixture…the overall abundance of these West Eurasian alleles is drastically reduced in 21 out of 25 South Asian populations analyzed here…Such a strong negative pressure against a light pigmentation allele may be explained by the high ultraviolet (UV) radiation at South Asian latitudes and this result seems to be further corroborated by similar N ancestry deficiencies in TYRP1 and BNC2 genes for as many as 11 South Asian populations (supplementary table 4, Supplementary Material online). However, purifying selection against maladaptive light pigmentation alleles in high UV environment is not observed for all pigmentation alleles; in fact, the rs1426654-A allele of the SLC24A5 gene…shows instead an increase of frequency in South Asian…Taken together, our results point to opposite pressures on some West Eurasian alleles involved in skin and eye pigmentation. On one hand, SLC45A2 seem to have undergone some selective pressure that removed most of West Eurasian alleles that arrived in the area after the admixture event. Conversely, the SLC24A5 (rs1426654-A) West Eurasian allele seems to have escaped such a negative pressure perhaps thanks to its apparent neutral role with respect to susceptibility to skin carcinoma caused by UV radiation…

As I said, in the phylogenomic analysis above the authors suggest that the AASI population was homogeneous. I think this suggests that a single ancestral population was absorbed into expanding Iranian-related-farmers in NW South Asia. The prevalence of deeply diverged haplogroup M on the mtDNA in subcontinental peoples points to female mediated admixture. The positive selection for various “lifestyle” alleles indicates to me that expanding Iranian-related-farmers absorbed AASI tribes, in particular the women, and assimilated them to the new lifestyle.

The results from pigmentation are surprising, but not shocking. Knowing what I know about the ancestral frequency distribution of the various alleles, it was clear that the derived fraction of SLC24A5 was enriched. A lot of the other ones that are responsible for variation in Europeans looked either selected against or, the ancestral Indo-Aryans et al. were not quite like modern Europeans. These data point to in situ selection.

But why selection for some pigmentation alleles and not others? First, I don’t think cancer is a major selective pressure. That happens late in life. Rather, I think SLC24A5 in the derived variant does something that has nothing to do with pigmentation. It was positively selected among the Khoisan people of Southern Africa and looks to have been selected in Ethiopia as well after the admixture event. In Europe itself its frequency is so high that there has clearly been lots of positive selection since the “great admixture.”

As far as the other alleles, perhaps it is pigmentation. But perhaps it is something else?

Round and round we’ve been going with these genome-wide studies, but in the 2020s I think biologists who know the molecular pathways in a way that plumbs the depths of pleiotropy need to get involved.

May 19, 2020

Correlated response is a big story of selection

Filed under: Human Population Genetics — Razib Khan @ 10:55 pm

Adaptation is clearly one of the most important processes in understanding how evolution occurs. In a classical sense, it’s easy to understand. Parallel adaptations in body plans make dolphins and swordfish shaped the same. It’s physics.

But with the emergence of DNA, a lot of the focus on adaptation has been displaced to the signatures of natural selection on the molecular level. Phenotypes are controlled by variation in genotypes, and instead of description and hypothesizing, researchers can actually infer from the genetic patterns the history and arc of adaptation. 

At least that’s the theory.

The initial tests for signatures of natural selection focused on adaptation between species. For example, Tajima’s D. Usually this took the form of comparing variation across two lineages of Drosophila. In the 2000s with genome-wide data new methods predicated on looking at ‘haplotype structure’ (variation across sequences of genes) emerged. Instead of between species, these methods focused on the selection within species (e.g., why are some humans adapted to malaria?). These methods were good at picking up strong signals at a few genes where the selective sweeps were recent.

But as datasets and genomics got bigger and better researchers focused on more fundamental patterns and analyses, such as looking at ‘site frequency spectra.’ Ultimately the goal was to go beyond selection at a single locus (e.g., lactase persistence), and understand polygenic characteristics (e.g., height). Obviously, this is much harder because polygenic characters are distributed across many genetic loci, and issues of statistical power are always going to loom large (and there is the soft vs hard sweep issue too!).

A new preprint is an excellent introduction to this wild world, Disentangling selection on genetically correlated polygenic traits using whole-genome genealogies:

We present a full-likelihood method to estimate and quantify polygenic adaptation from contemporary DNA sequence data. The method combines population genetic DNA sequence data and GWAS summary statistics from up to thousands of nucleotide sites in a joint likelihood function to estimate the strength of transient directional selection acting on a polygenic trait. Through population genetic simulations of polygenic trait architectures and GWAS, we show that the method substantially improves power over current methods. We examine the robustness of the method under uncorrected GWAS stratification, uncertainty and ascertainment bias in the GWAS estimates of SNP effects, uncertainty in the identification of causal SNPs, allelic heterogeneity, negative selection, and low GWAS sample size. The method can quantify selection acting on correlated traits, fully controlling for pleiotropy even among traits with strong genetic correlation (|rg| = 80%; c.f. schizophrenia and bipolar disorder) while retaining high power to attribute selection to the causal trait. We apply the method to study 56 human polygenic traits for signs of recent adaptation. We find signals of directional selection on pigmentation (tanning, sunburn, hair, P=5.5e-15, 1.1e-11, 2.2e-6, respectively), life history traits (age at first birth, EduYears, P=2.5e-4, 2.6e-4, respectively), glycated hemoglobin (HbA1c, P=1.2e-3), bone mineral density (P=1.1e-3), and neuroticism (P=5.5e-3). We also conduct joint testing of 137 pairs of genetically correlated traits. We find evidence of widespread correlated response acting on these traits (2.6-fold enrichment over the null expectation, P=1.5e-7). We find that for several traits previously reported as adaptive, such as educational attainment and hair color, a significant proportion of the signal of selection on these traits can be attributed to correlated response, vs direct selection (P=2.9e-6, 1.7e-4, respectively). Lastly, our joint test uncovers antagonistic selection that has acted to increase type 2 diabetes (T2D) risk and decrease HbA1c (P=1.5e-5).

There’s a lot going on here. This is my favorite passage:

To address these issues, we recently developed a full-likelihood method, CLUES, to test for selection and estimate allele frequency trajectories. 21 The method works by stochastically integrating over both the latent ARG using Markov Chain Monte Carlo, and the latent allele frequency trajectory using a dynamic programming algorithm, and then using importance sampling to estimate the likelihood function of a focal SNP’s selection coefficient, correcting for biases in the ARG due to sampling under a neutral model.

Alrighty then! Someone’s a major-league nerd.

The preprint is fine, but ultimately this is something you get a “feel” for by working with models, data, and general analyses in the field. And I don’t have a strong feel since I don’t work with these sorts of data and questions myself. So what do I know? That being said, I like the preprint because it satisfies an intuition I’ve long had: correlated response is a big part of the story of polygenic selection.

Basically, you have to remember that complex traits are subject to variation at a host of genetic positions. And genetic variants rarely have singular effects. That is, one locus usually exhibits pleiotropy. The genetic effect shapes a lot of characteristics. Therefore, if there is a strong selection on a gene, more traits than simply the target of selection will be impacted. In animal breeding making huge, meaty, fast-growing lineages can render them infertile if selection is taken too far. That’s a bad correlated response.

After correcting for the genetic correlation the authors note that some traits, such as EDU and hair color, are not really selected directly at all. This is like the fact that we know EDAR is associated with hair thickness and is a strong target of selection. We have no idea what the trait of interest is. But it’s a pretty big deal. All these quantitative traits controlled by variation across the genome are being reshaped by adaptation on other traits. What are those traits? This preprint doesn’t answer that really.

Hopefully, we’ll make some headway in the 2020s because we’re definitely looking through the mirror darkly.

May 17, 2020

Knanaya & Kerala: perhaps there is some different down south?

Filed under: Human Population Genetics — Razib Khan @ 2:03 am


Over the past few months I have been getting together some samples from people from Kerala, with a focus on Knanaya Christians. A subset of the brother St. Thomas Christian community, two things have jumped out in my analyses:

– they are quite endogamous

– they are shifted off the ‘India-cline’

More precisely, like Cochin and Mumbai Jews, they are often shifted toward Middle Eastern populations. This is relevant because the Knanaya believe themselves, like most St. Thomas Christians, descended in part from Jews or Christians from the Middle East.

All that being said, looking more deeply into the data I’m not quite as sure. One of the reasons is that Kerala may not be as “structured” as other parts of India. Some of this is well known. The Nair samples I have are shifted toward South Indian Brahmins, which is plausible in light of connections between Nairs and Brahmins.The Brahmin-adjacent Ambalavasi seem quite similar to Brahmins. These are not surprising. But, Kerala samples I have as a whole seem notably shifted on the India cline more toward the “north” than I would have expected. This could be due to gene flow from without and within Kerala, in a way that is not typical in other parts of the subcontinent.

I say this because even the Ezhava, who were basically what we’d call a Dalit community (no longer today), show a shift.

Instead of talking, let me post some admixture plots (unsupervised):

Now, supervised:

Now TreeMix:

Here is an admixturegraph (using the Narasimhan et al. right-populations):

Test_PopSteppeAHGIndusValley
Bengali0.1520.4130.435
Cochin_Jews0.2120.1880.6
K_Bunt0.1970.3070.496
K_Ezhava0.1470.2810.572
K_Iyer0.2710.1640.565
K_Knanaya0.1490.1860.665
K_Mapilla0.1410.2930.565
K_Nair0.1720.2440.584
K_Nambudiri0.2480.1830.569
K_Nasrani0.1340.270.596
K_Poduval0.20.2760.524
K_Vaniya0.2150.1390.646
K_Varma0.2130.1160.67
Brahui0.236-0.1810.945
Mumbai_Jews0.271-0.1340.863
Patel0.1670.270.562
Pulliyar0.050.5750.375
TamilBrahmin0.1910.2620.547
UP_Brahmin0.2930.2230.484
Velamas0.1010.2980.601

I ran f3-stat. Here it is filtered of any z-scores that are > -2.

Thoughts?

Older Posts »

Powered by WordPress

Do NOT follow this link or you will be banned from the site!