Open Thread, 11/24/2014


Been busy.

The above are a set of images generated from 23andMe genotype data courtesy Mark Shriver. Comments? I’ll post on this at some point shortly.

Also, any thoughts on posts of note that I’ve put up over the years from long time readers? You can see some at the bottom of this page.

Finally, this link to reader survey results should work. With about ~380 respondents it looks like most of the core audience has followed me from Discover.

Putting IBD to bed


IBD plays a big part in my understanding of inheritance. I don’t mean inflammatory bowel disease. Nor do I mean isolation by distance. I’m talking identity by descent. Assuming your parents are “unrelated” then you are identical by descent with your sibling across some portion of your genome. You inherit identical segments from your parents, though due to recombination they will usually be non-identical at least across some part of the chromosome. Because of the law of segregation you should overlap 25% with your full sibling on the copy of the genes inherited from your mother and father (double that, and you get 50%). But this is an expected value. As it happens many siblings are not exactly 50% (e.g., I know of full siblings who share 40% of their genomes identical by descent from their parents). In the pre-genomic age this detail about variation was elided because usually you couldn’t precisely estimate the identity by descent. Rather, you just assume that you share 1/2 your genome with your full sibling, 1/4 with a half sibling or aunt/uncle or grandparent, 1/8 with your first cousin, and so forth.

Genomics has changed that. I can tell you for example that my son is ~20% identical by descent with one of his grandfathers. And, more surprisingly, he’s 18.9% identical by descent with one of his great-aunts! If expectation held his great-aunt should be 1/2 as related to him as his grandfather, but expectation did not hold. The figure above is from a review, Relatedness in the post-genomic era: is it still useful?:

Relatedness is a fundamental concept in genetics but is surprisingly hard to define in a rigorous yet useful way. Traditional relatedness coefficients specify expected genome sharing between individuals in pedigrees, but actual genome sharing can differ considerably from these expected values, which in any case vary according to the pedigree that happens to be available. Nowadays, we can measure genome sharing directly from genome-wide single-nucleotide polymorphism (SNP) data; however, there are many such measures in current use, and we lack good criteria for choosing among them. Here, we review SNP-based measures of relatedness and criteria for comparing them. We discuss how useful pedigree-based concepts remain today and highlight opportunities for further advances in quantitative genetics, with a focus on heritability estimation and phenotype prediction.

If you have academic access, you should read it. If you don’t, they seem to be proposing that we move beyond the confusing concept of identity by descent, and just think in terms of a coalescent framework. It does strike me that classical IBD-thinking is a historical contingency of genetics’ emergence in part in an age where pedigrees were very prevalent tools in interrogating patterns of inheritance. All for the good. But for non-geneticists I would suggest that these new methods which are able to pinpoint with fine precision patterns of genetic variation across pedigrees will allow us to explore in much more detail the nature of the heritability of many quantitative traits.

The X/(7 billion)-Men

516JD1M3N5LInteresting piece in MIT Tech Review by Antonio Regalado, The Search for Exceptional Genomes: They walk among us. Natural experiments, living ordinary lives, unaware that their genes may hold the clue to the next superdrug. As you certainly know by now a lot of the hype over the Human Genome Project turns out to have been unwarranted. But one thing about technology is that often people overestimate the short-term windfall, and underestimate the long-term consequences. Here’s the science & tech:

Ten years ago, scientists discovered that some people are naturally missing working copies of a gene known as PCSK9. The consequences of the mutation were extraordinary. These people, including a Texas fitness instructor, a woman from Zimbabwe, and a 49-year-old Frenchman, had almost no bad cholesterol in their blood. Otherwise, they were perfectly normal.

Drug companies pounced on the clue. To lower cholesterol, they would also try to block PCSK9. Now two separate drugs that disable the gene’s activity are nearing FDA approval. People taking the medications have seen their cholesterol levels plummet dramatically, sometimes by 75 percent.

Most large-scale genetic research is a search for the causes of disease, not the nature of health. But in 2008, Daniel MacArthur, a computational geneticist now at the Massachusetts General Hospital, became interested in how frequently genes are completely dysfunctional in healthy people. Along with collaborators, he scrutinized the genomes of 185 people.

MacArthur’s analysis, completed in 2012, found that each of us has, on average, one entirely defective copy of about 80 genes, and another 20 genes for which neither copy works. In other words, everyone’s genome is a little dysfunctional. (Most genes are present in matching pairs—one inherited from your mother, and one from your father.)

But here’s a fascinating personal twist. Just a heads up, I met Eric and Sonia at ASHG.

That turns out to be a question of urgent importance to husband and wife scientists Eric Minikel and Sonia Vallabh, who have been working alongside MacArthur at Massachusetts General Hospital. Vallabh’s mother died of fatal familial insomnia, an extraordinarily rare disease in which a misfolded protein builds up in the brain, causing dementia and early death. Vallabh has inherited the gene mutation that causes FFI, and has a 100 percent chance of developing the illness, unless some kind of treatment is developed.

Before her diagnosis three years ago, Minikel was an urban planner and Vallabh had gone to law school. But they switched careers and became scientists in order to try to cure Vallabh before she falls ill.

Vallabh’s mutation is the opposite of a knockout—it adds an unwanted function, causing her prion protein to fold in a way that it shouldn’t. This month she switched to another Boston laboratory to explore whether an advanced form of gene therapy, called genome editing, might allow her to eliminate the prion gene from her brain cells altogether.

But would doing so be dangerous? Knockout mice that have been genetically engineered to lack the prion gene seem to be mostly normal, but that’s no guarantee that the same is true of humans. For instance, the knockout surveys carried out by MacArthur’s lab have found more than 40 healthy people with mutations known to prove fatal to mice. Vallabh says she worries that if she were to succeed in eliminating her prion gene it could cause another disease, perhaps equally grave.

In the compressed time frame Vallabh faces—she has perhaps 20 years to cure herself—finding a living person without the prion gene would be one important clue. This year, she and Minikel carried out such a search across DNA sequences of more than 60,000 people as part of MacArthur’s Knockout Project.

They turned up three individuals missing one copy of the prion gene—but, so far, no one who is missing both copies.

Minikel says it may mean that people can’t live without the gene. Or it could be that their database isn’t yet big enough. The gene is small and therefore less likely to be affected by mutations. Working quickly with a pad and paper, with Vallabh looking over his shoulder, Minikel roughly estimated it might take a database of a billion people to know for sure.

If Sonia has 20 years is a billion people doable? I haven’t done the math, but the way the technology is advancing it seems plausible, and the eventual rate limiting step is going to be sociological.

Genes are a concept and a thing


Quantitative Genetics

A new study in Psychological Science, Genome-wide scan demonstrates significant linkage for male sexual orientation, is getting breathless coverage in the press. Representative: “A genetic analysis of 409 pairs of gay brothers, including sets of twins, has provided the strongest evidence yet that gay people are born gay.” As a matter of fact I don’t think this is the strongest evidence that people are “born gay.” The study is decent, and better than what has come before, but the authors themselves in the text acknowledge issues of statistical power. These results could be right, but I doubt this is going to end up being a robust signal.* That being said, at some point in the next ten years I’m pretty sure we’ll localize the genes which carry variants which do result in a higher than typical likelihood of an individual exhibiting homosexual orientation. It’s a matter of time, not if. Behavioral genomics was way too optimistic in the interval 2000 to 2010. I suspect we’re starting to become too pessimistic in the interval 2015-2025.


Molecular Genetics

But the bigger point is that we already know homosexuality has a heritable component. We don’t need to know what genes, we just know that related individuals exhibit a propensity for the trait in direct proportion to their relatedness. Heritability is just the proportion of the variation of the trait (e.g., homosexual vs. heterosexual) within the population that can be explained by the variation of the genes in the population. Heritability of homosexuality is modest, but it is there nevertheless, so there is some biological component.** We’ve known this for a long time. A modest linkage study doesn’t really shift the need much at all. It’s asking and exploring somewhat different questions. It assumes heritability, and is looking to uncover its genetic architecture.

Mendelian Genetics

Mendelian Genetics

The problem here is that the public and the press conflate the concrete biophysical instantiation of genes with the abstract concept of the gene. The latter pre-dates the former by about 50 years. For two generations geneticists developed their field without a precise understanding of the biophysical mechanism of inheritance. But that’s because all Mendelian, and evolutionary, genetics requires is that the units of inheritance follow regular laws across the generations. Quantitative genetics, arguably a branch of applied statistics, is even less tied to the concrete unit of genetic transmission in the form of the DNA molecule.

Concrete physical locations of genes as structures in the material world are important data. In a field like biomedicine it has changed the whole game. Genomics as an enterprise wouldn’t really be possible in a practical sense without our understanding of the physical basis of inheritance in DNA. But that doesn’t make DNA necessarily a game change in understanding whether a trait is heritable or not. Rather, it adds detail and specificity to how a trait is heritable. For applied science the “how” is essential. But for basic research it is not the be all and end all.

* Two reasons that I’m skeptical. First, large effects like this often don’t pan out for behavioral traits. Second, I doubt it’s so simple as a common large effect variant because homosexuality almost certainly decreases fitness directly. For a variant to get moderately common with this sort of effect it had to have another outcome which was strongly favored.

** Note that genetics does not include all biological factors. E.g., developmental stochasticity or some early environmental perturbation in utero with lasting consequences.

SciReader, more bookmarks you might not get to….

scireaderThe Pritchard Lab at Stanford is beta testing a new tool to help you sort through the tsunami of publications coming at you, SciReader. Registration is easy enough, and I just imported my library from PubChase, which does something very similar. Right now the recommendations from SciReader aren’t really relevant, despite the fact that I’ve put in topics, authors, and a rather large PubChase library. So I assume it’s waiting me to “like” more papers. Fair enough. (or there might be a latency in relation to how soon the engine responds)

pubBut one thing that has come to mind is what I use these tools for. If you look at who I follow on Twitter it generally does not go above 300 (I prune inactive/dormant accounts as I add people), and the list is heavily skewed toward those with a disciplinary focus similar to mine (evolutionary genomics, broadly). I’ve noticed that PubChase is usually a day to a week behind Twitter in pointing me to papers of specific interest to me.* So why are these tools even useful? First, it’s a good way to have a personal library that one can use for references. But second, it also points me to papers which are of interest, but somewhat just outside of my core domain of focus. Basically they make sure I don’t get too snug on my optimum adaptive peak, and ignore goings on outside the ghetto.

Update: Jonathan Pritchard leaves a comment, which I think is very clarifying as to why a prominent research lab is developing a tool which seems more up the alley of the private sector:

Hi Razib

Thanks for this shout-out!

To clarify about the recommendations, right now we have these running overnight, so you should get recommendations tomorrow. (In the near future we will hopefully provide these within a few minutes, but we need to rewire some code for this.)

I hope that as we develop the site further, SciReader will be a more tunable and more flexible recommendation system than the other current systems. You mentioned Twitter–I agree that this is a great source of papers. We are now scraping Twitter for papers that are being discussed on Twitter. Right now we present this as a separate Twitter summary, and we will also be incorporating this into the recommendations.

One of our long-term goals is to encourage the community to adopt post-publication recommendation and peer-review in a unified platform:
although we have not yet implemented much in that direction. These functions (finding and discussing papers) should really be core activities for all scientists, so I think there’s value in having a variety of tools in this space trying to figure out how to really make this work.

Finally, this is currently a beta release and we very much welcome bug reports and suggestions on how to make this tool more useful.


* Google Scholar now as a recommendation service, and though it’s less frequent in telling me to notice a new paper, they tend to be very laser targeted.

Liberal science denialism at the ballot box

Golden Rice

Golden Rice

The two major issues where liberals in the United States get tagged as “denialist” or “anti-science” is on vaccination and GMO. A major problem with this thesis though is that in aggregate the social science doesn’t support this. I’ve used the GSS to check on GMO attitudes, and education/intelligence (or lack of) are the strongest predictors of skepticism, not ideology. And the best social science doesn’t seem to indicate strong political valence to anti-vaccination sentiment at the grassroots.

But sometimes looking at aggregates misses the important dynamics. I’d argue that the reason people keep thinking that there is a correlation between anti-vaccination opinions and anti-GMO opinions and the Left is that the the most vocal elite expositors of these positions hail from the cultural Left. Policy positions that start out non-ideological can quickly become polarized when elites lead in a particular direction.

The state of Oregon had a ballot measure on genetically modified organisms and labeling. Oregon also legalized marijuana. We have county-by-county results for both, as well as results for the governor’s race. I brought them together and generated some scatter plots. As you can see below:

1) There is a strong correlation on the county level for support for legalization of marijuana and GMO labeling (R2 is just the square of the correlation, and explains proportion of variation in Y explainable by variation in X).

2) There is a strong correlation on the county level for support for Democratic candidates and GMO labeling.

I am aware that not all of those who support GMO labeling are denalists. Some of them are scientists. But my personal experience with those who support GMO labeling (there was a measure in California a few years back) is that their rationales are inchoate, and often not “reality based” (i.e., they are more about fear than anything else). Though there is no strong political valence on the grassroots at this point, I predict that if GMO labeling keeps coming up over and over, and it becomes a social movement, you’ll see it become Left-tinged as people like Michael Pollan start polarizing opinions. Of course in some places, such as Europe, the anti-GMO position has swept society to become the dominant one.


Raw data:

County Yes, Marijuana Yes, GMO label Democrat for governor
Baker 41 32 27
Benton 60 52 59
Clackamas 51 47 46
Clatsop 57 50 46
Columbia 53 45 43
Coos 53 50 42
Crook 41 31 29
Curry 56 52 41
Deschutes 51 46 46
Douglas 45 41 34
Gilliam 41 23 32
Grant 35 32 25
Harney 34 26 24
Hood River 57 54 59
Jackson 53 55 43
Jefferson 44 32 34
Josephine 50 49 35
Klamath 44 36 28
Lake 38 29 23
Lane 60 57 57
Lincoln 62 53 54
Linn 47 38 35
Malheur 31 32 25
Marion 48 42 41
Morrow 34 27 28
Multnomah 71 62 70
Polk 47 42 41
Sherman 38 23 28
Tillamook 58 45 47
Umatilla 37 32 29
Union 41 33 31
Wallow 39 35 28
Wasco 49 40 43
Washington 55 48 52
Wheeler 36 32 29
Yamhill 50 41 41

Open Thread, November 16th, 2014

9780191574061_p0_v2_s260x420I’ve updated the raw data (csv, Excel) for the survey, which has nearly 340 respondents now. You can see the results so far here. Interestingly, 75 percent of readers claim to have read The Selfish Gene, vs. 65 percent who’ve read The Origin of Species. More have read Principles of Population Genetics than Molecular Biology of the Gene or Molecular Biology of the Cell. Not surprising. What is surprising is the reader who claims to have read me for 20 years! I assume that this individual means 2 years. Most of the results align with what has always been the case since the beginning of the blog. Mostly male. Mostly atheist. Mostly white. Politically diverse. Socioeconomically skewed toward the higher income and more well-educated (~15% of readers do nothave a university degree).

Ideological profile of GNXP readers

Below I’ve take the survey results and plotted the scatter of results along two dimensions, and smoothed them out. No surprises, readers are about equally divided between libertarians, liberals, and conservatives, with a bias of numbers in that order. There are very few “populists,” understood to be people with Left economic views and Right social views. The good majority of readers are anti-interventionist, but there’s a small minority that is internationalist. There are very few “liberal internationalist” types among my readers. Rather, the tendency for this to correlate with economic, and to a lesser extent social, conservatism suggests these are probably non-libertarian conservatives in the readership.

Note: The charts’ titles have “conservatism” and “interventionism” in them because higher values on the x or y axis indicates higher values in this direction.


Beyond the cartoon in understanding the world

9780199314058In The Atlantic Shadi Hamid has interesting article, The Roots of the Islamic State’s Appeal, which is basically a precis of his recent book Temptations of Power: Islamists and Illiberal Democracy in a New Middle East. This is on my “to-read” list, so I’ll get to it at some point, though Hamid has been expressing his views for years now, so I don’t anticipate any new big picture analyses. Since my post on ISIS this summer he’s been pointing to some of my posts of interest to him now and then (e.g., ISIS’ Willing Executioners. I don’t always agree with Hamid, but he is a serious thinker. In contrast, most of the public discussion is performed in a manner where it is clear that the interlocutors have in mind only idealized cartoons. The sort of multiculturalist Left liberalism which fixates upon Islamophobia reduces Islamic civilization as an colonized adjunct to the Western experience. On the other side you have the type of intellectual whose comprehension of Islamic civilization does not extend much beyond the latest bombings. To truly grasp issue and affairs across geographic space and the vast spans of history requires some modicum of scholarly learning, which most who offer their opinion do not have. This is why I often dismiss readers who “explain” to me their understandings gleaned from a few books here and there, because if I agree their opinions are irrelevant, and if I disagree why exactly would I take the opinions of those far less informed than me on anything? Everyone has a right to their opinion. What concerns me is when the uninformed are on the ones who are influencing policy decisions.

More to the point in relation to Hamid’s piece, one of the implications is that this anti-Islamist phase in the Arab world is a correction, but that the arrow of history will probably lead to a second rise of Islamism and illiberalism. In other words, it will get worse before it gets better (if it gets better, Hamid seems to be skeptical of taking the Western arc of history as anything but a specific contingency). What immediately comes to mind then are the Copts. It seems clear that much of the Fertile Crescent excluding Israel and Lebanon will be cleansed of its ancient communities. The numbers work, insofar as these are minorities on the order of percents in populations of millions. But the Copts of Egypt number millions in a population of tens of millions. The second Islamist age in Egyptian politics and society will not be pretty for this minority, who will experience repression and exclusion as a matter of ideological commitment from the powers that be.

2014 Gene Expression reader survey

IMG_20141111_213014407Over the years I’ve realized that since I regularly verbally bludgeon readers people think I’m a severe and overly serious person. Apparently the headshot which I have on Twitter also seems a bit dickish (it was taken in Florence in 2010). To compensate for that I had a friend take this picture of me recently. I’m smiling. So I’m capable of that.

Second, it’s been a while since I posted a reader survey. I’ve been doing them every few years since 2005. I expect that since I moved to Unz Review there has been some change in the readership, but I also have the same people who have been following me across platforms (speaking of this issue, just subscribe to my total content feed).

Here is this the link for this year’s survey, There are 33 questions. Many of them pretty quick (e.g., age, sex, number of children). I’ll be posting an update, and the raw data (csv format) later.

Finally, old reader survey posts.

Update: Nearly 300 responses in. Past experience tells me that the numbers won’t go much more than 500, and that will take a long time. I’ve put the results so far in csv and excel format. I’ll keep the file name the same as I generate updated reports. No big surprises so far, as the respondents pretty much fit the profile of earlier results. Only major surprises to me are the high support levels for maintaining blue collar wages through government intervention, and, the overwhelming acceptance (~75%) of anthropogenic climate change given the somewhat libertarian bias of the readers.

Is Jonah Lehrer “one of the most gifted nonfiction writers of his generation”?

candle_in_the_dark_by_kyrille-d32dybqI wish Jonah Lehrer success in his life. I’ve told him so personally and privately, though that was easier for me than most since I don’t think of myself as a science writer, so his betrayal did not strike as close to home. When I read How We Decide in 2006 I actually thought it was a pretty good book, but a friend who was a Ph.D. student in cognitive psychology told me to be very careful of Jonah, because he cut corners. A few years ago another friend recounted to me the story of how she recommended one of Jonah’s books for a book-club in her graduate program, and a colleague offered that though he found Jonah’s work interesting, whenever it touched something he knew about it seemed either superficial or error filled.

So we’ve established that heretofore Jonah has a history of not being exceedingly punctilious toward the source material, fabrication and plagiarism aside. A contrast might be with Carl Zimmer. Sorry to pick on Carl, but I’d have a heart attack if I found that he did something sensationalist I’d be so shocked. Rather, the issue is whether Jonah is “one of the most gifted nonfiction writers of his generation.” Is his storytelling ability and writing style so exceptional to warrant this appellation? Many people are expressing a lack of surprise. Jonah fits the expectation of a “boy genius” writer on paper. He looks the part, he speaks the part. I used to joke that he was the “boy king of cognitive neuroscience.” To be plain about it, Jonah is a young white male, so he’ll be given particular breaks in this world. I am generally averse to this kind of reductive thinking, but it is hard in this case to avoid concluding that there is something to this. He rose fast, he rose high, and he fell far. And now he’s back where few could ever aspire to be, all within a few years.

The issue is simple for me, and it has to do with numbers. Many, many, people want to be science writers. That’s why there are now professional programs to train you to do this. But very few make a good living in this area. One issue that immediately comes to mind is that you probably need some financial buffer to really take this risk as far as a career choice. It could be family money, or, it could be that your partner has a more conventional job which can allow for income smoothing over time. I also happen to know that Jonah had some powerful and influential mentors, so it wasn’t hard for him to become a public intellectual, and so bring to the table the requisite synergy that agents are looking for. Every now and then literary agents contact me, and one issue that comes up is that they want me to increase my public profile so that I will be able to push copies of anything I publish using my own resources of my own personal fame. I have not forged that path, rather, I’d like to think I’m a much more eccentric character who has tracked himself into much more exotic territory, career-wise. But back to the numbers, the vast majority of people who aspire to be science writers will not become science writers. Jonah was one of the few who had made it, and spectacularly so. He then flamed out, again, spectacularly so. Now he’s back, seemingly on his way to success. Is he such an exceptional talent that he deserves this? Are there no other Jonah Lehrers in the world who haven’t been given a chance and who happen not to have Jonah’s baggage? It is hard for me to believe that.

That is why I wish Jonah and his family success in the world, but I’d have hoped he would have moved on to another line of work, and allowed others to step into the glory and fame. There are many people in the trenches who I think could actually succeed in doing what he did. To me it’s a matter of just desserts. To become a writer who can buy a million dollar house is a once in a lifetime opportunity. Jonah blew it. It is now rightfully the turn of others. But perhaps Jonah is just so good, such an incredible talent, that they had to snap him up again, justice be damned! Honestly of that I’m skeptical. Despite his transgressions, my interactions personally with Jonah have been cordial. He seems cool. But just because he’s a good guy is not enough to warrant a second chance in his chosen career as a writer. There are many people out in the world who don’t have the privilege of choosing their careers. Perhaps Jonah needs to experience that more pedestrian life too.

A feline genome in full


Best friends forever

As I mentioned yesterday I’m a contributor to a paper which made a big splash yesterday in PNAS, Comparative analysis of the domestic cat genome reveals genetic signatures underlying feline biology and domestication. It’s been pretty widely covered in the media. One thing that hasn’t gotten that much play because most people don’t work with whole genomes is that the feline reference genome needed some work, and the group at Washington Unviersity’s Genome Institute really pushed it much further along the way to being useful. Much respect to Wes Warren and his team. This is not an uncommon issue. We may live in the “post-genomic era,” but that really applies to humans and a few particular model organisms for now. For many lineages there is the requisite genome-of-the-week paper, a hastily assembled reference, and then the group goes onto greener pastures. To get a sense, the original “cat genome” paper had 1.9-fold coverage. That means you expect that each SNP will be sequenced ~2 times. The problem with this is that that’s an average, and with variation there will be lots of gaps (leaving aside repetitive regions which are hard to span normally). And, with a ~1% error rate it will be hard to be confident about whether the variation you see is “real” or just error. To get a sense of how much better this paper’s data is they got 58-fold coverage out of pooled samples (n=22) from a wide range of domestic cats from different lineages (as opposed to just Cinnamon the Abyssinian). They also got 7-fold coverage of the wildcat samples, essential for comparative purposes.

To get you some quick background, F. silvestris catus diverged from its wildcat ancestors 5,000 to 10,000 years ago. This is in contrast to the dog, which seems to have been domesticated at least 15,000 years ago. The mitochondrial profile of Egyptian cats ~2,500 years ago was already similar to what you see in Egypt today. Over the past few thousand years domestic cats have expanded across a wide range in Eurasia. Breeds are relatively new for domestic cats, and tend to be relatively inbred lineages developed over the past few hundred years at most. In contrast the feral cats exhibit population genetic diversity in the same range as humans.

Citation: Comparative analysis of the domestic cat genome reveals genetic signatures underlying feline biology and domestication

Citation: Comparative analysis of the domestic cat genome reveals genetic signatures underlying feline biology and domestication

So what did this paper find? First, I think the biggest aspect, which has been picked up by the media, is that cats are subject to the “domestication syndrome” due to selection on development of neural crest cells. This is not entirely surprising. Domestic cats have a reputation as being marginally tame and lacking in the servile sycophantic affect of the dog. But in comparison to the wildcats F. silvestris catus is actually very tolerant of coexistence with humans. In addition, they exhibit behavioral patterns which are not found in wildcats, such as residing in colonies. The practical reason for this is pretty obvious, as cats residing within Neolithic villages would be living cheek-by-jowl in comparison with their ancestors.

In regards to selection, because there were numerous samples, comparisons could be made across lineages using a sliding window method. Areas with high Fst and sharply reduced heterozygosity are tells for selection events. Everyone has their particular genes of interest. What always makes a mark for me is how often I recognize genes which are targets of selection in domestic mammals, considering that that there are ~20,000 genes (granted, some of these selection events sweep across many genes, and the ones listed are often selected based on functional considerations). Evolutionary processes are substrate-neutral, but across a particular phylogenetic depth they tend to rework the same ‘raw material’ over and over again. As we expand the post-genomic empire outward it seems likely that animals and plants closely associated with humans will get the earliest treatment. And I think that will yield some very definite insights into the nature of genomic constraint and convergence conditional on being wrapped up in the same ‘ecosystem’.

What the internet can be to intellectual discourse

I’m having a discussion on Twitter about the value of journals, etc., in this age. You’ll hear more from me on that topic in the near future. But right now I want to tell a quick story about how novel distribution and communication channels speeds up everything. A few years back I had some discussions with Peter Ralph while he and Graham Coop were putting together their manuscript for The Geography of Recent Genetic Ancestry across Europe. Peter told me that once the manuscript was put on a preprint server he’d email me so I could check it out. What happened is that 1) the preprint went up 2) within one hour people were talking about it on Twitter 3) within two hours I had put up a blog post about it. Peter emailed me to laugh about the fact that he was about to tell me that the preprint was up when he saw that I had already written a blog post about it.

Obviously not all aspects of the academic production process can be accelerated in this manner. But there are now steps in the reaction where there is very little friction, and the latency can be pretty much abolished. The internet introduced us to “Netscape time”, but it doesn’t seem that many aspects of science have changed much since the universal penetration of the internet….

The K14 paper, an author speaks

In the post below Martin Sikora, an author on the K14 ancient DNA paper, has responded. The whole thing is worth reading:

Hi Razib,

after reading your post it I thought it would not hurt to chime in with a bit of perspective from my side, as I don’t entirely agree with some of your criticisms. Some of the reactions to our paper have caught me a little by surprise, but in retrospect it probably reflects the complexity of the story, which is something I also struggled with (and still am!).

Part of the confusion seems to be that it is assumed that since we find that K14 somehow relates to all three European ancestral proposed by Lazaridis et al., that it necessarily also has contributed these components to modern Europeans. In your post you also seem to imply that, i.e we don’t “acknowledge the possibility that K14 did not leave modern descendants, and was part of an early population which did not end up flourishing”. I actually agree with the early population part, and we also acknowledge that in our suggested model in Figure 2, which does not have a K14-related population directly contributing to modern Europeans. What one can say with reasonable certainty though is that K14 does share substantial amount of ancestry with Mesolithic Hunter-gatherers (and therefore modern Europeans by extension), but at the same time appears less close to East Asians than all Western Eurasians, so things are complex. Therefore if you take the Lazaridis et al. model as a backbone, you need some extra gene flow to account for that, be it from Basal Eurasian into K14, or some sort of basal gene flow between East Asia and early West Eurasians, post-K14 but pre-ANE/HG split. While we don’t have the resolution to be sure, our results do suggest that K14 was close to or a already somewhat down the HG branch of the ANE/HG split, which implies that those proposed components would not only have to be already somewhat differentiated by 36 kya, but also already have had mixed to a certain extent.

Regarding your take on the PCA results, I would disagree and say that these are very much what you would expect for an individual of that age. K14 is after all ~36,000 years closer to the East Asia / West Eurasia split, so it lacks a substantial amount of drift on the European branch. It is nevertheless shifted towards Europe on PC1 from the origin as expected (a bit more so than MA1 actually). Pontus Skoglund had a nice recent paper in MBE that demonstrates the same effect (see Figure 9 in doi:10.1093/molbev/msu1920). As you say, using modern variation to infer affinities of ancient samples has limitations, and PCs are often hard to interpret. In the same spirit I would also not interpret the different admixture components in K14 as itself being admixed with all those components, but rather reflecting ancestral relationship with modern populations represented by these components. The same is obviously true for the “Middle East” component, but it still implies that K14 somehow relates ancestrally to those populations whereas all other HGs including MA1 do not.

Overall, I do think that migrations played an important role, e.g. I don’t think that “Basal Eurasian” came with K14 to Central Europe or was already present back then in another way, that seems pretty clear. I would also not say that our results are necessarily a refutation of the Lazaridis et al model, but I do think they show that it seems to have been already quite complicated in the Upper Paleolithic. If you need a new migration/component for every new individual, to me this questions at least to a some extent whether one can really talk about three or any other number of discrete ancestral populations for all modern Europeans. Personally I would expect ancient samples from the Caucasus or Central Asia to yet again spring some surprises. The cool thing is that we’ll probably know soon, since many groups are adding more and more samples to the picture.

Anyways, I just wanted to share my thoughts, hope this clears up things a bit.

Btw regrading your subsequent ANE post, I can confirm that those are the Kalash. Interesting also that the correspondingly the Kalash ADMIXTURE component shows up in MA1, but is almost absent in K14 (see our Figure S20).


Population genetics is a precondition for understanding evolutionary process

principlespopulationgeneticsSince I’ve moved to Unz Review I’ve attracted a set of readers who are used to the level of discourse on topics evolutionary which is the norm on “HBD blogs.” Let me be clear that I don’t tolerate uninformed speculation because I don’t care to listen to it as I don’t gain any value from it. This is in response to a long and bizarre hectoring rant about my lack of credentials, the nature of heredity, etc. It reminded me of the moron who accused me of not understanding Lewontin’s Fallacy at Inducivist a few years ago (a further idiot also decided to “explain” epistasis to me). A buzz word or two does not sagacity make. Naturally this person was banned. But in any case this is as good a place as any to suggest that someone who wants to engage with me in a manner where I will take them seriously should be at least somewhat familiar with population genetics, and hopefully genomics. This naturally curtails communication with most of the human race, and that’s the point. I will at some point die in the future unless the Singularity arrives, so I do not wish to waste my time talking to most of the human race about things they know nothing of.

With the pleasantries out of the way I am here to offer a way to meet the threshold of knowledge which will make you fluent in leaving comments here.

The water is warm and not too deep, wade in

The water is warm and not too deep, wade in

- You can read Principles of Population Genetics.

- Read the UConn population genetics notes.

- Read Graham Coop’s population genetics notes.

- Read Joe Felsenstein’s population genetics text.

All of these are pretty easy, and three of them are free. You don’t need to derive all the formalisms. God knows I haven’t. But you need a basic algebraic framework to think about the process quantitatively. Additionally, it is probably useful to get at least some genomics background since that’s the empirical data that is really relevant for much of the commentary on this weblog.

I hope I’m clear that any rude, annoying, and hectoring comments are going to result in immediate banning.

Purring in the post-genomic era

birmanI am an author on this paper, Comparative analysis of the domestic cat genome reveals genetic signatures underlying feline biology and domestication:

We present highlights of the first complete domestic cat reference genome, to our knowledge. We provide evolutionary assessments of the feline protein-coding genome, population genetic discoveries surrounding domestication, and a resource of domestic cat genetic variants. These analyses span broadly, from carnivore adaptations for hunting behavior to comparative odorant and chemical detection abilities between cats and dogs. We describe how segregating genetic variation in pigmentation phenotypes has reached fixation within a single breed, and also highlight the genomic differences between domestic cats and wildcats. Specifically, the signatures of selection in the domestic cat genome are linked to genes associated with gene knockout models affecting memory, fear-conditioning behavior, and stimulus-reward learning, and potentially point to the processes by which cats became domesticated.

I’ll have more to say, but here is a write up in Science. Super happy be part of the team that got this published, especially Mike Montague, who did the heavy lifting.

Different ways to color a cat

Credit: CISC

Credit: CISC

Early last year an ancient genomics paper came out with the title Derived immune and ancestral pigmentation alleles in a 7,000-year-old Mesolithic European. The point here is that light pigmentation associated alleles common in Europeans seem to be relatively new derived mutations from the ancestral state, associated with Africans. An Ewen Callaway write up highlighted the fact that one of the inferences made from these genomes is that these hunter-gatherers had light eyes (blue) and dark(er) skin. At the time I pointed out to Callaway on Twitter that we need to be careful here, as ancient Europeans may have had different variants, and these traits are not monogenic but exhibit dependencies on multiple loci. In light of my post below Graham Coop suggested a similar issue, that there could have been convergence. In other words, just because modern Europeans have particular derived alleles which confer a particular trait, it does not entail that ancient peoples who lived in Europe had to have the same alleles to confer the same phenotype. Alicia Martin observed that OCA2 is a locus where fast evolution occurred in both East Asians and West Eurasians (especially Europeans), but at different SNPs. In other words, the same gene is modified, but the mutational event is distinct.

Pigmentation in humans seems to be a trait we have a pretty good grasp of. Because most of the genetic variation between populations seems to be localized at relatively large effect loci GWAS has been good at picking up the signals. Tests of selection which look at haplotype structure also detect these loci because many of them seem to have swept up in frequency relatively recently. This is consonant with what ancient DNA is telling us, as a substantial proportion of modern European ancestry does derive from peoples who have been resident at high latitudes for tens of thousands of years, but new variants, possibly from the Middle East or elsewhere, have increased in frequency within this admixed populations (in South Asia the same pattern is evident, as the Ancestral North Indians likely introduced West Eurasian variants into the hybrid daughter populations).

But let’s think through some of the implications of the alternative scenarios. One model is implicitly the dominant one, that the modern skin lightening alleles which are derived in contemporary populations are due to new pressures for de-pigmentation. Though some de-pigmentation likely occurred early on, perhaps even in Neandertals, the full suite is recent. Another model is that there were other variants segregating in the older populations, and that new populations brought new variants which swept to fixation. My question is simple: if the indigenous populations of Europe were already relatively light skinned whey did the new alleles rise in frequency so rapidly?

Let’s unpack what I’m getting at. OCA2 and SLC24A5 are two loci implicated in de-pigmentation in Europeans. The regions around the selective events are highly homogenized so that there’s a long haplotype around them. This means that the causal variant was targeted by such strong selection that the flanking regions of the genome were swept upward in frequency faster than recombination could break apart the association.  SLC24A5 in particular seems to have been under very strong selection, to the point where almost all variation has been purged from European populations at this locus. In India SLC24A5 is also at a higher frequency than might be predicted by simple contribution of ANI ancestry. The issue that I’m getting at, assuming that modern continental populations such as Europeans are admixed, is why these skin lightening alleles swept to frequency so rapidly and in the case of SLC24A5 nearly to fixation. It’s framed by the analysis presented by this paper, Parallel Adaptation: One or Many Waves of Advance of an Advantageous Allele?:

Models for detecting the effect of adaptation on population genomic diversity are often predicated on a single newly arisen mutation sweeping rapidly to fixation. However, a population can also adapt to a new environment by multiple mutations of similar phenotypic effect that arise in parallel, at the same locus or different loci. These mutations can each quickly reach intermediate frequency, preventing any single one from rapidly sweeping to fixation globally, leading to a “soft” sweep in the population. Here we study various models of parallel mutation in a continuous, geographically spread population adapting to a global selection pressure. The slow geographic spread of a selected allele due to limited dispersal can allow other selected alleles to arise and start to spread elsewhere in the species range. When these different selected alleles meet, their spread can slow dramatically and so initially form a geographic patchwork, a random tessellation, which could be mistaken for a signal of local adaptation. This spatial tessellation will dissipate over time due to mixing by migration, leaving a set of partial sweeps within the global population. We show that the spatial tessellation initially formed by mutational types is closely connected to Poisson process models of crystallization, which we extend. We find that the probability of parallel mutation and the spatial scale on which parallel mutation occurs are captured by a single compound parameter, a characteristic length, which reflects the expected distance a spreading allele travels before it encounters a different spreading allele. This characteristic length depends on the mutation rate, the dispersal parameter, the effective local density of individuals, and to a much lesser extent the strength of selection. While our knowledge of these parameters is poor, we argue that even in widely dispersing species, such parallel geographic sweeps may be surprisingly common. Thus, we predict that as more data become available, many more examples of intraspecies parallel adaptation will be uncovered.

Basically, if the ancient North Eurasian populations had lighter skin due to their own alleles, why are the new light skin alleles sweeping up in frequency so strongly after admixture? (for Europeans, I’m thinking SLC45A2 and SLC24A5 in particular). Perhaps the selective sweeps were not driven by light skin at all? Or, perhaps the ancient North Eurasians didn’t have their own variants.

Addendum: The 2007 Neandertal red hair paper offers up a possible solution toward phenotype reconstruction: test the ancient genetic variants in cell lines to check for expression.

Insects are a pretty big deal


Citation: Misof, Bernhard, et al. “Phylogenomics resolves the timing and pattern of insect evolution.” Science 346.6210 (2014): 763-767.

There’s another paper in Science which I don’t have much intelligent to say about, but which I want to point to because it seems really cool, Phylogenomics resolves the timing and pattern of insect evolution. Earlier work in phylogenetics tended to use a few characters or genetic markers. As noted in the abstract they used nearly 1,500 protein coding genes to construct this phylogeny. Some of my friends who know particular organisms have objected to specific branching patterns near the tips, but what I’d like to emphasize is how ancient insect lineages seem to be. Our own mammalian branch of the tree of life really only diversified over the last ~100 million years or so. Most of the big groups of insects had already started to coalesce by 300 million years ago! As far as land animals goes, insects are incredibly diverse and ancient. Near the end of the paper they state: “The almost linear increase in interordinal insect diversity suggests that the process of diversification of extant insects may not have been severely affected by the Permian and Cretaceous biodiversity crises.” There will always be insects….

Prehistoric “Europeans” did not look like Ayla

credit: Alan Light

Credit: Alan Light

As you may know the actress Daryl Hannah depicted Ayla, the protagonist from Jean Auel’s Clan of the Cave Bear, in the film version. Unlike many castings Hannah was an inspired choice, as she does look like the description of Ayla in the novels. Tall, blonde, and with a high forehead (remember, there’s a lot of contrast with Neandertals in these books). Auel depicts human Neandertal interactions to such an extent that there is hybridization. In the 1980s when her series first gained traction this was not particularly a popular angle. This was the age of “mitochondrial Eve”, when replacement was a more fashionable idea (ask Milford Wolpoff about it). Even though in the details Auel may have been wrong about hwo this process played out (admixture seems to have occurred earlier on in the modern human migration out of Africa, not in northern Eurasia), overall she has admitted feeling vindicated by the work of people like Svante Paabo.

But there’s one area that is pretty important where Auel was wrong: it seems that during the Ice Age anatomically modern European humans did not fit the Nordic ideal of tall, blonde, and gracile. One reason I posted the image of the skull of K14 in the post below is that even without professional background in analysis of skeletal morphology it is visually obvious that this individual was rather robust. There’s a reason that it was apparently termed “Australoid” by earlier anthropologists. The native people of Australia and Papua are among the most robust humans alive today. In contrast other populations have gone through a great deal of gracilization, especially over the last 10,000 years. What about the coloring? I couldn’t find a reference in Seguin-Orlando et al. to any analysis of the functions of the genome, but in Anne Gibbons’ piece in Science she states that K14 was ” a short, dark-skinned, dark-eyed man.” I doubt she would say this unless she knew from the research team what the genotype of this individual was. Perhaps there is a later paper coming out on population genomics rather than phylogenomics, but these results would be consistent with other results.

One story that ancient DNA is unraveling is that of the complexity of human demographic history. There are lots of surprises in store. But a second no less important angle is that humans have adapted and changed functionally over the last 100,000 years, to the point where salient physical traits vary a great deal across both time and space.

Ancestral North Eurasians about the world

Citation: Seguin-Orlando, Andaine, et al. “Genomic structure in Europeans dating back at least 36,200 years.” Science (2014): aaa0114.

Citation: Seguin-Orlando, Andaine, et al. “Genomic structure in Europeans dating back at least 36,200 years.” Science (2014): aaa0114.

The above is a plot of shared drift (ergo, history) between Mal’ta, the 24,000 year old Siberian boy, and various world populations. As per Lazaridis et al. you see a north to south gradient in Europe. As per Raghavan et al. you see the evidence of a lot of contribution to Native American ancestry. The rest of the world tells an interesting story. Recall that the highest fraction of Ancestral North Eurasia (ANE) outside of the New World is among the peoples of the North Caucasus. Their shared drift statistic is depressed in comparison to Europeans because of their high fraction of Basal Eurasians (BEu). What I want you to focus on is a secondary mode of shared drift in northwest South Asia. The reddish tinged circle are the Kalash I am rather sure from what I have heard/seen. The Ancestral North Indian (ANI) ancestors of South Asians seem to resemble the people of the South Caucasus (Georgians/Armenians) from what I have read/seen (I’ve run a few f-stats and D-stats myself). If this means that have a fair share of BEu then their yellowish shading might be misleading in terms of their total ANE ancestry.

Just something to think about.

Addendum: Everything I’ve seen suggests that there were two movements into South Asia from the north/west. The Brahui/Baloch are very distinctive in comparison to the Kalash/Pathan/Burusho. This might be a function of continuous gene flow from distinct regions to the west as well, especially in the case of the Brahui/Baloch, who have had associations as far afield as Oman due to their geographic proximity.

WordPress theme: Kippis 1.15