Razib Khan One-stop-shopping for all of my content

May 1, 2019

Privacy in a social genomic age

Filed under: Genetics,Genomics,Privacy — Razib Khan @ 10:56 am

I recently had a long conversation with Veritas Genetics’ Rodrigo Martinez for an episode of The Insight, our podcast on genetics and evolution. One of his major arguments is that we are entering into the age of the social genome.

And the numbers don’t lie. There are more than 30 million Americans who have been genotyped in the consumer sector as of this writing, and Rodrigo contends than within two years his company alone will have sequenced more than one million Americans!

Sequencing is different from genotyping. Instead of looking at hundreds of thousands of markers in a genome of billions, you read the entire sequence.

We are fast approaching total information awareness in the genomic space for a large fraction of Americans. This brings me to a new story in Wired, The US Urgently Needs New Genetic Privacy Laws:

The rise of DNA data has legal experts increasingly concerned that the United States is not effectively protecting consumers from the many privacy risks that now loom before them. “What in heaven’s name is the law in genomics? That is not that easy to answer,” Susan M. Wolf told an audience gathered last Thursday at the University of Minnesota, where Wolf is a professor of law and health policy. “We’ve got 50 states. We’ve got multiple federal agencies involved.” The patchwork of laws means that in practice genetic anonymity is almost never guaranteed. But the legal landscape is so fractured that to fix this situation, the first issue is to resolve what rules apply to what data.

The piece discusses the broader scientific, policy, and current affairs, angles of genomics and how it relates to our personal information. Though the current wave of discussion has been triggered by the forensic revolution triggered after the identification of the Golden State Killer with public databases, people working within genomics have warned for years that the exponential growth of the field was going to necessitate a reckoning.

A major problem in understanding genomic privacy and dealing with it through legislation is that the United States of America has a patchwork of federal and state laws, and, genomics as a field is changing so fast that it is hard to keep up with the areas that legislation might need to address. The federal Genetic Information Nondiscrimination Act of 2008 was useful in its time, but it clearly did not anticipate the reality of a world where everyone can be identified through matches within relative databases.

In the Wired piece, one issue that crops us is that it is important not to treat DNA as if it is special or distinct from general privacy considerations. DNA is important, but it is not magic.

The ubiquity of genomic information must be accompanied by its demystification, and integration into the panoply of personal information which is informative, but not determinative.

Rather than specific laws tailored to the genome, Americans need to focus on the broader issue of privacy. Your credit score is as important to your life as your genome at the end of the day, after all.


Privacy in a social genomic age was originally published in Insitome on Medium, where people are continuing the conversation by highlighting and responding to this story.

April 30, 2019

Surfing into the genomic future

Filed under: Genetics,Genomics,Human Genome,human-genome-project — Razib Khan @ 7:16 am
The decline in cost per genome

Within genomics circles, the chart above illustrating the crash in sequencing costs since the year 2000 is famous. The reason it is famous is that it shows that genomic technology began to outrun the famous “Moore’s Law”, that computing power doubles every 18 months, around 2008.

The genomic revolution is arguably like no other information revolution of the 21st-century.

In 1983 there were 800 known genes with locations within human chromosomes. This is for all humans. The field of genetics had existed since the first decade of the 20th-century by this point. But the methods of the 20th-century were laborious, and not well suited to human genetics (we are a slow reproducing organism that one cannot experiment upon).

20th-century human genetics

Since the turn of the century, the sequencing of hundreds of thousands of human genomes has transformed our understanding of the landscape of inheritance. In 1983 scientists had no sense of how many genes humans had. They guessed 100,000, a suspiciously round number.

We now know that humans have 19,000 genes. We have also cataloged, more or less, all 3,000,000,000 positions in the genome. Genomics has finally allowed scientists to grasp the scale of variation among humans on the DNA level. To put a specific number to that variation (~100,000,000 polymorphisms), and assign specific places within the genome.

A “map: filled with “Here Be Dragons” has been transformed into something that one can perform a scientific GPS upon.

21st-century human genomics

Fundamentally the discipline of genetics has always been one of the transmission of information across the generations, but the emergence of “next-generation sequencing” (NGS) resulted in such a gusher of data that genomics and computer science have developed symbiotically in this century. For many researchers, the size of genomes is measured as often in computer memory as it is in base-length or classical recombination map-length.

But all of this fancy technology wasn’t developed so that computer scientists could work on interesting algorithms and data-mining techniques.

Though genomics has applications to basic science, as well as animal breeding and crop development, the original rationale for the Human Genome Project was to further the goal of human health and longevity. This promise has arguably been a disappointment. We have not won the war on cancer, nor has healthcare at the point-of-delivery been transformed.

But how likely were the promises in the first place? What could a single human genome tell us? Rather than be a pot-of-gold at the end of the rainbow, the first genome is more likely a map that points us to the future.

The growth of “personal” genomics

Many futurists contend that the transformative power of technology is often overestimated in the short-term, but underestimated in the long-term. Humans don’t anticipate what the broader market will respond to through reflection. Rather, there needs to be a trial and error process.

The internet exploded on the scene in the 1990s. It certainly transformed communication, but initial attempts to provide services and goods failed as the “dot-com bubble” collapsed. But the ashes of such failures gave rise to a whole new economy, which has transformed our lives in ways we can’t imagine account for. Remember anonymity before Google and social media?

The first decade after the first human genome saw little progress because it was fundamentally a blue-sky technology restricted to academic laboratories. The second decade of genomics has seen the explosion of the consumer market, from 100,000 users to 30,000,000 as of 2019.

Millions of consumers and dozens of companies means that the dynamic and adaptive power of the market will shape the future of genomics. The current “killer app” is genealogy, for fun and forensics. But as the whole American population gets whole-genome sequenced in the next generation the opportunities for personalization will open up. Instead of a single sequence, one can imagine consumers getting sequenced repeatedly over their lifetime, from different tissues, as healthcare professionals track the mutational arc of one’s life. And from the genome, consumer firms will explore the microbiome, the epigenome, and the transcriptome.

Information science will flood genetic science.

And once science becomes a technology, breaking out of the laboratory, the outcomes and changes can be unpredictable. Even protean.

In the years after 2000 what we would call “smartphones” existed, but they were luxury goods in the age of the “candy bar” and “flip” phone. The rivals to Apple were skeptical about the iPhone when it came out. But it turned out that the iPhone created an industry, and transformed many others (remember cheap digital cameras and paper maps?). Nevertheless, even Steve Jobs did not anticipate the proliferation of “apps”, and their centrality to the modern smartphone experience.

Apple’s ecosystem of applications developed organically, and its magnitude and importance were not anticipated. Jobs and his executive team clearly viewed the iPhone as a phone that had deluxe music functionality. As it turns out, Jobs had unveiled the next iteration of the computer! Today, a smartphone is really a computer with some phone functionality.

The genomic future likely will exhibit the same arc.

Genomics will transform our lives. That I can state with confidence. But how it will transform our lives, if I hazarded a guess I’d surely be wrong. History is more surprising than anything our imaginations can come up with.


Surfing into the genomic future was originally published in Insitome on Medium, where people are continuing the conversation by highlighting and responding to this story.

April 8, 2019

In search of the missing heritability

Filed under: Genetics,Genomics,heredity,Heritability — Razib Khan @ 9:45 pm

We’ve always known that parents resemble their offspring. An intuitive understanding of how traits are passed down in families is probably as old as our species and its ability to reflect on the world around us. The ancient Romans would often observe an association between a characteristic, for example, red hair, and a particular aristocratic family. And today, it is common to notice how a particular child resembles a particular grandparent. An interest in heredity is part of human nature.

But it has only been within the last 100 years that this intuition was transformed into a quantitative and rigorous science.
Resemblance within a dynasty

And believe it or not, this began before we understood the Mendelian genetic basis of inheritance. In the 19th century, Charles Darwin’s cousin Francis Galton developed the concept of correlation to explore the relationship of characteristics between parents and offspring. Some traits, such as the height of fathers and sons, turn out to exhibit a very strong correlation between the generations. Other traits, such as hairstyle, don’t (this is probably a good thing).

Heritable traits are those variable characteristics where parents and offspring resemble each other due to heredity. Those traits where parents and offspring show no correlation due to relatedness across the population are not heritable. Or, more precisely in the language of statistical genetics, their heritability is very low. In cases where there is a strong correlation between parents and offspring, the heritability is very high. Heritability is evaluated in the range of 0 to 1, with moderate heritability being ~0.5.

A heritability of 0.5 means that 50% of the variation in the trait in the population is due to variation in the genes.

This understanding of heritability was decoupled from an analysis of Mendelian inheritance. Though theoretically fused in the early 20th century thanks to the work of R. A. Fisher, the manner in which heritable polygenic characteristics expressed themselves genetically meant that they were beyond the power of Mendelianism to examine. The genetic effects of any particular gene were very small.

A Mendelian trait, such as cystic fibrosis, is passed through a pedigree and expresses a particular genotype. That is, most of the variation in the expression of cystic fibrosis is due to mutations at a particular gene. Mendelian analysis develops around the insight of genes encoding characteristics, but in the early 20th century the methods had the power to only detect associations at traits where a single gene, or perhaps a few, influenced the variation. In contrast, very polygenic characteristics were only understood through statistical analysis.

One of the distinctions of most heritable traits is that their causes are complex, and, they are often quantitative or continuous. Though some people are ‘tall’ or ‘short’, the reality is that we measure people to obtain a single number on a numerical range. Similarly, though we can divide people into ‘extroverts’ and ‘introverts’, we understand that this disposition is really on a spectrum.

This is in contrast to Mendelian traits, which are of the form where there are clear discrete differences between people with different genotypes. You have cystic fibrosis. Or you don’t.

DNA

Historically, for quantitative traits, the goal was to assess the total genome-wide heritability. In domains such as agricultural genetics, this was very important. The more heritable the trait, the more artificial selection could change a characteristic of a line of plants or stock of animals.

In humans, the understanding of heritability had different implications. In the middle decades of the 20th century, there were many theories of environmental triggers for illnesses such as schizophrenia, often focusing upon a child’s mother and her treatment of her offspring. The reality is schizophrenia is a highly heritable trait, so the likelihood of manifesting the illness is strongly dependent on one’s genes, rather than details of upbringing. For decades clinicians were looking at the wrong primary causes.

We may not have known the genes responsible, but we knew it was genes.
The pacifiers are not heritable

One of the most common methods used to understand heritability in humans utilized patterns in twins and their siblings. Scientists realized that identical twins share 100% of their genes, while siblings share about 50% of their genes. The heritability of a trait can then be expressed in terms of the difference between the correlation on the trait between identical twins and full siblings.

Identical twins tend to be rather close in height. Siblings are closer in height than you would expect from two random individuals selected from a population. The correlation being ~0.50. But that is far less than between identical twins. That’s because they share only 50% of their genes, and it turns out that height is a very heritable trait (estimates are 0.8 to 0.9). If the correlation between full siblings and identical twins on a trait is the same, it is quite likely genes have little to do with variation of the trait in the population (as opposed to the environment).

And yet these analyses are very sensitive to broader environmental conditions. Within a family in a developed society, it is unlikely one sibling would get more nutritional resources than another. But this is not true in a pre-modern society, or in the developing world. If an older sibling is born in the midst of a famine, it would not be surprising if there was some permanent stunting later on it life. The shorter adult height of this sibling in comparison to their younger brothers and sisters would then be due to the environment.

So another feature of complex heritable traits is that there is an environmental component to the variation of outcome across the population, at least for any trait where the heritability is less than 1. And, that environmental component is going to vary from society to society. Heritability is not a fixed statistic, but a dynamic one, dependent on conditions.

All of these complexities for heritable traits make it very clear why conventional Mendelian genetics did not attempt to tackle them. Pedigrees and experiments with linked physical characteristics were never going to get very far.

This landscape of indirect inference only began to change at the end of the 20th century, with the revolution in molecular biology which transformed genetics from an abstract field of pattern recognition to a concrete one where scientists began to hunt for specific genes in the physical genome.

A good understanding of genome-wide variation in human populations has only been available for the last ten years.

Until modern sequencing and genotyping technology emerged, we did not even know the number of genes in the human genome! Twenty years ago the number was estimated to be ~100,000. A ballpark guess based on intuition and hunches more than anything else. Fifteen years ago, after the first human genome had been published, the number was reduced to 40,000 genes. Today, the best estimate is 19,000 genes.

Within any given human genome there are about 3 billion DNA base pairs. Of these base pairs, only about 1% are functional in terms of coding proteins. In the vast majority of cases, differences between individuals in traits are due to differences in these regions of the functional genome. That’s about 30 million base pairs. So it is within these 30 million base pairs that the search for the biophysical basis of heritability will occur.

Genetic relatedness of full siblings

In the past, estimates of heritability rested upon “good enough” assumptions, such as relatedness between individuals. Today, genomic methods allow researchers to look at the truth beneath assumptions. Statistical methods assumed that the relatedness between full siblings is 50%. But this is the expected relatedness. Geneticists have always known that due to Mendelian segregation the real value is often different between any two siblings. But they had no direct way of assessing this.

That is until genomic techniques became more advanced and cheaper. In 2006 researchers confirmed that twin studies were correct in their estimate of the heritability of height by looking at how differences between full siblings varied in relation to how truly genetically similar they were. Simply due to the rules of chance, some full siblings shared more than >60% of their genome in common, while others shared <40% of their genome.

Full siblings who were genetically more similar turn out to be more similar in height, to the extent that the inferred heritability to explain the pattern was exactly the same as twin studies.

Researchers were also able to do more than just refine their older methods of assessing heritability: they finally had the tools to begin discovering the specific genes which underly the heritability of complex traits. For a century scientists understood and assumed that there were genes, physical entities responsible for the biology underlying their statistical results. But finally, they would be able to zero in on candidates!

The earliest attempts to understand heritability with genomic technologies and modern computational methods were somewhat disappointing.

For example, for height, work in the late 2000s discovered 40 genetic positions that correlated with height in humans, but these positions explained only 5% of the total heritability estimated from earlier studies using classical methods.

Why were these state-of-the-art methods only detecting a small proportion of the statistically inferred heritability? Some possibilities presented themselves:

  • Perhaps statistical geneticists used flawed methods, and the environmental component was greater than they understood
  • Perhaps single base changes, SNPs, were not responsible for much of the variation. Perhaps it was copy number variation, for example.
  • Perhaps the sample sizes were too small to detect the effects of single genes because most of the effects were too small
  • Perhaps the “SNP chips” did not have enough markers to detect the effects
  • Perhaps many of the variants were are very low frequency and were not typed on the SNP chips

For about a decade this issue of the “missing heritability” hung over the new synthesis between genomics and quantitative genetics. But recently a research group has presented results which suggest that they have solved the missing heritability problem for height. Using whole-genome analysis, which was prohibitively expensive ten years ago, but on the margin of the feasible today, the researchers captured almost all of the missing heritability genomically.

With 20,000 individuals and 50 million markers, the authors argue that rare variation accounts for most of what was missed in earlier studies.

Will these results hold up? Possibly. But the bigger take-home message is to reflect on how far we’ve come in our understanding of heritability in the past century. In the beginning, heritability was understood by looking at similarities across families. This sounds simple, but this straightforward design required a great deal of statistical ingenuity. And, the reality is that with the discovery of DNA and the molecular understanding of the gene, researchers could not satisfactorily answer the question until they connected the statistical causes to molecular processes.

Illumina Sequencing Machine

Today whole-genome sequences, which may have cost $100,000 ten years ago, can be had for less than $1,000. This is the total sequence information of human genetic variation. Whereas decades ago researchers didn’t even know the total number of human genes or the full genetic map of our species, today we can count and locate 19,000 genes.

It is no surprise that many of the genes associated with height in humans turn out to be related to bone development. In this way, the statistical wizardry has produced results in keeping with our expectations of standard biology.

In another ten years, it seems likely that the search for the missing heritability will be a footnote in the history of genetics. But, that footnote will have been fruitful in generating a great deal of science, and helping us solve the mysteries of complex traits.

In search of the missing heritability was originally published in Insitome on Medium, where people are continuing the conversation by highlighting and responding to this story.

April 4, 2019

The growth of human genomics

Filed under: Genomics — Razib Khan @ 11:40 pm
Citation: Aylwyn Scally

The above figure is from Aylwyn Scally, or as I like to think of him, the Irish Matt Hahn. I’m not going to add any comments as the chart speaks for itself, doesn’t it?

Also, looks like my son is about the 10,000th person in the history of the human race who was whole-genome sequenced. That’s not a shabby record. First prenatal whole-genome sequence of a healthy born individual, and in the first ~0.000125% of the human race alive today to be sequenced.

March 14, 2019

The Ubiquitous Sequencing Age

Filed under: Genomics — Razib Khan @ 10:46 pm

Several years ago Yaniv Ehrlich published A Vision for Ubiquitous Sequencing. We’re inching in that direction. In The Atlantic Sarah Zhang has a piece, An Abandoned Baby’s DNA Condemns His Mother, while The New York Times just came out with, Old Rape Kits Finally Got Tested. 64 Attackers Were Convicted:

Still, even with such successes, the problem of untested rape kits persists. Advocates for rape victims estimate that about 250,000 kits remain untested across the country.

Unfortunately, until recently, the ‘forensic genetics’ employed rather primitive 1990s technology. But that’s changing, though both money and expertise need to be brought to bear. Companies such as Gencove and Othram are bringing that expertise to a broader market, with the latter company focusing specifically on the forensic market.

So ubiquitous sequencing is happening. Soon. What does that mean? We need to think about privacy. We need to think about data. We need to reflect on the broader implications of this world beyond specific targeted tasks such as forensic identification.

February 6, 2019

Dreaming of billions of genomes

Filed under: Genetics,Genomics,Privacy — Razib Khan @ 8:43 pm

In the year 2000 scientists finished the draft of the complete human genome. The “reference” for what came after. Even ten years earlier some researchers were questioning the feasibility of any such project! In the early 1990s, many assumed it would be many decades before the first human genome was mapped. What changed?

Technology invaded science. The first human sequence cost three billion dollars. Today one can be had for $1,000. In other words, a genome was three million times more expensive just 20 years ago.

An Illumina sequencer

Instead of the laborious process of tracing inheritance patterns through visible markers, modern genomics utilizes the molecular nature of DNA to enable automation and computation to “read” the full sequence. In less than 20 years we’ve gone from a single human genome sequence to hundreds of thousands of whole genome sequences, and tens of millions of samples which have undergone high-density genotyping using “SNP-array” technology.

Though the human genome is three billion bases, only a small proportion of it codes for genes, and an even smaller proportion holds any variation of interest in a population genetic sense. The millions of genotypes in the databases of private consumer genomic firms may only capture a small number of genetic positions, between 100,000 and one million, but this small number is enough to draw many important conclusions. In particular, what common diseases you are at risk for, and what part of the world your family is likely from, and who your relatives are.

In other words, probably 90% of the things you would want to know about your genetics can be inferred from 0.03% of your whole genome! Today private companies are sitting atop a pot of potential gold because the genome doesn’t change over your lifetime. It is only an appreciating asset as time progresses, as more research unveils details of mechanism and associations.

You are being watched!

Within twenty to thirty years it is likely that a billion human genomes will be sequenced. The field will have fully transitioned from basic science to information technology. And as with any information technology, privacy and data sharing will be important things to consider. It is likely that some governments, like that of China, will have total access to their citizens’ data, while others, such as those of the European Union, will limit access.

But even without top-down invasion of privacy, the proliferation of databases and sequences will mean that one’s genetic information will be shared like credit scores across vendors. And just like with credit scores and histories, there will be data breaches. And while credit scores as ephemeral, your sequence is permanent.

Total strangers may have access to your disease risks, your relatives, and your heritage. Things today which is guarded privately may become totally transparent to anyone who wants to look unless precautions are taken.

The decisions we make today will have consequences for future generations. This applies to individuals, corporations, and the government.

Interested in learning where your ancestors came from? Check out Regional Ancestry by Insitome to discover various regional migration stories and more!


Dreaming of billions of genomes was originally published in Insitome on Medium, where people are continuing the conversation by highlighting and responding to this story.

January 16, 2019

To understand Neanderthals we need to understand ourselves

Filed under: Evolution,Genetics,Genomics,Neanderthals — Razib Khan @ 1:56 pm

In 2010 researchers sequenced the whole genome of a single Neanderthal. From comparing this genome to that of humans alive today they concluded, to their surprise, that many modern human populations had Neanderthal ancestry! More specifically, all populations outside of Africa seem to have some Neanderthal ancestry.

Over the last decade, researchers have come to agree that this finding is a true one. That is, modern humans do have Neanderthal ancestry. In fact, most scientists now believe that there is also ancestry from a third human population, Denisovans, across eastern Eurasia and Oceania.

But there is more to the story than we understood in 2010. Then, researchers argued there was a single admixture as humans left Africa. Today, some researchers contend there were multiple Neanderthal admixtures, with East Asians having more Neanderthal ancestry than Europeans. Others argue that European Neanderthal ancestry is diminished by later mixing with a human population without Neanderthal ancestry! Finally, some researchers have suggested these differences can be explained by natural selection, whereby Neanderthal genes are removed from the population due to their selective effects, which had different power across different populations (the larger the population, the stronger natural selection is)

Despite the widespread agreement about Neanderthal ancestry in modern humans, the discovery has triggered many more questions.

A new paper in PNAS, The Limits of long-term selection against Neandertal introgression, aims to resolve this muddle by two primary means:

  1. Use multiple Neanderthal samples of different relatedness to modern humans to obtain a better estimate of proportions.
  2. Add complexity to our understanding of the interactions between various ancestral human populations, and see how that affects estimates of Neanderthal ancestry.

The authors conclude that the decline in Neanderthal admixture discovered in Europeans may, in fact, be an artifact. First, it neglects gene-flow between African populations, as well as from Eurasian populations back into Africa (in particular, from European and Near Eastern groups). Second, using two Neanderthal samples, one much more closely related to the group which contributed to modern humans, allows for more precise direct estimates.

Though the authors found some evidence for natural selection, these results suggest that this force is not necessary to explain the differences between modern human populations.

This is unlikely to be the final world. The moral of the story is that moving beyond simple models and a few samples add complexity and nuance to our understanding of how our Neanderthal ancestry fits into the broader narrative of our ancestors’ tales.

Discover your Neanderthal story today!


To understand Neanderthals we need to understand ourselves was originally published in Insitome on Medium, where people are continuing the conversation by highlighting and responding to this story.

January 2, 2019

Onward to 2019!

Filed under: Genomics,Podcast — Razib Khan @ 11:57 pm

This was a big year for Insitome. Our three flagship products, Regional Ancestry, Neanderthal, and Metabolism, have been present in the Helix store for over a year. Over the next few months, we plan on upgrading and rolling out changes. One of the aspects of running a science-based service is that as the field evolves and changes, with dynamic technology we can update with the science. Watch for new “Traits” on our “Insights” over the next few months!

The Insight podcast produced over 40+ episodes. With over 150+ ratings on Apple, it is now one of the premier science-themed podcasts on the platform.

With 100+ posts on the blog, we are pushing forward with the project of complementing personal genomic products with information resources. The experience of receiving genetic results should be more than a one-time event. Insitome aims to provide continuous updates and revelations past on the latest science.

We’ll also be posting more updates to our YouTube website.

Interested in learning where your ancestors came from? Check out Regional Ancestry by Insitome to discover various regional migration stories and more!


Onward to 2019! was originally published in Insitome on Medium, where people are continuing the conversation by highlighting and responding to this story.

December 19, 2018

The Insight Show Notes — Season 2, Episode 10: 2018 in genomics

Filed under: Genomics,science — Razib Khan @ 10:34 pm

The Insight Show Notes — Season 2, Episode 10: 2018 in genomics

This week we reviewed the “big stories” in 2018 in genomics. There were a lot of possibilities, but we narrowed down the list.

First, we discussed Neanderthal art. And, it’s ramifications for the “Great Leap Forward” in behavioral modernity.

The second story involves the post-Roman barbarian migrations. With the relative cheapness of ancient DNA techniques, researchers are expanding the purview of their topics to more recent periods, in particular domains where written and archaeological evidence are not clear.

Next, we talked about the sequencing of an F1 of a Denisovan male and a Neanderthal female. That is, this individual had a mother from one group of humans, Neanderthals, and a father, from another, a Denisovan. The probability of discovering such an individual seems low, and they researchers stumbled upon this! No one like this is present today. But was it different in the past?

The fourth story was the discovery of the Golden State Killer through DNA pedigree analysis. Serial killers beware!

Then we discussed the polygenic risk score of educational attainment. With more than 1 million samples, 2018 saw the next step in the intersection of genetics and traits.

Citation: Khan, Razib, and David Mittelman. “Consumer genomics will change your life, whether you get tested or not.” Genome biology 19.1 (2018): 120.

2018 was the year that the consumer genomics went mainstream, with more than 20 million customers. The sector has finally broken out of nerd culture into pop culture.

It was also the year that DNA came to politics, as Elizabeth Warren released genetic results that indicated some amount of Native American ancestry.

Finally, the year in DNA ended with the “CRISPR babies” story.

Interested in learning where your ancestors came from? Check out Regional Ancestry by Insitome to discover various regional migration stories and more!


The Insight Show Notes — Season 2, Episode 10: 2018 in genomics was originally published in Insitome on Medium, where people are continuing the conversation by highlighting and responding to this story.

December 18, 2018

The year personal genomics got personal

Filed under: Genetics,Genomics,Technology — Razib Khan @ 1:12 am

The data for the above chart was assembled from press reports of various personal genomic companies with a public profile. So the values act as lower bounds. Additionally, the total numbers are from a comment in Genome Biology that I coauthored in the middle of 2018. Since then many more millions of people have been genotyped.

When Spencer Wells began The Genographic Project in 2005, genetic genealogy was an obscure and niche product. Today, consumer genomics has become a meme. Between January 1st of 2016 and January 1st of 2019, the total number of individuals who have purchased genotyping array kits increased 10-fold! From 2.5 million to closer to 25 million. An extrapolation would give a figure of 250 million individuals genotyped by January 1st of 2021.

This is at the root a story of science and technology. In 2005 it was rare for researchers to have access to the data of individuals who had tens of thousands of markers typed. By 2015 millions of Americans had purchased genotype arrays tested on themselves with hundreds of thousands of markers. From barely in the ivory tower, to the mass market. It has now become a business and cultural story. As 2018 ends the consumer genomics industry is now maturing, and expanding outward into the broader culture, as families now talk about their ancestry testing results over the holidays.

What does 2019 have in store? It is not implausible to imagine that the 50 million marks will be surpassed. At that point what was once a tool of the hobbyists will have left its stamp on the mainstream, and the public will begin to dictate what the “killer apps” of the sector are going to be.

Interested in learning where your ancestors came from? Check out Regional Ancestry by Insitome to discover various regional migration stories and more!


The year personal genomics got personal was originally published in Insitome on Medium, where people are continuing the conversation by highlighting and responding to this story.

October 27, 2018

Laws of engineering are meant to be broken

Filed under: Genomics — Razib Khan @ 8:54 pm

A reader pointed out a very interesting passage in Richard Dawkins’ The Greatest Show on Earth: The Evidence for Evolution on the future possibilities of genome sequencing. Since the book was published in the middle of 2009, it is quite possible the passage was written in 2008, or even earlier.

Unfortunately for Dawkins’ prognostication track-record, but fortunately for science, he was writing at the worst time to make a prediction:

…the doubling time [data produced for a given fixed input] is a bit more than two years, where the Moore’s Law doubling time is a bit less than two years. DNA technology is intensely dependent on computers, so it’s a good guess that Hodgkin’s Law is at least partly dependent on Moore’s Law. The arrows on the right indicate the genome sizes of various creatures. If you follow the arrow towards the left until it hits the sloping line of Hodgkin’s Law, you can read off an estimate of when it will be possible to sequence a gnome the same size as the creature concerned for only £1,000 (of today’s money). For the genome the size of yeast’s, we need to wait only till about 2020. For a new mammal genome…the estimated date is just this side of 2040

Obsolete plot from The Greatest Show on Earth

The cost for a sequence here is somewhat fuzzy. The first assembly of a genome sequence of an organism is much more difficult than subsequent alignments of later organisms (though more in computation than in the sequencing). But, the upshot is that Dawkins was writing when “Hodgkin’s Law” was collapsing. From 2008 to 2011 Moore’s Law was destroyed by the sequencing revolution pushed forward by Illumina.

Though you can get a $1,000 consumer human sequence today, the reality is that this is for 30× coverage. For lower coverage, which means you aren’t as sure of the validity of any given variant, the price drops rapidly. And for the type of evolutionary questions Dawkins is interested in, the coverage needed is far lower than 30× (you probably want to get a larger number of samples than a single high-quality sample).

October 24, 2018

The crash of the cost of genome sequencing

Filed under: Genomics — Razib Khan @ 10:24 am

It’s been a wild 10 years. There’s a reason that data compression companies are a big thing in genomics now.

October 23, 2018

Reflections on ASHG Meeting 2018

Filed under: ASHG,Genetics,Genomics,Illumina — Razib Khan @ 10:17 pm

Another meeting of the American Society of Human Genetics has come and gone. I’ve been going since 2012, and so want to post some observations of how things have changed. This is a big conference. From less than 1,000 people in the late 1970s to nearly 10,000 today.

First, more genomics, less genetics.

The meeting dates to the late 1940s, and originally focused on the classical genetic analysis of human characteristics. Consider the pedigree one might find in a medical text.

Over the past generation more and more of the presentations and posters focus on genomics, surveys of the whole totality of our DNA sequence. This is where medicine and human genetics more generally is moving in any case.

Vendors such as Illumina loom large, but the firehose of data is so powerful that compression companies also arrive at ASHG. In other words, ASHG is a combination of a science, medical, and tech, conference.

Second, a major shift in focus outside of traditional European study populations.

ASHG foregrounded the focus on Africa and other non-European regions to highlight the importance of the capturing of global genetic variation. A fair number of presentations and posters were on this topic, as well as a series of plenary talks.

One thing I’ve noticed is that many talks and posters now present data and results which have been posted as preprints. In past years a lot of novel and new results were first presented at the conference, but now the meetings seem to be more like a halfway point between posting the preprint and the publication of the final paper. This means that networking and career development have become as important as the science itself.

Probably the most notable result that hasn’t been posted as a preprint was the first robust signals of association between genetic variations and homosexual orientation in men. Though there is a history of these reports, this one is clearly a case where the authors went through all the statistical checks to make sure these are true hits. Some in the audience reacted negatively, but the research group was really careful.

Exciting times in the world of genetics and genomics. Very excited for what 2019 brings.

Interested in learning where your ancestors came from? Check out Regional Ancestry by Insitome to discover various regional migration stories and more!


Reflections on ASHG Meeting 2018 was originally published in Insitome on Medium, where people are continuing the conversation by highlighting and responding to this story.

September 18, 2018

On the whole genomics will not be individually transformative…for now

Filed under: Crispr,Genomics,Personal Genetics,Personal Genome,Personal genomics — Razib Khan @ 4:51 pm

A new piece in The Guardian, ‘Your father’s not your father’: when DNA tests reveal more than you bargained for, is one of the two major genres in writings on personal genomics in the media right now (there are exceptions). First, there is the genre where genetics doesn’t do anything for you. It’s a waste of money! Second, there is the genre where genetics rocks our whole world, and it’s dangerous to one’s own self-identity. And so on. Basically, the two optimum peaks in this field of journalism are between banal and sinister.

In response to this I stated that for most people personal genomics will probably have an impact somewhere in the middle. To be fair, someone reading the headline of the comment I co-authored in Genome Biology, Consumer genomics will change your life, whether you get tested or not, may wonder as the seeming contradiction.

But it’s not really there. On the aggrgate social level genomics is going to have a non-trivial impact on health and lifestyle. This is a large proportion of our GDP. So it’s “kind of a big deal” in that sense. But, for many individuals the outcomes will be quite modest. For a small minority of individuals there will be real and important medical consequences. In these cases the outcomes are a big deal. But for most people genetic dispositions and risks are diffuse, of modest effect, and often backloaded in one’s life. Even though it will impact most of society in the near future, it’s touch will be gentle.

An analogy here can be made with BMI, or body-mass-index. As an individual predictor and statistic it leaves a lot to be desired. But, for public health scientists and officials aggregate BMI distributions are critical to get a sense of the landscape.

Finally, this is focusing on genomics where we read the sequence (or get back genotype results). The next stage that might really be game-changing is the write revolution. CRISPR genetic engineering. In the 2020s I assume that CRISPR applications will mostly be in critical health contexts (e.g., “fixing” Mendelian diseases), or in non-human contexts (e.g., agricultural genetics). Like genomics the ubiquity of genetic engineering will be kind of a big deal economically in the aggregate, but it won’t be a big deal for individuals.

If you are a transhumanist or whatever they call themselves now, one can imagine a scenario where a large portion of the population starts “re-writing” themselves. That would be both a huge aggregate and individual impact. But we’re a long way from that….

September 14, 2018

Sequence them all and let God sort it out!

Filed under: Genomics — Razib Khan @ 11:14 am

Researchers reboot ambitious effort to sequence all vertebrate genomes, but challenges loom:

In a bid to garner more visibility and support, researchers eager to sequence the genomes of all vertebrates today officially launched the Vertebrate Genomes Project (VGP), releasing 15 very high quality genomes of 14 species. But the group remains far short of raising the funds it will need to document the genomes of the estimated 66,000 vertebrates living on Earth.

The project, which has been underway for 3 years, is a revamp and renaming of an effort begun in 2009 called the Genome 10K Project (G10K), which aimed to decipher the genomes of 10,000 vertebrates. G10K produced about 100 genomes, but they were not very detailed, in part because of the cost of sequencing. Now, however, the cost of high-quality sequencing has dropped to less than $15,000 per billion DNA bases…

Funding remains an obstacle. To date, the VGP has raised $2.5 million of the $6 million needed to sequence a representative species from each of the 260 major branches of the vertebrate family tree. To reach the goal of all 66,000 vertebrates will require about $600 million, Jarvis says.

Though a lot of the details are different (sequencing vs. genotyping, vertebrates vs. humans), many of the general issues that David Mittelman and I brought up in our Genome Biology comment, Consumer genomics will change your life, whether you get tested or not, apply. That is, to some extent this is an area of science where technology and economics are just as important as science in driving progress.

I remember back in graduate school that people were talking about sequencing hundreds of vertebrates. But even in the few years since then, the landscape has shifted. I’m so little a biologist that I actually didn’t know there were only ~66,000 vertebrate species!

And yet this brings up a reasonable question from many scientists who came up in an era of more data scarcity: what are the questions we’re trying to answer here?

Science involves people. It’s not an abstraction. Throwing a whole lot of data out there does not mean that someone will be there to analyze it, or, that we’ll get interesting insights. To be frank, the original Human Genom Project project should probably tell us that, as its short-term benefits were clearly oversold.

In relation to how cheap data storage is and the declining price point of sequencing, I think my assertion that a genome, a sequence, is not a depreciating asset still holds. There is the initial cost of sequencing and assembling and the long term cost of storage, but these are small potatoes. The bigger considerations are the salaries of scientific labor and the opportunity costs. Sequencing tens of thousands of genomes may not get us anywhere, but really we’re not going to lose that much.

Ultimately I side with those who believe that the existence of the data itself will change the landscape of possible questions being asked, and therefore generate novel science. But it’s pretty incredible to even be debating this issue in 2018 of sequencing all vertebrates. That’s something to reflect on.

May 9, 2018

The “X” in the sex chromosome

Filed under: Genetics,Genomics,mothers-day,science — Razib Khan @ 3:48 pm

There are ~3 billion base pairs in the human genome. Of that ~5% are in the X chromosome. The X is fully functional, unlike the famously hamstrung Y. It harbors one of the longest genes in the human genome, DMD, at 2,300,000 base pairs. In contrast, the human Y chromosome only has 72 protein coding genes! (it’s perhaps no surprise that, aside from sex determination, many of these genes are involved in things such as spermatogenesis)

And yet it is the Y chromosome which gets full treatment in popular science books. Like the C student who receives praise for a B-, the Y chromosome is given high marks simply for doing a few things here and there, most especially its role in driving the emergence of biological males. But the reality is that males would not be viable if it wasn’t for the X.

Can you see that it says 74?

Because the Y chromosome is so handicapped, filled with repetitive “junk DNA,” the heavy-lifting is shifted onto the single X that males carry. Though the Y is what makes males male, the X is what keeps males alive.

Anyone familiar with sex-linked characteristics knows this. Red-green color blindness is found 8 percent of human males and 0.6 percent of human females. Many more women are carriers of color blindness than who are color blind themselves.

The genes responsible for detection of some colors are found on the X chromosome, and are subject to high mutation rates. If a female has a broken copy she usually has a fallback in a functional second copy. She’s a carrier. In contrast, because males have only one X chromosome (inherited from their mother), they don’t have a backup. If a color-vision gene on the X chromosome is broken, then they’re out of luck when it comes to perceiving the full vibrancy of the world.

In other words, the male X chromosome does not possess recessive traits. All traits express due to the state of the single copy of the gene determining the trait. Every mutation on the X chromosome can potentially produce a mutant that will be exposed to natural selection.

Neanderthal-modern human hybrid

This results in some interesting evolutionary quirks when it comes to how natural selection shapes the genome and drives adaptation within populations and speciation between them. Crosses between different species can leave hybrids infertile. In mammals this often happens in males because mutations on the X chromosome can interfere with proper reproductive development. Selection against the genes of other species then happens because males can’t produce offspring.

Studies of Neanderthal admixture confirm this — there is far less Neanderthal ancestry on the X chromosome than across the rest of the genome. There is strong selection against Neanderthal variants in males, because these genes work less well with the rest of the modern human genome.

A wife of Genghis Khan

But the X chromosome is not distinctive just in terms of just natural selection. As two out of three X chromosomes in any population are found in females, its genetic history will be biased toward that sex. Differences between the X chromosome and the non-sex genome can tell us differences in the histories of men and women.

For example historically many more of the female ancestors of admixed people of the New World tended to be non-European, whether it was indigenous or African. As such, the genetic profile of the X chromosome in terms of similarity to worldwide variation would be different from the non-sex chromosomes, because those come equally from the father and mother. This is exactly what we see. There is less European ancestry on the X chromosome.

More generally mating systems such as polygyny — men having multiple female partners — result in far fewer males than females who contribute to future generations. Among Mongols during the era of Genghis Khan, a small number of males descended from Genghis and his Mongol horde had children with numerous women. Because X chromosomes tend to found in women, more of whom are reproducing, they will more diverse than non-sex chromosomes (where a few men contribute half the genes), while the Y chromosome will be the least diverse of all (where only a few men contribute genetic variation).

Men have only one X chromosome, but the one they have is genetically essential to them. X chromosomes are not exclusive to women, but for all males they are the singular legacy of their mothers. Because of this bias the X can shed light on the history of the women of our species, while the uniqueness of inheritance the X chromosome may even extend to driving the emergence of our species.

Explore your Neanderthal story today.


The “X” in the sex chromosome was originally published in Insitome on Medium, where people are continuing the conversation by highlighting and responding to this story.

April 25, 2018

DNA, from genetics to genomics

Filed under: DNA,Genetics,Genomics,science — Razib Khan @ 11:59 am

In the early 1950s scientists established that the molecular structure of DNA was a double helix. The had discovered the physical substrate of heredity. With this discovery the field of molecular genetics was born (and eventually a Nobel Prize given!).

And yet we also know that Gregor Mendel discovered the laws of heredity, the “law of segregation” and the “law of independent assortment”, nearly a century before the discovery of DNA.

It was literally the product of a garden.

The mature field of genetics itself developed fifty years before the discovery of the structure of DNA, as a host of scientists stumbled upon Mendelian insights simultaneously. Most were biologists who worked with plants, flies, or even algebra — no need for a powerful microscope or structural models of molecules.

Though DNA has been the key to many of the discoveries of the past fifty years, it is important to remember that the field of genetics is predicated on an abstract understanding of how inheritance works across pedigrees, as opposed to the biophysical basis of that transmission. Before DNA, before chromosomes, what Mendel and his heirs understood is that inheritance occurs through a process where discrete units of heredity, “genes”, are passed down from generation to generation.

These genes usually come in two copies, ‘alleles,’ for many organisms.

Recessive expression patterns of a trait, where parents do not express a characteristic found in their offspring, becomes comprehensible when a Mendelian model is adopted. Prior to this many had an intuitive “blending” understanding of inheritance, where the characteristics of the parents mixed together to produce offspring. The ultimate problem with blending inheritance is that it had difficulty in explaining how variation persisted over time. A problem solved by the Mendelian insight that genetic variation never disappeared…it simply rearranged itself every generation!

Genetics was born on the backs of Drosophila

Between the reemergence of Mendelian thought around 1900 and the discovery of DNA in the 1950s much research occurred in the field of genetics. The Neo-Darwinian Synthesis built upon the mathematical foundations of population genetics, which took the Mendelian framework and formalized and extended them, to create a model of evolutionary biology for the 20th century. Medical geneticists began to understand the patterns of inheritance of rare diseases in humans with the aim of preventing illness. Those researchers working with fruit flies discovered many of the phenomena which define modern genetics, such as recombination. Finally, biochemists established that heredity and nucleic acids were intimately connected.

Just as an understanding of the discrete basis of inheritance in a Mendelian framework opened up the systematic scientific study of heredity, so the understanding of the double helical structure of DNA paved the way for the molecular revolution of the second half of the 20th century, and the genomic revolution of the 21st. An understanding of DNA as the mode of inheritance allowed for the development of techniques that traced transmission of variation at the level of genes themselves, as opposed to expressed traits.

Illumina sequencing machine

And while in the 20th century we spoke of genetics, and specific genes, today we speak of genomes and the whole set of genes organisms possess. That revolution can not be understood without the knowledge of DNA as the mode of inheritance. If classical Mendelian genetics is pattern recognition across pedigrees, 21st century genomics is a synthesis of classical genetics, post-DNA era biophysics, and cutting-edge computing. Genomics is as much engineering as it is science; and “big data” as much as information theory.

The understanding of DNA created the world where genetics transformed itself from an esoteric science of probabilities, to a mass market product of possibilities.

Classical genetics tells you that your relatedness to your brother or sister is expected to be 0.50. Modern genomics might tell you that your relatedness to your brother or sister is shared across 46.24% of your genome. A fuzzy probability becomes a crisp reality. As a science, genetics can be imagined without DNA. It was born and matured decades before we understood the importance of the double helix, but as a part of our lives, one can’t imagine genetics without DNA.

Learn more about where your traits for food tolerance fall on the spectrum and explore your Metabolism story today.


DNA, from genetics to genomics was originally published in Insitome on Medium, where people are continuing the conversation by highlighting and responding to this story.

October 28, 2017

Apes just being apes

Filed under: Genomics — Razib Khan @ 12:10 am

A while back I made from of bonobos and chimpanzees for being kind of losers for looking across at each other on either side of the Congo river for ~1.5 million years the time elapsed since their diversion. I finally ended up reading the paper from last year, Chimpanzee genomic diversity reveals ancient admixture with bonobos, which reported complex population history between these two species. In other words, “they got it on”.

The key was a reasonable sample size of N=40 and high coverage genomes (>20x), to give them the amount of information necessary to have the power to detect admixture. If you aren’t human and have a reasonable size genome, and all mammals do, get to the back of the line. But the Pan‘s turn finally arrived.

The paper primary result is that over past few hundred thousand years there have been reciprocal gene flow events of small, but detectable, magnitude between chimpanzees and bonobos. Naturally, there was some geographic specificity here, in that chimpanzees from far West Africa lack much evidence of this while those from Central Africa have a great deal. The admixture is directly proportional to proximity to b0nobo range.

To obtain the result their initial focus on high-frequency bonobo derived alleles that were at low to moderate frequencies in chimpanzees. There was a notable excess for this class among Central African chimpanzees. And, these alleles seem to have introgressed recently.

I suppose the major takeway is that hominids do it like they do it on the Discovery Channel.

October 22, 2017

Selection swimming against the genomic tide

Filed under: Africa Genetics,Africa Genomics,Genetics,Genomics — Razib Khan @ 1:32 pm

One of the major issues that confuses people is that the distribution of a trait or gene is often only weakly correlated with overall phylogeny and the rest of the genome.

To give a strange but classic example, the MHC loci are subject to strong balancing selection. This means that novel alleles do not substitute and replace ancestral alleles. Substitution of this sort results in “lineage sorting,” so that when you look at chimpanzees and humans you can see many polymorphic loci where all humans carry one variant and all chimpanzees the other. In contrast at the MHC loci there is frequency-dependent selection for rare variants, so the normal cycling process does not occur. Humans and chimpanzees overlap quite a bit on MHC, and any given human may have a more similar profile to a given chimpanzee than another human.

There are 19,000 human genes. At 3 billion base pairs only about ~100 million are polymorphic on a worldwide scale (using some liberal definitions). There are lots of unique stories to tell here.

A new preprint, Inferring adaptive gene-flow in recent African history, illustrates how certain genes with functional significance may differ from genome-wide background. The authors find that among the Fula (Fulani) people of West Africa there has been introgression from a Eurasian mutation that confers lactase persistence. The area of the genome around this gene is much more Eurasian than the rest of the genome. In contrast, the area around the Duffy allele is much less Eurasian. The variation in this locus is related to malaria resistance. Finally, in other African populations, they found gene flow of MHC variants.

None of this is entirely surprising, though the authors apply novel haplotype-based methods which should have wider utility.

September 10, 2017

Quantitative genomics, adaptation, and cognitive phenotypes

The human brain utilizes about ~20% of the calories you take in per day. It’s a large and metabolically expensive organ. Because of this fact there are lots of evolutionary models which focus on the brain. In Catching Fire: How Cooking Made Us Human Richard Wrangham suggests that our need for calories to feed our brain is one reason we started to use fire to pre-digest our food. In The Mating Mind Geoffrey Miller seems to suggest that all the things our big complex brain does allows for a signaling of mutational load. And in Grooming, Gossip, and the Evolution of Language Robin Dunbar suggests that it’s social complexity which is driving our encephalization.

These are all theories. Interesting hypotheses and models. But how do we test them? A new preprint on bioRxiv is useful because it shows how cutting-edge methods from evolutionary genomics can be used to explore questions relating to cognitive neuroscience and pyschopathology, Polygenic selection underlies evolution of human brain structure and behavioral traits:

…Leveraging publicly available data of unprecedented sample size, we studied twenty-five traits (i.e., ten neuropsychiatric disorders, three personality traits, total intracranial volume, seven subcortical brain structure volume traits, and four complex traits without neuropsychiatric associations) for evidence of several different signatures of selection over a range of evolutionary time scales. Consistent with the largely polygenic architecture of neuropsychiatric traits, we found no enrichment of trait-associated single-nucleotide polymorphisms (SNPs) in regions of the genome that underwent classical selective sweeps (i.e., events which would have driven selected alleles to near fixation). However, we discovered that SNPs associated with some, but not all, behaviors and brain structure volumes are enriched in genomic regions under selection since divergence from Neanderthals ~600,000 years ago, and show further evidence for signatures of ancient and recent polygenic adaptation. Individual subcortical brain structure volumes demonstrate genome-wide evidence in support of a mosaic theory of brain evolution while total intracranial volume and height appear to share evolutionary constraints consistent with concerted evolution…our results suggest that alleles associated with neuropsychiatric, behavioral, and brain volume phenotypes have experienced both ancient and recent polygenic adaptation in human evolution, acting through neurodevelopmental and immune-mediated pathways.

The preprint takes a kitchen-sink approach, throwing a lot of methods of selection at the phenotype of interest. Also, there is always the issue of cryptical population structure generating false positive associations, but they try to address it in the preprint. I am somewhat confused by this passage though:

Paleobiological evidence indicates that the size of the human skull has expanded massively over the last 200,000 years, likely mirroring increases in brain size.

From what I know human cranial sizes leveled off in growth ~200,000 years ago, peaked ~30,000 years ago, and have declined ever since then. That being said, they find signatures of selection around genes associated with ‘intracranial volume.’

There are loads of results using different methods in the paper, but I was curious note that schizophrenia had hits for ancient and recent adaptation. A friend who is a psychologist pointed out to me that when you look within families “unaffected” siblings of schizophrenics often exhibit deviation from the norm in various ways too; so even if they are not impacted by the disease, they are somewhere along a spectrum of ‘wild type’ to schizophrenic. In any case in this paper they found recent selection for alleles ‘protective’ of schizophrenia.

There are lots of theories one could spin out of that singular result. But I’ll just leave you with the fact that when you have a quantitative trait with lots of heritable variation it seems unlikely it’s been subject to a long period of unidirecitional selection. Various forms of balancing selection seem to be at work here, and we’re only in the early stages of understanding what’s going on. Genuine comprehension will require:

– attention to population genetic theory
– large genomic data sets from a wide array of populations
– novel methods developed by population genomicists
– and funcitonal insights which neuroscientists can bring to the table

Older Posts »

Powered by WordPress

Do NOT follow this link or you will be banned from the site!