Razib Khan One-stop-shopping for all of my content

August 2, 2018

Ancient pigmentation pathways and modern genomics

Filed under: Forensics,Genetics,skin — Razib Khan @ 1:02 am
Piebald horses emerge out of common pigmentation pathways found in humans

Unlike most mammals humans are highly dependent on our sense of sight. This is due to the diurnal nature of many primates. Our ancestors foraged for bright fruit, and so we developed stereoscopic color vision. But eventually the human lineage left the forests of our ancestors, and ventured out to the savanna. We turned our eyes to other uses than detecting fruit, from hunting, to developing a keen eye for art.

Humans are pre-adapted toward color vision

It is not surprising then that humans have had a fixation on the color of our skin and the pelage of our domesticates. Skin is our largest organ, and our complexion is one of the best indicators of ill health.

Additionally, humans have utilized the skin as a canvas upon which to apply tattoos and other coloration so as to indicate group membership. And, as humans from very different geographic regions began to meet each other, any differences in pallor were salient indicators of difference and distinction. Whole people were defined by their color!

In the ancient Near East the Egyptians termed themselves red, while their neighbors to the south were black, and West Asians from the Levant were yellow. Greeks and Arabs distinguished between the ruddy peoples of the north, and the black and brown peoples to the south, with their own ethnicity often defined as being at some sort of equipoise.

Nubians were depicted accurately by the ancient Egyptians

And yet for such an important trait, the genetic elucidation of skin color, and pigmentation more generally, has evaded us until very recently. To be fair, the genetic elucidation of most traits in humans evaded us until the last decade or so, because we did not have genomic tools to explore the whole range of possible genetic sites.

In 2003 the evolutionary biologist Armand Leroi wrote in the afterword of his book Mutants that it was surprising that geneticists were still unclear about what underlay normal variation on the trait of human skin color. This passage was written at an opportune moment. In 2006 a review paper was published, A golden age of pigmentation genetics, which reflected the fact that much had changed since Leroi had written that passage just three years before.

Through analysis of British mixed-race pedigrees geneticists in the 1950s concluded that skin color was controlled by many genes, but that much of the variation was localized to only a few loci. That is, variation on a few genes had a large impact. This means that genomic methods pioneered in the 2000s were well placed to discover the genetic basis of the variation of the trait. If the impact of the mutation was large, then you didn’t need a large sample size to detect it.

75% of the variation in eye color in Europeans is due to one gene

And so they have. Today researchers now know that about half the variation in skin color across populations is due to variation on about ten or so genes. The other half is mostly distributed across the genome. Additionally, they know that the gene that is correlated with blue eye color also effects skin color. Similarly, the gene that causes much of the blondness in Northern Europe is also correlated with skin color. The pigmentation characteristics are usually correlated together. Skin, hair and eyes are all often controlled by the same set of genes.

Though East Asians and Europeans achieve light skin through different mutations, it is also the case that those mutations are found on an overlapping set of genes. Pigmentation pathways are highly conserved in human populations. The wheel is always reinvented in the same way. In fact, the same genes show up over and over across vertebrates.The genetic mutation that results in blonde hair causes the piebald pelage in horses. The mutations associated with red hair in humans are found in the gene that is important in mouse coat color. The gene responsible for much of the difference in pigmentation between Europeans and Africans also has a lightening effect in zebrafish.

There is a great to be done to understanding the genetic basis of many diseases and complex behavioral traits. But with pigmentation genomics has yielded incredible results, producing forensic applications with utility in a wide range of contexts. This is because tens of thousands of years have produced humans who come in all colors, but through simple fine-tuning of the pigmentation pathways which vertebrates had utilized for hundreds of millions of years.

Skin color is a complex topic with numerous historical and anthropological layers. But when it comes to genetics it’s actually surprisingly simple.

You can see your skin, but are you curious about what your genes say about your pigment? Check out Neanderthal by Insitome to learn more!

Ancient pigmentation pathways and modern genomics was originally published in Insitome on Medium, where people are continuing the conversation by highlighting and responding to this story.

May 2, 2018

The genetics of forensic identification

Filed under: criminology,dna-sequencing,Forensics,Genetics — Razib Khan @ 4:03 pm

The arrest of a suspect in the infamous “Golden State Killer” DNA evidence was notable for how he was identified. The media attention the case has garnered means that forensics genetics have come to public attention again in 2018. Not that the public has not been aware of the power of genetics in legal and criminal contexts: The Innocence Project famously leveraged DNA results to show that some individuals had been falsely convicted by eliminating them due to lack of a DNA match from the crime scene.

But the recent illustration of the power of 21st century genomics, with researchers digging through public databases to search for relatives of the potential suspect, was revelatory for much of the world, which had not kept up with the breakneck pace of change in genetics.

The first human genome cost $3,000,000,000 and took more than a decade to complete. Today a good quality human genome sequence can be had for less than $1,000, and generated in around a day in a pinch. The field has been subject to massive changes in the last 10 years, crashing through Moore’s Law and transforming what geneticists are capable of in the present.

The arrest of a suspect in killings that date back four decades has awakened the public to the reality that geneticists have already been living in in the 21st century. It’s like Clark Kent transforming into Superman.

Obviously using genetics to resolve legal disputes is not new at all. Blood group inheritance patterns were understood early in the 20th century, and brought to bear in cases such as paternity disputes. But blood groups are only a small number of traits, with a limited about of variation. In a huge number of cases inheritance patterns wouldn’t resolve anything. If ~25% of the population had blood group A, then finding that wouldn’t allow for narrowing across a broad cross-section of the population even if it would be useful in specific cases.

But even techniques as primitive as blood group inheritance illustrate the power of genetic techniques in the 20th century: they could eliminate a large number of possibilities. ~75% of the population does not have blood group A, so if you are looking at a large number of suspects then removing three out of four possibilities might be worth it.

By the latter decades of the 20th century, forensic genetics took this to the next step. With the molecular revolution in biology, geneticists didn’t have to focus on blood groups — rather, they could look at variation at specific genes that they obtained from various types of biological samples. With the development of new techniques of amplifying DNA from infinitesimally small samples in the 1990s, the amount of genetic material needed declined greatly, making it feasible to revisit cases where DNA analysis was previously deemed impossible.

The combination of molecular biology and genetics in the late 1990s was a
forensic “killer app”, but there was still the problem that geneticists needed to target loci that had enough variation that they could differentiate individuals. If, for example, scientists tackled a genetic position where 99% of the population population has one variant, and 1% the other, in most cases there wouldn’t be much novel information that one could use.

Because forensic labs could only focus on a specific number of genes, they quickly realized that the biggest “bang-for-the-buck” was in highly variable regions. In particular they looked at “short tandem repeats” (STRs). These are regions of the genome subject to expansion or contraction in the number of repeat units during DNA replication, thus generating usable repetitive variation. Where “single nucleotide polymorphisms” (SNPs) are limited to four different bases (A, C, G, and T — and typically only two of the four possible bases), STR loci can differ over many different copy number variants. Because STRs loci are mutate rapidly, they are more polymorphic and vary a great deal even across families.

All this is why they are at the heart of CODIS, Combined DNA Index System, a governmental database used by law enforcement, and centralized at the federal level since 1998. Originally starting with 13 markers, today CODIS uses 20. Because of the high level of variation in these markers, random matches are rare. Though some geneticists dispute the statistics, the FBI estimates that a random CODIS profile should appear about 1 in 10 million cases. That means that there should be more than 30 matches to a profile just based on chance in the United States. Obviously not all of these individuals would be a suspect. All but one would be false positives through DNA testing.

With limited markers, false positives — or more precisely the inability to distinguish between individuals — are always going to be an issue. Just by chance some people will match others within a subset of the genome, even at these highly variable positions. In contrast, the lack of the match eliminates someone from the pool of suspects.

This is why CODIS was useful for exonerating people: if one did not match the DNA sample, one knew that this was not a statistical fluke. A negative match gives a certain conclusion: the individuals are different.* A positive match gives a probability: the individuals are likely the same.

But CODIS is 1990s genetics. The apprehension of the suspect in the rapes and killings from the 1970s and 1980s in California was done with state of the art genetics. While CODIS focuses on 20 markers at most, by 2010 tens of thousands, and today tens of millions, of people were getting large swaths of their genome genotyped, usually at 500,000 to 1,000,000 SNP positions. CODIS relied on STRs because of the expense of genotyping genetic positions in the 1990s.

But today “SNP-chips” cost less than $50 and return nearly a million markers. Data constraints are no longer an issue, and aligning patterns of SNPs across each chromosome allows for highly accurate assessment of relationships between people. Instead of returning the result that two individuals are probably siblings or parent-offspring, one can now conclude that two individuals are siblings, and share 46.5% of their genome in common! (including what segments of each chromosome they share)

With individual DNA data no longer being in short supply, what was needed was a database. CODIS may have about a million profiles, but those are not genotyped on modern DNA technology. Consumer genomics firms such as Ancestry, 23andMe, and Family Tree DNA do have SNP databases of more than a million (Ancestry has more than 10 million), but these are not accessible to law enforcement without a subpoena. However, there are public databases available with SNP genotype profiles. GEDMatch is one of those, with ~1 million entries.

The combination of hundreds of thousands of genetic markers across millions of individuals is powerful. Bringing these together unleashes the ability to look into the pedigrees of thousands of individuals who weren’t tested with just a single sample. There are ~300 million Americans. If GEDMatch has ~1 million samples in its database it is likely that the vast majority of Americans will have matches. Obviously the vast majority of people will not have a perfect match, but because modern methods use hundreds of thousands of variable positions a perfect match is not just a probability anymore, but a surety (barring identical twins there will be only one perfect match in the database per person at most). Matches with 2nd cousins and closer are also ones that can be made with very high confidence. This means people who descend from common great-grandparents — but even without that many people can make matches with people more distantly related; the suspect in the case above shared common great-great-grandparents with people in the GEDMatch database.

Genetic genealogists have become adept at looking at patterns of probabilistic matches that are quite distant, and triangulating them with other pieces of data to establish high confidence genealogical connections. Once those connections are made, obtaining DNA from suspects would yield a result that law enforcement could have near-perfect confidence in.

Law enforcement, the media, and the public are living in the genetic 1990s. The future is actually happening in the present, led by consumer genomics databases and “citizen scientists.” The lesson we can distill from the headlines is that genetic privacy may, in many ways, now be a 20th century novelty in the eyes of the law.

Explore your Regional Ancestry story today.


* There are exceptions to this when it comes to genetic mosaicism.

Regional Ancestry

The genetics of forensic identification was originally published in Insitome on Medium, where people are continuing the conversation by highlighting and responding to this story.

April 26, 2018

The genetic future is here when it comes to finding relatives of suspects

Filed under: Forensics,Personal Genetics — Razib Khan @ 11:05 pm

You may have heard that a suspect was arrested who is alleged to be the “Golden State Killer.” DNA played an important role, Relative’s DNA from genealogy websites cracked East Area Rapist case, DA’s office says.

I think Alexander Kim’s supposition is probably right. It wasn’t a direct to consumer company that you know of that uses a genome-wide analysis, but probably old-fashioned Y STR matching which allowed the researchers to converge on the suspect. The public databases for this are extensive enough now that they might yield something, and law enforcement is comfortable with STR tests. This is really a preview of what’s to come. If researchers routinely extract DNA from remains that are tens of thousands of years old it seems clear that a lot more material will come out of old rape kits.

That’s one dimension. The other dimension is that we have many more markers to work with now. Even without whole-genome analysis, you can identify relatives with reasonable precision out to 2nd cousins (it gets a little dicier beyond that).

But the most important variable happens to be with numbers. If you read Alon Keinan’s piece, Crowdsourcing big data research on human history and health: from genealogies to genomes and back again, you know that probably nearly 20 million people have taken advantage of genome-wide consumer testing. Assuming 10 million are in the United States, a substantial number of “cold cases” could probably be closed by just looking for matches within these databases and establishing the pedigrees which suspects come from.

Of course, the genomics companies are not just going to open their databases to law enforcement.  But I’m not sure that that will be necessary. There are enough genealogy enthusiasts that public forums and services to facilitate matches will probably suffice. If only a few percent of the American population is in these forums, then that might get us 90% of the way there.

Addendum: There has been some work in forensic genetics “predicting” physical appearance. A lot of this is not primetime, but one area where a lot could be done: fine-scale ancestral analysis. Using haplotype-based methods and looking for matches within public datasets one could probably narrow down the ethnic background of a suspect pretty well from DNA. If the test tells you someone is Northern European in Minnesota that might not help, but if it tells you that they are around half Lithuanian, that might be very useful….

Powered by WordPress