Imputation and the Discontinuation of Illumina's OmniExpress Chip

+17 votes
1.4k views

Roberta Estes' latest blog entry sheds a great deal of light on what the discontinuation of Illumina's OmniExpress chip in favor of their new Global Screening Array (GSA) chip may mean for autosomal DNA testing for genealogy.

https://dna-explained.com/2017/09/05/concepts-imputation/

There is only about 20% of overlap between SNP locations previously tested by genetic genealogy companies and the SNPs tested by the GSA chip. I think Roberta's post is an important read for everyone interested in autosomal testing.

asked in The Tree House by Edison Williams G2G6 Pilot (178k points)
retagged by Ellen Smith
How reliable do you think the use of imputation is?  It sounds a little scary--like a guessing game that could give false positives--which could really make a mess of things.

Like in My Heritage's case.  They gave me a batch of DNA matches right out of the gate, but later took most of them back.  One of them was an adoptee who had her hopes dashed with our close relationship that actually wasn't.
So pretty much we have discrete testing pools which do not reliably cross compare and, as all the companies eventually upgrade their chips, the millions of current test results become deprecated. You'll have to retest to compare against new testers and hopefully the database of old testers sticks around so you can still work against that.

I guess that's technology for you.
You can't retest if you're dead.  I think we're on notice that there isn't much point in being tested in the hope of the results being useful to future generations.
Grrr!  Technology should be smarter than that in this day and age!
I suppose this explains the advent of Gedmatch Genesis -- it's for people who tested with the new chip.
Thanks for the interesting thread.

3 Answers

+7 votes

This may be a WikiTree G2G faux pas (my apologies to our inestimable moderators) but I'm going "answer" my own "question" just so that this post--because it contains a number of "for further reading" links--has its own little niche rather than being in the chain of comments.  :-)

"How reliable do you think the use of imputation is?"

Well, it's been used for many years in SNP array testing, but as with all things that depend on DNA genotyping and interpretive algorithms, it really all depends on the size, validity, and thoroughness of the sample size, and on the accuracy and effectiveness of the algorithms used.

In this regard, in her blog Roberta makes a statement that may be a bit misleading: "Illumina has encouraged vendors to utilize the process called imputation to infer DNA results for their customers that are common in populations, but has not been directly tested in customer's DNA..." Some might read this as inferring imputation is something brand new with autosomal DNA. We have, in fact, been using forms of it for quite a while. A well know example is the BEAGLE routine that AncestryDNA uses. Too, if you look at the link Roberta provides to Illumina's explanation of imputation, you'll find that the datasheet is not new: it was written in 2013. There's even a post yesterday on Anthrogenica that states: "Until recently, the word imputation wasn't a part of the vocabulary of genetic genealogy..." So some incorrect assumptions are developing.

Debbie Kennett also blogged about the change to GSA on her site. She and Roberta both included links to this ISOGG chart that shows a comparison of the overlap of autosomal SNPs on the microarray chips in use by genealogy testing companies: https://isogg.org/wiki/Autosomal_SNP_comparison_chart. The new GSA chip is represented by 23andMe v5 and Living DNA.

I honestly have no basis to form an opinion on the reliability of the GSA chip. And despite the angle from which I view it, I have to recognize that genetic genealogy is not where Illumina makes its money, and is not near the top of its priority list. A quick look through Illumina's press releases tells us that. So it's little wonder that this news didn't make a bigger splash earlier, and that the genealogy testing companies likely carry little or no clout in the matter. Adapt or perish. And I'm certain they will all adapt.

What I do admit--in my uninformed opinion--is that the most troubling thing to me is the fractional overlap of SNP positions tested with OmniExpress compared to GSA. Comparing the GSA chip (23andMe v5) to the current AncestryDNA v2 product (OmniExpress chip), they overlap at only 149,394 SNPs, or 23.3%. With about 3 billion base pairs in a human genome, the 600K to 700K SNPs we've been testing represents only a minuscule fraction anyway. And if we reduce that fraction to an overlap of only 23%...well, I'm sure comparisons can be done; but my small brain can't fathom the statistical and mathematical hoops that will have to be jumped through to do it accurately.

And if we thought "matching" on very small segments was an iffy prospect before....

Well, fudgesickle. Looks like all the links made it run afoul of the 8,000 character limit. I'll try breaking this in two.

Edited 14 months later: Because this post came up on a search regarding Living DNA's new Axiom (Sirius) chip from Thermo Fisher Scientific, and I saw I'd written "12andMe" instead of "23andMe."  Arrgh; sorry Anne. Not the first time I've done that, but normally I catch-and-correct.

answered by Edison Williams G2G6 Pilot (178k points)
edited by Edison Williams

Here is a "for further reading" section for those who want to dig a bit deeper:

Illumina website: https://www.illumina.com/

Very brief overview of human genotyping, "High-throughput arrays for identifying nucleotide and structural changes in the human genome," from Illumina; webpage:
https://www.illumina.com/techniques/microarrays/human-genotyping.html

An equally brief overview of targeted genotyping, "Targeted arrays and sequencing solutions for focused genotyping studies," from Illumina; webpage:
https://www.illumina.com/techniques/popular-applications/genotyping/targeted-genotyping.html

Catalog of Illumina's Human Infinium DNA Microarrays; this was taken from their website 5 Sep 2017. Note that the previous/current chip used by the major genetic genealogy companies is the Infinium OmniExpress-24 v1.2; the new chip is the Infinium Global Screening Array-24 v1.0. PDF file.
https://casestone.com/threlkeld/assets/DNA/Illumina-catalog_human-commercial.pdf

Datasheet for the Illumina Global Screening Array-24 v1.0; PDF file:
https://casestone.com/threlkeld/assets/DNA/infinium-commercial-gsa-data-sheet-370-2016-016.pdf

Datasheet for the Illumina OmniExpress-24 v1.2; PDF file:
https://casestone.com/threlkeld/assets/DNA/datasheet_human_Illumina-omni-express.pdf

Datasheet: the Affymetrix UK Biobank Axiom genotyping arrays, by ThermoFisher Scientific. Affymetrix (website here) https://www.thermofisher.com/us/en/home/life-science/microarray-analysis.html is really the only other game in town besides Illumina, but offhand I don't know of any (significant) genetic genealogy labs using their chips. PDF file:
https://casestone.com/threlkeld/assets/DNA/uk_axiom_biobank_genotyping_arrays_datasheet.pdf

"Imputation-Based Genomic Coverage Assessments of Current Genotyping Arrays: Illumina HumanCore, OmniExpress, Multi-Ethnic global array and sub-arrays, Global Screening Array, Omni2.5M, Omni5M, and Affymetrix UK Biobank," a preprint article (not yet peer reviewed; May 2017) by Sarah C. Nelson and Cathy C. Laurie, University of Washington; Jane M. Romm, Kimberly F. Doheny, and Elizabeth W. Pugh, Johns Hopkins University School of Medicine. A worthwhile bits and bites look into the subject for the interested. PDF file:
https://casestone.com/threlkeld/assets/DNA/Imputation-Based-Genomic-Coverage-Assessments-of-Current-Genotyping-Arrays_Nelson-et-al_May-2017.pdf

"Imputation-Based Genomic Coverage Assessments of Current Human Genotyping Arrays." Looks like same title, doesn't it? By many of the same authors, this article is from G3: Genes, Genomes, Genetics (October 1, 2013 vol. 3 no. 10 1795-1807) and not only gives us a look at similar comparisons state-of-the-art circa the time the enormous autosomal DNA testing boom began, but goes into more detail about combining SNP genotyping with imputation of untyped variants. Webpage:
http://www.g3journal.org/content/3/10/1795

Also, there are over 3 billion base pairs, but about 99.5% is the same for all humans, so about 15 million may differ between two persons. Of those 15 million, about 4% is tested.

I agree that 23% overlap between the tests isn't much at all. That is surprisingly low, since you'd expect that both tests would aim at the most useful/interesting SNPs among those 15 million, whether the main purpose is medical or genealogical.
+4 votes

As usual, I see things differently. 

The primary objective of auDNA is Medical.  Genetic Genealogy has been a secondary concern.  For me, if the medical community believes that this change is beneficial to medical research, then I am all for it.  If the accuracy is good enough for medical research, it should be good enough for genealogy.  

There has been resistance in the past to uploading all the DNA kits to gedmatch, but I do it specifically because of the differences between the number of SNP's used for comparison.  I have never subscribed that uploading all kit is somehow a bad practice.This particular change only reinforces my belief that we all should be doing this.

There have always been differences in chips, and from what I have seen on Genesis, the differences in predictions are minor. I admit only looking at the closer matches.

It seems going forward, the differences between vendors will be minor, which is a positive. As I understand it, the cost of processing each kit should be about 1/2 of the current cost, which is also a benefit.​

answered by Ken Sargent G2G6 Mach 5 (56.6k points)

Hiya, Ken. Actually, I can't tell if you see things differently or not.  :-)

So far, I haven't seen anyone argue that moving to the Illumina Global Screening Array chip is a bad thing. I also pointed out above that genetic genealogy is way down on the list of financial drivers for Illumina (and included links), and that we've been using imputation--in one form or another--for years; that, heck, AncestryDNA's BEAGLE is an imputation algorithm.

And in the field there really are no vendors, as in plural. I know of no commercial genetic genealogy company using anything but Illumina products. So I think the impact on genealogy will be what it will be, regardless.

You noted that there have always been differences in chips. Yep, there have. But never a difference this large, and I believe that's where some concern reasonably lies. I don't know if you checked the numbers. The overlap of SNPs tested between 23andMe v4 and v5, with the GSA chip, is only 18.7%; between AncestryDNA v2 and 23andMe v5, 23.3%; between FTDNA and the GSA chip, also 23.3%.

The GSA chip is testing 630,132 SNPs. Around the same number as always.

But about 77% of the SNPs the GSA chip is testing is different than any of our previous genetic genealogy tests.

Puts the whole concept of an imputed-by-genotype algorithm equals accuracy in a whole 'nuther light. As I said, I'm confident the folks with the really big brains can work it out. But if you were to draw me a Venn diagram with a 23% overlap and tell me that you could--with a very high degree of accuracy--tell me precisely the values of all the 650,000 points in Circle B by knowing only 23% of the shared points plus the 500,500 unique and different points in Circle A...I think it would be reasonable for me to be a tiny bit skeptical.

"There have always been differences in chips, and from what I have seen on Genesis, the differences in predictions are minor."

Just an aside, 23andMe is the only major player to have moved to the GSA chip as yet. And they only transitioned in August. They issued no press release that I could find, but I believe the change was made almost one month ago to the day. LivingDNA essentially launched with the GSA chip, but I imagine an infinitesimally small number of their rather new customer base had already tested with one of the Big 3 for comparison.

It would be interesting to see a statement from GEDmatch about this. I'm sure they're on top of it, but I can't imagine there was a significant database available with which to compare and contrast OmniExpress and GSA results for genealogical matching purposes much before a month or two ago. If one is even available now. And some serious number-crunching is going to have to be done if ever OmniExpress and GSA results can be accurately compared one-to-one.

+2 votes
Is there a known reason why GEDmatch could not merge kits together, especially ones from the old chips with ones from the new chips?  That seems ideal, as the merged kit would be able to match kits from either chip family, with very high numbers of common SNP's.  And there's less need for imputation.  And a few no-calls might be replaced with good calls from the other.  Seems like a big opportunity for GEDmatch, and easier for us with only one kit ID to use.  I suspect they have thought of this already.

I'd merge my Ancestry kit with my Living DNA kit, and be able to compare well with anyone else.
answered by Rob Jacobson G2G6 Pilot (103k points)

I have no insider knowledge about GEDmatch (even though some of the planners and programmers are close enough for me to take to lunch...hm, there's a thought...), but I have to think the announcement of the discontinuation of the OmniExpress chipset is one reason GEDmatch Genesis has been in beta for so long. Genesis debuted, if my memory isn't a sieve, at the beginning of June. Six months is a hefty amount of time for a technology product to be in public beta mode.

I gotta believe a lot of the gray-matter cycles at GEDmatch are being expended on this very issue.

> "even though some of the planners and programmers are close enough for me to take to lunch..."

I'm envious!
Rob

In terms of merging results together from multiple testing companies - absolutely no reason why it can't be done. I've done it myself for my tests from ftDNA and ancestry. No idea why gedmatch don't offer it as an extra service.

The issue with the new V5 chip remains the same though - just what are you going to compare the results to? If you want to compare two V5 datasets together, then that's no problem. But if you want to compare two datasets with only a 20% overlap, then you can only actually compare the 20% overlap, as Edison points out above with his Venn diagram visualisation.

The suggestion that it is reasonable to infer the point values for the other 80% is just silly.

I see no reason why you cannot determine the population (ethnic background) data from the new dataset and then use that to infer with reasonable accuracy the population data for the "missing" overlap. Unfortunately that goes nowhere to determining the last 5% of the data which is what the genetic genealogy community actually needs to work with.
The point that I was trying to make though was that if I merge my Living DNA kit with my Ancestry kit, then I can compare well with anyone.  I'll have 600000 odd SNP's to compare with a v5 produced kit, and a mostly different 600000 odd SNP's to compare with kits from older chips.  I no longer have to worry about the small overlap.

Related questions

+12 votes
4 answers
+21 votes
1 answer
+13 votes
2 answers
+10 votes
0 answers
+9 votes
12 answers
+19 votes
8 answers

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...