A 30X Full-Genome Sequence for $199 (€169 £150)?

Debbie Kennett brought this to my attention this morning (and Andreas West's email was exactly 30 minutes behind hers wink). Debbie explains it fully here: https://cruwys.blogspot.com/2018/11/a-30x-whole-genome-sequence-from-dante.html.

In short, it looks like the real deal...but that deal ends at midnight Pacific Time (9:00 p.m. Eastern) on Monday November 26. It can also be purchased via Amazon in the U.S. only.

More in a bit...

What you are describing is how DNA is sequenced. No matter if for genotyping or full genome sequencing. While there is progress on getting larger and larger pieces of DNA (there is a competition between various university teams on who has the current longest part) the mass technology used is currently restricted to this.

I’ve read a blog comment (not here) that described the amount of SNP’s that are read at least 4x (it was over 96%)

Quick update. If I'm correct that mine was the first kit sold by Amazon.com in the U.S., then they've sold 55% of the available, allocated tests.

And going on sale starting about 20 minutes ago, Veritas Genetics is jumping in and following Dante Labs marketing lead. Their Twitter announcement: https://twitter.com/i/web/status/1064290631051284480, and the $199 sale price went live as of 9:00 a.m. Eastern this morning: https://www.veritasgenetics.com/.

More developments happening on the artifact testing front, as well. I did not see all of this coming for this holiday season. Fun times to be a DNA nerd...

Edited to add: Blaine Bettinger's new blog post today, "Testing Artifacts to Obtain DNA Evidence for Genealogical Research." It examines the current state-of-the-art, looks at what might be around the bend, and talks about the value of that ancestral DNA and what might be done with it for genealogy.

Just a quick update for those in the U.S. My Amazon Prime delivery from Dante Labs arrived about 10:00 a.m. Wednesday morning. The collection kit itself is an Oragene-Dx saliva sample tube. The return shipping--which I assume to be the case for all U.S. orders--is FedEx, not DHL (I find that personally more convenient; there's a nearby FedEx Office location I use regularly for shipping and printing) prepaid to a New York destination. Undoubtedly the samples are logged, anonymized, and batched from there to the lab. I'll have it on its way tomorrow or Monday.

Fun times for a DNA nerd. smiley

I'm inviting myself to this party now that I know some of the people who will be attending smiley  I ordered from Amazon, which currently doesn't have any "X kits left" mentioned. It has one negative review based on the long time to get results, so I'm prepared to wait even longer than advertised.

I have been thinking for a long time about how WGS might work in the genealogy domain. My proposal would involve extraction of a core set of SNPs currently used by the genetic genealogy companies, followed by closer examination of WGS data for just a matching segment. If the two parties match on novel or rare variants, then they would have a more recent MRCA than a match without those variants. I think this would be a more tractable approach than trying to compare whole genomes.

I am sort of following this model for the novel mutation responsible for the hearing impairment that runs in my family. I've traced it back to John Riley, born about 1813 in Stafford County, VA.  It's an easily observable phenotype, so I send a query to anyone who matches me there. So far, none of my matches have the trait, so I know that the match must be earlier than 1813.




Woo hoo! You can't see it--and you should be happy about that--but I'm jumping up and down having Ann at the party!  smiley  I would have sent her a gilded invitation had I known she was interested. And I'm in complete agreement about how we may see WGS data actually used in genealogy.

(To Gaile: I was being facetious with the earlier start-a-business comment; I have about three new ideas for a business every year, and only one was ever successful enough to go public in a sizable way; so the track record ain't, uh, exactly what you might loosely term "good"...)

I'd seen some general, very basic (and probably dated) info from Strand Life Sciences regarding WGS BAM size and computational requirements dealing with FASTQ GZIPped files. At about 30X, the BAM should come in at around 80-90GB. They state, "...assuming whole genome samples are done at read lengths of 75 or above, the size of each whole genome sample [compressed FASTQ] can be rounded off to about 150 GB," that to accommodate up 40X. For a dedicated machine operating 16 cores at 2.7GHz with 32 GB RAM, they estimate generating aligned DNA reads from a FASTQ file takes about 6.5 hours each.

Per the SAM format specs, we should see each read per pass, plus a per-base-read quality string, read ID, flag, tags, and what not. On a bioinformatics board I read one person state that, at their institution, after alignment to GRCh38 the output SAM files from 30X tests were usually 250-350GB in size.

I've seen some amazing advances in computing and communications tech in my lifetime, but I'll never see a GEDmatch-style operation for complete WGS data. I'm also dog-paddling way over my pay-grade here. I'm eager to get my hands on the data and start learning, but on this subject I know not whereof I speak.  angel

If I had to guess, though, I'd say Ann is spot-on. 'Cause she does know what she's talking about. What will interest genealogists are essentially the same data that interest population geneticists. They (we) pretty much don't care about exomic data (well, mostly); that's for medical researchers. But there are somewhere around 10+ million SNPs identified and cataloged. If we consider current (some customized) genotyping microarray testing, including Living DNA's new Thermo Fisher Scientific chip, we're looking at less than 10% of those, or about 900K SNPs.

I'll betcha some big-brained population geneticists have already prioritized many more of those 10 million SNPs. (In fact, I know one I think I'll ask about that.) In other words, Goldilocks theory: for genealogy, 10 million SNPs may be overkill, but 900K may actually be too few. With greater SNP density should come less imputation/inference about segment sizes: greater accuracy and less guessing. Maybe there's a sweet-spot in there of, say, 5 or 6 million SNPs that are both stable and ancestral/population indicative. Dunno; I'm clueless.

If we had 6x greater SNP coverage to compare than we have access to today, we may not need to do much more with the other 99.8% of the base pairs. Meaning real-time, GEDmatch-style compare-and-report. Extracted databases of those sizes would probably be manageable.

Novel and unique autosomal SNPs may have less genealogical relevance than do the ones we're seeing almost weekly in the Y-chromosome. But maybe they could be handled similarly to the way Alex Williamson does for The Big Tree...but automated and batch-compared. And I imagine that yDNA and mtDNA WGS data would always be split off into their own databases. Oh, and speaking of Alex, I learned that the Y-DNA Data Warehouse can already accept VCFs from Dante Labs, but of course that endeavor only includes yDNA in Haplogroup R.

And as Ann noted, I'll bet we start to see some patterns where we've been incorrectly inferring unbroken segments based on matching SNPs, probably in SNP-poor chromosomal regions (some centromeres, telomeres, etc.). Not real-time matching stuff, but researchers having access to volumes of WGS data could start to tell us far more about our accuracy in working with SNPs (and lead to big-time refinements of imputation and matching accuracy), as well as helping us do much more in positively identifying pile-up regions.

Regardless of what happens with the tech and our ability to use the data for genealogy, if we're seeing a $200 WGS become an actual thing it's going to be time to start figuring out how to store and preserve--and grant permission for research purposes to--those vast amounts of data. Just in the 15 years that I've been messing with DNA for genealogy I've seen test-takers pass away and their family members then have no access to--or interest in--the DNA tests or any communication about them. Blaine Bettinger has written about and discussed the situation, and maybe we need to make it a public-service planning priority for 2019.

When drafting a will, password information and a directive of what to do with a DNA test may cross almost no one's mind. But I'd be hard-pressed to think of anything as unique and irreplaceable as my DNA information. We're all a one-off. My beneficiaries can use or sell property, can scan and archive photos and documents. But my DNA is unique and, once they lose access to (or interest in) it or the management of it, it may be irretrievable. We might be able to sample artifacts like stamps or envelopes, but there will never be a way to get back entire WGS results.

Hm. Off-topic, but food for thought...

My collection kit was handed to FedEx today. And I have a strange feeling that the companies offering these WGS tests for 200 bucks--and maybe the other DTC companies, as well--might just have underestimated the level of interest. Blaine reported that the Veritas $199 sale offer, which was valid for only the first 1,000 orders, sold out in about 6 hours.

The genetic genealogists I know of so far who have bought this Dante Labs 30X test include Ann and Andreas, of course, plus Blaine, Louis Kessler, Randy Whited, Stiofan Perkins, Greg Liverman, Alan McHughen, Gene Sweetser, and Elizabeth Overbeck Balkite. At the very least, we'll be able to stage a killer WGS testing party next year!  laugh

It's now 14 days since I ordered (and paid) for my kit. I still haven't even received the email that my spit kit is on the way, nor have I received.

I know from Ed that his is back to them but what about others? Am I the only guy who's getting the worst service ever from this DNA testing company?

I mean how can you not send the spit kit? That's nothing complicated and I ordered way before the masses of people went for the Amazon offer.
I was about to say I'm in the same boat, but I just got a notice from Amazon an hour ago that my order has shipped. It's coming from Dante Labs, not Amazon.

Ann: I'll bet that was the reference I made earlier about Amazon displaying a "number available" count. Bet that Dante Labs gave them 20 to stock in-house for fulfillment, and I was just lucky enough to nab one of those. FedEx shows that mine was delivered in New York today. At least I'm glad you got notice that yours has shipped!

Andreas: Not so glad that you've heard nothing. frown  If your kit is coming directly from Dante, I'd use their website's contact form and ask for a status. I had good response when I was over-eager and wrote them within minutes of receiving my kit...there was no shipping label; turns out that Amazon sent me a printable label the next morning anyway. But Blaine Bettinger had even better response: he asked them a question and they got back to him in only a couple of hours.

I fully expect this $200 deal was only dabbling a toe to test the water, to test the response. I believe they got their answer: at that price-point, there is demand. On the plus side, they know that this initial offering gleaned notable marketplace influencers, like you, Ann, Blaine, Louis Kessler, Randy Whited, and a number of others. I'm prepared to be patient and don't expect the 12-week turnaround shown on their website, but if they totally botch the handling of these initial few hundred, they could be excoriated on the interwebs and create an uphill climb for themselves. Potential market leader to battlefield triage.

Since it's 2:45 Saturday morning where you are, I'm hoping you'll wake up to find a note that your kit has shipped. I'm forever an optimist (I just sound like a curmudgeon most of the time).  angel

It's our plan to eventually not only offer matching and comparison on the current raw DNA data size (650-750k SNP's) but also on NGS data.

So the URL to note is Your DNA family - please check out our feature list (though NGS isn't explicitly mentioned there.

To go ahead with exploring it I have ordered Dante's test myself. It has the benefit of giving me more detail on the mtDNA (where I'm currently stuck on H4a1a via 23andMe) and hopefully on Y-DNA as well (stuck on E-V13). With FTDNA I would have to pay a lot more for their specific mtDNA and Y-DNA tests and wouldn't even get a full sequenced genome of my autosome on top!

So IMO this is a great offer (Disclaimer: I don't have any financial interest with Dante DNA labs or relation other than now being their customer) and they've got excellent reviews (albeit those reviewers got the test for free):

DNA Sequencing Reviews for Dante Labs

Dante Labs Full Genome Sequencing: A Medgadget Review

The future for DNA genealogy is bright, though the massive amount of data will bring new challenges (in that I agree with you, Ed) but it will also solve the problem that we do have now (low overlap between different versions of DNA tests or between different vendors, eg. 44.3% for FTDNA vs Ancestry or 14%/13.3% between 23andMe vs FTDNA or Ancestry!

