A 30X Full-Genome Sequence for $199 (€169 £150)?

Question

A 30X Full-Genome Sequence for $199 (€169 £150)?

I don't want to rehash what Debbie Kennett already posted, but some salient points:

Dante Labs websites: https://us.dantelabs.com (US website), https://www.dantelabs.com (EU website). For U.S. customers only, a link to purchase from Amazon.com with free Prime shipping: https://amzn.to/2DKDlUS.

This is a full-sequence, or WGS test, "Whole Genome Sequence." There are no companies offering genealogy or cousin matching for WGS tests...at least not yet. One reason is that the resulting files can be massive in size. You have around 3.2 billion base pairs, after all. For most genealogists, a full-sequence test won't be of much interest.

Don't be confused with tests that offer whole exome sequencing or WES. Exome is the term used for all of the known protein-encoding genes in your DNA. That's not what you want if you're a genealogist. WES tests are typically less expensive than WGS tests because they look at a limited area of the DNA...and that area, since it's the minority of your DNA, the bits known to encode proteins, is not what's most important for genealogy: we want all the stuff that's more free to mutate than our genes, all 10+ million identified SNPs and all the "junk" DNA (and yep, "junk DNA" is an actual term).

Dante Labs makes it very clear how you will have access to your data. VCF files will be available for direct download. This stands for "Variant Call Format" and is a common way of representing genomic information. It is a plain-text file and will come compressed in a ZIP file. You can read about the purpose and format of VCF files here: https://samtools.github.io/hts-specs/VCFv4.2.pdf.

If you're serious enough to want a WGS test just for personal use, you're also going to want a BAM (Binary Alignment/Map) file. That's what you'll need to do anything else with all that data. Not to make your head spin, but a BAM is exactly the same as a SAM, only in a compressed, indexed, binary form. SAM stands for Sequence Alignment Map. It's a text file that stores sequence data in a series of tab delimited ASCII columns, very much like your raw AncestryDNA or FTDNA results. But...I did mention 3.2 billion base pairs, didn't I? Your AncestryDNA data looks at only about 0.02% of that. Ergo, the BAM indexing and compression. Full SAM format info here: http://samtools.github.io/hts-specs/SAMv1.pdf.

Dante Labs gives you VCF files for online download. BAM files for a WGS test are too large for that. After your test results are in, you can optionally buy a 500GB hard drive for €59.00, call it about $70US, and they will send you your BAM and FASTQ files on that drive. It's an extra expense above the (pretty remarkable) $199 sale price, but one you may well want to factor in up-front. (FWIW, FASTQ is a format developed by the Sanger Institute designed to group together sequences and their quality scores--which uses the interesting moniker PHRED--but you can Google it if you're curious; will probably be of less use to you than the BAM file).

This is a 30X resolution test. That's pretty much become the norm, at least so far. There are lower resolution tests out there, and some that are higher; 100X is about as high as you'll commonly see today. More accurately, this is referring to the sequencing coverage of the NGS (Next-Generation Sequencing) test. From the company that makes a lot of this tech, Illumina: "Sequencing coverage describes the average number of reads that align to, or 'cover,' known reference bases. The next-generation sequencing (NGS) coverage level often determines whether variant discovery can be made with a certain degree of confidence at particular base positions." Read more here.

To me, that's a tiny bit confusing because the test looks at your entire genome whether it's a 15X or 100X test...so I find it clearer to think of it as "resolution" rather than "coverage." Just me, though.

Dante Labs states that they offer variant calling against hg19 (GRCh37) or hg38 (GRCh38), depending on the customer's needs. My suspicion is that we don't have a say in it for the sale price, and that it will called against hg38...which is just fine by me.

Shipping seems to be included in the price. They write: "After your purchase, you will receive an email containing a DHL pre-paid return shipping label. You can schedule a pickup online or by calling your local DHL Contact Center."

You get two customized reports with the test. You can see examples here: Sample Customized Report on Genetic Diseases and Sample Health & Wellness Report.

Now we'll see how it goes. I ordered mine from Amazon; Prime two-day shipping applied so it will arrive next Wednesday. I'll generate the saliva sample before any imbibing or overeating on Thanksgiving Day.

commented Nov 18, 2018 by Edison Williams G2G6 Pilot (441k points)
edited Nov 18, 2018 by Edison Williams

I'd promised myself that if a reasonable resolution/coverage full-genome test dropped down into the price range I paid for the Big Y test, that I'd do it. I honestly didn't expect a price transition to this level so quickly. WGS tests at 30X dropped from around $1,300ish to around $700 earlier this year, and $500 to $700 is the best price range I'd seen. I thought we'd see a drop to the $300ish level sometime in 2019.

I'd personally prefer a 60X, but not enough to wait another year to see if that price-point dropped within range.

Oh, and I failed to mention that Dante Labs' website indicated 10 to 12 weeks from receipt of sample to results. Not at all bad if it holds up.

Debbie posted this link to a 3 November Twitter post by James Hadfield on the rumor that Illumina will be introducing a "$100 genome" in a few months. Don't know if that represents new HiSeq equipment or just new workflow; but it probably represents a perceived cost rather than retail price. It will be interesting to see how Dante Labs handles the orders from the sale, and whether this is, in the current environment, simply a loss-leader proposition for PR and market share. But if they can actually break-even at $200 right now, the whole whole-genome market may explode next year. I predict a new product announcement from 23andMe as early as January.

The biggest issue for consumer (non-medical) interest is the sheer amount of data. There's just no IT infrastructure that can do real-time matching and comparison for genealogy. My bet is that someone will take it on if $200 really does become a 30X test benchmark, and that everything will be done via batch processing behind the scenes: we won't be able to run on-demand comparisons like we can at GEDmatch, but reports will be compiled for us over the course of days, and maybe updated monthly. May be a new business idea. We could be comparing several millions of SNPs rather than a few hundred thousand, and we could include the Y-chromosome and mtDNA into the mix. Too, with base pair data between SNPs--something we have to only assume today--we could get down to a finer degree of matching than the 8 or 10cM segments we use now.

Hey, Rob. Let's find a few investors, each put up $5 million, and have a go! I already have a good domain name: Geneanetics.com.

A FWIW postscript: When I placed my order at Amazon, they were showing 20 units available. My bet is that was the original allocation from Dante Labs and that mine was the first purchase. About a half-hour later, they're already down to 15!

commented Nov 18, 2018 by Edison Williams G2G6 Pilot (441k points)

Quick update. If I'm correct that mine was the first kit sold by Amazon.com in the U.S., then they've sold 55% of the available, allocated tests.

And going on sale starting about 20 minutes ago, Veritas Genetics is jumping in and following Dante Labs marketing lead. Their Twitter announcement: https://twitter.com/i/web/status/1064290631051284480, and the $199 sale price went live as of 9:00 a.m. Eastern this morning: https://www.veritasgenetics.com/.

More developments happening on the artifact testing front, as well. I did not see all of this coming for this holiday season. Fun times to be a DNA nerd...

Edited to add: Blaine Bettinger's new blog post today, "Testing Artifacts to Obtain DNA Evidence for Genealogical Research." It examines the current state-of-the-art, looks at what might be around the bend, and talks about the value of that ancestral DNA and what might be done with it for genealogy.

commented Nov 19, 2018 by Edison Williams G2G6 Pilot (441k points)
edited Nov 19, 2018 by Edison Williams

I'm inviting myself to this party now that I know some of the people who will be attending I ordered from Amazon, which currently doesn't have any "X kits left" mentioned. It has one negative review based on the long time to get results, so I'm prepared to wait even longer than advertised.

I have been thinking for a long time about how WGS might work in the genealogy domain. My proposal would involve extraction of a core set of SNPs currently used by the genetic genealogy companies, followed by closer examination of WGS data for just a matching segment. If the two parties match on novel or rare variants, then they would have a more recent MRCA than a match without those variants. I think this would be a more tractable approach than trying to compare whole genomes.

I am sort of following this model for the novel mutation responsible for the hearing impairment that runs in my family. I've traced it back to John Riley, born about 1813 in Stafford County, VA. It's an easily observable phenotype, so I send a query to anyone who matches me there. So far, none of my matches have the trait, so I know that the match must be earlier than 1813.

http://blog.23andme.com/23andme-and-you/23andme-how-to/tracking-down-a-trait/

http://blog.23andme.com/23andme-customer-stories/citizen-scientist/

commented Nov 24, 2018 by Ann Turner

Woo hoo! You can't see it--and you should be happy about that--but I'm jumping up and down having Ann at the party! I would have sent her a gilded invitation had I known she was interested. And I'm in complete agreement about how we may see WGS data actually used in genealogy.

(To Gaile: I was being facetious with the earlier start-a-business comment; I have about three new ideas for a business every year, and only one was ever successful enough to go public in a sizable way; so the track record ain't, uh, exactly what you might loosely term "good"...)

I'd seen some general, very basic (and probably dated) info from Strand Life Sciences regarding WGS BAM size and computational requirements dealing with FASTQ GZIPped files. At about 30X, the BAM should come in at around 80-90GB. They state, "...assuming whole genome samples are done at read lengths of 75 or above, the size of each whole genome sample [compressed FASTQ] can be rounded off to about 150 GB," that to accommodate up 40X. For a dedicated machine operating 16 cores at 2.7GHz with 32 GB RAM, they estimate generating aligned DNA reads from a FASTQ file takes about 6.5 hours each.

Per the SAM format specs, we should see each read per pass, plus a per-base-read quality string, read ID, flag, tags, and what not. On a bioinformatics board I read one person state that, at their institution, after alignment to GRCh38 the output SAM files from 30X tests were usually 250-350GB in size.

I've seen some amazing advances in computing and communications tech in my lifetime, but I'll never see a GEDmatch-style operation for complete WGS data. I'm also dog-paddling way over my pay-grade here. I'm eager to get my hands on the data and start learning, but on this subject I know not whereof I speak.

If I had to guess, though, I'd say Ann is spot-on. 'Cause she does know what she's talking about. What will interest genealogists are essentially the same data that interest population geneticists. They (we) pretty much don't care about exomic data (well, mostly); that's for medical researchers. But there are somewhere around 10+ million SNPs identified and cataloged. If we consider current (some customized) genotyping microarray testing, including Living DNA's new Thermo Fisher Scientific chip, we're looking at less than 10% of those, or about 900K SNPs.

I'll betcha some big-brained population geneticists have already prioritized many more of those 10 million SNPs. (In fact, I know one I think I'll ask about that.) In other words, Goldilocks theory: for genealogy, 10 million SNPs may be overkill, but 900K may actually be too few. With greater SNP density should come less imputation/inference about segment sizes: greater accuracy and less guessing. Maybe there's a sweet-spot in there of, say, 5 or 6 million SNPs that are both stable and ancestral/population indicative. Dunno; I'm clueless.

If we had 6x greater SNP coverage to compare than we have access to today, we may not need to do much more with the other 99.8% of the base pairs. Meaning real-time, GEDmatch-style compare-and-report. Extracted databases of those sizes would probably be manageable.

Novel and unique autosomal SNPs may have less genealogical relevance than do the ones we're seeing almost weekly in the Y-chromosome. But maybe they could be handled similarly to the way Alex Williamson does for The Big Tree...but automated and batch-compared. And I imagine that yDNA and mtDNA WGS data would always be split off into their own databases. Oh, and speaking of Alex, I learned that the Y-DNA Data Warehouse can already accept VCFs from Dante Labs, but of course that endeavor only includes yDNA in Haplogroup R.

And as Ann noted, I'll bet we start to see some patterns where we've been incorrectly inferring unbroken segments based on matching SNPs, probably in SNP-poor chromosomal regions (some centromeres, telomeres, etc.). Not real-time matching stuff, but researchers having access to volumes of WGS data could start to tell us far more about our accuracy in working with SNPs (and lead to big-time refinements of imputation and matching accuracy), as well as helping us do much more in positively identifying pile-up regions.

Regardless of what happens with the tech and our ability to use the data for genealogy, if we're seeing a $200 WGS become an actual thing it's going to be time to start figuring out how to store and preserve--and grant permission for research purposes to--those vast amounts of data. Just in the 15 years that I've been messing with DNA for genealogy I've seen test-takers pass away and their family members then have no access to--or interest in--the DNA tests or any communication about them. Blaine Bettinger has written about and discussed the situation, and maybe we need to make it a public-service planning priority for 2019.

When drafting a will, password information and a directive of what to do with a DNA test may cross almost no one's mind. But I'd be hard-pressed to think of anything as unique and irreplaceable as my DNA information. We're all a one-off. My beneficiaries can use or sell property, can scan and archive photos and documents. But my DNA is unique and, once they lose access to (or interest in) it or the management of it, it may be irretrievable. We might be able to sample artifacts like stamps or envelopes, but there will never be a way to get back entire WGS results.

Hm. Off-topic, but food for thought...

commented Nov 24, 2018 by Edison Williams G2G6 Pilot (441k points)

Ann: I'll bet that was the reference I made earlier about Amazon displaying a "number available" count. Bet that Dante Labs gave them 20 to stock in-house for fulfillment, and I was just lucky enough to nab one of those. FedEx shows that mine was delivered in New York today. At least I'm glad you got notice that yours has shipped!

Andreas: Not so glad that you've heard nothing. If your kit is coming directly from Dante, I'd use their website's contact form and ask for a status. I had good response when I was over-eager and wrote them within minutes of receiving my kit...there was no shipping label; turns out that Amazon sent me a printable label the next morning anyway. But Blaine Bettinger had even better response: he asked them a question and they got back to him in only a couple of hours.

I fully expect this $200 deal was only dabbling a toe to test the water, to test the response. I believe they got their answer: at that price-point, there is demand. On the plus side, they know that this initial offering gleaned notable marketplace influencers, like you, Ann, Blaine, Louis Kessler, Randy Whited, and a number of others. I'm prepared to be patient and don't expect the 12-week turnaround shown on their website, but if they totally botch the handling of these initial few hundred, they could be excoriated on the interwebs and create an uphill climb for themselves. Potential market leader to battlefield triage.

Since it's 2:45 Saturday morning where you are, I'm hoping you'll wake up to find a note that your kit has shipped. I'm forever an optimist (I just sound like a curmudgeon most of the time).

commented Nov 30, 2018 by Edison Williams G2G6 Pilot (441k points)

Well, no; not quite that simple: https://www.wikitree.com/g2g/718053/startup-company-nebula-genomics-using-different-business. The $99 low-pass sequencing that Nebula offers is essentially useless because it's only 0.4x coverage, or "resolution"; doesn't even look at the whole genome and doesn't perform multiple passes to resolve potential no-calls or allele disagreements.

The "free" sequencing does seem to be 30x coverage; perfectly respectable and useful. But my understanding is that it requires a minimum of 1,000 "points" to get a free test. One of our WikiTreers registered, answered every question available to answer, and also uploaded his raw data from AncestryDNA and 23andMe. That garnered him a total of 650 points. Presumably the only way to top those remaining 350 required points is for one of Nebula's research or pharma partners to take interest in your profile and pony-up the difference to start the test.

On their Twitter feed, Nebula announced they went live on 16 November 2018, same date as the G2G post linked above. But two-and-a-half months later I can find no mention anywhere of any 30x WGS tests having been qualified for and started. In fact, I can't find any news or press releases about them at all since that mid-November announcement. Their Twitter feed contains some interesting general news about genetics, and some self-congratulatory posts about things like being finalists for the March SXSW 2019 "Pitch" Award for start-ups, and co-founder George Church being featured in Newsweek's Creative Class of 2019. But nary a single, solitary mention anywhere of any WGS tests being performed. Not to say that none have; just that one would think it would be mentioned somewhere after two-and-a-half months if the company hoped to gain credibility with potential test-takers.

And thanks for visiting, Liberator26. But does that link to Nebula look quite a lot like an affiliate or associate link? As if you might be trolling here for commissions? The temperatures are gonna plummet there in Bucyrus, Ohio, over the next couple of days. Stay warm and safe!

Edited: "...Since that mid-December announcement" changed to "since that mid-November announcement." Yeppers. I can tell one month from another; I can read a calendar...

commented Jan 28, 2019 by Edison Williams G2G6 Pilot (441k points)
edited Jan 29, 2019 by Edison Williams

Hey, Andreas. I started a thread on Nebula two days before this one: https://www.wikitree.com/g2g/718053/startup-company-nebula-genomics-using-different-business. But I admit I haven't been monitoring their progress, other than their being a follow on my Twitter account. I did register with Nebula, but didn't proceed through the questionnaires toward the potentially free WGS. The Dante Labs option became available almost simultaneously with Nebula's launch, and I decided to go that route.

FWIW, on the Dante front, I expected no status as of yet because it's only day 50 since my sample arrived at their New York office and I logged into my account just to see. No status as of yet.

I saw that you purchased the external drive--in order to receive the BAM file--at the same time you ordered the test. I didn't. I dropped them an email today asking if it's better for me to buy that now, in advance of the sequencing completion, or wait until the test is finished. Not knowing if the HD would be prepped at the lab in China or by Dante in Italy or New York, I may have made a mistake by not purchasing the HD option at the time of the original order, like you did. We'll see.

commented Jan 29, 2019 by Edison Williams G2G6 Pilot (441k points)

Categories

A 30X Full-Genome Sequence for $199 (€169 £150)?

Please log in or register to add a comment.

Please log in or register to answer this question.

2 Answers

Please log in or register to add a comment.

Please log in or register to add a comment.

Related questions