Which Family Tree DNA test do I take for the y haplogroup?

+7 votes
WikiTree profile: Stephen Paine
in Genealogy Help by Anonymous Paine G2G Crew (370 points)

2 Answers

+8 votes
Best answer

Adding to the excellent comments from C.R., Randy, and Chase, a tangential answer that may or may not be of any help. If a male has taken an autosomal microarray test from AncestryDNA or MyHeritage (23andMe and Living DNA already provide the info, so they don't really apply), you can use a handy utility developed by Thomas Krahn (of YSEQ) and Hunter Provyn to determine a yDNA haplogroup to the degree possible based upon what Y-SNPs were originally tested: https://cladefinder.yseq.net/.

Some Y Chromosome markers have been included in the default microarray tests since the start; the raw data is there in varying amounts, but some choose not to report on it. For example, the AncestryDNA v1.0 test looked at 885 Y-SNP markers; their version 2.0 tests (there were more than one iteration all lumped together under that one version) from 1,668 to 1,803 Y-SNPs. The MyHeritage v1.0 test included only 482 Y-SNPs, but their v2.0 test a more robust 3,524 Y-SNPs. That needs to be tempered with the fact that the Big Y test examines over 23 million SNPs, and the current Y-haplotree has over 46,000 SNP-defined branches.

Note that the Clade Finder tool is free, but does require that you upload your raw data file. They don't keep your data on file, but the raw file does have to be uploaded. That might be a deal-breaker for some. For those who've had a whole genome sequencing, you can also upload a VCF file from those results (maximum 50MB in size) and the utility can use that to return a haplogroup. You'll get a deeper result from a WGS VCF than from a basic microarray test, of course.

A final note: utilities like Clade Finder--and there are more than one--do a very good job with the data they are given. However, autosomal microarray tests aren't a true substitute to for-purpose Y-SNP testing. A male's haplogroup or "terminal" SNP (we'll just say deepest tested...it's more correct) isn't based on only a single SNP. To be an accurate assessment, the complete hierarchy of SNPs in a particular haplogroup needs to be evaluated. For example, if you're R-P310 positive, you also need to know that you're positive for its parent SNPs, L151, L23, and M269.

Based on personal experience only, higher up in the Y-haplotree I'd say the single SNP test is, oh, 99.99% effective. It starts to get very slightly more iffy the deeper--or more recent--you go in the tree. For example, if you're R-M269, there are (currently) as many as 35 branches deeper than that on the haplotree, and M269 itself is already six branches below the seminal M207 of the R haplogroup. It's remotely possible there might be a mutation in a single SNP very deep in the hierarchy that is an outlier: if some or several of its parent SNPs test negative, then your haplogroup wouldn't actually be represented by that SNP; your deepest evaluated SNP would be a step or two higher up the tree.

Last, and related to the deepest SNP bit, a common "trick" used by some FTDNA project administrators is to have project members selectively test for single (or a small panel) of Y-SNPs based on the results of others. FTDNA offers some single-SNP testing, and YSEQ offers a somewhat broader range at a slightly lower cost.

An example scenario: a surname project has, say, a half dozen men all solidly categorized in a genetic, ancestral group, and they all have taken the Big Y test. Often, there will be other test takers grouped with them from their STR test results only. With a fairly large dataset like that for comparison, a new test taker who has done only a 67 STR test would readily be placed in the same group in the project results. And based on those correlations, it's very probable that the new member of the project will test out deep in the haplotree identical to the Big Y test takers. So if cost is a factor, sometimes the new member can be recommended to test a single SNP at a point at which the Big Y results start to branch; it may be one or several steps above the deepest Big Y results. But a single SNP tested in that context can return very good evidence about the new test taker's position on the haplotree, and verify a deep haplogroup subclade with a very inexpensive test.

by Edison Williams G2G6 Pilot (316k points)
selected by H. Williams

Edison; I thought I was getting the hang of this but suddenly you say "For example, if you're R-P310 positive, you also need to know that you're positive for its parent SNPs, L151, L23, and M269". My BIG TREE surname group is momentarily FT32960 (6 people) then upstream to FT33736 <FGC29721 <DF27 <P312 <M269 <R1b. My maternal male 1st cousin is L21<P312<M269 maybe MCRA at 2600 BCE. Our small FT33736 group developed from 15 Y-35 matches all upgrading to BigY (the 16th did not upgrade).

The cat among the pigeons is that you infer these sequences are not cast in bronze, or even wax ?. If the P312 mutation occurs under a different haplogroup, say H not R, isn't it given a different number - and so on, right down to "terminal" ?. Are we all R1b ?

If you can explain how it works in laymens terms and within 2000 words, it would be most appreciated. Coffee ground and ready to go.

Alan! Fewer than 2,000 words? How am I supposed to operate under that kind of constraint?  

Seriously, though, trying to figure out where to start. The easiest place is probably up high on the haplotree. My P310 and L151 example wasn't a great one; but I didn't want to dive down to levels as deep as your FGC29721.

With the rapid changes in just the last few years to the haplotree (in March 2020 there were 26,862 branches, or subclades, in the FTDNA yDNA haplotree; this morning there are 46,165), we're getting a much more refined picture, but things aren't as simple as they were back in February 2002 when the Y Chromosome Consortium proposed its nomenclature system (R1b1, I1a1, E2b1, etc.) for yDNA haplogroups. At the time, they had identified exactly 153 haplogroups.

The term "basal haplogroup" used to mean founder haplogroups designated by a single letter only. Cast in bronze. Then we got more and more data and the term "basal" shifted deeper to clades like R1b, or E1b, or J2a. Still cast in bronze. And all that metal casting proved to run even deeper than that. R-M269 is an excellent example, which in the long-form notation is R1b1a1b.

There's a whole lotta bronze casting going on. But...a SNP is still a SNP. It's a single allele, a single nucleotide polymorphism, and is subject to possible mutation at any time. However, all those expanded-connotation "basal" haplogroups have continued to show that their defining SNPs are very, very stable, not mutating for well over a millennium or more.

The only thing, really, that differentiates a SNP from a SNV (single nucleotide variant; with the unfortunate pronunciation "sniv") is that a SNP has been determined to appear in at least 1% of the global population (yeah; not an auditable rule, really) and that, typically, it has been submitted to the NCBI's dbSNP database (https://www.ncbi.nlm.nih.gov/snp/) for cataloging. A SNP, like a SNV, is defined by its precise locus on the chromosome, and the allele value at that locus.

For example, your FGC29721 is (under the GRCh38 assembly, or genome map) located at locus 22,286,478. It tests as positive (or "derived") if the nucleic acid is guanine rather than the reference adenine. It is cataloged as the rsID ("Reference SNP cluster ID") rs770607610. And it is, technically, still classified as a SNV and not a SNP.

I know; nitpicking terminology. But even though it's a lot easier for us to call everything a SNP, that's not always the case, even if the variant has been cataloged in dbSNP and has been given a name based on the person or institution submitting it for naming ("FGC" stands for Full Genomes Corporation; you'll see a boatload of Y-SNPs with names starting with "BY"...you guessed it: "Big Y"). Bottom line, though, that's what "FGC29721" refers to: a locus on the chromosome and a specific nucleic acid value at that locus. It isn't associated with any haplogroup at all...until data collection and analysis positions it on the haplotree.

I suppose the net message is that a single allele result deep in the haplotree, in vacuo, isn't bankable. That's the inherent problem with some--not all, but some--results obtained from autosomal microarray testing. One variant in isolation doesn't necessarily confirm a subclade or branch on the haplotree. For solid accuracy, all the SNPs in the hierarchy need to be known. That said, in my example of spot-testing in yDNA projects (and in your own of having 15 Big Y test-takers) accumulated testing data plus a genealogical tree can go a long, long way as verifiable evidence of haplogroup membership.

And even with the Big Y test, not every allele can be positively identified. In microarray testing, a value is either there or it isn't. With full sequencing (NGS) a bunch of reads are made and stacked on top of each other (virtually speaking) and an accurate determination is attempted. Works almost all the time. But as an example, I'm R-BY35083 < BY35076 < BY22194 < BY22166 > BY33322 > ZZ12_1 < DF27 < ZZ11, etc. In my Big Y results, I verifiably test positive for all those SNPs...except one, ZZ12_1. In that instance, I'm a no-call because the test didn't provide enough evidence to accurately name the allele. But FTDNA considers ZZ12_1 as presumed positive in that instance since all the other hierarchical SNPs for R-BY35083 are consistent.

I have a trademark on the term World's Worst AnalogiesTM, so I'll close with a very bad but simple one. An individual SNP is like a street address. Let's say it's 13822 Purloined Pigeon Court. Pretty specific. But it isn't impossible for some other place to also have a 13822 Purloined Pigeon Court. If we add a town or city name to that, we're probably 98% sure we've got the unique location. Further add a zip code or postal code and, voila!, almost a certainty that we've put a pin a particular place with extreme precision.

Haplogroups are kinda like that. A given SNP (or SNV, or grouping of synonymous SNPs) is like a street address: it defines that haplogroup branch but, by itself, isn't enough to be completely certain we're at the right place. The collection of SNPs that define the hierarchy above and sometimes below (like my no-call) help do that.

Some random male somewhere in the world could have a G instead of an A at locus 22,286,478 on his Y chromosome and not be related to your patrilineal family group. But if he's also positive for BY3190, CTS10029, and FGC15710, that you can take to the bank.

Edison and all other contributors,
This thread has really helped my understanding. Thank You wink

A special prize Edison for 'laymanship', a rare skill in these jargon infested waters. An NZ Pinot Noir or Savvy suggested.

A supplementary question related to your last paragraph; I get FGC29721  because I have a G at 22,286,478 and I'm positive for the R1b1 sequence. My neighbour also has G at 22,286,478 but he is positive for Haplogroup H or M69. Does he also get the FGC29721 notation, or does it change because he is positive for the H sequence.

Why, thank you, Alan! Ya know, I think I'll take a Savvy; normally a red guy and have seen some snobbery over NZ Sauvignon Blanc, but I have yet to try one that I didn't like very much. But I haven't tried enough of them, so I should keep testing...

FGC29721 would stay FGC29721 because that naming assignment is associated with the polymorphism itself, not a haplogroup. FGC29721 makes for an interesting example because seldom do FTDNA and YFull completely agree on a SNP/SNV; they disagree in a lot of places and, if I had to guess why, I think it would be because YFull prefers to use names that don't start with "BY," and FTDNA prefers to use names that do start with "BY" when it's practical. In this case they both use the Full Genomes Corp. FGC29721 as the haplotree branch name.

Two other quick items that I didn't think to mention earlier. First is about those SNP/SNV names. The names themselves, and same with an rsID number, will always be unique: FGC29721 will never be reused as a name and will always refer to the same polymorphism (the actual locus on the chromosomes can and does change with modifications to the genome reference assembly (e.g., I didn't check, but its singular, positional address may have changed from Build 37 to Build 38, and with so much advancement in recent years with the Y chromosome, I can almost guarantee some loci numbers will be shifting around come the next major assembly publication), and rs770607610 will always refer to the same polymorphism.

However, there's no administrative constraint anywhere that I'm aware of that controls the initial naming. NCBI and dbSNP control the rsID designation, but if a handful of researchers submit a previously uncataloged SNP, they may each be given a unique rsID. After the fact the dbSNP database will be reconciled so that, eventually, duplicate rsIDs will be deprecated and we'll end up with only one per polymorphism. As an aside, this can cause issues also when trying to compare autosomal SNPs in our raw microarray data. Results from, say, 2011 may use a different rsID for the same locus as a test taken in 2021.

The Y Chromosome Consortium used to have a hand in the naming of Y-SNPs but, while I don't know the exact dates, the YCC was founded circa 1993 and suspended activity circa 2009. There's no single administrative authority for these names like there is with dbSNP management. This is the primary reason we see synonymous names for exactly the same SNP/SNV, why those names are never really deprecated (at least not formally), and why different haplotrees may use different branch naming. Not ideal. It's hard enough to follow along with this stuff without that complication. But moving a couple of steps higher up the tree from FGC29721, the FTDNA tree doesn't show it but FGC15737 is synonymous with Y10800 (the "Y" prefix designating Adamov and team at YFull). Another couple of steps higher and FGC15710 is synonymous with Y8717.

The last item is about how likely it is to find an outlier polymorphism, meaning something like FGC29721+ (A>G) in, say, someone in the E haplogroup. It can happen because SNPs/SNVs aren't tied down; they're free to roam! Well, free to mutate to one of four nucleotides like any polymorphism. But...

We had some discussion here on G2G recently about FTDNA (hopefully soon) coming out with their version of branch dating on the haplotree. Initially, in 2019, they'd said that the new date feature would be something that was added to customers' Big Y results, that a localized TMRCA would be added. I don't think the latter will happen, at least not until after the branch dating, a la YFull, is complete. We'll see.

A lot of people had been guessing that, using 32 or 33 years as the average generational interval along the patrilineal line, we'd see somewhere around 85 years per. And Iain McDonald's excellent new paper (in a special Statistical Genetics issue of Genes, June 2021) pretty much sealed that deal. I personally believe that FTDNA postponed its branch dating until Iain published, and his generalized number came in at 83 years. However...

That's not about any one, single polymorphism. That's taking into account the whole shootin' match. The Big Y-700 seeks to examine a little over 23 million base pairs, and Iain's paper worked with the assumption that we'd be looking at about 15 million meaningful SNPs/SNVs (see section 2.2.1 in his paper). To arrive at that figure of 83 years, Iain used a single polymorphism mutation rate of 8×10−10 per generation as an overall average. As a real number that's 0.0000000008. We're currently at a global population of 7.88 billion, with about 85 million births this year so far. So the odds of a single, specific polymorphism like FGC29721 showing up in a male who is not R1b1a1b are not high--not by a long shot--but they aren't zero.

I lied. This is the last thing. There's another term when dealing with polymorphisms: UEP. UEP stands for unique event polymorphism and, essentially, it's intended to represent a mutation that's so infrequent that it's probable every individual who has the SNP will have inherited it from the same ancestor in the same mutation event. Uptopic I talked about basal haplogroups, not just the top-level single-letter designations, but also some deeper clades like R1b1, I1a1, E2b1, etc. Now we're talking mutation rates of once in tens of thousands years. UEPs are really the basis for the Y-haplotree and how we've been able to define it in a hierarchical structure. We still know much less about recent markers, the ones very deep in the tree, SNVs (single nucleotide variants) like FGC29721 for example. So the higher you go in the tree, the more stable--mutationally, speaking relatively--the SNPs are. And saying that I should note that most of the depth in the tree has come so far from our R1b clade, or M343. Currently, M343 has 18,432 named branches and the entire haplotree has 46,213 branches.

And to think that all Mr. Paine wanted to know in his original question was which FTDNA test to take to find out his haplogroup. <cough, cough> I'll call it Value-Add instead of way Too Much Information...

Deep reds are my favourite. Perth (Margaret River) or Melbourne (Limestone Coast) suggested but Spain, Portugal and S.France have some beauties.

A rain cheque on that Statistical Genetis subscription if you don't mind. Like other deep thinkers, 42 is the best I can do so up to you and Ian to advise as he so ably demonstrates with Big and Block Trees.

A new acromym for my yDNA aim is MDCSA; most distant common surname ancestor. At the moment his name is FGC32960 and he probably lived in Wales, but with 7 undefined SNPs below it there is plenty of room for other surnames to appear; but Wales is looking reasonably secure. My lot spent 10-12 generations in Armagh/Down before heading to NZ.

Our small group debated the generation issue and 32-33 is OK for MDCSA, but I'm with Hobbes for the distant past - say death at 40 and first child close to puberty so for most of H.sapiens history generations were half that ?. And in case you have a few more words to spare, is haplogroup A the definition or type speciman of H.sapiens or is this a new question?.

Thanks to Mr Paine for getting the ball rolling. The rule of unintended consequences applies.

"Is haplogroup A the definition or type specimen of H.sapiens or is this a new question?"

You'd think that would be a simple yes or no, wouldn't you? But...it isn't.

I think the short answer--but don't quote me--is that we can, for day-to-day operations, think of haplogroup A as being "yDNA Adam," but there's a difficulty with that if we want to be strict about it.

Part of the reason has to do with our ability, from ancient DNA, to positively distinguish what in the Y chromosome can be verified as a unique stick in the sand that says, "This is exactly where, in the long history of family Hominidae, Homo sapiens began." Working with ancient DNA isn't easy. We've made great technological strides in the last decade, but there's still the matter of locating a wide range of samples, sample integrity (in an Antarctic deep-freeze for 200,000+ years would be best; unfortunately the cradle of humanity, Sub-Saharan Africa, generally doesn't offer the best environmental conditions for the preservation of nuclear DNA), the ability to obtain fully sequenced DNA  from a sample (it's only the mitochondria that remain pretty stable, and until around 2013-14 we didn't yet have the ability to even really think about working with the broad sequencing of nuclear DNA).

For example, a study by Mendez, et al., published April 2016 in the American Journal of Human Genetics, made a determination that (to a 95% CI) the time to the yDNA most recent common ancestor for both Homo sapiens and Neanderthals was about 588 thousand years ago. And opinions still differ considerably on when modern humans first appeared; definitions vary as to "modern," and estimates range from ~140,000 years ago to as long as ~350,000 years ago.

The flip side to that, which really is what our Y-haplotree attempts to categorize, is to determine the earliest known patrilineal ancestor of what we think of as modern man, specifically the somewhat faulty analogy of "yDNA Adam": the signature of the Y chromosome from which modern man's Y chromosome derives.

The sticky wicket with that is: yDNA lines die out. They become extinct. Simple example: we know our "yDNA Adam" had at least two sons, and that one of those sons had a yDNA polymorphism that the other son did not. Otherwise "yDNA Adam" wouldn't be "yDNA Adam." But then suppose one of the sons dies before having male offspring. Voila; an extinction point. Now there is only one haplogroup/haplotype because only one son had sons of his own. So that son has now become "yDNA Adam." Massive population bottlenecks have happened throughout history...as if survival before fire or stone tools or agriculture or smartphones was all that easy anyway. So our yDNA time to MRCA could only have moved forward in time, not backward. We may never be able to accurately differentiate the yDNA signature that signals the start of Homo sapiens sapiens.

The haplotree was figuratively turned on its head as recently as early 2013 when our friends Michael Hammer, Thomas Krahn, Astrid-Maria Krahn, and others published a paper, also in  the American Journal of Human Genetics, titled "An African American Paternal Lineage Adds an Extremely Ancient Root to the Human Y Chromosome Phylogenetic Tree." A descendant of a South Carolina man named Albert Perry, born about 1819, had a Y chromosome that proved to predate anything on the haplotree at the time. This new haplogroup was labeled A00 and what had been the previously oldest "starter branch" of the haplotree was renamed A0. The earliest and topmost part of the haplotree looked very different just eight years ago.

At the time, A00 had not been found anywhere in Africa, which would certainly have been expected. Research subsequent to the discovery of A00 found that its living population epicenter seems to be in the Bangwa and Mbo peoples of Cameroon.

Maybe the best visualization of what those topmost branches look like right now is the tree at YFull: https://www.yfull.com/tree/. The haplotree at FTDNA is describing the same thing, but for those topmost basal clades, the 500-foot view at YFull provides, I believe, a better at-a-glance overview.

Kinda puts it in perspective, though. ISOGG defines the genealogical timeframe as: "the period in which it is possible to find genealogical records relating to individual ancestors which allow the researcher to construct family trees." FTDNA says it is "the most recent one to fifteen generations." Other definitions will include the earliest widespread adoption of consistent surname practices, which in Europe began as early as the 11th century, but it really wasn't until the 15th century that surnames began to be used for purposes of legal inheritance. To be lenient, call it 1,000 years. Which makes it a very good thing that we don't have to try to trace our anthropological relatives from 100,000 years ago.

Thanks for the very interesting explaination. The brief Yfull tree is easy to follow but the '500 footer' might be more of a challenge. Big/Block Tree at least gives us the downstream end. Presumably Y haplogroups can be reverse engineered, in that all males (who have tested to date) have the A00 SNP, but you would not find any that died out.

Are coding mutations called SNPs ?. Is it reasonable to say that mutations favourable to sapiens gradually appeared in (say) neander genes, maybe compressed into a small group by an ice age type of event to become dominant and Eve eventually appeared. In a small group, her genes would quite quickly become dominant and start to spread. Y DNA was fortunately dragged along as an adjunct and her son gave us haplogroup A (number of zero's unknown)?.

The ECONOMIST not to long ago said it was the TAX MAN that really got surnames moving in the 1400's, the Florentines needed a handy way to differentiate people to extract more money. No suprises there.
+10 votes

Y111 will only give you a predicted haplogroup. If you were in haplogroup R1 for instance you could do no better thn M269 as a prediction. There are tools available that would make a prediction futher down the Big Tree than that using y111. But it won't be confirmed until SNP testing is done.
SNP testing in included in the BigY kit along with the STRs (Y111 is an STR test). BigY or SNP testing would give you a confirmed Haplogroup.
Alternatively, I hear 23andMe does limited "confirmed" Y haplogroup testing in their autosomnal kit. that might take you a bit further down the Big Tree than 10,000 years ago +/- (M269) :)
by CR Campbell G2G1 (1.8k points)
CR Campbell is mostly CORRECT. Let's put some refinement.

FTDNA FamilyFinder is the only microarray test to NOT report ySNPs to determine a haplogroup. They actually remove it from the lab result before reporting the result to you. And FTDNA could predict to the (near) leaf haplogroup with a y67 STR test but chose not to.

23andMe, as far as microarray tests, tests the most Y SNPs and this has the chance to get you the deepest in the tree. MyHeritage the least (not surprising, given the test is from FTDNA). And Ancestry and LivingDNA in the middle. A big benefit of 23andMe is testing Mtdna SNP testing is included also. Microarray testing is not as deep as sequencing though.

If you do an STR test, use nevgen.org to likely get a near leaf haplogroup prediction (that is never wrong). They are essentialy doing the same thing you can. Looking at your closest STR matches that also tested BigY SNPs and finding the deepest common haplogroup among the results.

There are cheaper sequencing tests than BigY that are deeper but do not get you the closed FTDNA match database or tree. Nebula Genomics and ySeq are always cheaper. Big benefit is they are WGS tests. So you get the deepest microarray result for matching as well as a full mtDNA sequence for the deepest haplogroup.

With all that said, not sure why you would want the haplogroup only. That is like doing a microarray test and only looking at the ethnicity report. There is so much more you can do if you include deep STR testing and matching. The testing also comes with WGS sequencing but FTDNA is king for matching (like Ancestry is for Microarray segments) due to having the largest match DB.

"not sure why you would want the haplogroup only" - My thought exactly.

Related questions

+2 votes
1 answer
88 views asked Jun 7, 2018 in Genealogy Help by Ernest Doucette G2G1 (1.7k points)
+7 votes
0 answers
+7 votes
2 answers
109 views asked Dec 26, 2020 in Genealogy Help by Danny Redmond G2G Crew (490 points)
+2 votes
0 answers
163 views asked Jun 16, 2018 in The Tree House by Peter Roberts G2G6 Pilot (570k points)
+3 votes
1 answer
175 views asked Feb 19, 2018 in WikiTree Help by anonymous G2G Rookie (250 points)
+2 votes
1 answer
96 views asked May 25, 2019 in Genealogy Help by Missy Berryann G2G6 Pilot (130k points)

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright