Did you see that Family Tree DNA has made public the world's largest yDNA Haplotree?

+30 votes
1.8k views

FTDNA has been, by far, identifying, collating, and cataloging more Y-chromosome SNPs than anyone else. This thanks to the volume of Big Y-500 tests taken over the past few years...I personally believe a success in terms of quantity that surprised even them because the cost is not entry-level.

Currently, FTDNA's phylogenetic tree of yDNA haplogroups contains over 16,000 branches, over 118,000 variants and more than 160,000 confirmed SNPs. "Confirmed" indicates that the SNPs have been determined to be ancestral rather than only derived. Full sequence yDNA testing will typically reveal that almost every male has at least one novel or unique SNP or indel; to be added to the phylotree means that an identified SNP has been found in multiple men and is considered to be ancestral: in general, having been inherited for more than one generation.

Previously, the tree was available only to those who have taken the Big Y-500 or SNP panel testing with FTDNA. Now we can all look at it.

To view the tree, be certain not to log into your existing FTDNA account, and you'll find the link at the bottom of the page under "Community." Here is a direct link launching at the top of the tree: https://www.familytreedna.com/public/y-dna-haplotree/A.

There is a pull-down menu where you can choose to view by countries, surnames, or variants. If you're used to dealing with a yDNA Haplotree elsewhere, you will undoubtedly be most comfortable navigating by "variant." If you search by "Branch Name," you'll need to use the full top-level haplogroup indicator, e.g., "R-M269" rather than just "M269." Unless you're familiar with the hierarchy of the SNPs you're looking for, a search will almost always be in order. With so many SNPs cataloged, surfing through them is unlikely to lead you to the "terminal" branch you might be seeking.

in The Tree House by Edison Williams G2G6 Pilot (434k points)

A quick follow-up note. FTDNA uses what most of us consider to be the "standard" taxonomy for Y-SNPs. However, a sizable number of SNPs have more than one alias (or appellation). An example: R-U106, R-M405, and R-S21 all refer to the same SNP, but you won't find R-M405 or R-S21 in the FTDNA phylotree. These naming differences arose based upon the organization that first discovered and cataloged the SNP (e.g., the "L" series by FTDNA, the "M" series by Stanford University, the "CTS" series by the Sanger Institute in Cambridge, England, and so on).

If you are unable to locate via search a SNP that you believe should be in the tree, have a look at the yDNA haplogroup tree that ISOGG maintains: https://isogg.org/tree/. That tree does not dive as deeply into the subclades as the FTDNA tree is able, but it keeps up with significant developments and includes the aliases where known.

Due simply to the volume and pace of identified subclades and SNPs, the naming convention of using a long string of letters and numerals to identify subclades is slowly going away, replaced by the designation of the subclade's deepest identified SNP. For example, R-U106 (and R-M405 and R-S21) is also known as R1b1a1b1a1a1. With over 160,000 SNPs now cataloged, you can understand why the older nomenclature is phasing out. ISOGG is still your source to find the aliases of a SNP as well as the older naming convention.

Edited to add: a rather important "not" that was omitted.  :-/

Yet another follow-up note. I rushed to G2G to post about this development as soon as I accidentally discovered it by going to the FTDNA website; FTDNA group project administrators were not separately notified.

But Roberta Estes had already beat me to it! She posted about it on her blog a couple of hours ago: https://dna-explained.com/2018/09/27/family-tree-dnas-public-y-dna-haplotree/.  laugh

One other resource for identifying SNP's is www.yfull.com

Problem is that using the RU106 method, one does not know if a SNP falls under the R1a or R1b branch, and I am sure this problem exists for other haplogroups.

ISOGG, while good, does not cover all  SNP's neither does YFULL or FTDNA.

Tis a shame that the ISOGG method (e.g. R1a1a or R1b1 or E1 or E2) is going away because the layman (most of us) will look at a classification like R-5578 and R-M269 and not understand that they represent different branches of the R phylotree and are separated by 47,000 years.

Even fewer will grasp the significance of a terminal SNP like R-YP6373 and realize that it is a subclade of YP5905, and it's bearer shares with other YP5905's the same rather recent common ancestor.

This is a problem that needs to be addressed and solved, alas the only organization that cares is the International Society of Genetic Genealogists, because it says so in their names, alas their own phylotree is not up  to date and all inclusive of all known and newly discovered SNP's.

They only update about once or twice a year, and even then do not incorporate all newly discovered SNPs

"They only update about once or twice a year, and even then do not incorporate all newly discovered SNPs."

Well...yes and no. ISOGG is actually doing a much, much better job in updating their tree.  In fact, it has been updated 234 times so far in 2018: https://isogg.org/tree/ISOGG_YDNA_Version_History.html.  smiley  Disclaimer: I'm an ISOGG member and an editor of their Wiki so my view is a tiny bit biased.

But ISOGG does no direct testing or evaluation of newly-discovered SNPs, so there's no API-style pipeline that gets all the newest additions from multiple sources into their tree. Ray Banks busts his tail to keep up, though. Another new addition this year for ISOGG is its Y-SNP index: https://isogg.org/tree/ISOGG_YDNA_SNP_Index.html.

The 800-pound gorilla is FTDNA simply because they do more Y-SNP testing than anyone in the world, by a factor of magnitude. Yfull gets a large number of BAM files to analyze, but it's still a fraction of the NGS testing at FTDNA...and YSEQ continues to grow in the marketplace, too. Yfull doesn't do testing, so it's reliant on data submitted from other companies. That's not to say they don't have expertise to analyze things on Vadim Urasin's team.

I honestly don't know if we'll get to a consolidated, one-stop-shop yDNA phylotree anytime in the near future. Developments are coming in fast and furious, making it extraordinarily difficult to keep up, and the largest databases are commercial in nature. That, to me, is why it's such a big deal that FTDNA has made their tree public.

You're right about the nomenclature. I'm not certain of a reasonable way to deal with it. Clearly, the string of letters and numerals can still work just fine for mtDNA haplogroups because there are only a fraction of the number and new additions are at a snail's pace: the last phylotree.org update was February 2016 and there were over 5,400 nodes (or haplogroups) compared to nearly 200,000 now with yDNA. Naming my deepest know Y-SNP with the old naming convention would necessitate a string of a couple hundred letters and numerals.

Very cool tool. I wish they would incorporate a search in the Surname Report. My group (R-L21) has 275 pages of surnames. I don't find mine in the first few pages, but the group is dominated by kits in the UK, which accords with my understanding of my paternal line.

Thanks Edison. I knew you were a reliable resourcesmiley

I understand the problem with FTDNA and it's virtual daily discovery of New SNP's. To me the solution would be for some kind of program that would automatically link and update ISOGG.

YFull also has a problem keeping up withFTDNA, but seem to do a better problem.

This R-Xnnn method of portraying a tree sucks. I hope that a solution is found.

I just spent 1/2 hour on the FTDNA phylotree https://www.familytreedna.com/public/y-dna-haplotree/R

Trying to run down SNP's,were it not for www.isogg.org and it's search function I would not have been able to find what I was looking for.

Thanks be to ISOGG..the home of my genetic genealogy heart.

According to this https://www.familytreedna.com/groups/r-l21/about/results

L-21 is concentrated in Brittany,Ireland and Scotland.

It's presence in England can be explained by two things.

The Roman Governor  Agricola invaded Scotland and hauled back thousands of slaves to London.

The Bretons formed the left flank of Williams army at Hastings, and as victors would have spread their DNA throughout the land, especially those areas not immediately sought as fiefdoms by the Normans who were the core of Williams army.
Yes. The Elliott dna project has shown that tge main line of Scottish Elliots as well as the Cornwall Eliots is L21. This is a big part of the clan historians thesis that the Elliotts were originally Brittons.
Hi Jesse.

Indeed it must be a chore trying to identify and localize the various R1b1s found in the British Isles, one would think that Ireland would be easier, but apparently not.

Because of your website, and I think it is yours, I bought, read and owned the excellent Steel Bonnets. The Story of the Scottish English Border Reivers. I enjoy especially the hypothesis that the core of the border reivers are descendants (genetic or cultural) of the Sarmatian auxillaries that guarded  Hadrians wall, were stationed out of York and retired to Ribchester (Veteranorum Bremetenacum).

I once entertained the notion that one of those auxillaries was an ancestor, but have since discarded that notion, yet as an aficionado of history, I devour this stuff like a kid with an ice cream cone.
Thats awesome. The Eliots being a border reiver clan is what got me going down this road.  Interestingly,although im an Elliott, my DNA came out E-M35.  So my ancestor it seems, would have literally been a Roman Auxiliary .
Thank you! This ties into a question I Just asked about the list of ytest takers and their Haplogroups.

Hoping if I find a y tested male Gregg autosomal cousin for  Moms Gregg Dad and other surnames,  I should be able to use his Haplogroup to find/confirm what branch my grandfather etc came from.

2 Answers

+8 votes
I agree that it's great that FTDNA has made this yDNA phylogenetic tree public.

One question though...  Unless I'm missing something, the FTDNA tree does not contain the sort of "year formed" and TMRCA estimates that the YFull.com tree contains. YFULL lists detailed calculations for each SNP. I find this information very useful. Am I just not finding it for the FTDNA tree or is it not there?

I'd also urge caution about ascribing SNPs to countries. It appears that this association between country and SNP is taken from the country that the FTDNA customer enters for the home of their oldest known patrilineal ancestor. If that information is incorrect, then the country associated with the SNP will be incorrect.  I am Q-YP4549. As of the time that I'm writing this post, the FTDNA tree shows Q-YP4549 associated with three people for England, one for the U.S., one for France and one for Unknown. I am the reason that a French flag is shown. I have always believed that Alsace (France) was the home of my oldest known patrilineal ancestor (great grandfather). This is supported by a solid paper trail. But DNA testing has revealed that my patrilineal great grandfather is not the person shown on paper.  My genetic patrilineal great grandfather was born in England. I have changed my account information to show this new information but the yDNA phylogenetic tree still shows the French flag. I'm curious if the tree will be updated to reflect removal of the French flag and an increase from 3 to 4 for the English results.  This hasn't happened yet. In other words, it seems that the SNP -country association is only as accurate as the ancestry data entered by FTDNA's customers.
by Mardon Erbland G2G1 (1.7k points)

Hi, Mardon. You're right: I don't believe estimated bifurcation dates are available at the FTDNA tree, but maybe someone who works there can come along and comment.

I wholly agree about the country designation. Not sure why FTDNA even bothered to include that except, well, they have the user-entered data on file and...a big and...what's selling in the direct-to-consumer DNA marketplace is ethnicity, lederhosen or a kilt, not genealogy. That's the best reason I can think of. I saw the ability to sort by country and immediately ignored it.  wink

BTW, I'd personally take Yfull's TMRCA estimations with a grain of salt, as well. They're based on a single March 2015 published study (here's a direct link to a PDF on my server) by Dmitry Adamov, Vladimir Gurianov, Sergey Karzhavin, and the owner of Yfull, Vadim Urasin. It's a developed method that uses essentially a static computation and employs only three ancient yDNA sequences for calibration. In general, my bet is that the estimates are pretty good, but for the most part we have nothing to compare them to for benchmarking. Some could be spot-on, others might prove to be inaccurate.

Yfull's estimations are actually a range over hundreds of years

There does seem to be some kind of validity to their estimations,at least in one instance.

R-YP5905, is found soley in the male descendants of Councillor William Farrar, born 1583 and came to the new world in August 1618. It is found in no other.

Then again to drill down that deep one has to test Big Y and to date it is an expensive process and thus out of reach of most. One has to be either really dedicated and curious are rich enough to consider $500 as change.
+3 votes
I observe that the YFull m-tree includes more subclades than FTDNA, and they disagree with Genetic Homeland.

When do you suppose that we can expect a new y-tree and m-tree from FTDNA with more named subclades?
by Murray Maloney G2G6 Mach 3 (38.5k points)

Hi, Murray. The FTDNA yDNA haplotree changes pretty much monthly, sometimes semimonthly. When this question was posted back in September 2018, there were 16,361 branches on FTDNA's Y-Tree; three years later, on 11 Sep 2021, there were 47,785 branches; as of today, it's up to 67,153.

The mtDNA tree is a different animal. YFull--where I've also had my own yDNA and mtDNA analyzed--is a bit of a rogue element for mitochondria. Not necessarily correct or incorrect, just non-standard. FTDNA has stayed with (at the least for the most part) Phylotree, which is still on Build 17 from February 2016.

There are simultaneous efforts underway right now with a goal of improving the haplotree and updating the branches. In fact, two different entities are using the term "Mitotree" for the result: both FTDNA and work underway by Nicole Huber, Walther Parson, and Arne Dür in Europe, evidently using the forensic  EMPOP database (https://empop.online/) as its repository of reference. The International Nucleotide Sequence Database Collaboration (INSDC, https://www.insdc.org/)--a joint initiative among the U.S.'s NCBI (part of the National Institutes of Health), the DNA Data Bank of Japan (DDBJ, and the EBML-EBI (the European Molecular Biology Laboratory's European Bioinformatics Institute)--has a stated position that scientific journals should, for published papers dealing with mtDNA, submit the full sequences to NCBI's GenBank, DDBJ, or ENA (the European Nucleotide Archive).

Part of the overall complexity with the mtDNA haplotree is that the mitochondrial DNA molecule is amazingly tiny. Unlike yDNA where we have over 23 million base pairs and their possible variants to work with--and with the Y-Tree being much cleaner in that variants tend to line up chronologically for us in ancestral/derived delineation--with mtDNA we're looking at only about 16,569 total base pairs, and it's more about the overall collection of variants. Not a one-to-one but a one-to-many. The same variant at the same locus might be included in over a half-dozen subclade designations, some of which might not even have the same basal, top-level clade.

Too, care has to be given in making sure that the designated haplogroups represent germline DNA, the DNA that's passed along in the ova...which for obvious reasons we don't typically examine. What we test with the typical cheek swab is somatic DNA, body DNA: we test a bunch of mitogenomes in epithelial cells that are some of the roughly 4 quadrillion mitochondria that are in our bodies at any given time...and they're constantly making copies of themselves because mitochondria have a replication half-life of as brief as 8-11 days for things like skin cells, and 20-30 days for long-lived cells like neurons. We don't directly test the germline mtDNA because they are already formed in the oocytes while the mother herself is still a fetus.

From experimental studies we know that over 61% of us carry mtDNA heteroplasmies, instances where two different mitogenomes are in one organism, even different genomes inside single cells. The reality is likely that very nearly all of us are heteroplasmic: no copy machine is good enough to replicate and replace 4 quadrillion items every four weeks or fewer and not have copy errors. That's the only way the mitochondrial DNA changes. And it's only when those replication "errors" become so prevalent that they're passed along into daughters, and then their daughters, that they begin to work their way into becoming germline stable. At FTDNA, for example, they consider concentrations of a minor allele--meaning the DNA "letter" that is the least common at a given position in a sample's DNA--of less than 20% to be a degree of heteroplasmy not worth identifying.

I'm rambling but, yep, the little mtDNA genome isn't straightforward to deal with in terms of haplogroups. But we definitely need an update that is going to be acceptable by consensus...meaning that, unlike the work YFull has been doing, the updated haplogroups and their specific variants are agreed to by all the major players, including forensic databases, academic institutions, and scientific journals.

At RootsTech 2023 last March, FTDNA announced that they would be focusing on updates to the mtDNA haplotree and hoping to provide additional tools of the sort they've recently added for yDNA. A more accurate TMRCA calculation for one would be very welcome, because the one in use is extremely broad and, in fact, overly optimistic when compared with the plethora of mtDNA research papers out there on germline mtDNA mutation rates. Last year, Roberta Estes posted about the Million Mito Project. That might be worth a read for some background. And every day that I see a new Group Project Administrator newsletter from FTDNA, I check to see if we have an update on current mtDNA efforts.

Thank you, Edison. I appreciate your insights.

Related questions

+25 votes
11 answers
1.2k views asked Jun 30, 2022 in The Tree House by Edison Williams G2G6 Pilot (434k points)
+6 votes
1 answer
+22 votes
2 answers
+6 votes
1 answer
+10 votes
4 answers
383 views asked Aug 19, 2022 in The Tree House by Kim Goforth G2G6 (8.2k points)

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...