What is WikiTree's stance and direction toward SNP-type Y-DNA testing?

+21 votes
845 views
Hi,

I'm pretty new here in terms of being an active contributor, so I'm still adding my personal ancestry. During that process I've already had opportunity to experience WikiTree's DNA entry tools and policies.

While what exists is very good, the things that exist are clearly geared towards STR testing as regards Y-DNA. This is entirely different than SNP testing, which is a completely separate (and in my opinion, more useful) tool that essentially uses entirely different data than STR testing.

Knowing this, I was wondering what the DNA folks here have in mind for the future regarding integrating SNP testing? Has the Discovery beta at FTDNA been considered? How about SNP time estimates?

What I'd find most useful is some way to document a modification of SNP time estimates by documented genealogy. I have a couple of examples of this in my family's DNA testing. That is, I can demonstrate an essentially precise date for certain SNPs since they coincide with documented genealogical information. Currently there's no other place that documents these cases, and I think it would be an extremely valuable tool and selling point for WikiTree.

Anyway, I'd love to hear what the DNA folks here think about Y-DNA SNP testing and its application here.

Thanks!

Greg Lamberson
in Policy and Style by Greg Lamberson G2G6 Mach 1 (12.3k points)

11 Answers

+10 votes
 
Best answer
Maybe I am missing something, but I think the answer lies in the existing trees already. What you are really asking is how can my genealogical tree overlay the haplogroup tree when their is concordance? I think the answer is simple.

Simply label the testers with their haplogroup. When differing haplogroups meet, you now label that father with the next haplogroup up in the tree where those two earlier haplogroups meet.  Just clearly distinguish propogated haplogroups from tested ones. Much like you do for DNA labels now.

This also handles when people have not tested as deep as others. One branch will propagate a haplogroup higher up the tree than the other. So that becomes the MRCA haplogroup. Or for that matter, ySTR predictions of haplogroups can be entered by testers and propagate up. If ever two haplogroups meet in the wikitree that cannot be resolved by an MRCA haplogroup (that is below some threshold like single letter haplogroups or some other list), then you clearly have an NPE that needs to be resolved. Do not specify the MRCA with a new haplogroup that propagates up. Note that this clearly allows for crude accuracy ySNP tests like from Ancestry, 23andMe, etc to be entered and propagate where appropriate.

The only hassle I can think of is if a lower resolution test gets entered, it will negate when two higher resolution tests would otherwise meet further up and propagate. For example, two cousins BigY-700 test and match. But a sibling of one enters a 23andMe test result. The 23andMe would propagate up instead of the BigY. But that could be handled, I think.

Key is, you are not trying to label individual SNP changes. Simply figure how to overlay the haplogroup tree on the wikitree. And as haplogroup entries are added or changed, recalculate the propagation up.
by Randy Harr G2G3 (3.7k points)
selected by Greg Lamberson
So I think the propagation algorithm is already in hand. This is assuming you have a haplogroup tree (or trees or merged trees) that contains all the haplogroups that may be specified for an individual that tested. You may need some conversion table or service otherwise.

If a father has two or more sons with the same haplogroup, assign that haplogroup to him and propagate it up.

If a father has two or more sons with different haplogroups in each son, then assign the father's haplogroup as follows:

(a) if one sons haplogroup is on the path to the others, then use the more refined haplogroup for the father. Propagate up.

(b) if the two sons haplogroups are different, determine the most recent common haplogroup (MRCH) in the phylogenetic tree of haplogroups. Then:

 (b1) If that MRCH is above a certain threshold in the tree (single letters, root / adam, etc), then do not specify any haplogroup for the father. Maybe mark the father as in conflict but do not propagate that conflict.

(b2) if the new MRCH is below the threshold, then use that MRCH for the father and propagate it up.

(b3) if there are more than two sons with haplogroups and there is a conflict between any two that is not resolved by a refinement of one to the other, then simply consider it in conflict and the father gets no haplogroup. Even if four sons match and one does not.

Anytime a haplogroup changes for a father, propagate the new haplogroup up the genealogical tree.This will allow a refined haplogroup to propogate up when introduced.

Here is the tricky one: if a haplogroup changes on a father AND it has sons with no haplogroup specified, then walk down the genealogical tree of male descendants. If you encounter a father with sons that have haplogroups specified that are in conflict, AND the MRCH you walked down the tree with would meet the criteria to assign those son haplogroups, then assign the father the refined haplogroup and propagate it back up the genealogical tree as normal. Mark the son that is different as an NPE but leave his haplogroup assigned.

Note that this self repairs and pushes the NPE down the genealogical tree when a descendant test is added that conflicts with another descendants test. Where those descendants meet, if there is a difference that causes the father to not be marked, then remove the fathers haplogroup, and propagate the removal up as long as no other son on the way up has a haplogroup marked.

Either use only one haplogroup tree and only allow it's haplogroups to be entered OR create a merge of haplogroup trees to allow different haplogroups . If one haplogroup tree and list of haplogroups is specified, and what the user gave is not in it, then use the SNPs in that alternate haplogroup trees' haplogroup and it's ancestors to find the corresponding haplogroup in the approved haplogroup tree being used.

FTDNA has the most refined Y tree but does not readily provide all the SNPs of the haplogroups (not in their API at least; but you may be able to scrape it from the public tree set to show variants -- if they allow that use and do not consider it a copyright infringement).  The yFull tree is more public and available; but less detailed in most cases.

The second advantage is if you can assign an SNP to an ancestor, then that means two have taken deep SNP tests and show that SNP different. So both can be loaded to the yFull tree, no matter where they tested, to cause the new haplogroup formation.

The one caveat is the trees usually require two testers to match in some but not all of the leaf (or paragroup)  haplogroup to cause a split in haplogroups. If you are trying to make a determination when only single testers, then none of these solutions will work as you want a haplogroup tree more refined than any public tree will be.

Note you can use a tool like Cladefinder, which is currently based on the latest yFull tree, to get a properly named haplogroup from  microarray tests like Ancestry, 23andMe, etc.  You can use a tool like NevGen or ySeqs new on to get a predicted haplogroup from STRs to enter as a "tested" haplogroup.  Maybe mark it as predicted so you do not use it to resolve conflicts and propagate if the "more" refined.
The problem with labelling testers with their reported SNP haplogroups is that this will inevitably change and become obsolete as more testing is done by others in the family. There needs to be a way to exploit genealogical data and SNP DNA testing data through analysis. The kind of labelling you're suggesting is the very thing that differentiates SNP testing from STR testing and makes SNP testing so much more valuable.
+10 votes
One thought was to have a “terminal SNP” migrate back each generation until it met an interruption (similar to how DNA propagation currently stops in WikiTree when a parent / child relationship is marked nonbiological).  The interruption could be placed on the profile of the SNP’s progenitor by the person who knows when the SNP formed.  Perhaps the  interruption could then switch to that SNP’s immediate ancestral SNP.

However, how often do surname project admins know with certainty in which generation a new SNP formed?  How difficult would it be to implement something similar to the above?  Could most users use the tool correctly?  Unfortunately a good number of Y DNA test takers are not able to correctly enter their haplogroup (e.g: M-269 or R1b-M269 {for R-M269} or using their mtDNA haplogroup as a Y haplogroup).
by Peter Roberts G2G6 Pilot (706k points)
edited by Peter Roberts
It's certainly not common for admins to know when a SNP occurred in an actual ancestor, but this is certainly the goal. There's simply not enough data available yet in most lineages.

There's also absolutely NOWHERE for anyone to document this sort of information once it becomes available. It would be wonderful id WikiTree were prepared to be the place to document such occurrences. As I said, I have a couple of example in my own lineages due to the many years I've sought to use Y-DNA and have therefore worked on getting cousins tested who could help prove the relationships I've had as my project goals.

Most people don't understand Y-DNA is actually tested using two completely different methods which don't actually have an exact correlation. STR markers and SNP markers don't exactly correspond to one another. It's possible, even likely, to have an STR mutation where no SNP mutation occurs and vice versa. They correspond closely enough a lot of times to be useful and even used as analogues, but when you get to the minutia of pinpointing a mutation they're actually entirely different and unrelated.

However, as I said, this is the goal of this stuff. It is possible. When people start seeing it happen, I think the use of BigY tests will explode. Some of us have stuck with it to see the enormous benefits possible, but not that many have, and FTDNA doesn't make it particularly easy to understand their own tests' data.
Another thought was to manually enter relevant information in profile biographies under === DNA ===.  A consistent format would need to agreed upon.

Using a WikiTree Template was was also explored https://www.wikitree.com/wiki/Project:Templates

https://www.wikitree.com/wiki/Space:WikiTree_Templates

Perhaps a version of a succession box could be developed?https://www.wikitree.com/wiki/Template:Succession_box
Yes, well, what you suggest would be a valuable first step. I'm currently writing a book in which I have one example of a pinpointed SNP to an ancestor. The problem I face is that there's no objective way to document this. That is, the equivalent process WikiTree uses for DNA confirmation has no Y-DNA equivalent, at least not for what I'm talking about. It would be extremely valuable to have a site such as WikiTree to have a way to objectively confirm this data.
A problem I have, as an FTDNA admin, is getting results on WikiTree because of the fact that each person needs to do their own. Some of my guys are so computer illiterate that it takes a lot of work to get each one on board. I have nearly 100 tests at FTDNA. One Haplotype cluster has 29 tests. I got 11 of those to upgrade to Big Y. I'm trying to get them all on WikiTree. They would let me do it but I'd have to "cheat" the WikiTree rules. If I do more BigY tests, then there is the possibility of new SNPs being found which means I need to get the tests updated on WikiTree for guys who can't do it.
And, BTW... the 11 SNP tests I was able to do was VERY helpful in that I ended up with some recent SNPs that gave a great look at the connections of the lineages.
See my answer. It is not a terminal SNP that propagates. Only haplogroups get entered and they propagate. If an SNP is known where it changes, that should cause a haplogroup change. Unless you are basing it off a tree where not all testers can exist. Then you run into issues.
Yes, it's certainly true that WikiTree's stringent standards regarding the entry of DNA data is an obstacle in some ways. However, I understand this and can deal with it for now as a policy that, while strict, does result in good (but limited) DNA data on WikiTree.

My biggest concern is that since WikiTree doesn't deal with Y-DNA SNP testing really at all nor does it recognize this sort of testing's difference from the cheaper and older STR testing, there's no way to exploit the power of this platform or make good use of SNP testing data in a systematic way.
Replying to Douglas and others... I have 10 FT testers connected at WT on our Mitchell YDNA line and since our surname was a "pioneer project" for the formation of YDNA at familytreedna.com (we have 2 of kits 700-709 that were allocated for "Mitchell"), many of our initial testers have passed. So for those deceased members, most tested to 67 level, there is no way to do advanced testing, SNP testing, big Y. I can imagine this has happened to many of the initial projects.

I was fortunate to have been in close contact with our group but now that so many have passed the opportunity to upgrade their kits is not possible. And, for those living who listed their tests there doesn't seem to be a way to upgrade their testing levels in their DNA info. If anyone can suggest a workaround please let me know!
+13 votes
Isn't this complicated by the fact that SNPs are shifting goalposts? FamilyTree created a whole new SNP when I took my test, because it bucketed me and another tester into a documented group that didn't exist until I replicated his result. We now have three whole members, but your measure of "how long between mutations" would be clouded by the fact that it's only showing the mutations in the people who have tested.

That universe is continually expanding, and the number of SNPs is continually expanding. Doing it within your specific lineage may be one problem, but doing it with a tool that could be applied to the public at large may be an entirely different problem.
by Jonathan Crawford G2G6 Pilot (280k points)
It's not that SNPs are shifting goalposts. It's that in most cases not enough data is available. To be sure, pinpointing in whom a mutation (aka SNP) occurred is the ultimate goal regarding Y-DNA and genealogy. It just hasn't happened yet for most lineages.

FTDNA didn't create a new SNP. They discovered a common one. That is, when you took your test, it found that you and another previous tester shared a mutation, so now that mutation could be named according to the standards of ISOGG and other genetic authorities. It just means that you're that much closer to being able to get SNPs identified within the time frame that includes your documented genealogy along your paternal line.

The issue you're describing will absolutely impact the recording of Y-DNA haplogroups no matter what. It's important to understand precisely why this is, though. Until enough testers take tests, this simply isn't an issue most people have to deal with yet. But it's absolutely the whole point.
I agree. That is not a moving goalpost. You formed a new child subclade of your haplogroup because you had a closer MRCA with your match, exposing one (or more) of his Private SNPs. As more men test this can happen again until you have 1 or no Private SNPs.

You can accelerate the process and expose all of your Private SNPs by testing your father, son, or brother and in most cases a 1st or 2nd cousin. Then your haplogroup will never change again.  But that would be a waste of money, in my humble opinion.

Russ Carter

Administrator, FTDNA Carter Surname Project

FTDNA gives your Private SNPs names and reports them to the ISOGG after your test results have been through the manual reviewed. You can look up your Private SNP names on the ISOGG website. SNPs are named by the reporting company. All SNP names starting with "FT" originated from FTDNA tests. The numbers in an SNP name are sequential according to when they are discovered.

The most recent ISOGG Y Tree update was 11 July 2020. The additions are now so frequent it may have become too burdensome for the volunteers to manually keep up.

If your Y DNA is added to mitoYDNA.org, then your haplogroup automatically links to FTDNA’s Y haplogroup tree via Scaled Innovations’ SNP Tracker.

To expand on Peter's comment, ISOGG has never been an arbiter of SNP identification. SNPs weren't reported to ISOGG; their haplotree was maintained manually by a very few volunteers, notably Ray Banks, by accessing and reviewing the publicly available data from FTDNA and others.

We had a conversation about the status of the ISOGG haplotree last December in the private ISOGG group following questions that arose on Anthrogenica. The ISOGG haplotree isn't officially retired, at least not yet. It's really the only repository that continued using the YCC's long-form nomenclature, so if it's retired I don't expect that the last version, v15.73, will be taken down, but will remain for reference. When a decision is made, I fully expect a statement to be published both on the ISOGG tree main page and on the ISOGG Wiki.

But we have an even more troubling conundrum with Y-SNPs. Since the demise of the YCC there has been no governing body to take control of vetting and curation of named Y-SNPs. One result is that we now have two dominant haplotrees, one at FTDNA and one at YFull, and they are different. Particularly under the R clade, as we get deeper there are differences of opinion about bifurcation and identifying SNPs. A third tree is at YDNA-Warehouse, ydna-warehouse.org/tree. I've had both my Big Y BAM and that of a WGS analyzed by YFull and YDNA-Warehouse, and I have slightly different haplotree structures and terminal SNPs reported at FTDNA, YFull, and YDNA-Warehouse. It isn't terribly difficult to sort them out...but it would no doubt be bafflingly confusing to someone new to yDNA for genetic genealogy.

Another problem that we've brought about is that we've used the term "SNP" more loosely than the academic/research community. By definition, a SNP is something that can be found in at least 1% of the global population. A predominance of what we see as recently-appearing haplotree sub-branches are not SNPs at all; they're SNVs, Single Nucleotide Variants. They haven't been found in large enough proportions of the population to be SNPs, and the deeper we go in the haplotree the less likely it is that they will ever be.

That doesn't lessen their value for population genetics or genealogy, of course. It's just that we use the term incorrectly.

The other repercussion of having no governing body managing identification and naming is that we're seldom actually discovering truly new yDNA SNPs/SNVs. Independent researchers (and that includes FTDNA, YFull, and YDNA-Warehouse) are, basically, finding more than one consistent, correlated variant whose locus they don't see as having been given a name (at least, the kind of names we're familiar with), so they name it. The numbering sequence is only internally consistent within their own naming prefix (e.g., FGC for Full Genomes Corporation; BY, BZ, FT for FTDNA; Y for YFull). That's how we end up with synonymous names that identify the same locus and same polymorphism.

Something that is curated is the dbSNP database, maintained by the National Institutes of Health. I haven't pulled recent data for the FTDNA haplotree, but as of December 2021 they note that they have over 460,000 variants named in their haplotree; it's probably well over a half-million by now. At dbSNP, there are currently almost 2.7 million SNVs and SNPs cataloged for the Y chromosome. That's what I meant when I said that, for the most part, we aren't discovering new polymorphisms, but rather categorizing them in haplotrees based on ancestral/derived relationships.

Every entry at dbSNP is assigned an rsID. Standing for "reference SNP cluster ID," if you've looked at the raw data from an autosomal microarray test you've seen tons of them because that naming convention extends across the genome. However, the valuable aspect of constant curation is the very reason we can't really use rsIDs for naming yDNA variants. The NIH is receiving newly reported data about polymorphisms all the time. When a new one comes in, the vetting is minimal and an rsID is assigned. Then they go back and do a deeper investigation. If the same locus and same polymorphism has been referenced by a new report, the data is reconciled and the new rsID is merged back into the earliest rsID representing that particular variant.

In the long run, that's the way to go...it's the same approach WikiTree takes with duplicate profiles. But it could play havoc with haplotrees. It would mean the haplotree publisher wouldn't be in control of their own tree. If they had established, for example, "rsID987654" as the defining SNV/SNP for a parent branch with a few bifurcations beneath it, and months later dbSNP determines that should be folded under previously cataloged "rsID123456," then the tree would need to change dynamically and our reported position on the tree and terminal SNPs would stay less stable than they are now.

This is all far afield of Greg's initial question, but I thought it worth level-setting the information about ISOGG and some of the challenges we face with yDNA haplotree structures...not just trying to deal with it here at WikiTree, but in general.

Otherwise summarized as.....shifting goalposts...
Yeah, no. SNPs are the very opposite of shifting goalposts. They're the only things standing still! lol
Be careful in the discussion. I think their are three separate things being labeled ISOGG above. The ISOGG phylogenetic tree of haplogroups that is curated and thus far behind other trees, the submission by FTDNA to the site of their latest list of named SNPs, and Thomas' near daily update of the ybrowse database (that includes the FTDNA we well as other sources of SNP names).  

An SNP name database and a haplogroup tree which contains named SNPs are two distinct functions that happen to both be on the ISOGG website. People should clarify which they are talking about.
True enough, but the real issue I'm trying to address is WikiTree's lack of policy or procedure to document or exploit Y-DNA SNP testing as the powerful tool it is.
+7 votes
Be careful, and think through, what you come up with here.  I reported my (current) 'terminal' named-mutation Y haplogroup on my profile page, and found hundreds of pages now show up with it in Google search results for wikitree profile pages with it in their text as "suggested DNA matches" for other individuals, making the mutation name no longer a good search keyword.  I am interested in this.
by Barry Gates G2G2 (2.6k points)
+9 votes
From a computer programming perspective, this is easy.  In fact, although normal ancestry is not a tree but rather a directed acyclic graph, patrilineal ancestry is a true tree (albeit unbalanced) and easy to implement.

From a WikiTree platform perspective, it gets a bit trickier.  Although test takers can have WT accounts and one can access DNA information for profiles of non-living person, the information is meant to be actual tests taken and not simply the imputed haplogroup (there is also the policy directive that only the manager of the test kit is allowed to add the information).  Even if this were an issue, the problem could easily be solved by using categories.

From a DNA perspective, there are certainly many issues to be considered.  The idea of mapping to FTDNA and or mitoYdna could have some benefits.

Combining all of these considerations, it crosses my mind that there is no single haplogroup for a test taker, rather an array from the most ancient to the most recent.  Were categories used to create descent trees for each haplogroup, comparison of two test-takers could easily compare the two arrays to find the most recent common haplogroup (if any).  This haplogroup can be used to query the haplogroup information (number of years back, geographic location, etc.) and it can be used to generate a list of profiles with that haplogroup (or, optionally, just those who have tested).  If on the rare occasion that there is sufficient information to document the generation where a mutation occurred, a link should be made from the category information to the WikiID of the relevant profile.
by Living Anderson G2G6 Mach 7 (79.5k points)
I think my more recent answer is concordant with yours .Just explained differently. I should have made mine a comment to yours .
Having seen some of the discussions my comment has generated here, I do see the value in the use of categories for the time being. However, I don't think this is the best long term solution. Certainly category use would ideally be some sort of super-category that could propagate patrilineally.

I haven't had much time to keep up with these comments, but as soon as I get a couple projects off my plate I'll devote more time to my thoughts here.
+10 votes
Wow. Well, what this question clearly shows is we have a huge range of experience levels and understandings of Y-DNA. Indeed most people don't seem to realize that Y-DNA is something completely different than what AncestryDNA is testing. In contrast, some of the truly experienced fo0lks here are bringing up issues that I hadn't really considered and may not understand, either.

So the question is: How do we harness the power of Y-DNA SNP testing evidence in a practical way here at WikiTree? Let me give a real example I'm very interested in:

In the [https://www.familytreedna.com/public/carroll?iframe=ycolorized Carroll FTDNA Surname Project] there's a subgroup with the ponderous title "R-L21, DF13, DF21, S971, Z3000, Z16270, 511=9, 425=0, 505=9, 441=12 -- Clan Colla cousins." The link above shows the STR results for the ~30 members of this group. Among this group are 11 or so testers who have also taken a BigY SNP discovery test, and that's the real data I'm interested in.

Among these testers I can demonstrate that those that are positive for one of the current terminal SNPs (namely, R-FTB30189) descend from [https://www.wikitree.com/wiki/Carroll-14757 John Carroll, Jr.] .

Now, can you tell that from anything you see on FTDNA? Of course not. First, the Earliest known ancestors are incorrect. Second, there's no way to document what the correct lineages are.

The latter problem is easily solved already on WikiTree. But how can I document that SNP R-FTB30189 equals the above-mentioned John? I could of course assert it, but I could also assert this John had travelled to Mars and back in 1770. I'd like to be able to put the data here and allow people to independently verify this. FTDNA certainly doesn't do anything like that, and they're not likely to anytime soon.

Anyway, this is my meager attempt to redirect this discussion somewhat back to my original question, or what drives it. Thoughts?

(Pardon my ignorance of Forum and wiki formatting please!)
by Greg Lamberson G2G6 Mach 1 (12.3k points)
edited by Greg Lamberson

But how can I document that SNP R-FTB30189 equals the above-mentioned John?

Thats what your lineage work on Wikitree does, it shows that all the testers under that branch can be traced to that common ancestor. Now if they cannot,  maybe one is from his uncle's family, then you could infer that the SNP indicates descent from his grandfather, and so on, continually adjusting until you had no outliers. I think the tool you would want for that would be the DNA descendants from John, or multiple runs of the Relative Spiderwebs app, and I have requested that Greg set it up to allow it to run for commonality among multiple users for just this reason.

I'm just not familiar enough with the apps you're talking about to respond intelligently, but I'm certainly interested!
+7 votes
Not sure if that would be practical in most cases. The last mutation known could be 1000 years ago or more.  I think it's important to list the haplogroup as far back as a well known mutation( R1b> RM269) as well as your terminal snp. Just putting R-BY123455 or such doesn't mean much to 99% of people.
by Jesse Elliott G2G6 (7.3k points)
I would comment that most ancestors in a tree do not mean much to most people.  And you would not want to mark the EKA in the descendants either (usually). Which seems to be what this advocates (as a comparison).

More directly, I would not get into labeling haplogroup tree paths.  Only distinct haplogroups which are nodes in a haplogroup tree. Otherwise you get into an update battle trying to keep the haplogroup tree duplicated and current in the wikitree; which is not what you want to be in the business of doing.

There have been splits at major letter levels of the haplogroup tree of late. In fact, because they use an SNP to name a haplogroup, and splits can cause that SNP to get pushed to a lower haplogroup, there can be some update issues when even recording and using the haplogroup name.

IMHO, the goal here is to label the wikitree with the overlaid haplogroup tree.  And propagate up (and down) values to help others know what they should expect from a test they may buy.  Or to help identify where NPE conflicts likely exist. If I had a dollar for the number of times I have had to convince people that unequivocally a particular ancestor is a biological NPE (independent of what records say) due to DNA Match analysis: either Y, Mito or Auto/X segment analysis. ...

To help users understand propagated haplogroup test labels, you could always make the haplogroup name a URL link back to the public haplogroup tree in use.  Or a link to the eupedia, wikipedia or similar page on some major haplogroup on the path down to the specified haplogroup.  Or even a link to the FTDNA discover tool page (if FTDNA retains public availability and you use FTDNA's tree). The URL link may help unfamiliar users understand propagated haplogroup test labels.
Yes, Randy, I agree there would be value in somehow overlaying FTDNA or other haplotree data onto WikiTree. However, in practical terms, it's very rare for such an overlay to be useful since most haplogroups haven't reached a genealogical timeframe. That's of course what I'm interested in (and what I think most people will eventually desire), but how can that SNP data be mapped in the meantime in such a way as to not become obsolete or inapplicable as more testing occurs?
+5 votes
Would it be possible to solve by using Categories for SNPS?
by Jean Skar G2G6 Mach 2 (27.1k points)
That is what haplogroups are.
What has that got to do with using a category? I know a SNP defines the Haplo group.
I do not know how categories propogate and are handled in wikitree. Categories to me usually imply grouping and classification. And that is what haplogroups are already doing.  Hence my likely cryptic response.

I do know you do not want to use SNPs but haplogroups. See my answer in that regard.
Using categories has some utility but it's not a systematic SNP solution. I'm hoping WikiTree will develop a more wholistic approach. The genealogical community as a whole is aching for a good way to understand, display and exploit SNP testing data.
+4 votes
I don't think Y-12 through Y-37 marker tests are quite concise enough to be worth it. I'm R-U106, Germanic, and at some of the lower marker levels, I find myself matching exactly R-L21 and R-DF27, Atlantic and Iberian Celtic respectively. Things improve at the 67 marker level where I've got a Big Y match listed I don't have at the 111 marker level. At the 111 marker level you find all 5 of the people who match the SNP given by FTDNA as my haplogroup, while none of the others in this group have done the Big Y. I'd say that must be getting awfully close to identification by STRs.

.
by Frank Blankenship G2G6 Pilot (130k points)
In my opinion you're badly muddling STR and SNP testing data and analysis. They are completely separate things, and while I don't fault your analysis per se I do strongly contend that one must properly and precisely document each type of testing before one can properly use the two datasets in combination.

It sounds like you understand it well, but I'll say for others' benefit: STR testing data has the limitation of continuously changing in each generation down to the actual test taker without the possibility of easily differentiating in what generation a mutation occurred. Determining when a mutation occurred and also exactly what SNPs any descendant will have is possible, whereas STR testing makes this very difficult without a huge amount of test takers.
STRs are subject to continuous mutations, SNPs, not so much. Okay, 5 of us under the same SNP,  2 of 5 on a subsidiary branch, and the only ones to have done the Big Y in this group, on the same page, are matches at the 111 STR marker level. This is 5 out of 23. Those are facts. If I've muddled data and analysis by pointing at the facts, I'd like to know how. The situation regardless is still as it was, the facts are thus, 5 of us under the same SNP who have done the Big Y, are matches with 18 people who haven't done the Big Y. I've got only one exact match at that level, but the distance runs from exact to 9 steps.The distance between those who have done the Big Y runs from 3 steps to 6 steps. I'd reckon some of those 18 who haven't done the deeper testing, if they did, would be listed under the same SNP if they had. Muddled data and analysis aside, wouldn't it be interesting to get a few more people among this group Big Y tested to see if they fell under the same SNP or not. Sheer coincidence? I doubt it.
+5 votes
Hi Greg,

You've gotten a lot of good answers here so far. My only 2 bits is that you can't rely on WT to add features here quickly. Even more so, is that WT seems to be on a path to push as much new functionality into Browser Extensions (or the WT Browser Extension specifically) as much as possible. Which means that whatever you want to do, or which way you want to go, the best course would be to use the existing WT functionality. I believe this is why others keep mentioning the use of Categories as one possible way of implementing a solution. From my limited knowledge, Randy Harr's answer and proposed solution sounds like it could work.
by Eric Weddington G2G6 Pilot (520k points)
+3 votes
Please let me know five pairs of Big Y-700 test takers in WikiTree with a known shared patrilineal line ancestor (in WikiTree).  Each in the pair should be distant cousins and have a matching SNP (consistant with their shared patriarch).

For example their shared patriarch was born in “1680” and Family Tree DNA estimates their shared SNP formed about 1680.

What if the shared ancestor is believed to have been born in 1680 but Family Tree DNA estimates their shared SNP formed about 1100?
by Peter Roberts G2G6 Pilot (706k points)
edited by Peter Roberts
Mutations do not necessarily happen every generation. It may two or three generations (or more) before a SNP mutation occurs. On average such a mutation is thought to occur about every 2 generations, but that's not always the case.

FTDNA's time estimates are wide ranges based upon statistics. These estimates may encompass a period of nearly 500 years. That said, they're just estimates. A documented lineage trumps a statistical estimate.
Please let me know a documented lineage in WikiTree for two Big Y tested cousins whose most recent direct paternal line ancestor was born over 300 years years ago -- and they both have their "terminal" haplogroup in WikiTree.  Thanks.
@Peter - I know of a case where 2 distant cousins share a direct common ancestor from 300 years ago, and they found each other through Big Y SNP haplogroups. These lines are on WikiTree but there is no place to post terminal haplogroup data that I know of with the profiles.  Please advise.

Hello Leake,

Please place a "YDNA 700 Earliest Known Ancestor" sticker on the earliest known ancestor's profile with the haplogroup of the earliest known ancestor. If a category for that haplogroup does not yet exist, the profile will be placed into a DNA Maintenance category, and a DNA Project member can create the appropriate haplogroup category (or categories).

Instructions can be found at the following page: https://www.wikitree.com/wiki/Template:YDNA_700_Earliest_Known_Ancestor Template

But in short, if the earliest known ancestor's haplogroup was R-M222, you would place the following text on a line under the == Biography == heading: 

{{YDNA 700 Earliest Known Ancestor|R-M222}}

A couple of us were just asked to work on the DNA Categorization Team, so to avoid being lost in the existing backlog, please send me a direct message with the EKA's profile ID and I will help you with that!

Related questions

+6 votes
1 answer
+7 votes
0 answers
65 views asked Mar 12 in WikiTree Tech by Gary Boughton G2G Crew (340 points)
+30 votes
28 answers
+17 votes
2 answers
+2 votes
0 answers
473 views asked Mar 24, 2020 in Genealogy Help by anonymous G2G Rookie (280 points)

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...