What do you think of our DNA confirmation instructions and possible automation?

Question

What do you think of our DNA confirmation instructions and possible automation?

3.7k views

Hi WikiTreers,

This message is aimed especially at DNA Project members and others with extensive experience with both DNA and WikiTree. I'd also be interested to hear input from members who have recently tried to follow our DNA Confirmation instructions.

I have been working on making the instructions simpler. My recent changes to Help:DNA_Confirmation and related pages are not final and need to be discussed by the community here.

The goal, of course, is to make the instructions easier to follow so that it's easier for members to do confirmation correctly. But I also think we are laying a foundation for future automation. At some point we could replace the need to manually mark relationships as Confirmed with DNA and manually create the source citations to justify them. It could all be done automatically based on information you enter about your matches.

For close cousin matches between WikiTree members I think it could be pretty simple. We already store information about the DNA tests members have taken. We would probably just need the member to select one of their tests and a test from another member, then enter the predicted relationship given to them by the test company or comparison site. If the members' relationship on WikiTree corresponds to the DNA-predicted relationship (and it's third cousins or closer) we could mark the appropriate relationships as DNA-confirmed, right? We could do something similar with yDNA and maybe mtDNA.

It would get more complicated if the match is not on WikiTree and/or we want to allow for triangulation. And I do think we want to allow for both. Otherwise the utility of the new system would be limited. If you can use the system to track all your significant DNA matches, I think more people would. I would.

But before I get too far into this, I want to get feedback on the changes to the DNA Confirmation page. What do you think of it? Any questions or complaints?

There are two things I would rather not rehash here.

First, the use of the word "confirmed." Like any word, its meaning can be debated. Its meaning on WikiTree is explained on the Confirmed with DNA page that's linked from the icon.

Second, whether we should require public verification. In the past, we have considered and sometimes specified that DNA-confirmed conclusions should be verifiable on Y Search, Mitosearch, GEDmatch, etc. But I believe that building our requirements on top of sites like these isn't a solid foundation. We can and will work on building our relationships with other tools to facilitate comparisons (and there has been some recent progress on this) but our policies and tools shouldn't depend entirely on any one of them. Therefore, some sources that justify some DNA-confirmed conclusions will not be publicly verifiable. Educated genealogists will learn to put less stock into DNA sources that can't be verified, like with other sources.

Back to our confirmation instructions. If the DNA Confirmation page is agreeable, do you have thoughts on the Triangulation page? That has not been edited as significantly, but its content was controversial for a while. I don't know where people are at with it these days. (And I need to emphasize, I am not an experienced genetic genealogist. I haven't tried to mark any of my own ancestors as confirmed using triangulation.)

Triangulation is inevitably more complicated than one-to-one comparison, but we need to keep our instructions as simple as possible. What conclusions can users confidently draw from what they're getting from AncestryDNA, MyHeritage, etc? If we can boil things down to the barest essentials that we are comfortable with as a community, we can think about automation.

Another tangent: I am still unsure if we might want to store start and stop points for segment matches. This would enable the DNA conclusion to be automated, rather than entered. I think. But maybe we'd just want to depend on the conclusion given to the user by the testing company or third party. It's tempting to do it ourselves, but complicated. (We actually came close to starting to track segments last year, but we pulled back because of the GDPR and the privacy implications. Our hope was to create chromosome maps for ancestors. But we could not make these segment matches public in any way. Maintaining them privately is less beneficial to WikiTree's mission, so I have been inclined to leave it to third parties. But it's still tempting to think about it.)

We should probably consider automating simple one-to-one confirmation first. Then expand on that. I keep getting ahead of myself.

The question for now is whether there is input on the help pages. What are your thoughts?

Thanks!

Onward and upward,

Chris

asked Mar 16, 2019 in The Tree House by Chris Whitten G2G Astronaut (1.5m points)

well I read through the page again and if I'm following correctly I can mark my father as confirmed by DNA based on my brother's Y DNA test, so I did that. Previously I had thought that could only be posted on my brother's profile as the tester. My Mom and I are tested au DNA but my brother has not tested his au DNA. The visual does help.

The triangulation instructions seem fine to me, it does take some time to do those triangulation statements. I've only added to one line where there are large segments showing. But I was able to do a statement for my Mom's relationship to these 2 cousins also, she has the same as my relationship to these 2 cousins, but picks up an addional chromosome for her separate statement. To me it's important to be able to identify the DNA associated with a particular surname.

On another note, there's been a lot of discussion about reducing values, and to chime in there, here's an example, removing names, but all are wikitree members..... 7th cousins so of course tiny snippets shared with my kit T474191:

A915220, 8, 139,185,487, 140,339,538, 3.6, 309 and A042389 , 8, 139,373,435, 140,513,557, 3.6, 343.

A915220, 18, 6,631,427, 7,845,868, 6.0, 330 and A371231, 18, 6,959,761, 8,183,657, 5.8, 297

commented Apr 5, 2019 by Sherrie Mitchell G2G6 Mach 5 (52.2k points)
edited Apr 5, 2019 by Sherrie Mitchell

1. "I would love to confirm a lot of my cousins/direct line but I feel it is very time consuming." What is time consuming is the format that must be used. As long as all the elements are there, such as with any other source citation, what Wikitree problem could possibly be caused by the style used?

2. As far as adding GEDmatch account numbers? That requirement begs lawsuits from people not even born yet.

3. The testing and comparison companies 'should' be producing source citations, just like familysearch.org does. An example: FTDNA match at 7cMs or greater between 'this person' and 'that person,' with 'this many' shared cMs and a predicted relationship of 'this.' Then just the 'this person' and 'that person' would be changed to the Wikitree identification. Another example: GEDmatch between 'this person' and 'that person' at 7cMs and greater, with 'this many' shared cMs and a predicted length to MCRA at 'this number."

ok...back to my own.

B.

commented Feb 13, 2020 by Living Britain G2G6 Mach 2 (28.7k points)

I'll have to give a private answer, so as not to encourage those who are character-challenged.

After (after) I posted here, I got an email telling of how 'my location' via my IP address must be given to GEDmatch so they can verify that I'm not a part of the EU and their new privacy law. Until I do that, my GEDmatch information has been ported-out of their database and will not be available.

All they really had to do was show me a spot where I could opt-in for moving my data to Verogen.

So the public answer to your question is this: (a paraphrase) Don't give the Chr #, start, and end locations on Wikitree. I'll call that the 'folder contents.' Just give the Gedmatch #, or, the key to the whole folder.

There are all kinds of responses on the internet following a search for "Should you give people your GEDmatch #."

commented Feb 14, 2020 by anonymous

Sherrie Mitchell, you wrote:

On another note, there's been a lot of discussion about reducing values, and to chime in there, here's an example, removing names, but all are wikitree members..... 7th cousins so of course tiny snippets shared with my kit T474191:

A915220, 8, 139,185,487, 140,339,538, 3.6, 309 and A042389 , 8, 139,373,435, 140,513,557, 3.6, 343.

A915220, 18, 6,631,427, 7,845,868, 6.0, 330 and A371231, 18, 6,959,761, 8,183,657, 5.8, 297

A 3.6 cM isn't a valid segment size for DNA triangulation to a 7th cousin.

That's the problem when the tools are given but the users apply them wrongly. As per Itsik Pe'er presentation that I've linked several times here before (see source at the bottom of this post), a 3 - 4 centiMorgan triangulating segments (like in your case) is coming from up to 300 - 799 CE

That's 1220 - 1720 years ago, so a lot more generations. Hope you don't mind me pointing this out, it's not for the purpose of putting any blame on you or targeting you but rather a good example how quickly DNA confirmation is done wrong with GEDmatch which gives you the freedom to do anything with their parameters and results.

While I agree with some of Chris views, I disagree that giving the user the autonomy to enter any values (like in the example above from Sherrie) isn't helping WikiTree in any way to become a reputable website for DNA confirmations for relationships that require DNA triangulation.

I can guarantee that the majority of DNA confirmations requiring DNA triangulation entered manually by users will be wrong as they don't have enough knowledge about genetic genealogy.

Source:

Genome of the Netherlands Consortium. Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nat Genet. 2014 Aug;46(8):818-25. doi: 10.1038/ng.3021. Epub 2014 Jun 29. PMID: 24974849.

commented Apr 12, 2021 by Andreas West G2G6 Mach 7 (76.1k points)

34 Answers

Best answer

I think visual diagrams would help some readers.
- For example showing a third cousin relation and which ancestors get marked as confirmed.
- Also, on the how to mark as confirmed instructions, snippets of the page would be helpful.
Why did we go back to putting the citation in the sources section, I thought it got moved out into DNA several months ago, now we are putting it back, possibly also doing DNA section? I'd prefer to see all the DNA source information in a single location, not spread across two.
The triangulation page dropped the requirement that was not clearly defined before that no two of the three triangulation members be close cousins. Was this an intentional relaxation? I believe you wouldn't want two siblings and a fifth cousin to be your group. Even two first cousins and a fifth cousin is probably not sufficient.
- Perhaps the criteria should be dynamic, e.g., since fifth cousin is two away from third (point that triangulation is needed), two away from sibling would be second cousin, and thus a fifth cousin with two second cousins would work?

I recommend the following addition to the Triangulation instructions (either as a footnote since an advanced topic, or at end of the section as shown):

Which Relationships to Mark as Confirmed

Using WikiTree's Relationship Finder, find the most recent common ancestral couple that they all share.

Each child-to-parent relationship back to but not including the ancestral couple can be marked as Confirmed with DNA if the requirements are met and a proper source citation is included in each child's profile.

Note: in the case of a half-cousin relationship, where the most recent common ancestral couple, is not a couple, but instead is just a father or just a mother of half-siblings, the final ancestor can also be marked as Confirmed with DNA.

I already added "(except in the case of half-cousins that only share a mother or a father of half-siblings)" to the MCRA page.

answered Mar 16, 2019 by William Foster G2G6 Pilot (122k points)
selected Dec 20, 2019 by Mariha Henle

Page:

Answer 1 · 2019-03-16T19:51:40+0000

The new instructions seem reasonably easy to follow, but I do have a few comments:

1. Under "Which relationships to mark as confirmed" the first cousin example only seems to mention one side of the tree. I assume it is also still acceptable to confirm the relationship of the first cousin's parent (whichever one is appropriate) to both their parents.

2. Under "Source requirements if your match is on WikiTree" it then says "If your match has a WikiTree profile" - shouldn't that be "If your match is a member of WikiTree"? Pre-GDPR I created profiles for several of my matches, which are now unlisted, but I assume this section should not apply to them. If this line is changed then the equivalent change should also be made to the first line of the following section "Source requirements if your match is not on WikiTree"

3. The AncestryDNA and MyHeritage examples don't seem to follow the new guidelines. The guidelines say that you should include the initials (or another anonymous identifier) as well as the relationship, but these 2 examples just state the relationship.

Answer 2 · 2019-03-16T20:33:22+0000

Chris,

I think this goes a long way towards making the process more user friendly. I'm looking forward to more automation too.

My comments:

In section (3) -- you might add "Family Tree DNA Family Finder" to the list of autosomal DNA tests, since it is pretty popular.

In section (4) -- I've seen the DNA testing company say "2nd to 4th cousin" for a 3rd cousin relationship. This might be confusing to some.

It might also be helpful to mention the DNA Ancestor Confirmation Aid as a way for people to see what has already been confirmed.

The "How to Add Source Citations" section is fine for the first one, but you might point out that additional confirmations should be added when they confirm a different MRCA, even if there is already one confirming the same parent. Incidentally, I find that sources start to become somewhat cumbersome as more confirmations get added to a tester's profile. Perhaps a way can be found to simplify this.

John

Answer 3 · 2019-03-16T23:09:53+0000

Chris, this may be too wordy, but here goes:

FTDNA Family Finder Triangulation

When you find that two of your documented distant cousins are matches on Family Finder, they may be candidates for distant cousin triangulation. Family Finder's Chromosome Browser can help you determine on which chromosome all of you match each other and whether the matching DNA meets WikiTree's requirements for DNA confirmation. Here's how.

On your FTDNA Family Finder home page click on the the "Chromosome Browser" and select the two distant cousins with whom you want to compare DNA.Click on the "Compare" button to see the individual chromosomes and look for overlaps with both cousins on the same chromosome. If you move your cursor over a colored overlapping segment, a pop-up box will give you the details of the overlap. Note: screenshots can be helpful for capturing the details..
Once you have found and verified that your selected cousins match you on the same chromosome and the overlap with each other is enough to meet WikiTree's requirements for confirmation, continue on to step 3.
Contact your selected cousins and ask them if they would be willing to help with the triangulation and, if so,
1. Send them a copy of what you see on the Chromosome Browser about the matches, including the details.
2. Ask each of them to do step 1 on their own FTDNA account, selecting you and the other cousin and to send you a copy of the details of what they see on the Chromosome Browser.
Compare the details from their results with yours and determine the amount of DNA that the three of you have in common on the designated chromosome.
If the details from at least one of the selected cousins confirms that the three of you have a sufficient amount of DNA in common, you can use this triangulated group for confirmation.
Share your findings with all the selected and responding cousins and mark parental relationships as confirmed as stated under "Which Relationships to Mark as Confirmed."

Here is an example DNA confirmation statement:

* Maternal relationship is confirmed by a triangulated group on FTDNA consisting of [[Kingman-271|John Kingman]], [[Brooks-4984|Denny Brooks]], and JG, who share a 13.73 cM segment on chromosome 15. These matches have been independently verified by John Kingman and Denny Brooks via the Family Finder Chromosome Browser. John and Denny are 4C1R; John and JG are 3C1R; JG and Denny are 4C2R. The most-recent common ancestors shared by all three are [[Brooks-4989|Joseph Brooks]] and [[Basinger-161| Dorothy Basinger]].

commented Mar 23, 2019 by John Kingman G2G6 Mach 6 (63.4k points)

Answer 4 · 2019-03-17T00:13:56+0000

Chris,

Don't look for me to be confirming that the woman I loved and knew as my mother for 55 years is genetically my mother.

I am a simple minded creature, not an astronaut.

I am mechanical minded. I can take a computer apart and put it back together, but I cannot write a program to run on that computer. I can change the brake pads on my car, but I cannot explain in words how those brakes stop that car. I can draft (draw) sugar cane equipment parts or bridge trusses, but I cannot write a thesis on how physics or whatever it is, comes into play to hold that sugar cane harvester together, or how those trusses hold up that bridge.

That being said, I cannot write out a source to prove that my mother and I are related by DNA.

I understand the need for sources, but geez KISS.

When I uploaded my Ancestry.com raw DNA data, it automatically put a statement on my profile and the profiles of all my ancestors stating that my DNA connects me to them. I love to look at their profiles and know that I actually do belong in that family.

If I click on that box that says DNA confirmed, why can't it put a source that says {{Ancestry.com DNA confirmed}}?

Vicki

commented Apr 6, 2019 by Vicki Chicola G2G Crew (500 points)

Answer 5 · 2019-03-17T01:00:30+0000

I think the DNA confirmation help page is much improved. It has already gotten significantly better over the past year, and is in a pretty good state right now. I agree with the use of "confirmed with DNA", and think a reasonable compromise has been found on public verification. I do wish we could add DNA tests for people who are not ourselves - I know there is a GDPR issue here but given that there's no actual DNA data shared it seems pretty harmless, especially since any living people will be unlisted.

The triangulation page has also improved, but I still think there are a couple key items there that make it unclear exactly what wikitree's requirements are.

The minimum segment size required for a valid triangulation was 7 cM, then it was updated to 12 cM, and now it's back to 7. What is it going to be - and why does it keep changing?
It's also still somewhat ambiguous as to exactly what relationships allow triangulation. The first iteration of the page vaguely said "three cousins with MRCA", and the more recent page said these must all be > 3C to each other. The current page has removed this wording, which leads to more ambiguity. I constantly see questions on G2G about whether triangulations are valid, and a lot of them (including several of my own) arise from this confusion. I think arriving at a precise formula which will enable anyone to clearly assess whether three relationships are "triangulatable" is needed for this to be useful. The ">3C" rule was a start, but it was somewhat unclear as well; I received conflicting advice on whether a triangulation between two 3Cs and me (being a 3C1R) was valid: taking the previous rule at face value, it wasn't, but that doesn't seem to really be in the spirit of what DNA confirmation is meant to achieve. I don't know exactly what the solution is, but I'm thinking something along the lines of "all members of triangulation must be within x% degrees of your own relationship to the MRCA" - which would allow for much needed flexibility in reasonably close scenarios but also extend to those triangulations with more distant cousins.

Anyway, I'd like to see more discussion about #2 - it's a pretty significant shortcoming in Wikitree's triangulation guidelines in my opinion.

Answer 6 · 2019-03-17T01:27:39+0000

I like the explanations - thank you for working on those! I too was surprised to see it mentioned that a DNA section was optional. I think it makes it easier for the DNA team to check them if they are kept together, easier for cousins to jump quickly to that section, and make it less likely that a DNA source will be inadvertently damaged while someone is editing the profile (like I see with regular sources and Ref tags).

I think it would be good to point out that Relationship Finder is a great way to see exactly which ancestors you can mark. The final step for each cousin can be marked for both parents (son or daughter of MRCA).

And a question - If the cousin isn't on WikiTree can't we still use an ancestor above them? Such as Mindy Silva and the daughter/granddaughter of (deceased person)?

Otherwise I love it! I'm all for making it easier for people. And an automated system would do wonders!

On a side note, as the GDPR interfered with some of our previous activities - what if you mapped the chromosomes with the ancestors names, such as you can do on dnapainter? Instead of using the cousins you match, put the MRCA in those segments. Just a thought.

Answer 7 · 2019-03-17T03:19:32+0000

Nice work, although I expect that many members will always find DNA confirmation to be hopelessly confusing.

One comment that seems nitpicky but that I think might be more consequential:

Can we please revise the wording in "Maternal relationship is confirmed by a {{company name}} test match between [[Member-1]] and [[Member-2]]" to replace "maternal relationship" with words that more explicitly describe what were are talking about? That is, use words like "mother" and "father" and give the names of the people involved. So, instead of "maternal relationship," say "Relationship of [[Child Name]] relationship to her mother [[Mother's Name]]".

Getting personal about whose relationship is confirmed would make the text friendlier, and use of names instead of the antiseptic words "maternal" and "paternal" would make it easier for members and other readers to understand what relationships being discussed. Additionally, if for some unusual reason the identification of an ancestor/relative gets changed it would be easier to recognize that the DNA "confirmation" needs to be reexamined.

Also, looking at some of my DNA confirmations, I am thinking that it might be a good idea to create guidance on streamlined ways to document the multiple matches that confirm a particular relative. For example, my father's relationship to his mother is confirmed by four sets of second-cousin-level DNA matches between two of his children and two grandchildren of his mother's sister (plus additional matches on more distant relationships). I think it is worthwhile to document the matches in case questions arise in the future (for example, say that ES and DW match on 252 cM over 12 segments, NS and DW match on 297 cM over 13 segments, ES and MP match on 254 cM over 11 segments, and NS and MP match on 267 cM over 14 segments), but that seems like it may be too much information to shove into the source citation on my father's profile -- and to repeat on the profiles of other family members whose relationships these matches also confirm. I'd like to encourage people to put details like these on a free-space page that can be cited from individual profiles, and suggest some formats for the free-space page(s) and citations to those pages.

Answer 8 · 2019-03-17T14:11:48+0000

The “DNA Confirmation” and “Confirmed with DNA” pages are both much improved, but could be improved further. The “Triangulation” page still has a way to go.

I’m still not sure if the aim of these pages is to educate or to inform. They do a pretty good job of the latter (this is wikitree, and this is how we do things here) but not much of the former (pros and cons of DNA, why we stop at G3, where it’s good evidence, where it can fall down, why triangulation is necessary past G3).

Confirmed with DNA:

“DNA is not absolute proof.” A true statement but perhaps better phrased as what it is rather than what it isn’t. "A DNA match is just one piece of evidence to be considered alongside of all the other evidence you have gathered."

The section on Genealogy supported but not confirmed by DNA is a little worrying. It could be read to suggest that it is OK to insert imaginary links and mark them as confident just because ancestry/ftDNA/gedmatch or whoever says we’re related in a particular way. I’m sure that’s not what is intended, but not sure what this section actually adds.

The piece that is in that section about sources could also lead to some issues for Data Doctors. As I understand the system at the moment if a relationship is tagged as DNA confirmed and the bio section does not contain the phrase “maternal relationship” or “paternal relationship” it throws up an error. Same works vice versa. You could get a lot of errors generated by following this advice on sources. Absolutely agree with putting in DNA observations in a separate section but worth checking with Ales first on how his system is designed.

DNA Confirmation:

The content is fine but the first section is just begging to be formatted as a flow chart. It would make it so much easier to follow (and hence more likely to be read and used).

I would like to see more emphasis being placed on matches who are not on wikitree. The section is there, but a bit buried. I think that I am far from unique in that only about 10% of my known matches are with people who are on wikitree. The other 90% have no interest in wikitree and are never going to have their own profile pages here.

A point for you to ponder – what if neither match is on wikitree? May seem a little obscure, but I am busy trying to get other people who run one-name studies to join in at wikitree. Many of them have atDNA studies that could add loads of information to wikitree. The donors rarely want to be bothered by yet another website, but are quite happy for their trees to be verified and published by someone else.

Triangulation:

This really needs a piece on why triangulation is important and what makes for a good triangulation and what makes a poor one. In teaching this to groups I’ve found it a really difficult concept to get over whatever words I use, but showing a picture of a properly balanced three legged stool next to one with uneven legs tends to be a light bulb moment for most everyone.

A couple of pictures would really help on this page.

I echo what John Trotter says about the actual limit. It does seem to have bounced around quite a bit. If 7cM is the base limit taken for the commercial companies declaring a valid match of any sort, then it logically becomes the base limit for declaring a triangulation as well. It’s either valid for both or valid for neither.

The page is now mute about the MRCA pair and how to mark them. Previously the advice was to mark them as being confident but not confirmed. They should be recognized somehow and that seems to me to be good logic.

The page is also mute about how (if) to include triangulations where the donor is not present on wikitree. I don’t think we should ignore this as there are many of them now and likely to be many more of them in the future.

On triangulation using just ftDNA, John Kingman makes a very valid point. With co-operation it is quite possible to do triangulations within ftDNA. I know this reflects into the argument about independent verification, but it is not something that should be ignored.

Automation:

It depends so much on what degree of automation you opt for.

A system in which a user identifies two profiles and has a form fill to record the detail of the matches would be great. The background system could then add all the right DNA tags and the right sourcing in one go.

I don’t see why the same system could not also work for triangulations – it then becomes easier to build in limitations such as “The MRCA identified is a G7 grandparent, using first cousins for this triangulation is insufficient”.

Some form of AI that picked up potential matches from gedmatch and plugged them into wikitree has the potential for being a disaster.

Answer 9 · 2019-03-17T15:28:51+0000

Looks good. Kay Wilson helped me out when I first started and I see it evolved a bit since then. I've updated almost all of my DNA confirmations since the change. I left the ones where the matches were people already registered on Wikitree.

You guys do great work. =D

Though, I was thinking....Could we add that "proven" could be backed up with research and other things? Like say I put in my first cousin. What proof would you need for something like that or your parents' cousins if you know them and they tested. Or even second cousins who tested?

Just random questions, Chris. Keep up the good work, guys!

Answer 10 · 2019-03-17T16:00:15+0000

First of all, THANK YOU for your efforts. I consider myself computer- and DNA-literate, but I still have struggled with each confirmation I've done in my tree. The instructions are much clearer now and I like the flowchart feel to them.

Automation of straightforward, close matches would be a great thing, and would probably encourage people to go further, taking the effort to put just a few more pieces in place to confirm slightly more distant matches.

Answer 11 · 2019-03-17T16:17:59+0000

Hi Chris,

Thanks to you and any others that are involved in working to improve the DNA confirmation instructions. I really like the step-by-step approach. Many WikiTreers request step-by-step instructions when they ask questions on G2G, and this will be beneficial to them and the WikiTreers that try to help answer their questions.

On step one of the DNA Confirmation instructions, should the question "If a DNA testing company has provided you with a match, continue" be expanded to include a match from a "non-testing" 3rd party comparison site like GEDmatch? I believe that updating the question verbiage would help to clarify that we should also "continue" with a match from GEDmatch (not just matches from testing companies). There is already a later section and source citation confirmation statement example for using a one-to-one DNA match at GEDmatch. Also the verbiage/link in the GEDmatch section incorrectly points to MyHeritageDNA instead of GEDmatch/Genesis.

For step four of the DNA Confirmation instructions, where do matches that are 2nd cousins, twice removed fit in? Is that a one-to-one match confirmation scenario or would that relationship distance require triangulation for confirmation?

One thing that's been confusing for me (and others) is whether or not we should be confirming up to both individuals of an MRCA couple for 3C or closer one-to-one matches. My understanding is that at some point in the past (maybe prior to GDPR implementation), the confirmation instructions did provide guidance to confirm to both MRCAs. But more recently the instructions appeared to have been silent on that particular aspect. The currently proposed updates appear to include guidance to confirm to both MRCAs: "All the relationships that connect you — up to and including your most-recent common ancestor or ancestral couple — can be marked as confirmed." There have been some G2G discussions on the topic, and I just want to make sure that I understand the intended guidance going forward.

For the triangulation instructions, previously there was a requirement that the members of a triangulation group all be distant cousins (>3C) to each other. If that's still the intent, I'd suggest specifying that in the related verbiage on the page, such as:

in the triangulation requirements section: "Three or more distant cousins (3rd cousins, once removed or more distant) need to all match each other on a single segment of DNA"
in the GEDmatch triangulation section: "you share (i.e. overlap) with two or more people who are all distant cousins (3rd cousins, once removed or more distant) to each other"

In both of the help pages, when documenting non-WikiTree members in a DNA confirmation source citation, it is suggested to use initials or "another anonymous identifier". My impression is that a GEDmatch ID or testing company kit#/ID would not be considered as an anonymous identifier, but would be considered private info that shouldn't be disclosed for non-members. Is this correct?

I agree that automation of the one-to-one close cousin matches between WikiTree members should be pursued to attempt to automatically mark the confirmation indicators and generate appropriate source citations.

Thanks again for working to improve the DNA confirmation instructions!

Answer 12 · 2019-03-17T16:19:47+0000

I’m happy to see that there’s progress again on the DNA confirmation process. I think there will always be a conflict between “keeping it simple” and keeping it accurate/precise/valid/sourced/etc. Confirmation of close relationships can be made simple, but it’s necessarily somewhat complicated for distant relationships just because of the many issues involved. I don’t know how many people click the DNA confirmed button just because they’ve had a DNA test, without comparing it to anyone else, but I suspect it’s common.

Regarding Privacy:

Using initials for a DNA tester was a great advance to increase the possibility of DNA confirmation on WT. The current examples don’t include initials though which is a leftover from previous guidance. As another option, what about using the WT ID of unlisted testers? Nothing more can be seen than what WT thinks is allowable to comply with privacy regulations. The advantage is that it allows the WT user with access to that profile to quickly see relationships and relationship trails. Over time, I start to forget who the initials represent. Would that be acceptable as “another anonymous identifier”?
“…whether we should require public verification.” Short of posting screenshots, I believe it’s literally impossible on sites other than GEDmatch for others to verify a DNA match, so I completely agree with “some sources that justify some DNA-confirmed conclusions will not be publicly verifiable.”
What is the privacy issue in specifying start/stop positions?
For GEDmatch IDs, could we use a pseudonymized version for non-WT members such as AB####567? The actual ID can’t be guessed, but it may be possible to decode on the GEDmatch site where the user has given explicit permission to have their GEDmatch ID visible.

Regarding triangulation:

I’d like to associate myself with John Trotter’s comments regarding triangulation. It was silly that it was literally impossible to confirm a 3C1R/3C1R/3C triangulation with the previous guidelines. However, the current draft that says “three or more cousins need to all match each other” is insufficient. It would allow for triangulation using a first cousin. I like the analogy that the three matches should be like three legs of a stool that support the seat, the MRCAs. The legs need to come together at about the same level. Perhaps the branches should be within one or two generations of the MRCAs?
The guidance that requires describing the relationship between all of the testers and with the MRCAs seems unnecessarily wordy. If two testers are third cousins, it’s not also necessary to describe the MRCAs as 2^nd great grandparents, and vice versa. It gets more complicated with triangulated matches, but perhaps there’s a compromise somewhere. I’ve started using abbreviations like 3GGP, but that might not be universally understood.
John Kingman’s comment that it is possible to triangulate a match on FTDNA is valid, but the guidance does need to be explicit regarding the cooperation of one of the other testers.
The 23andMe language is insufficient. Just seeing that three testers overlap on the chromosome browser doesn’t mean they triangulate. The third person either needs to appear in the “Relative in common table” as “Shared DNA: Yes” or the ‘compared’ person in the chromosome browser needs to be swapped with one of the others to make sure they all match each other.

In the previous version, there was a statement requiring a sourced lineage for each tester. I don’t see it now. I thought that was important as a means to avoid just relying on someone’s family tree without validation. Each lineage should be developed on WT, at least to the most recent non-private ancestor.

For third cousin and closer matches, it’s necessary to state “paternal and/or maternal” or add a 2^nd confirmation statement to include both MRCAs. Perhaps the term parental could be allowed to mean both parents.

Answer 13 · 2019-03-17T20:02:28+0000

Great job Chris! Simpler is definitely better. As someone who knows just enough about DNA to be dangerous, I think this goes a long way to making "the rules" easier to understand.

As a former instructor with the military, I realized that while some people can easily understand the written word and pick things up the first time they see it, many others can benefit from simplified or explanatory information. You may have to explain something two or three different ways before the lightbulb comes on.

I offer the following suggestions (in no particular order) to make things easier for those that don't have a strong understanding of DNA (which is a lot of folks).

Provide up front why 3rd cousin is the cutoff for non-triangulated DNA matching and equivalents to that 3rd cousin (i.e., 2nd cousin 1X removed, nieces, nephews, etc.). This would go a long ways in eliminating many of the 3rd cousin 1X removed questions.
Explain why 7 CMs is the cutoff for confirmation matching. Keep this as simple as possible so people can understand.
Explain how SNPs relate to the 7 CM cutoff. I have a known cousin who doesn't match at GEDMatch unless I back off the SNPs to 300, then I get over 9 CMs. Does this qualify for matching?
Explain how to convert DNA segment locations into CMs. I have a five way match to one spot on my Chromosome, but each match has a different start/stop point. While I can figure out the overlapping start/stop points, how does the resulting range of numbers equate to an unknown number of CMs?
Use plain language wherever possible. With all of the nationalities and native languages here on WikiTree it only makes sense.
Use visualization to help get your point across. The use of graphics or diagrams would definitely help folk's understanding of the DNA confirmation process, Keep these graphics as simple as possible so it doesn't make things worse. Also I would recommend having a person with a limited understanding of DNA read these instructions. If they require a further explanation, this might be a good choice to use a graphic of some type.
Add links to aid in understanding. While this has been done for many situations, there are many sources and tools available online that are very basic and could easily help. I'm sure if we used these with permission it could really help. One such link is here at Autosomal DNA Table.
Use plain language whenever possible. Not everyone has a PhD in Genetic Genealogy, let alone a high school diploma. As mentioned earlier with all the native languages involved here on WikiTree, simpler is better in many cases.
Keep in mind the "why" when writing the "how to" portion of the instructions/help pages. While many can easily follow the "how to" step by step if it's written down somewhere, adding the "why" makes it rememberable.

Hopefully these suggestions can help others understand the DNA Confirmation idea a little easier.

Answer 14 · 2019-03-17T23:13:40+0000

When the rules become "written in concrete" an attempt should to be made to delete or modify all those old G2G questions which have now erroneous answers. Otherwise the confusion will continue.

Answer 15 · 2019-03-18T03:34:13+0000

I though the page was pretty good and easy enough to follow for most people.

I too was surprised to see 7cMs there. I had thought it was 15cMs and then reduced, but couldn't quite remember what to - maybe 10cMs but not sure. I think 15cMs was too high, but 7cMs might be too low given the large numbers of IBP segments in that range. Maybe 10cMs is a better number if its still under review.

There have been some great comments already about visual aids which I would support.

From the triangulation page. I am a bit concerned about not quoting the chromosome segment 'overlapping segment area' (it says for privacy but I don't see these raw numbers being an issue) as this is the essence of triangulation. It is also the only way you will eventually build a chromosome map for the ancestor, so they are vitally important. I know it says it in the first paragraph, but it isn't restarted in the later sections.

The instructions as they read at present might be misinterpreted by a novice as needing matches on the same chromosome only as they don't need to quote this detail. By having to state the 'overlapping segment' area, eg 20.1 - 30.2 (or the longer way of writing it), it ensures we are actually sourcing a 'true' triangulation. At the very least the source should say something like 'who share a 10.8cMs overlapping segment on chromosome 1'.

IMO it would be of more benefit to quote all three segment areas, if you were to go down the automation path, with the resulting overlapped area, eg

* Match 1 and Match 2 share 19cMs on Chromosome 3 between 20.1-39.5

* Match 1 and Match 3 share 21cMs on Chromosome 3 between 18.8-39.5

* Match 2 and Match 3 share 14cMs on Chromosome 3 between 25.3-39.5

Triangulated and overlapping segment area = C03 29.3-39.1, approx 10.8cMs.

Automation of the source from this would be a great innovation down the track. Meantime I think the source should say Chromosome 3, 'overlapping segments' of approx 10.8cMs between 29.3-39.5.

Great work Chris (and others), continuing to lead the way with DNA innovation!

Answer 16 · 2019-03-18T17:44:08+0000

I recently ran afoul of the DNA confirmation suggestion that now REQUIRES the word CONFIRMED with either MATERNAL or PATERNAL. I must have missed that change. I had written an extensive explanation of my DNA confirmations and had even posted them a while back to G2G where they were approved. So when I saw 2 suggestions pop up I was surprised. I do not think most people who have already confirmed DNA know that is a new requirement. I mention this because some of the suggestions above want to reword that part of the write up and if they do then the suggestion will also need to be reworked.

I prefer things that make it simple for users to do. So perhaps a fill in the blank form that guides the users to a result that will be accepted.

For users who have matches already on WT it might be something like:

My DNA with my (select mother or father) is confirmed through DNA matches with the following WT members: insert their profile IDs This would allow the system to automatically figure out the common ancestor just as it does not with the Relationship Tool).

For users who have matches outside of WT it might be something like :

I match at (list testing lab) with (initials of match) at total CMs of (fill in number) with the largest segment of (fill in number) CMs. Beginning at SNP (fill in number) and ending at SNP (fill in number).

For users who are using GedMatch there s a report called People Who Match 1 or 2 Kits that allow you to visualize Triangulation. This is in the free group not Tier One. So for that an example might be:

Using GedMatch Report People Who Match 1 or 2 Kits visualized for Triangulation, I match with (initials of match) at total CMs of (fill in number) with the largest segment of (fill in number) CMs. Beginning at SNP (fill in number) and ending at SNP (fill in number).

For each example there will need to be a way of adding another line to support the matches... 2 lines to start with for the triangulation should be standard.

As to an explanation of what kind of relatives need to be used, I am confused because the old direction was if you had matches closer than 3rd cousins you did not need to do triangulation. Has that direction changed? So in my examples above we would need to add the line: I relate to this person as (fill in the blank as to level of relationship such as sibling, cousin, aunt/uncle, 2nd cousin, etc)

Finally, why are we using 12cms for triangulation? Please read what ISOGG says: https://isogg.org/wiki/Triangulation they say Triangulation can be used going back many generations. However, well documented pedigrees are necessary for all the matching parties in order to rule out the possibility that the match is not on a more distant line which has not yet been researched. Caution still needs to be exercised when reviewing matches with smaller segments under 15 cMs in size, and especially segments under 10 cMs in size, as many of these are false positive matches.

The process of triangulation is greatly facilitated by the use of third-party tools such as those available from GedMatch and DNAGEDCOM (eg, Don Worth's Autosomal DNA Segment Analyser).

This does not rule out lower than 10cm with an associated paper trail that matches. Aren't we being more restrictive than even the testing companies?

Answer 17 · 2019-03-19T23:17:18+0000

General... I like it. It's an improvement.

Changes to the DNA Confirmation Guide:

Step 5:

Add a link to here: https://dnapainter.com/tools/sharedcmv4

For now, I think that predictions of the testing companies OR the Shared cM Project tool could both be used. See the final section for a comment on how this might ultimately be improved.

Step 6:

I would add a 6th step, even if slightly redundant: "6. Is the match a member of WikiTree?", in order to explain the anonymity requirement and to improve it (note A). Additionally, the privacy requirement should be changed with respect to matches who are members of WikiTree (see note B):

If they are not members of WikiTree, their identity must be kept anonymous (A), simply stating the relationship: e.g. 2nd cousins once removed, sharing the common ancestors [WT ID Ancestor 1] and [WT ID Ancestor 2]
If the match is a member of WikiTree, their identity should only be disclosed with their consent. Otherwise, the relationship can be stated as described in the guide for non-members. (B)

A. With regard to the anonymization, the guide as currently written is problematic. "Your WikiTree ID and the initials (or another anonymous identifier) for your match. You can include your full name but do not publicly reveal the identity of your match." Given the ability of the typical genealogist to Facebook stalk, initials provide nearly all of the information required to correctly identify the individual in question, thus it is not an "anonymous identifier" (see: oxymoron, noun). That their name is not attached on WikiTree means very little if one can simply side-step that in most instances. The only anonymous identifier would be the biological/genetic relationshiop, .e.g. 2nd cousins once removed, sharing the common ancestors [WT ID Ancestor 1] and [WT ID Ancestor 2]. That is the standard which has been provided in the past and it seems entirely sufficient.

If there was a private, non-publicly-viewable field where such initials could be stored, that may be acceptable. If we had a system of where sources, including DNA confirmations, were separate from the biography, this might be feasible: If such private, creator-only-viewable fields were provided.

B. We should not be operating on presumed consent with regard to DNA confirmations involving other WikiTree members. Perhaps that could be automated such that for 2 WT members, for a DNA link to be made, there should be a recorded consent of both parties to that linking process. Although point 5 in the Honour Code states, "We privacy-protect anything we think our family members might not want public. If that's not enough for someone, we delete their personal information.", we need to realize that once information is placed on WikiTree, we cannot guarantee that WT's deletion of it will remove it from being publicly accessible as it could have been recorded prior to deletion. Because of this, the consent should be obtained prior to posting of the linked citation.

Make Sources Separate (So we can do cool things with them)

If we are going to the work of automating this, you may wish to consider the idea of keeping citations separate from the biography. If we had a separate data system for citations, it would likely be much easier to record this affirmation, as well as to propagate the citation:

Wter-1 and Wter-2 are both members. WTer-1 goes to Wter-2's profile page and clicks, "We're DNA Matches", then enters a few details (what test, how many cM for each shared segment or total cM and # of segments). Wter-2 gets a notification and clicks "Confirm"... automatically those citations climb the tree to their common ancestror(s).

That would be nice.

Making the separate data system for sources could also be something gradual. If we made one for DNA Confirmation sources, with other application and/or future expansion in mind, it could gradually be expanded to address other kinds of sources as well.

Such a system could also include the confidence of the relationship (using Shared cM Project data, as used on DNAPainter). It could be interesting for edge cases, e.g. 3C vs half 3C, where the ancestral couple being confirmed (especially with automated, separate-source citations) might have a note that there's a possibility (including % probabilities) that the terminal relationship might not be as described (1 vs 2 shared ancestors). I think that would help to balance the over-confidence sometimes produced with DNA confirmations (and take a little heat off from the use of the word "confirmation"). Perhaps instead of a "DNA Confirmed" checkmark, we might be able to have a % probability or a qualifier ("High", "Likely", "Possible", etc...)

Additionally, as more knowledge is gained, we might be able to have the system recognize if 2 or more DNA Confirmation citations are present and then introduce a statement of odds into the Research Notes section (or as a separate, non-editable part) of the profile. This could be useful for multiple matches with distant ancestors (> 3C). Again, this would be best executed by linking WT IDs and a separate citation section. Having multiple 1-to-1 matches (as a simple, "We're DNA Matches" button/link), recorded with automated citations would in my view be ideal. And I think that it's something that could ultimately be built on for more robust distant confirmations in the future.

In the instance where two people are DNA matches and have WT profiles, having that profile-to-profile link that exists separately from a known shared ancestor, could allow for a narrowing down of how two or more WT users are likely related. Forget ThruLines™ and Theory of Family Relativity™. Now introducing, Algorithmic Speculation by WikiTree™ ... but joking aside, this could be a serious possibility. DNA networks are incredibly powerful, especially as one increases the network size, and because this is a global family tree (where DNA tests could rule out certain connections and suggest others), the predictive power becomes much more precise.

If the DNA Confirmation inputs were well tracked, one could create McGuire charts and which might help users to evaluate testing options and the data (along with odds or probability calculation):

If we can start doing these kinds of relationship analyses, we can, I think, significantly expand our answer to "What conclusions can users confidently draw from what they're getting from AncestryDNA, MyHeritage, etc?" Rather than having questionable suggestions imputed from tangled and unsourced trees via algorithms that are a black box, we can open the box to show how the certainty is determined and provide users with a verifiable level of certainty. I believe that there's a real parallel here between the push for open source code in science vs closed source "black box" code. WT can be the open source option - trustworthy and verifiable - in its code more than in the source data.

Although Chris isn't interested in re-hashing the "whether we should require public verification" question, there is a relevant aspect here. Because we have a better opportunity for how to be open and verifiable: Provide an analysis that is open and a platform for the data. As more people contribute relationship data to answering questions, outliers will become apparent, and those can be marked as such and viewed with a skeptical eye. This, for the doubters, is actual verification... not linking the fake kit I may have uploaded to GEDmatch!. The value to users continues as the dataset grows. Moreover, being cross-platform — as far as DNA data goes — there's an opportunity to reach conclusions that can't currently be accessed on other platforms.

The Tangent on Segments:

I am still unsure if we might want to store start and stop points for segment matches.

Regarding the tangent on storage of segment position data, I'm against the storage aspect. As outlined above, I believe that there is much more that we have the potential to do without getting into segment positions.

As medical inferences could be made from accessible start and stop points for segment matches, segment position storage should not be part of WikiTree. It creates both regulatory and privacy risks. For good reason, many jurisdictions redact or withhold cause of death from public inspection on death records. Such information links living individuals, especially those closely related, with a greater likelihood of certain genetic conditions. Still, some provide such information. (Such may depend on local healthcare and/or privacy laws.) So if a segment containing a SNP with a causal relation to the cause of death, then medical information can be inferred with even higher certainty about any others with that cited segment. I would swerve to avoid that landmine.

That said, there could be a middle ground approach here. WT could have a calculator input using segment information which could (1) calculate and (2) evaluate relationship confirmations which does not store the segment position data. Perhaps what could be recorded, not for individuals, but for analysis purposes, is the number of cM for each segment. Such could be done collaboratively with the Shared cM Project to help grow the database using WT to help crowdsource further collection. I could see a synergy between WT and the Shared cM project there - it would certainly "increase the world's common store of knowledge." This could produce useful info for endogamous populations, more complex relationships, and even for simple relationships — as the Shared cM Project does not currently take segment number or size distribution into account.

Categories

What do you think of our DNA confirmation instructions and possible automation?

Please log in or register to add a comment.

Please log in or register to answer this question.

34 Answers

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Related questions