Another update to DNA Confirmation app

+32 votes
1.0k views

I have posted another update to the DNA Confirmation app this afternoon, now at version 2.06 as of this afternoon.  Most of the updates in this version are to patch some holes, give some more guidance, and add some perks to the look and feel and resulting citations.

Here is the list of some of the more interesting changes:

  • Added "Geneanet" as a DNA Company option for Simple DNA Matches
  • Automatically load mitoYDNA, GEDmatch and FTDNA kit #s for DNA test takers, if available
  • Added links to mitoYDNA comparison page if mitoYDNA kit #s included for both test takers
  • Added links to GEDmatch comparison page if GEDmatch kit #s included for both test takers
  • Reconfigured X-DNA Match Intro screen to be more explicit about 2 options
  • Revised wording for mtDNA Matches (included Coding Region in citation)
  • "How to Add Source Citations" - brief outline added to opening page
  • Triangulation mini tree : closer branches drawn beside each other
  • Triangulation no longer allows you to continue if any pair of 3 test takers are 3rd cousins or closer. (Will suggest a Simple DNA match, and provide a button to create that citation)
  • Simple DNA mini-tree tweak - child/grandchild of DNA test takers greyed out

Not new in this latest version, but worth repeating:  There are two places where you can customize the citation if additional / alternate wording is needed for your specific usage:

  • In the second last step, there is an ADD NOTE option if you need to add any clarifying information about the DNA matches / test results
  • In the final step, at the top, there is a CUSTOMIZE CITATIONS option that can be used to determine the name used for each DNA Test Taker in the citation (where you can substitute an Alias or shortened version for living people, non-WikiTreers) - and - also choose how you wish to format cousins  in the citation (e.g. "2nd cousin once removed" vs 2C1R)

As always - please let me know if any of these added features (or previous ones) have introduced new bugs into the system.

Thanks for all who help me with the testing and refinement of this and other apps - and in particular with this latest set of updates, a huge thanks to John Kingman for his suggestions along the way.

in The Tree House by Greg Clarke G2G6 Pilot (114k points)
Thank you! This DNA Confirmation tool has been a great help to me. :)
Yes thanks for this app! Has helped me confirm / deny validity for simple DNA match of some more complicated relations (e.g. half 2nd cousin once removed)

3 Answers

+12 votes
 
Best answer
Love this app and have been using for all of my DNA Statements, kudos & thank you Greg!
by Patty LaPlante G2G6 Pilot (183k points)
selected by Valorie Zimmerman
+10 votes
Nifty tool!  However, Is it intended that the triangulation module will only accept WikiTree members as matches?  I don't have that many WT matches so was trying to use initials of matches from elsewhere.  That was generating an error message that my input (say, "EB") was not recognized as a WT ID.  Thanks.
by K. Nichols G2G6 (9.7k points)
Excellent question, K !  And, the answer is no, it is not only for WikiTree members only. ONE of the DNA test takers has to be a WikiTree member - and you have to put his/her WikiTree ID in the first slot, but for triangulation or any of the other options, you can use an Alias, like a nickname or just initials, for the other test takers.

Then, a warning will come up to warn you that you have entered a non-WikiTree userid - just in case it was a typo or a mistake - BUT - if it is NOT a mistake, then the next step is to indicate the WikiTree ID of the first ancestor of the non-WikiTree person who IS on WikiTree, and then specify the relationship.

For example, if you have a match with a cousin, you share a  set of great-great-grandparents in common. Your cousin is not on WikiTree, but her grandmother is, then you'd enter ABC for your cousin, then enter her grandmother's WikiTree ID and indicate the grandmotherly relationship, and the DNA confirmation app would then stitch the rest of the information all together for you.

Hmmmm ... I think in one of the videos I did an example of this - but - probably pretty quickly.  With some of the latest changes, perhaps it's time to do another How To video.

Hope this helps!

 - Greg
Thanks for the clarification.  I don't think I have watched a video on this.  I'm still somewhat of a newbie.
+6 votes
This is great, thank you.

I'm not sure if it is addressed, but if I could raise one thing. There is a gap between being able to do a straight match between two people up to a common descent from great great great grandparent, but then an inability to create a triangulation with another person added. I have not managed to successfully establish a triangulation yet, because more often than not, it may be my great great great great grandparent, but both the other people in the triangulation are either ggg or gg, with a result that it tells me to do a simple match between those two people. The problem is, I don't have authority to act on behalf of either of those people, but am constantly denied the ability to do either a simple match, or a triangulation. I'm not sure if this has been raised before. I hope it makes sense and isn't too complicated. I do love what is being done with this app.

Actually, another quick point, I'm not entirely sure what is being said in relation to the paternal connection, or maternal connection. I understand what paternal and maternal are, but the blurb isn't defining who the connection is with, whether it is describing the descendants connection to the profile, or the profiles connection to their parent, or higher ancestors. Get back to me if I don't make sense. Thank you. Ben.
by Ben Molesworth G2G6 Pilot (163k points)
Hi Greg

 I think Edison has made the point that having more individuals in a triangulation strengthens the result, which means 3rd cousin triangulations are of higher value than a simple match of 2 3rd cousins.

 As no-one has cited a Wikitree passage banning 3rd Cousin ( or lower order) triangulations, and consequently the view that  such triangulations are banned is the result of an honest misinterpretation of the DNA confirmation pages, I am requesting that you restore the ability of the citation app to create citations involving two 3rd cousins and any other relation permitted by the 3 legged stool principle.

I'm also asking if the app could be modified to include a check box on each citation block, so we can mark that citation off when it has been entered on Wikitree, it's a pain to find out months later that I've missed one.

 Also is it technically feasible to modify the app to handle more than 3 in a citation? I think I could modify a 3 way citation made by the app to a 5 or 6 way, but it will be a lot of work, it would be nice to know if it is technically feasible  automate that.
Hi Gary - thanks for your questions.

I've initiated a discussion within the DNA team that I've been working with to consider these points - good ones - that have been brought up by this discussion, including considering changes to the DNA confirmation app.  I'll let you know (in this thread, and eventually a new G2G post) what we decide, and when the next version of the app is available.

Your request for checkboxes on the final page (next to each of the long list of citations) is an excellent one - someone else requested that earlier on, but I hadn't gotten to doing it, but it was on my list of possible things ... I was just wondering if more people would find it helpful ... looks like it!  I'll add that to the next version for sure.

Your question about the feasability of > 3 DNA test takers in a citation is a good one.  Currently - the way the algorithm works is it first finds all the ancestors of Person1 (up to a limit).  Then it finds all the ancestors of Person2, and tries to find an MRCA between them (at this point it stops once it finds one ... after checking to see if it's actually an MRCA couple or not).  Then - it does the same with Person3 - but for Person 3 we have to find the MRCA in common with Person 1 and then Person 2 ... and see if all those paired MRCAs result in a single MRCA (individual or couple) that all 3 descend from.   Throwing in an additional 4th, 5th, or 6th person is possible, but the logic and time it takes to compute could be off-putting for impatient WikiTreers.  HOWEVER ... if the first 3 chosen by a wise DNA confirming sleuth were the ones to establish THE definitive MRCA, then those additional peeps would only need to verify that they are also direct line descendants ...   SO ... bottom line .. not impossible, but will take some work, and some thinking that this early in the morning is hurting my brain a bit ... but likely going to keep me up at night.

Thanks for that!

PS - I've also suggested to the team that maybe a revisit to the wording on the DNA help pages might be in order ... so .. again ... thanks to all who have participated in this discussion - good points raised, good questions asked.

"Gary might have achieved the seemingly impossible task of writing a post longer than Edison's."

Kerry, I get the vague feeling that could be an implication I'm overly wordy in my posts here. Surely you jest... laugh

"I do think your proposals throw the baby out with the bathwater."

They well might, Gary, but I believe it comes down to the purpose of the "Confirmed with DNA" status. That word "confirmed" has been debated extensively before, however, and it's been made clear that it will remain as-is for WikiTree.

From a standpoint of genetic genealogy, not of WikiTree policy, my personal opinion is that the propagation in the "DNA Connections" panel to ancestor profiles as potentially sharing autosomal DNA as far back as 6g-grandparents is sufficient to serve as "FYI" information and/or a research hint.

If the goal is accuracy in the use of DNA as evidence, I think we need to be informed and careful enough to discern, if there are many babies in the same bathwater, which baby is really ours and to let the rest go. And now that has to rate as my worst follow-on metaphor ever. frown

I won't take this too far off-topic; this is about Greg's super-handy app and not a general genetics discussion. But it's still germane to whatever guidelines evaluation might be done by the WikiTree DNA Project, so I'm going to take up appreciable screen space. Surprise.

We had a recent conversation about some of the difficulties with using autosomal DNA for distant relationships three weeks ago here on G2G. I deferred responding to some of Frank's specific issues because I'm currently preparing a presentation on the subject.

My own arbitrary and personal definition of what determines whether a given segment of "matching" autosomal or xDNA is suitable for use in genealogy has two parts:

1. Is the segment very likely to be a valid segment, or is there sufficient possibility that it is a false-positive?

2. With a high degree of confidence, can the segment be identified as having originated with a single, specific ancestor?

Item one is about the physical, chromosomal segment. Every testing and reporting company uses its own methodology for this evaluation; the same DNA "match" will be somewhat different as reported by each company; and none of the companies disclose the actual base-pair data. A matching segment should be a continuous set of half-identical nucleotide values, unbroken by mismatched loci and having no excessively long region of no matches. Our common microarray tests look only at about 1 in every 4,800 base pairs, so long regions of no half-identical matches might infer that two distinct segments have been conflated in the reporting. Compounding this is the routine practice of assuming any no-calls (loci where one or the other set of test data was unable to determine a value at either base in the pair) are matches.

The centiMorgan itself--by definition and computational estimation--becomes increasingly imprecise as an evaluation tool the smaller segments become. Add to that, our microarray tests simply cannot tell us the actual start and stop points of a purported segment.

How the 7cM value became so commonly cited in genealogy as some de facto threshold in determining segment validity I've never quite figured out. My best guess is that's simply what GEDmatch first decided to use as its default minimum. But almost 10 years ago Dr. Tim Janzen reported on the results of comparing traditionally phased trios using what reporting detail we have available, and he found that fully 58% of all 7cM segments were false, this based solely on the phasing results, not a physical examination of the base-pair detail.

The most common problem in the interpretation of the actual base-pair data deals with the fact that no DNA test can differentiate between which nucleotide values come from the maternal chromosome in a pair, and which from the paternal. This is commonly called "haplotype switching": as ISOGG explains it, "...matching alleles zig-zagging backwards and forwards between the maternal side and the paternal side." In a paper in Molecular Biology and Evolution (Durand, et al., 2014), researchers found that this was the reason for false-positive matching between small segments in up to 67% of the instances.

GEDmatch uses no additional comparison refinement tools like computational phasing or genotype imputation; their matching is arithmetic only, calculating the number of continuously matching SNPs, an allowance for a small number of mismatches, and the distance in base pairs between the matching SNPs. Given that, the greater the number of SNPs in a kit's data, hypothetically the more accurate the matching results should be.

In an informal check using a baseline kit comprised of over 2.08 million SNPs extracted from whole genome sequencing data, and employing GEDmatch's Tier 1 one-to-many tool using its default settings, I compared the results from that superkit to 11 different tests/versions of our common microarray results using the same DNA sampled at the same time. In the worst performing instance, that of a 23andMe v5 test (and mind you this has nothing to do with the accuracy of that test, only with way that GEDmatch uses its data for comparisons), at a threshold of ≥ 10cM for every kit in the database shown as a match to the superkit, in round numbers 3 times as many kits showed as matching the 23andMe test (9,242 versus 28,846), presenting the real possibility that over 19,600 of the matches attributed to the 23andMe kit were false. Interestingly--though purely coincidental--that would be a 67.96% false-positive rate compared to the 67% reported in the Durand study. It wasn't possible to go lower than the 10cM threshold: the GEDmatch server would consistently deliver a "memory exceeded" error and, since that time, they have modified the available search parameters to eliminate the possibility of that volume of matches; 7,500 is the new maximum. No, I did not break GEDmatch...at least I don't think I did.

Item two--with a high degree of confidence, can the segment be identified as having originated with a single, specific ancestor--is an even stickier and more difficult criterion to evaluate.

Contrary to popular opinion, recombination (crossing over)--the process that occurs during Prophase I of meiosis and the function that creates our DNA segments--is not a random operation. There are several biological mechanisms in play that make that so, from genetic linkage to crossover interference to the centromere effect to linkage disequilibrium to crossover hotspots and their shifting due to something called deamination as males age.

One result of these various mechanisms is that, as we step back in time generation by generation, we reach a point where we will never be able to accurately attribute a given segment to a given ancestor. Just as we Europeans carry about 2% Neanderthal DNA, we carry DNA segments from our various founder populations and even from entire populations at a broad, continental level. If that weren't true, none of the "ethnicity estimates" we see would be possible.

This is one of many reasons that numerous voices have been urging the scientific community to do away with the single-genome-as-reference concept and move to a pangenomic model that better considers attributes specific to diverse, global populations. The National Institutes of Health have, for now, agreed and put the release of GRCh39 on indefinite hold until a determination is made how to proceed. (A reminder that all our genetic genealogy comparisons of autosomal DNA is still being done against GRCh37, a reference assembly first published in 2009 and retired in 2013; considering that this is the basis for both base pair numbering and calculating centiMorgans, genealogy is already a decade behind the curve).

In doing research for the mentioned presentation, I came across what I believe is a simple and important summary of the situation (Mathieson and Scally, PLOS Genetics, March 2020):

"Another source of confusion is that three distinct concepts--genealogical ancestry, genetic ancestry, and genetic similarity--are frequently conflated. We discuss them in turn, but note that only the first two are explicitly forms of ancestry, and that genetic data are surprisingly uninformative about either of them. Consequently, most statements about ancestry are really statements about genetic similarity, which has a complex relationship with ancestry, and can only be related to it by making assumptions about human demography whose validity is uncertain and difficult to test."

Gary, you encapsulated part of that very problem when you wrote: "This MCRA couple is Cornish, and I've noticed consistently higher cM results on my Cornish branch." We all have so-called pile-up regions of DNA in our genomes. These are chunks of autosomal or xDNA that display far greater frequencies of sharing than should be expected, and the causal elements here can range from occurrence at the level of continental populations, to regional and founder populations, to haplotypic pile-ups that can be associated with tribal/clan groups and even individual families.

The social practice of endogamy isn't necessary to create this. Every population bottleneck in each of our long genetic histories has resulted in a compression of the mating population, whether from disaster, pandemic, migration, or geography. At each of those intervals our genetic history has undergone a narrowing, with the resultant downstream funnel a commingling of DNA that becomes difficult to evaluate, requiring knowledge, effort, and detailed investigation to accurately attribute to a specific ancestor at several generations, and downright unlikely much further. For example, at 5th cousins we're already looking at 12 meiosis events between two test-takers; as many as 24 among three test-takers; 36 among four test-takers, and so on. The odds of finding a segment shared by two 5th cousins aren't great even at a 4g-grandparent MRCA--about 1 in 6--and they drop precipitously from there. Very roughly (and admittedly inaccurately; since triangulation has never been scientifically studied no one has published the computed probabilities), the odds of finding three 7th cousins who share the same segment of DNA from the same ancestor would be on the order of 1 in 166.

Part 2

Wait. Did Kerry call me wordy? I can't remember. <cough cough>

In summation, I think the goal for the WikiTree "Confirmed with DNA" status with regard to autosomal triangulation can be one of either entertainment or accuracy, but not both. Since I believe the "DNA Connections" panel on ancestral profiles does a good job with the entertainment and research hints aspects, I'm left thinking that the goal of "Confirmed with DNA" should be accuracy, should be the result of careful and studied examination resulting in a realistic conclusion.

WikiTree quite obviously can't make the requirements so rigorous that everybody would have to go back and take a refresher course in microbiology. A ton of complexity simply wouldn't fly for a genealogy site like WikiTree: nobody would take the time to figure it out and Greg certainly wouldn't want to try to write an app that drove folks through all the analytic considerations. Just a few of the questions an app might need to ask, and then require explanatory input to record in the resulting conclusion statement:

  • Are any of the triangulation group test-takers working with trio-phased data? Whether yes or no, explain how have you taken that into account.
  • Have you created haplotypic pile-up region charts for all members of a triangulation group and down-weighted the evidence of those segments appropriately?
  • Have you down-weighted any segmental portions that overlap an area of identified population-level pile-ups?
  • Have you looked at the match lists for test-takers sharing the segment in question for whom the genealogies don't show a correlation (in other words, actively sought evidence that would call the hypothetical MRCA into question)?
  • Have you examined the actual base-pair data from the raw DNA results of the test-takers to identify any mismatched or too-distant markers that might indicate the segment is not continuous?
  • Does the actual base-pair data, when compared to the respective genealogies, provide any insight into regions of potential haplotype switching?
  • Have you calculated the centiMorgan values for both male and female genomes (females undergo crossover at a frequency about 70% higher than males) rather than just a sex-averaged value, and considered the number and sequence of males and females in each inheritance chain accordingly?
  • Have you determined which of the compared SNPs are part of protein-coding, non-phenotyping genes and excluded those?
  • Have you analyzed the number of in-common SNPs compared among the different tests to determine if the comparison density is adequate? Do the resulting locations of in-common SNPs leave any large regions with no one-to-one comparisons?
  • Have you established that the purported crossover location(s) is not in a region that would biologically prohibit it (e.g., proximal to a centromere or in a location approaching densely heterochromatic regions?
  • Have you considered running a tool like liftOver for each set of raw data to convert the results to GRCh38.p14 in order to eliminate known errors and omissions in Build 37?
  • Have you carefully examined the genealogy four or more generations prior to the hypothetical MRCA to rule out any possibility of consanguinity or any possibility of conflated inheritance chains?

And so on. Yeah. I'm giving Greg nightmares now and will go quickly and quietly away...

Ugh.  I really hate it when Edison knocks me off my comfortable perch on that left peak of the The Dunning-Kruger Effect graph and puts me in freefall.  It seems to happen every time he posts.

 

Holy Moley Edison! ... as I run screaming from the room ...

Kerry, I've done a 360° search and I can't find you anywhere near me up here on the left D-K mountain. frown

Greg, sorry to elicit the famous Edvard Munch painting (maybe G2G should have that as a standard emoji).

That partial laundry list was sorta just to indicate, per my earlier reply, that I don't believe WikiTree could ever attempt to codify the analytic depth that's truly necessary to build solid evidence for a distant-cousin autosomal (or xDNA) triangulation. This marks my 21st year messing around with genetic genealogy (of course, autosomal testing first appeared more recently), and I've never been willing to accept a triangulation in my own research that was farther back than a 3g-grandparent of mine (4C2R was the most distant, to me, of 15 individuals in that triangulation group). None of the evaluations to 4g-grandparents or farther have been able to stand up to rigorous scrutiny. But, if we're placing a premium on accuracy, putting some generational distance restrictions and tightening up the criteria (e.g., increasing the 7cM minimum; allowing data from 3C relationships to be included; requiring more than three people in the triangulation group past a 2g-grandparent MRCA) could be done while still keeping the WT triangulation guidelines an either/or, step-by-step proposition rather than a couple dozen "Did you check this?" items.

BTW, I got a PM concerning that post related to the bullet-point about areas on the chromosomes where some biological mechanisms typically prevent crossing over to happen during meiosis. Because that's not something discussed much in using DNA for genealogy, I put together a quick reference back in 2021. A deeper dive in this thread than needed, but I figured I could provide the link to the PDF in case someone else is wondering about it.

Greg, I sincerely apologize for creating the post!! Could we just remove the lower end for triangulation, so we can do so with 2C. Definitely still need to point out that triangulation works best if the 3 or more people are from different children from the common ancestor.
Hi Ben

 I thank you for making the original post, it revealed a problem with triangulation, and that there are two diametrically opposite interpretations of the triangulation page as two whether a triangulation can cross the third cousin level barrier.

 You've stimulated a discussion on those issues, and perhaps a few more, and that hopefully will lead to an improvement, even if that's only a better written and easier to understand page.

My apologies to all for taking a backseat these last few days. I've been dealing with a family medical emergency.
You're alright Gary. I was being tongue in cheek. I didn't expect so much discussion to come from it.

Related questions

+61 votes
18 answers
+68 votes
25 answers
+2 votes
1 answer
171 views asked Sep 20, 2023 in WikiTree Help by Nancy Harris G2G6 Mach 1 (12.6k points)
+2 votes
1 answer
+2 votes
3 answers
+4 votes
2 answers

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...