DNA Confirmation statement to disprove lineage

+12 votes
491 views
Is there (or can there be) a standard DNA confirmation statement if you want to say "these two people definitely cannot be related because <reasons>"? That can be as useful as a positive confirmation, but I do not see any guidelines around that.
WikiTree profile: David Crawford
in Policy and Style by Jonathan Crawford G2G6 Pilot (275k points)
retagged by Mags Gaulden

4 Answers

+8 votes
 
Best answer

Hi, Jonathan. I think I understand the end result you're asking about, and my opinion would be a qualified, "No; there generally cannot be a concise and accurate statement made, based on DNA, that 'these two people can't be related.'"

But as G2G's most notorious EWCS (Excessive Word-count Scofflaw), of course there are qualifications! And explanations! And examples! <We pause now for a brief moment of sonorous groaning>

Before I get into it, first I want to note that we buried and blessed the "confirm" etymology and definition controversy a few years ago. Ain't gonna bring that up again, as such. So this isn't about that. I'll try not to use the word "confirm" unless I'm specifically addressing it as a WikiTree "Confirmed with DNA" policy or guideline. Suffice to say in that context it does not mean "establish the truth or correctness of."

Second, DNA is still very much a growing and evolving science. For example, the version of the human genome map that all the major testing and reporting companies currently use for autosomal DNA was superseded in 2013. So not only is the current reference genome different, but what we've learned about DNA in just those intervening eight years--from technology to techniques to application--has been enough to, quite literally, fill multiple books.

In terms of genealogy, DNA is very, very seldom binary: not a yes/no, either/or. Some genealogists think it is--or at least want it to be--but it almost never is, relatively speaking (pun intended). So it requires evidence analysis just like any other potential piece of evidence...though I've offered the opinion before that DNA introduced a type of evidence that was new to genealogy and that required a different type of knowledgebase and skillset to effectively perform.

I'll rephrase your question a bit, just to make it a little more specific for my own purposes: "Can we verify that these two test takers, based on a given set of actual DNA results, are not related in the genealogical timeframe?"

As Judy Russell has written, "Negative evidence is the hardest type of evidence to understand or use in genealogical research. By definition, [it is] a 'type of evidence arising from an absence of a situation or information in extant records where that information might otherwise be expected...'"

That "might otherwise be expected" part is key. An example of simple deductive use of negative evidence that both Judy and Elizabth Shown Mills have used is the Arthur Conan Doyle short story, "Silver Blaze." In it, a pivotal moment comes when Holmes realizes that a guard dog hadn't barked an alert at the scene of the crime:

   "Is there any point to which you would wish to draw my attention?"
   "To the curious incident of the dog in the night-time."
   "The dog did nothing in the night-time."
   "That was the curious incident," remarked Sherlock Holmes.

In the lab, most of what's done with DNA is inductive, based on testing and observation...though things like imputation can crossover to deduction. With deductive reasoning we're making an inference based on expectations or what we believe are widely accepted facts. Negative evidence--which is a perfectly valid form of evidence--is always deductive. We can't actually measure something that isn't there...just ask Schrödinger's cat.  smiley  Ergo the old trope: absence of evidence is not evidence of absence.

Mostly all types of evidence, either positive or negative, live on a sliding scale from "meaningless" to "wow, super-important." To make that a bit more meaningful, call it from 0 to 1, and all the fractions in between. Not meaning to drive things down a rabbit hole, but all the items of evidence can interact with each other. The evaluation of the weight of a piece of evidence is dynamic: its importance is relative to all the other bits of evidence available, and can change with the introduction or removal of a different piece (or pieces) of evidence.

Simple genealogy example. John Smith's will left a tract of land to his son, James Smith. If that's the only document we have, we rate it as a 1, "wow, super-important," and record James as John's son. Then a couple of years later we discover custodial papers from a county court that show John Smith adopting a James Jones, age 6, parents deceased. Very much changes the evidence weighting of John's will.

Still percolating on two of my lonely, remaining glial cells is the notion of adapting Bayesian reasoning and the statistical concept of "likelihood ratio" to assign dynamic, quantitative values to evidence weighting in genealogy. To describe levels of confidence in traditional genealogy, terms like "apparently," "possibly," and "probably" are used, and especially as DNA enters the picture those become a bit too ambiguous to be very useful, IMHO.

"Negating evidence" is a legal term (considering tax day in the U.S. is next Monday, it might be worth knowing that the IRS uses the term in its Internal Revenue Manual, Part 9, "Criminal Investigation," Chapter 5, Section 9; ahem). But in the physical and life sciences--just like there is no ultimate "1" weighting designation because "when it comes to science, proving anything is an impossibility" (Ethan Siegel, writing for Forbes)--there's no "0" for impossible, either.

Now, I've used "negating evidence" as distinguished from "negative evidence" when it come to genetic genealogy, but infrequently...and even then it's more like a value of 0.001 rather than 0. Heck, even actual forensic testing is almost never absolute; there are still deduction and preponderance of evidence considerations involved, despite what we see in reruns of Law and Order. An example of something that can put the kibosh on a full "0" with DNA is an an allogeneic bone marrow transplant. There are ways to get around it in the lab, but our little microarray tests don't take those measures and a parent who has received bone marrow and T cells from another person may have an AncestryDNA report that shows, incorrectly, they aren't biologically related to the tested children.

In terms of "negating evidence," I reserve that for mismatches in high-level haplogroups. If two people take mtDNA tests and we see haplogroups of H and U, then we can negate the hypothesis that they share a matrilineal line ancestor in the genealogical timeframe. Likewise if one man tests as a R1b Y-haplogroup, and another is G: no patrilineal line ancestor in the genealogical timeframe.

But we have to be careful even with haplogroups when it comes to evaluating a common ancestor. Here's an ongoing, real-world example from one of my FTDNA Group Projects. Persons A, B, and C are thought, on paper, to descend from three different sons of the same man who was born circa 1650. Person A takes the Big Y test and the results show R1b with the following (partial) SNP-positive hierarchy: R-L151 > U106 > Z381 > Z301 > S1688 > FGC51534. Person B takes a Big Y test and matches A on all of those SNPs with three private variants; A and B are an STR genetic distance of 3 at 111 markers, 1 at 37 markers.

Person C then tests a single SNP, R-Z301. It tests negative. YFull estimates TMRCA for Z301 at 4,600 YBP. So done deal, right? C is not patrilineally related to A and B. Negating evidence.

But wait... C also took a 37-marker STR test. At that level he is a GD of 3 with B, and 4 with A. TiP report says over a 90% likelihood of CA at 12 generations. Note that this level of matching is perfectly acceptable per WikiTree guidelines to mark all three men as "Confirmed with DNA" back to that 1650 ancestor. And at 37 markers, C also has GD 2 and 3 matches with 5 other men who are also STR matches at that level to A and B. In fact, all of C's 37-marker matches are shared among A, B, and C.

Unfortunately, C passed away and we can't order another upgrade against his stored sample. But is it possible he could still be related to A and B just as the paper trail says? That the lab test for the single Z301 SNP was in error, or that C has a back-mutation at Z301 and that all the other SNPs would test positive? We don't know, but the possibility absolutely exists.

On the autosomal DNA side, the most recent data comes from Amy Williams at Cornell University. The most distant cousin that is expected to share DNA with you 100% of the time is 1C1R (also equivalent to a half 1st cousin). A 2nd cousin should share DNA with you 99.98% of the time...but that doesn't quite reach 100%. A half 1st cousin 1x removed, 99.97%; a 2C1R, 99.1%. So genealogically speaking, the farthest you can go with an Accuracy = almost "1" for negative evidence is the grandparents; fairly close to "1" is great-grandparents. However, we can't actually get to "1" because, again, we can never be 100% certain of the laboratory accuracy.

It isn't as difficult for positive evidence. If you share a couple of reasonably-sized segments for a total of, say, about 25cM, you're almost certainly related. Probably around 3C1R or 4C. Lab error isn't as much a factor because to arrive at that there have been a couple of thousand unique SNPs tested that matched along those segments.

Negative evidence needs to be evaluated much more stringently, and the constraints to those evaluations don't allow a great deal of leeway. In no small part because for genealogy the paper trail always has to drive the hypotheses. Back to the example of John Smith's will and the discovery that son James was adopted: what if five of James's descendants had been yDNA tested--thus seeming to show a consistent preponderance of evidence--and a sole descendant of John's other son Abraham had tested. But that descendant of Abraham is a different high-level haplogroup from the other five test-takers. If the adoption record had never been discovered, we might treat the test results of Abraham's descendant as negating evidence...when in fact he was the only one of John's biological descendants ever tested.

by Edison Williams G2G6 Pilot (434k points)
selected by Jonathan Crawford
Thank you Edison, selecting this as a best answer not to disparage all the other excellent responses, just that you do such a thorough job of it. Especially as you bring up the point of adoption (or other NPE) that would be unknown and yet explain the difference exactly, and without full testing on all branches really couldn't be confirmed.

For what it's worth, I tried to ask Schrodinger's cat, but for some reason it passed away when I did. Schrodinger seemed to think it was your fault for asking me to ask the cat.

I shall then try to collect the preponderance of evidence and note my methodology so that future researchers can deride my decision-making skills to my posthumous detriment.

Ha ha, Jonathan!

But Edison!  Of course you deserve the star, but I'll assume you had not seen Jonathan's recent comment that indeed his question was about a Y-DNA test.  So...well, I just wonder...is your answer still totally applicable, and/or could you maybe distill it for us?  smiley

Does Amy Williams really say 1C1R are guaranteed to match? I wouldn't think there is a biological mechnism that totally precludes this happening. Is it not just much, much closer to 100% than the 99.98% for 2C?

My understanding is that nobody has yet confirmed a case of certified biological 2C "in the wild" that fail to match. But I will be excited to read the first study that carefully does this.

Jonathan: Thanks for the best-answer star. Diving off on tangents like likelihood ratios and the IRS investigative manual doesn't help, but that's partly why I have fun on G2G.

Oh. And Erwin (as in Schrödinger) emailed me, said you two had spoken, and he implied I had indirectly caused the demise of his cat. I peeked inside the cat carrier, and the furball meowed at me. I gave Werner Heisenberg's cell phone number to Erwin and said he could take up any additional issues with Werner.
devil

Julie: Um... Only 10% of that morass addressed autosomal DNA. I actually did a word count to check because I was wondering how it left the impression it was about atDNA. Others had mentioned autosomal DNA, and I felt it was a valid part of the discussion.

I did try to distill it. I'm trying to get better at providing a summary/answer in the first paragraph.  smiley  But the question isn't exclusive to DNA, and there's no "this would be correct and that wouldn't be" answer. Negative evidence can be tricky and can get all epistemological and stuff. Judy Russell's 2016 webinar about negative evidence was 50 minutes long, not counting the Q&A period. I came in way under that.

Barry: Yeah; the splitting of hairs was endeavored. It was really just to make the point that negative evidence, for our purposes, has to be evaluated more stringently than positive evidence. Can't really apply a Bayesian approach that uses the number of false positive results as quantification because in most cases we can't know what's a false positive and what isn't...until we make a final decision about which bits of negative evidence are valid or not.

Amy's post, the one for popular consumption, considered only segments that were 7cM or larger from the simulations. Of note, too, is that up to 1C1R/h1C all matches would be expected to consist of five or more separate segments, while at 2C we start to see some matches with fewer segments: 3.13% with four segments, and 1.45% with three segments, for example. The paper this was based on (Williams, Caballero, et al. "Crossover Interference and Sex-Specific Genetic Maps Shape Identical by Descent Sharing in Close Relatives." PLOS Genetics 15, no. 12, December 20, 2019: e1007979) counted all segments, regardless of length.

Earlier research by Brenna Henn that I've referenced often (Henn, et al. "Cryptic Distant Relatives Are Common in Both Isolated and Cosmopolitan Genetic Samples." PLOS ONE 7, no. 4, April 3, 2012: e34267) arrived at slightly different numbers, with 2nd cousins showing a detectable match 100% of the time. That paper didn't break things down to 1x relationships, but pegged 3rd cousins at an 89.7% chance of detectable sharing, while Williams's simulations, with the 7cM threshold, show a slightly higher chance at 91.7%.

So as a practical matter, is 2C the realistic cutoff for atDNA as solid negative evidence? Probably. Maybe even 2C1R. All that said, even 3C1Rs not matching is negative evidence. But it's evidence I personally wouldn't weight heavily enough to base a decision upon without additional information.

At the core of all this, really, is that "Confirmed with DNA" status and its formulaic accompanying statement. Genetics simply isn't a binary yes/no in the vast number of instances, and approaching it that simplistically flies in the face of the Genealogical Proof Standard. Any DNA testing presents evidence in one form or another...well, any credible DNA testing. And that evidence--positive or negative--can and should be presented as part of the genealogical research and analysis.

I fully understand and appreciate the need to have an easily understood, repeatable DNA "proof" statement for use on WT profiles. But that doesn't make it accurate, or right, in any but a limited number of instances.

+6 votes
I would place information like that in the research notes section on the profile. That way someone else who decides to look into the person/family can see that people were incorrectly added to be related to each other and exercise more caution while researching.
by Paul Kerbow G2G6 Mach 1 (15.3k points)
I think I am looking for a specific way to say "here's the proof". We have a very specific DNA Confirmation statement format, because it has to state the accurate information the right way. There has to be a way to say "X is confirmed to NOT be related to Y because <some sort of DNA triangulation statement similar to the DNA confirmation>"

What kind of evidence would be needed in order to do that? Confirmation to how many other documented relatives, at what distance, sharing what centiMorgans or whatnot in common with each other, but NOT with the person in question.

Have a look here: https://dnapainter.com/tools/sharedcmv4

I agree that you could write something in a Research notes subsection if something could be proved in your case.

"What kind of evidence would be needed in order to do that? Confirmation to how many other documented relatives, at what distance, sharing what centiMorgans or whatnot in common with each other, but NOT with the person in question."

Creating such a statement would be much more complicated than a statement proving relationships, especially for autosomal DNA.  Absence of triangulations might be strongly indicative that a relationship might not exist, but it wouldn't be enough to make the statement that two people definitely couldn't be related.

We have a similar situation in our Mitchell Y DNA tested line,10 of us are connected by Y DNA and our paper trails. We have an 11th who has some faulty research and is deceased, and is connected incorrectly. I've connected his DNA to the line he shows in and have written the PM to let her know, but have had no response. I think posting in Research Notes section may be a good idea.
Theirs multiple account with my name and dna  on not run by me  so this could be the case with this person
+5 votes
It seems to me that type of statement would have very limited applicability.  You've linked the profile of David Crawford, who was born in 1662.  What are you hoping to demonstrate?

I'm trying to imagine cases where DNA could disprove a relationship.  It could prove or disprove a parent, if the parent and child had both tested.  It could disprove a paternal line through a Y-DNA test or maternal line through an mtDNA test, but I'm not sure how easily you could identify the point in the line where the NPE occurred.  

One thing that might work is to use a DNA confirmation (the positive version) to "prove" a relationship, thus demonstrating that an alternative relationship was impossible.  Presumably, you'd disconnect the impossible parent(s), and add a research note explaining why.
by Living Kelts G2G6 Pilot (545k points)
Thanks Julie, "disprove a paternal line through a Y-DNA test" is exactly what I'm thinking of. Documented line at the moment goes to David, but I'm the only Y-tester on my leg of that stool, and I'm haplogroup R as opposed to David's other descendent testers who are I2. If I can get lucky and find another tester in the generations below I can figure out where in the interim an NPE would have occurred, or where the lineage needs to be disconnected and realigned.

I also like the idea of finding the confirmation of the alternative relationship, and using that as an argument, but of course that involves finding that relationship.
Hey, Jonathan.  I see you have a long answer from Edison, so let's read that and see what we think!
+5 votes

That depends on the number of generations that the person is away from the DNA tested people.

It's very easy to disprove lineage at the close family level. Say you expect your parent to match 50% but you find 0%. Or for your Grandfather (who shares 0% instead of about 25%) or for your 1st cousin (with whom you share a Grandparent) who shares 0% when it should be 7-15% or thereabouts.

But for anything else that is further away than 2C, I think you can't make such a statement. For a 2C, the reported centiMorgan are 49-592 as per the shared cM project. Now take that range carefully as it's based on user reported numbers but it still means there must be DNA shared.

As per Graham Coop's estimates, for 6 generations there should be some shared DNA, see How much of your genome do you inherit from a particular ancestor?

But from 7 generations onwards (actually it looks even possible for 6 generations if it was along the all female lines in both DNA matches) it's clear that 0% shared DNA at the 7 cM minimum threshold can happen.

But anything beyond that it can (and will) happen that you no longer see any shared DNA above the usual 7 cM minimum threshold that most DNA companies are using.

by Andreas West G2G6 Mach 7 (74.8k points)

Related questions

+2 votes
1 answer
+9 votes
3 answers
266 views asked Jun 15, 2023 in The Tree House by Steven Harris G2G6 Pilot (738k points)
+32 votes
3 answers
+60 votes
18 answers
+68 votes
25 answers
+7 votes
3 answers
716 views asked Mar 12, 2022 in WikiTree Help by JJ Stratton G2G3 (3.1k points)
+8 votes
1 answer
134 views asked Aug 30, 2021 in Genealogy Help by Cindy Cooper G2G6 Pilot (324k points)

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...