Can we get relaxed DNA Confirmation rules when using at least one phased kit?

I'd like to see the DNA confirmation guidelines updated to give a more relaxed standard when using phased DNA.

For example, right now, you have to be third cousin or closer to not require triangulation to proceed.

It is my belief that if one of the two kits involved has been phased (e.g., my mom is tested, so I've phased my DNA to get a paternally phased kit that I'm trying to match a paternal cousin with) then we should be able to relax this to at least third cousin once removed, possibly to fourth cousin or closer before triangulation is required.

in Policy and Style by William Foster G2G6 Pilot (111k points)

Wow William,

Sorry that it's been so long since you asked this and so long to get someone to answer you.

I understand completely what you are saying and why you are saying it.

Can you come up with a suggested wording for this? How would we add to our help pages? Would it be in a more advanced section? How would we help our members, who are not familiar with phasing, not accidentally use this method?

I am going to ask Emma to step in to help with this discussion as she is the Project Coordinator for the DNA Educators...

by Mags Gaulden G2G6 Pilot (574k points)
I would like to know how small the matching segment(s) could be and remain reasonably sure it is a real match? (IBD).

Amy has phased results using both parents

Bertha has phased results using only one parent

Charles has phased results using both parents

David has phased results using only one parent

I feel confident that if Amy and Charles where known 6th cousins and shared a matching segment then you would not need triangulation.  But how small could that shared segment be?  3 cM?

How much would the confidence level change if you were looking at a match between Bertha and David, or Bertha and Charles?

Two comments. First, running computational phasing with one parent tested can significantly help research matches and align them to the correct maternal or paternal lines. However, we have no good data that indicate computational phasing can or should eliminate any need for triangulation, or that very small segments can be taken at face value even with both parents phased.

Perhaps the best numbers we have for this come from a compilation by Tim Janzen of data collected by John Walden. The information looked at segment "survivability" from phasing, meaning whether or not phasing showed a segment to be invalid because it did not match the parents.

  • 9cM segments: proved false 15% of the time when one parent was phased; with full trio phasing they proved false 20% of the time.
  • 8cM segments: proved false 22% of the time when one parent was phased; with full trio phasing they proved false 38% of the time.
  • 7cM segments: proved false 37% of the time when one parent was phased; with full trio phasing they proved false 58% of the time.
  • 6cM segments: proved false 58% of the time when one parent was phased; with full trio phasing they proved false 74% of the time.
  • 5cM segments: proved false 71% of the time when one parent was phased; with full trio phasing they proved false 86% of the time.

Second, with segment size, we again come to the fact that centiMorgans are mathematical estimates of genetic relationship based upon assumptions of gametogenic crossover points along each chromosome. The cM isn't a physical measurement, and what's reported is never part because of the nature of the linear equations involved; in part because of the objective of indicating a 0.01 probability of crossover at any given section of any chromosome; in part because the tested SNPs amount to only about 0.02% of the base pairs in the genome, so the physical segment start and end points are determined by the tested SNPs, not by the base pairs that might, in fact, match or not match; in part because some level of SNP mismatch is typically allowed by the reporting entity (e.g., the 1:300 error rate allowed by 23andMe or the the SNP mismatch bunching limit allowed at GEDmatch), and in SNP-poor chromosomal regions the possibility exists that there may be many thousands of base pairs within a reported segment that don't actually match; and in part because the female and male genomes do not look at all the same when it comes to centiMorgan computation.

Since females experience crossover about 40% more frequently than males during gametogenesis, their genome maps in terms of centiMorgans look markedly different. All we see reported from the testing companies is a sex-averaged value; that's really all they can tell us. But in the realm of very small segments, that breaks down to a point where a sex-averaged evaluation is basically useless. Here's one real-world example that I happen to have at hand:

Chromosome 9 from bp 32,216,761 to 53,574,578. The sex-averaged value is 7.56cM. On the female genome, it works out to 13.6cM. But on the male genome, that segment is only 1.52cM. I doubt anyone feels we should be working with 1.5cM segments.

I don't believe this is simply a matter of revising text on a Help page.


You know I like those awkward sort of questions and there is something in this work that is another itch that needs to be scratched. Picking just on the lowest set of "matches" from the Janzen/Walden list, 71% of matches are proved false by single parent phasing.

This does not mean that 29% are true, but that 29% have not been proved false.

Of those 29% how many are actually true matches?

To my mind, in a generally related population with a generally related background genome, the number of false matches in that 29% is likely to be some number less (but maybe not much less) than 71%. That may of course be quite different if two of the grandparents were from different population backgrounds.

Have there been any 3 generation studies that would either confirm or refute this notion?

I agree with you, Derrick. At least to my knowledge, there have been no three-generation studies--yet--regarding segment phasing survivability. And that's all the numbers from Tim Janzen represent: phasing survivability, not suitability for matching or genealogical usefulness. An absolutely valid IBD segment may prove to be impossible to positively source to a specific MRCA in the genealogical timeframe. The limited number of population-level pile-up regions we know about today are correctly referred to as "regions of excess IBD sharing." The DNA segments are quite valid; they're just shared by umpteen people of a similar continental ancestry. But "pile-up region" is a lot easier to write than "region of excess IBD sharing."  wink

When Tim reported those results in 2014 at the annual FTDNA conference in Houston, fewer than 3 million DTC autosomal DNA tests had been purchased. As of last April, we were at about 17 to 18 million. Big difference. We can only hope that increased time since DTC testing began and the massive increase in number of tests taken will give us that kind of multi-generational data in the future. Where's the fingers-crossed emoticon...

BTW, if you go looking for Tim's 2014 presentation as described in the source citation on the ISOGG Wiki page titled "Identical by Descent," you'll find that the URL is no longer valid. I've dropped Tim an email to see if we might find a solution to get his important presentation back online so I can correct that citation at ISOGG.

Gotta give kudos to Dr. Janzen for amazing responsiveness.  smiley  I've already corrected the citation at the ISOGG Wiki with a new URL to his 2014 presentation:

Love it Edison and thank you! Mags
WikiTree’s GEDmatch ID field would need to allow for two more characters (It currently allows for 8 character Genesis IDs). See for example

Phased GEDMatch ID 

  • Paternal Phased ID - PA414581P1
  • Maternal Phased ID - PA414581M1

My Evil Twin

  • Paternal Phased ID - PA414581P2
  • Maternal Phased ID - PA414581M2

If a DNA tester says they have phased results then instead of one GEDmatch ID field, they should have two ten character fields (one for Paternal Phased ID and another for Maternal Phased ID).  

If a DNA tester says they have also created a My Evil Twin then they should have two additional ten character fields.

On second thought, some people may (for some reason) want to also keep their GEDmatch ID.

One additional issue is a female’s X-DNA.  auDNA tested females who have tested a parent would/could discover which of their two X’s was from their father, but the auDNA tested female’s X-DNA ancestor tree would be for a female (e.g. )  and not a male (since males have fewer ancestors who could have contributed to their X).

by Peter Roberts G2G6 Pilot (592k points)
edited by Peter Roberts
This is an interesting question indeed. I’m not sure how to word it, but I’d be interested in this being implemented.
by Anonymous Younger G2G6 Mach 1 (13.5k points)
I don't think it is that simple, because the triangulation performs two quite different functions:

  1. it improves the likelihood of rejecting non-IBD segment matches. I have not seen any calculation comparing the probability of a false positive from  triangulation versus a false positive from comparing a phased DNA result with an unphased one.
  2. Triangulation provides additional validation of the direction to the MRCA. Phased DNA, on the other hand, offers no help to reduce the chance of unnoticed pedigree collapse.

My feeling is that there is still the need to have a lower grade "supported by DNA" rating. Then there would be a clear case for using two-way, or multiple two-way matches and setting different limits for phased and unphased dna.

by Cameron Davidson G2G5 (5.1k points)
The current system we have is already widely misinterpreted and often used without full understanding. I think it likely that a two tier system would be even more liable to misinterpretation, much as I like the concept that most of what we have is supported by the DNA evidence rather than being confirmed by it (already voiced in a separate G2G discussion).

Is this a case where the degree of support given by the DNA should be noted in the biographical text or does that just get to be too cumbersome?
Certainly the same evidence should be placed in the DNA section irrespective of any labelling.  I just think a two-tier system might make people pause to consider which of two options to choose. It is an open question whether there would be enough improvement to justify the programming effort.
Perhaps the two-phased system could be hidden for most folks and only after demonstrating understanding of the system determined either by someone awarding you a badge or some number of edits the second phase edit capability is revealed in the interface?
That's an interesting idea. Perhaps it should be amplified to include any DNA certification?

Maybe no more than in the same way that we handle pre-1700 entries and pre-1500 entries, but even that might just help.

May seem a bit elitist but if we require such rudimentary self certification for early data entry than maybe we should have something similar for DNA.
I have asked this question before and feel wholeheartedly that the cM for phased matches can be lower for a given level of certainty. That said, it is still difficult to be certain that a match - phased or not - is through a particular MRCA and not some other ancestor common to all test takers.  Therefore, we need to move forward carefully with this question.

This question, and some other DNA confirmation questions should be collected, discussed with the broader genetic genealogy world, and incorporated, refined, or rejected in accordance with their respective merits.

Please give us your name, anonymous, and join WikiTree if you haven't already!

This question, and some other DNA confirmation questions should be collected, discussed with the broader genetic genealogy world, and incorporated, refined, or rejected in accordance with their respective merits.

I've been wavin' that particular flag for over a year...and expended thousands of words in the effort. Would love to have some company. I'll even buy you a virtual glass of wine...  laugh

