Math doesn't work on Ancestry DNA calculations of shared cMs

+15 votes
724 views

As you know, Ancestry is now giving subscribers more information about the longest segment shared between matches.

I have seen a few matches now that go something like this one:

Predicted relationship: 5th–8th Cousin

Shared DNA: 6.7 cM across 1 segments

Longest Segment: 17 cM

Is it the effect of TIMBR? or what?  I'm scratching my head how there can be a longest shared segment of 17 cMs but the shared DNA is only 6.7 cMs.

Cheers

Shirlea

in The Tree House by Shirlea Smith G2G6 Pilot (304k points)
edited by Ellen Smith

Hmmm, they have this kinda high-handed explanation:

In some cases, the length of the longest shared segment is greater than the total length of shared DNA. This is because we adjust the length of shared DNA to reflect DNA that is most likely shared from a recent ancestor. Sometimes, DNA can be shared for reasons other than recent ancestry, such as when two people share the same ethnicity or are from the same regions.

2 Answers

+22 votes
 
Best answer

I'm with ya, Shirlea. Ancestry could have vastly improved that wording; I have no doubt that everyone who's looking at the new display of "Longest Segment" for the first time is thoroughly baffled. Seems to me like the simple addition of a couple of words would help. Maybe like:

  • Genealogically Significant Shared DNA: 6.7 cM across 1 segments
  • Longest Overall Segment: 17 cM

May still not be clear enough, but you're absolutely correct: when you see that odd disparity between "Shared DNA" and "Longest Segment," it's the result of Ancestry's application of its proprietary Timber algorithm to "assess informativeness of matches for relationship estimation." Their new July 2020 "AncestryDNA Matching White Paper" is here if you want to browse through it (see section 4 for their description of Timber; and BTW, this is the most forthcoming they've ever been about the calculation method itself): https://www.ancestrycdn.com/support/us/2020/07/2020whitepaper.pdf.

So, yeah: in essence what the numbers you noted mean are, "Hey! Good news! We found a single 17cM IBD segment in common between the two of you, but we've decided that only 6.7cM of it are valid when we're talking about genealogy. The rest of the 17cM segment occurs too frequently among our other samples of a similar haplotype (which means a global-population or haplotypic pile-up region), or that may include areas of protein coding genes that don't change much from one generation to another, or where we see way more DNA sharing than we would otherwise expect and it's showing up in areas where there isn't a lot of usable SNP density, like some areas at the chromosomal ends, the telomeres, or close to where the pair of chromosomes join together, the centromere. So your important, genealogically meaningful match is only 6.7cM of what, in the results, otherwise looks like a 17cM segment."

Which is just a tad too long to show on your matching page.  wink  But there really is a method to the head-tilting confusion. And one, from the standpoint of genealogical accuracy, I can't really quibble with. I do, though, still sorely wish that Ancestry would display all that detail to us: actual segment start/stop loci; which portions of a segment determined to be IBD was excluded from matching and why (at least a few general "why" categories; and the ability to compare, with that detail, more than two individuals at a time. A chromosome browser would be nice, but I personally never bother much with the graphics anyway.

Ancestry does genotyped, computational phasing, and they attempt to eliminate blocks of DNA that would be useful for anthropology and population genetics but that aren't meaningful for the reach of autosomal DNA for genealogical matching. They're the only company that does that, and as a result I tend to trust their final matching more than elsewhere.

That's one failing we as a community currently have: GEDmatch does no culling of SNPs (other than, I believe, avoiding a very few SNP-poor areas on certain chromosomes) and they do no imputation as a method to improve validation of IBD segments. Which is fine, except...that leaves everyone to their own devices to collate and analyze the resultant data. And among all the matching and autosomal triangulations we see, my bet is that very, very few people take the time to compile personal haplotypic pile-up charts and research the chromosomal start and stop loci to identify exonic areas, areas we know contain a significant proportion of protein coding genes (about 20% of the SNPs examined by our current tests are specifically targeting clinically-informing portions of the exome) and exclude or at least de-prioritize those pile-up and coding sections from evaluations.

Ancestry does do that for us, but they won't show us the details under the hood so that we can use their data for our own research and triangulations.
angry

Edited: One little typo between "IBS" and "IBD" makes a big difference. <grumble>

by Edison Williams G2G6 Pilot (465k points)
selected by Darlene Athey-Hill
Are they filtering segments under 7.9 or matches? I am under the impression that you also need to save someone who matches at, say, 14cM on two segments.

What they said was “... you’ll no longer see matches (or be matched to people) that share less than 8 cM with you.”

Ah, okay, then Roberta Estes was being overly cautious in her original description of her plan for dealing with the changes.

Well, I'm not entirely certain. What Darlene quoted comes straight from Ancestry's "DNA match updates coming in August" banner info that's been in place over a month now. In fact, here's a rundown of every single place in those FAQs that they mention the centiMorgan change:

  • "...You'll no longer see matches or be matched to people who share 7.9 cM or less DNA with you..."
  • "Very distant matches--those who share 7.9 cM or less DNA with you--will no longer appear in your DNA match list or in ThruLines..."
  • "We are waiting until the end of August to remove very distant matches who share 7.9 cM or less DNA with you..."
  • "If you've added any notes about distant DNA matches who share less than 8.0 cM of DNA with you, those DNA matches and their notes will remain in your DNA match list."
  • "We've changed the amount of DNA you need to share to be considered a match with another individual to 8.0 cM."

Not a peep about matching segment versus total matching amount, is there? Everything is worded, to me, as implying the total match.

However...

If you check Ancestry's July 2020 white paper, which is the precursor for "The Culling" (yeah; I wouldn't trademark that even if I could) and look at their five-step process description on page 12, it shows that they go through both BEAGLE and Timber first, then:

  • "4: Calculate the length of the candidate matching segment in terms of genetic distance, measured in centimorgans (cM)."
  • "5. If the segment is longer than 8 cM, we retain the segment to store as a match in our database, unless we dismiss it as identity by state."

That sure as shootin' reads to me that any segment smaller than 8cM is not retained.

I have a couple of ThruLines paths that obviously are the result of fictitious, fabricated trees, but otherwise I just marched through the very distant ones and placed everyone in a Group named "Uncertain Small Matches." I "grouped" around 50. Offhand I don't remember any of them being more than a single segment.

But... Now that Barry's brought it up, I might just mosey back over there--assuming "The Culling" hasn't happened yet--and check a couple of levels higher, like 3rd and 4th cousins, where multiple segments are more likely.

Clearly, though, that we're still talking about this is evidence that Ancestry hasn't done the world's best job of communicating all the details. We all know that 95% of their subscribers won't even know it happened and won't care. But that's no excuse not to provide accurate, unambiguous, and thorough communication to everyone. Their most evergreen subscriber revenue stream is firmly seated in those other 5%. I figure that Darlene and I, just the two of us over the past decade, represent almost $10,000 in revenue.

Hopefully--I know it's a big ask, but still--maybe Blackstone will do some housecleaning and institute a shift in the management culture at Ancestry.

Come to think of it, I wonder if my "Ancestry Insider" invitation questionnaire bombed out where it did only because that's when their software finally identified me as a genetic genealogy public rabble-rouser?
devil

I'm telling you, Edison, you not only are extremely informative but immensely entertaining!  laugh

Well, thank you very much! I'll be here all week. No cover charge for the 7:00 p.m. weekday shows. And please be sure to tip the wonderful waitstaff!
I got the Insider invitation, too, and was deemed unqualified after they asked if I had ever worked for the news media.
Well, I guess now we know what to say if we want to be selected.  Tell 'em we're newbies and clueless!
The Culling has happened to my account.

Well, I'm sure confused.

I can see that I no longer have any matches shown shorter than a total 8cM.  However, one of my recent new matches has a total of 8.2cM in 2 segments - longest segment 9cM.

Did the programmers not read the white paper?

+8 votes
If a single segment of 17 cM is significant, then it's only rational to include that entire segment in the total DNA calculation. That 17-cM segment can't be significant for one calculation but not for the other.

Ancestry's DNA Algorithms are completely non-transparent and unjustifiable. Their recent revision to show the longest segment is a failed substitute to a real chromosome browser, such as those provided by 23andMe and FamilyTree DNA.

Moreover, Ancestry still does not report x-DNA matches, which can be quite illuminating and even critical at times.

Anyone who has tested their DNA with Ancestry should upload their Raw DNA Data to Gedmatch.com and possibly to FamilyTreeDNA.com where they can find a chromosome browser and view their actual DNA data.
by Bill Vincent G2G6 Pilot (178k points)

Related questions

+5 votes
1 answer
151 views asked May 20, 2021 in The Tree House by Brian Stynes G2G6 Mach 2 (25.3k points)
+2 votes
3 answers
+6 votes
1 answer
+10 votes
0 answers
+12 votes
3 answers
+7 votes
3 answers
+9 votes
1 answer
+17 votes
7 answers

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...