Math doesn't work on Ancestry DNA calculations of shared cMs

+13 votes
573 views

As you know, Ancestry is now giving subscribers more information about the longest segment shared between matches.

I have seen a few matches now that go something like this one:

Predicted relationship: 5th–8th Cousin

Shared DNA: 6.7 cM across 1 segments

Longest Segment: 17 cM

Is it the effect of TIMBR? or what?  I'm scratching my head how there can be a longest shared segment of 17 cMs but the shared DNA is only 6.7 cMs.

Cheers

Shirlea

in The Tree House by Shirlea Smith G2G6 Pilot (284k points)
edited by Ellen Smith

Hmmm, they have this kinda high-handed explanation:

In some cases, the length of the longest shared segment is greater than the total length of shared DNA. This is because we adjust the length of shared DNA to reflect DNA that is most likely shared from a recent ancestor. Sometimes, DNA can be shared for reasons other than recent ancestry, such as when two people share the same ethnicity or are from the same regions.

2 Answers

+20 votes
 
Best answer

I'm with ya, Shirlea. Ancestry could have vastly improved that wording; I have no doubt that everyone who's looking at the new display of "Longest Segment" for the first time is thoroughly baffled. Seems to me like the simple addition of a couple of words would help. Maybe like:

  • Genealogically Significant Shared DNA: 6.7 cM across 1 segments
  • Longest Overall Segment: 17 cM

May still not be clear enough, but you're absolutely correct: when you see that odd disparity between "Shared DNA" and "Longest Segment," it's the result of Ancestry's application of its proprietary Timber algorithm to "assess informativeness of matches for relationship estimation." Their new July 2020 "AncestryDNA Matching White Paper" is here if you want to browse through it (see section 4 for their description of Timber; and BTW, this is the most forthcoming they've ever been about the calculation method itself): https://www.ancestrycdn.com/support/us/2020/07/2020whitepaper.pdf.

So, yeah: in essence what the numbers you noted mean are, "Hey! Good news! We found a single 17cM IBD segment in common between the two of you, but we've decided that only 6.7cM of it are valid when we're talking about genealogy. The rest of the 17cM segment occurs too frequently among our other samples of a similar haplotype (which means a global-population or haplotypic pile-up region), or that may include areas of protein coding genes that don't change much from one generation to another, or where we see way more DNA sharing than we would otherwise expect and it's showing up in areas where there isn't a lot of usable SNP density, like some areas at the chromosomal ends, the telomeres, or close to where the pair of chromosomes join together, the centromere. So your important, genealogically meaningful match is only 6.7cM of what, in the results, otherwise looks like a 17cM segment."

Which is just a tad too long to show on your matching page.  wink  But there really is a method to the head-tilting confusion. And one, from the standpoint of genealogical accuracy, I can't really quibble with. I do, though, still sorely wish that Ancestry would display all that detail to us: actual segment start/stop loci; which portions of a segment determined to be IBD was excluded from matching and why (at least a few general "why" categories; and the ability to compare, with that detail, more than two individuals at a time. A chromosome browser would be nice, but I personally never bother much with the graphics anyway.

Ancestry does genotyped, computational phasing, and they attempt to eliminate blocks of DNA that would be useful for anthropology and population genetics but that aren't meaningful for the reach of autosomal DNA for genealogical matching. They're the only company that does that, and as a result I tend to trust their final matching more than elsewhere.

That's one failing we as a community currently have: GEDmatch does no culling of SNPs (other than, I believe, avoiding a very few SNP-poor areas on certain chromosomes) and they do no imputation as a method to improve validation of IBD segments. Which is fine, except...that leaves everyone to their own devices to collate and analyze the resultant data. And among all the matching and autosomal triangulations we see, my bet is that very, very few people take the time to compile personal haplotypic pile-up charts and research the chromosomal start and stop loci to identify exonic areas, areas we know contain a significant proportion of protein coding genes (about 20% of the SNPs examined by our current tests are specifically targeting clinically-informing portions of the exome) and exclude or at least de-prioritize those pile-up and coding sections from evaluations.

Ancestry does do that for us, but they won't show us the details under the hood so that we can use their data for our own research and triangulations.
angry

Edited: One little typo between "IBS" and "IBD" makes a big difference. <grumble>

by Edison Williams G2G6 Pilot (441k points)
selected by Darlene Athey-Hill
What is the genealogical use of knowing the longest segment? My understanding is that it is mainly useful for people from highly endogamous populations. But for them, I would assume there needs to be a higher bar in general for pileup regions. Is Timber customized to recognize people from highly endogamous populations and set that bar higher? I am not from a recent highly endogamous population. While it is hard for me to see a reason to want this information, it then seems even more dubious that I would want to know it if they have left in stuff that is not genealogical useful.

My only thought about a genealogical reason to present the longest segment unTimbered is this: it helps us see if Timber cut out the middle of a long segment instead of cutting off the end. If you have a total of 12cM over 2 segments, before that's all we'd know. But now it might show the longest segment as 8cM, say because there was an 8cM and a 6cM and Timber had cut out 2cM from the 8cM segment. In this case, it is probably likely the two segments were from different chromosomes even. But if your 12cM over 2 segments is reported with a longest segment of 14cM, you know that the 2 segments were really part of one long segment. Then if I happen to find a match who shares lines going back to different different sets of recent common ancestors, in the former case I would know it is possible I received one segment down one line and one down the other, while in the latter I would know the single segment could only have come from one of the pairs. That seems to me of such infrequent utility that it would seem not to be worth the clutter and confusion their new reporting is causing.

Yep. Agree wholeheartedly. And with closer cousinships, those with several longish segments anyway, the "longest un-Timbered segment" really can't tell us anything at all.

When Ancestry first announced they were doing this, I thought they were going to show us the longest segment in centiMorgans that was actually used in the ultimate determination for matching. So Shirlea's example would then become:

  • Shared DNA: 6.7 cM across 1 segments
  • Longest Segment: 6.7 cM

Apples to apples, and no resultant confusion...well, of course until someone takes the comparison to, say, GEDmatch, finds a 17cM segment, and doesn't know why.

With Ancestry providing us with no real detail, giving just an iota more detail may prove to be a mistake for them. I guarantee they're being inundated with paid subscribers asking the same question Shirlea did, and they probably have a canned answer they provide.

Would make sense to me that they change the reporting and go with the largest Timbered segment. For that matter, how much more difficult would it be to list all the Timbered segments by chromosome and size in cM? Without specifying the relevant loci there really can't be any practical privacy issues. And at least that would give us a little more info to work with.

Thanks, Edison and Barry!  

I wish they wouldn't decide for me what is 'genealogically significant' to me.

There is a place for all the information, depending on our goals.  But to inform me that it was a much larger segment, but they decided some of it didn't count, but it counted enough to tell me it didn't count....

Apropos of nothing, a few minutes ago I went to Ancestry to take a last look at my Thrulines to see if there was anyone else with <8cM results I wanted to place into a Group to save, and I got a pop-up asking if I would consider becoming an "Ancestry Insider"; a questionnaire followed to evaluate my suitability to provide information, comments, and opinions to Ancestry.

Among the questions were whether or not I was a subscriber (yes; annual); how long, in months, have I been a subscriber (167 months); what kind of subscription ("All Access"); not counting today, last time I logged into my account (more recently than 1 month ago); have I purchased an AncestryDNA test (yes); not counting today, last time I logged into my AncestryDNA account to view/work with results (more recently than 1 month ago)...

Then I get the brief and abrupt message that I do not qualify to continue with the questionnaire. I can only conclude that they aren't looking for anything like "Ancestry Insiders"; they're really looking only for new or uninformed users for focus group-type marketing purposes. They don't want curmudgeons who have been paid subscribers for 14 years and actually use the site...
frown

Thanks Edison!  I guess you could tell that i have been looking through my small-segmented matches, and that's how i noticed this counter-intuitive math thing.  I'm doing some anticipatory grieving...not just for what was available and will be gone, but also for the matches that we will now never see...

Sorry that they didn't deem you the kind of Insider they were looking for! You sure seem like an Insider to me!
They’ve done that to me with surveys before. I’ve never qualified, and I’m about the same as you, Edison. On another note, another user wrote a code to automatically save your matches of 7.9cM or less. Send me a PM if you’re interested in using it...
Thanks for the shiny best-answer star, Darlene!

With the somewhat unusual number of small-segment matches I have at Ancestry...well, trying to preserve them all would mean about 150,000 more than I'd never get around to looking at. Roberta Estes also wrote about a utility to automate adding small matches to Groups, but I never looked at it. Don't know if it's the same tool.

Speaking of, however, a possible cautionary note in keeping with Shirlea's question. For those of us who don't employ an automated utility and instead choose just to preserve only the matches showing in our ThruLines: In ThruLines 8cM doesn't necessarily mean 8cM.

Ancestry rounds to the nearest whole number when they display your ThruLines charts. At the bottom of each possible connection they show the evaluated relationship, the shared DNA in centiMorgans, and the number of segments. If you see 8cM there, don't assume that you're free and clear, that the individual will remain there after they cull the small matches.

As Darlene noted, any matches showing a Timber-adjusted 7.9cM or less will be removed. Those 8cM matches you see in the ThruLines charts may actually be matching at anywhere from 7.5cM to 8.4cM. So if you're making a last pass through ThruLines, be sure to click on all those showing 8cM because they might be some of the ones you don't want to lose.
Are they filtering segments under 7.9 or matches? I am under the impression that you also need to save someone who matches at, say, 14cM on two segments.

What they said was “... you’ll no longer see matches (or be matched to people) that share less than 8 cM with you.”

Ah, okay, then Roberta Estes was being overly cautious in her original description of her plan for dealing with the changes.

Well, I'm not entirely certain. What Darlene quoted comes straight from Ancestry's "DNA match updates coming in August" banner info that's been in place over a month now. In fact, here's a rundown of every single place in those FAQs that they mention the centiMorgan change:

  • "...You'll no longer see matches or be matched to people who share 7.9 cM or less DNA with you..."
  • "Very distant matches--those who share 7.9 cM or less DNA with you--will no longer appear in your DNA match list or in ThruLines..."
  • "We are waiting until the end of August to remove very distant matches who share 7.9 cM or less DNA with you..."
  • "If you've added any notes about distant DNA matches who share less than 8.0 cM of DNA with you, those DNA matches and their notes will remain in your DNA match list."
  • "We've changed the amount of DNA you need to share to be considered a match with another individual to 8.0 cM."

Not a peep about matching segment versus total matching amount, is there? Everything is worded, to me, as implying the total match.

However...

If you check Ancestry's July 2020 white paper, which is the precursor for "The Culling" (yeah; I wouldn't trademark that even if I could) and look at their five-step process description on page 12, it shows that they go through both BEAGLE and Timber first, then:

  • "4: Calculate the length of the candidate matching segment in terms of genetic distance, measured in centimorgans (cM)."
  • "5. If the segment is longer than 8 cM, we retain the segment to store as a match in our database, unless we dismiss it as identity by state."

That sure as shootin' reads to me that any segment smaller than 8cM is not retained.

I have a couple of ThruLines paths that obviously are the result of fictitious, fabricated trees, but otherwise I just marched through the very distant ones and placed everyone in a Group named "Uncertain Small Matches." I "grouped" around 50. Offhand I don't remember any of them being more than a single segment.

But... Now that Barry's brought it up, I might just mosey back over there--assuming "The Culling" hasn't happened yet--and check a couple of levels higher, like 3rd and 4th cousins, where multiple segments are more likely.

Clearly, though, that we're still talking about this is evidence that Ancestry hasn't done the world's best job of communicating all the details. We all know that 95% of their subscribers won't even know it happened and won't care. But that's no excuse not to provide accurate, unambiguous, and thorough communication to everyone. Their most evergreen subscriber revenue stream is firmly seated in those other 5%. I figure that Darlene and I, just the two of us over the past decade, represent almost $10,000 in revenue.

Hopefully--I know it's a big ask, but still--maybe Blackstone will do some housecleaning and institute a shift in the management culture at Ancestry.

Come to think of it, I wonder if my "Ancestry Insider" invitation questionnaire bombed out where it did only because that's when their software finally identified me as a genetic genealogy public rabble-rouser?
devil

I'm telling you, Edison, you not only are extremely informative but immensely entertaining!  laugh

Well, thank you very much! I'll be here all week. No cover charge for the 7:00 p.m. weekday shows. And please be sure to tip the wonderful waitstaff!
I got the Insider invitation, too, and was deemed unqualified after they asked if I had ever worked for the news media.
Well, I guess now we know what to say if we want to be selected.  Tell 'em we're newbies and clueless!
The Culling has happened to my account.

Well, I'm sure confused.

I can see that I no longer have any matches shown shorter than a total 8cM.  However, one of my recent new matches has a total of 8.2cM in 2 segments - longest segment 9cM.

Did the programmers not read the white paper?

+6 votes
If a single segment of 17 cM is significant, then it's only rational to include that entire segment in the total DNA calculation. That 17-cM segment can't be significant for one calculation but not for the other.

Ancestry's DNA Algorithms are completely non-transparent and unjustifiable. Their recent revision to show the longest segment is a failed substitute to a real chromosome browser, such as those provided by 23andMe and FamilyTree DNA.

Moreover, Ancestry still does not report x-DNA matches, which can be quite illuminating and even critical at times.

Anyone who has tested their DNA with Ancestry should upload their Raw DNA Data to Gedmatch.com and possibly to FamilyTreeDNA.com where they can find a chromosome browser and view their actual DNA data.
by Bill Vincent G2G6 Pilot (173k points)

Related questions

+3 votes
1 answer
128 views asked May 20, 2021 in The Tree House by Brian Stynes G2G6 Mach 2 (22.1k points)
+2 votes
3 answers
+4 votes
1 answer
+8 votes
0 answers
+10 votes
3 answers
+5 votes
3 answers
+7 votes
1 answer
+15 votes
7 answers

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...