Trying to understand what sharing segments on chromosomes really means

+8 votes
Using GEDmatch I have found 5 individuals who all have the same segment on chromosome 2, with a match to me. One of them, with the largest matching segment (28cM) is my 4th cousin (using traditional genealogy) and is second closest match on GEDmatch and top 10 on Ancestry. . A second one is my closest match on GEDmatch and top 10 on Ancestry, (sharing 12cM) but I do not know who he is. Nor do I know any of the other 3 who share the same segment. What can I reasonably assume about these individuals and their relationship to me?
asked in Genealogy Help by Lindis Elliott G2G2 (3k points)
retagged by Peter Roberts

Adding the tag "DNA" to your question may help you get an answer faster, as it will draw more attention from people who deal with the genetic genealogies on a regular basis.

Hope this helps,

3 Answers

+5 votes
Best answer

(Am I the only one who sometimes runs up against G2G's 8,000-character limit? Oh. Don't answer that. I guess I am. But I wrote this, ain't gonna delete it, so will break it into two parts.)

I agree with RJ but might state it slightly differently. DNA is a form of evidence just like any other form of genealogical evidence. As such, any given piece of evidence fits somewhere along a sliding scale ranging from "useless; discard" to "prima facie; very strong." To apply the genealogical proof standard, we create hypotheses about our tree, and then weigh and evaluate the evidence.

DNA is no different. It's about the most exciting thing since sliced bread for genealogists, but far, far too many believe that, because it's "science," the results are always going to be binary, either a "yes" or a "no." Doesn't work that way. Biology is all random and squishy to start with, and our current technological state of DNA testing and interpreting requires an amount of assumptive math, modeling, and probabilities that I think would shock most genealogists. Where RJ said "cautious approach" I might substitute "careful approach." Your hypotheses don't necessarily have to be overly-cautious, but if you want the outcome to be correct and not some confabulated addition to the many thousands of fantasy trees floating around the Internet, you have to be stringent in the evaluation of the evidence and strict in the application of the genealogical proof standard.

And, quite frankly, too many genealogists are trying to evaluate DNA evidence without ever taking the time to learn enough about the science and the math to be qualified to do so. It ain't rocket brain surgery, but it's a very, very different--learned and studied--skill set than is traditional white-glove genealogical research. Mistaken conclusions are being drawn every day, left and right, outside of WikiTree as well as within. And, like the fantasy trees that flooded the Internet confusing so many of our collective genealogies, one mistaken assumption about DNA is made, and then someone else comes along and takes that as "scientific proof" of the relationship and away we go again. If I had a dollar for every DNA determination I've seen in the past year that had weak, questionable, unsupportable, or even almost certainly false assumptions behind it, well...

Lindis, your question is an absolutely great one, and it's the correct one. It's fundamental to this whole arena. Because it's fundamental, though, there's no quick answer to least not beyond Peter's irrefutable statement that shared segments come from shared ancestors.  :-)  The not-always-simple trick then becomes identifying what really is the shared segment; determining if the segment is valid or at least likely to be valid; and deciding if and how it can be used in your DNA matching.

If I had to concoct some quick advice for dealing with autosomal DNA (mind you, the non-recombinant forms yDNA and mtDNA have to be treated much differently), it would be--after self-education--to start high and work low. Start with yourself and the most immediate family you have. Does it add immediately to your genealogy? Nope. But it does set necessary baselines and help give you hands-on practice with available tools. At GEDmatch, do a quick "Are your parents related?" pass. If this shows your parents share any significant segments of DNA, it means you're dealing with pedigree collapse and it changes the hypotheses you apply to all other matches you find.

If you have grandparents or parents alive, test them...immediately and in that order. Then test siblings and half-siblings. Few of us get the luxury of having both parents tested but, boy, if you do it's a game changer; you can approach the DNA matching thing with a whole 'nother level of available tools and confidence. Even having one parent tested is a big deal.

answered by Edison Williams G2G6 Pilot (170k points)
selected by Lindis Elliott


Starting high to low also means not getting sucked into exploring small segments unless you have an immediate reason to do so. There have been zero, zilch, nada scientific, peer-reviewed studies validating or quantifying the use of small segments in genealogical research. Or even what a "small segment" means. Some not-scientifically-tested but empirical data suggests that 6cM segments are false 74% of the time; 7cM segments are false 58% of the time; and 8cM segments are false 38% of the time. Not "difficult to validate"; false. At the level of 4th cousins, your theoretical amount of shared DNA is 13.3cM; for 5th cousins it's 3.3cM. Moreover, only 14.9% (17:3 odds) of your 5th cousins are going to share any detectable DNA with you at all, so it takes a lot of luck as well as work to use autosomal DNA back to 3g- or 4g-grandparents.

Too, often overlooked, is that what we see when GEDmatch or FTDNA or 23andMe reports a segment length is not only computed via a form of linear equation originated by an Indian mathematician named Kosambi in 1944 (refining work by a gentleman named Haldane who coined the term centiMorgan way back in 1919)--meaning that it isn't a physical measurement at all, it's an estimated one; there is actually no centiMorgan "length"--but the value in cMs is what is called "sex averaged." All us men know that females are more complex than we are, and the female genome does see about 30% more DNA crossover at gamete production than males. The result is that the cM computation for a male and a female at precisely the same range of DNA base pairs can be very different, sometimes as much as 10cM or 15cM. Working with segments as low as 7cM means we're using a sex-averaged value that may see the actual gender value above the threshold for males, but below the threshold for females.

For example, here is an actual match between a cousin and me on chromosome 18 starting at base pair 72,323,427. The three different centiMorgan computations by gender are:

  • Male: 11.1cM
  • Female: 4.4cM
  • Sex-averaged: 7.9cM

We have to use the sex-averaged value, in this case because I'm male and she's female, but also because that's all that GEDmatch and the testing companies can realistically report to us. But is using the 7.9cM average value valid if we're talking about two female cousins who descend from two different daughters of a shared grandmother? The gray area is simply too broad when trying to work with small segments, IMHO.

The Shared cM Project's crowd-sourced data shows 35cM average sharing among 4th cousins; the mathematical average is 13cM. I personally would split the difference and not bother trying to investigate matches that show less than about 20cM or 25cM unless you absolutely feel you know what you're doing.

Last, RJ said, "There's a process misnamed triangulation..." And he's correct. That Latin root "triangulus" is an artifact that causes a lot of trouble. Three legs on the stool is a minimum and most often isn't enough. With autosomal DNA--I'm now speaking as a method (which is not proven or standardized, BTW, in the genetic genealogy community), not of WikiTree guidelines--using just three people can work fine at the 2nd cousin level, but probably needs an additional tester or two to validate at 3rd cousins, or 2g-grandparents. When you get to 4th cousins, it rapidly becomes much more complex. In fact, one of the researchers who is often pointed to as saying 7cM segments are likely valid is Jim Bartlett. What Jim actually wrote is this:

"However, my triangulation process involves a lot of work... I highly recommend starting with 15cM as a threshold (or even higher, if you don't have the time or inclination). Setting a personal threshold is a good way control the amount of work you are willing to put in."

Jim's point is also that working to establish valid evidence using small segments is very complex and difficult...if it can even be done. When he says his process involves a lot of work, Jim has said that it's typical for him to work with triangulation groups that have upwards of 20 people in them before he's willing to accept a 7cM segment as valid: "If 1 or 2 of those triangulated shared segments turns out to be IBS [identical by state, or not able to be validated to a specific ancestor...think, "identical by chance"], it's not harmful in the grand scheme."

Lindis, I know this does little to address the five matches you've found on GEDmatch and Ancestry. But it's refreshing to see someone asking, "How does this work?" rather than "How do I mark these 3g-grandparents 'confirmed with DNA?'" It's about understanding the process and the fundamentals, not about leapfrogging straight to an end result that may or may not be valid. I'm now putting my soapbox away for the day....  :-)

Thanks for the best answer selection, Lindis! First, I can't believe you read it. That's a bigger "thanks" than even the best answer.  ;-)  Second, I know I was being too general to really help with those cousin matches you found. DNA for genealogy isn't the most complicated subject in the world--it sure ain't quantum mechanics--but it's also not the easiest thing, least, not when we're trying to make sure we're doing it as accurately as possible. But questions here are absolutely free to ask! When you run into specific stuff that's confusing or doesn't seem to make sense, ask away. We even have well-known genetic genealogists like Blaine Bettinger, Debbie Kennett, and Ann Turner who peek in from time to time.
+7 votes
Shared segments come from shared ancestors.  Compare your ancestral tree with the ancestral tree of your match.

Sincerely, Peter
answered by Peter Roberts G2G6 Pilot (428k points)
so if I can find the MRCA for person A and myself, and the other related people share the same segment, they also des

cend from that MRCA?
Peter, that's a great way of putting it, I may quote you!

Lindis, that's the most likely implication, but it's possible that segment came from a parent of the MRCA, which would mean it's possible for the same segment to pass down through any of the MRCA's siblings too.
Lindis, you would also need to do a 1-1 comparison between each pair to ensure that they also match each other. They may all be from the same side or you may have two groupings, covering that segment from your paternal side and from your maternal side.
I can and will do this. Thanks
+9 votes
The frst problem is vocabulary.  When they say chromosome, they mean chromosome-pair.  So a "segment of chromosome 2" is actually 2 strings of DNA, one from your father, one from your mother.

So when they say match, they mean half-match.

This means you can "match" a segment with cousin A on your father's side, and "match" the same segment with cousin B on your mother's side.  But A and B aren't related and don't show a match at all.

It's also possible that A and B do match on that segment, but by coincidence.  They share a 3rd common ancestor who isn't yours.

There's a process misnamed triangulation which takes the test data of 3 matching people and establishes that they are all half-matching each other on the same DNA, so they do get their matching DNA from a single common ancestor.

In your case, you're probably assuming that you and your 4th cousin got your shared DNA from your known shared 3rd great-grandparent.  Probably - but maybe you could also share a different common ancestor.  How confidently can you rule that out?

Try to trace descendants of the assumed common ancestor.  If you can find a 3rd paper descendant and get a 3-way DNA match, the odds are much better.

There's a good chance of tracing a line down to one of the people who is already showing up as a match.

It's less good if you locate the testers and then work on the trees.  If you force a connection to the assumed common ancestor (claiming the DNA as evidence), then it all becomes circular, and the case is no stronger than the original assumption.

That's the cautious approach.  But of course nobody bothers with that, people just jump straight to the most optimistic conclusion.
answered by RJ Horace G2G6 Pilot (391k points)
Yes, so if I am understanding correctly the answer is: don't make assumptions from the dna, use it to confirm traditional genealogy.

I will try to follow through on all the suggestions as far as I can.

Thanks to all who answered.
Thanks for the specifics on this RJ!


Related questions

+10 votes
2 answers
147 views asked May 22, 2017 in Genealogy Help by Stephen Nicholson G2G Crew (460 points)
+5 votes
1 answer
+1 vote
2 answers
+3 votes
2 answers

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright