What do GEDmatch Matching Segment Search color groups signify?

+4 votes
592 views
GEDmatch is a wonderful site, unless you have a question. The self-help forum is useless, unlike this one. Does anyone here know what the groupings by color signify on the Matching Segment Search? I found one GEDmatch forum query with an answer: "The change in color indicates a change in matching. The colors themselves don't mean anything. So every that is red should be related." I have no idea what "change in matching" means.

My first assumption was that all matches identified with the same color came from a common ancestor and where the colors overlap must indicate paternal or maternal origins. That would be nice, but I have cases such as this where the match ranges on chromosome 7 include:

Color Group A from 72,857,444 to 113,164,624
Color Group B from 86,173,525 to 129,443,112
Color Group C from 105,359,419 to 146,422,146

Notice that all three color groups overlap. Color B overlaps Color A from 86,173,525 to 113,164,624 and Color C overlaps Color A from 105,359,419 to 113,164,624.

It gets stranger. There are people in each group who have the same identified common ancestor and, yes, each matches the others in the one-to-one comparison so it's triangulated across three color groups. To make it stranger, there is a mother in Color Group B and her daughter in Color Group A.

What am I missing?
asked in The Tree House by Bennet George G2G4 (4.9k points)

1 Answer

+8 votes
 
Best answer

Hi, Bennet. I'm not 100% sure I'm answering the right question, but I'll give it a go. My first assumption is you're not referring to the colors you can set for "Tag Group Management" under your profile. I find that feature really handy for flagging triangulation groups once I've identified ones I'm highly confident in. But in that case, you're setting the color and then adding kits to the groups manually. So we're not talking about that.

If it's the Tier 1 "Matching Segment Search" (actually titled "GEDmatch DNA Segment Search" once you get to it), I think that seemingly cryptic answer you found was correct...if not stated very well: "The change in color indicates a change in matching. The colors themselves don't mean anything."

If you choose to include graphics in the display, you get little colored bands positioned approximately where the specific segment would be relative to the beginning and end of the chromosome. Here's a link to one of Kitty Cooper's blog posts written shortly after the feature was introduced. Scroll down to the second screen capture. Below it, Kitty notes: "The colors in the graphic section just indicate where the logical breaks are in the overlaps, they are not otherwise significant." I believe the key is "logical breaks."

How exactly GEDmatch determines that, I don't know. My suspicion is that it simply has to do with sequential proximity of the SNPs tested. Since all our current autosomal DNA tests make assumptions about a segment's start and end points, and its unbroken continuity, based solely on the SNPs sampled (on a given segment you might have 700 SNPs that match contiguously but they're spread out over 1 million base pairs) some math modeling goes into determining the base pair numbers of the start and end points. In most cases, the length in actual base pairs and the evaluation in centiMorgans is never completely accurate. The reason is that only those 700 SNPs were identified; each SNP aligns with a specific base pair, or reference cluster; but matching base pairs may well continue fore and aft...or maybe even break in the middle somewhere. We're assuming a sequence of 700 SNPs that match identically mean that all 1 million base pairs the SNPs encompass also match identically. Until full-genome testing becomes affordable, we'll never really know for certain.

And the density of the SNPs tested vary all over the place, not just from one chromosome to another but with significant differences along the same chromosome. On some little stretches of a chromosome, the SNPs will be packed in there like sardines, while other stretches are virtual SNP deserts where the nearest neighbor SNP is a long way up the road.

All that to say the seeming overlaps in the coloring of the segments may have more to do with the clustering of identical SNPs than the actual overlap of the base pairs. For example, a segment might begin in a SNP dense area, then trail off into a SNP desert. A second segment might begin near the end of that desert, but continue through another SNP-dense area. In terms of the matching SNPs, the conclusion might be that there is a "logical break" between the two segments in that SNP desert; that the two segments likely belong in two different groups because they don't share the same nearby areas of high SNP density.

What I can say in support of Kitty's statement that "they are not otherwise significant" is that the color scheme is a spectrum based on the segment's location along the chromosome, from start of the chromosome to the end. The color groupings aren't otherwise tied to specific test kits in any way. They have (unfortunately) nothing to do with matrilineal/patrilineal lines of ancestry because none of the genealogy testing companies can supply the raw data in a way that distinguishes mother from father. Dang it.

Last, the "Matching Segment Search" can't be set below a threshold of 5cM and 500 SNPs. Also kinda unfortunate, but you could imagine what would happen to their servers if people started asking for 200 SNPs and 2cM thresholds and to show all chromosomes. But 5cM is still works. 

I'd do it one chromosome at a time, but it's still a rather useful way of seeing if there are localized pile-up regions that might not be phylogenetic (shared by very large segments of the population), but are specific to your own haplotype. It isn't that these are necessarily false positives, but if there are a whole boatload of matches sharing the same smallish segment, there's a reasonable possibility that the segment is deeply ancestral...meaning that it's stuck around over many generations. I'd then consider those segments to be dubious in genealogical matching. With time and effort in building triangulation groups with 10 or 20 people in them, a widely-shared segment can become meaningful. But without that substantiation, it's possible that segment can't be matched to a MRCA because the couple is simply too many generations back. These small segments widely shared among matches are perhaps the greatest pitfall of triangulation attempts; they're why simply finding two other people who share a small segment with you will never an accurate triangulation make.

answered by Edison Williams G2G6 Pilot (178k points)
selected by Peter Roberts
Thanks, Edison; you interpreted my question correctly.

I always read your answers and always learn something. Thanks for taking the time to write such detailed comments.
On each chromosome the colors group your matches who have overlapping SNP sequences in a portion of the chromosome.  They will include both your paternal and maternal match possibilities as they reflect at each sampled base pair location all sequence combinations found for 2 of the 4 possible results.
excellent explanation.  I had a similar question.  but just to restate my understanding.   Are you also saying the small overlaps across different colors do not link to common ancestor, but that the overlaps are just small breaks.

Thanks again

Related questions

+4 votes
2 answers
+3 votes
0 answers
+14 votes
1 answer
231 views asked Sep 16, 2018 in The Tree House by Shirlea Smith G2G6 Mach 5 (54.3k points)
+10 votes
2 answers
1.9k views asked Jun 1, 2017 in Genealogy Help by Pete Toemmes G2G4 (4.6k points)
+6 votes
0 answers
49 views asked May 31, 2018 in Genealogy Help by Barry Smith G2G6 Mach 3 (37.7k points)
+4 votes
1 answer
+9 votes
1 answer
192 views asked Jan 13, 2018 in Genealogy Help by Barry Smith G2G6 Mach 3 (37.7k points)

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...