Well, I said I was a curmudgeon about this stuff, so I might as well continue to live up to that cantankerous moniker. If someone tells you that you can go deep into your genealogy using autosomal triangulation, ask them a very specific question:
"Can you provide me a citation to a study published in an academically-respected, peer-reviewed journal that expressly examines the use of distant cousins and microarray tests to establish an MRCA via autosomal DNA triangulation and indicates it to be a valid, accurate, and broadly applicable methodology?"
If you don't get an outright "no," what you'll get instead, usually, will be one of three forms of logical fallacies in response.
"My own family research shows over and over that autosomal triangulation using distant cousins is an accurate methodology" (petitio principii; the most common form of this fallacy is using an unproven conclusion as the very evidence upon which which a claim is made that the conclusion itself is valid; i.e., a form of circular logic).
"If you read material by Popular Blogger ABC you'll find that autosomal triangulation using distant cousins is an accurate methodology" (argumentum ad verecundium; literally, "argument from that which is improper"; this fallacy capitalizes on feelings of respect or familiarity with a well-known individual who might actually know very little about the topic).
"Experienced genetic genealogists know that autosomal triangulation using distant cousins is an accurate methodology because it's used all the time" (argumentum ad populum; a favorite rhetorical device of propagandists and advertisers, this is basically an "everybody knows it's true" failure in critical thinking...like, "everybody knows the sun orbits around the earth").
At the end of the day, though, what's important is your specific goal. If the goal is accuracy and the ability to use DNA evidence in keeping with the analytical guidelines of the Genealogical Proof Standard, that's one thing. It's an entirely different thing if it's only a hobby and, as with the proverbial horseshoes and hand grenades, "Close enough will do; we don't need to over think it." Which though, I would argue, is the same way we end up with such vast numbers of blatantly incorrect public trees stuck rather permanently in the Raiders of the Lost Ark-like warehouse of the internet.
I went into typically lengthy detail in a G2G thread last month about what I consider the two--seemingly simple but actually quite difficult--criteria to determine whether "matching" DNA segments can be useful for genealogy, the concept of genetic similarity, and some reasons why we see so many small, false-positive matches.
Using the GEDmatch free one-to-one autosomal tool, if we lower the threshold to 3cM and leave the SNP window size at the floating default, then depending upon the two test versions used in the comparison just about anyone with the same continental population origins will show as if they have matching segments. But all those tiny segments shown in the 3cM to 6cM range are, a very large majority if not all of them, false. And there's no way to determine which are real and which aren't; as Blaine Bettinger has pointed out, triangulation can't actually do that for you.
Amanda, as a simple experiment for the Mary Glendinning problem, since you have American Southern Colonies roots as do I, try giving a look at some random people among the 165 WikiTree members who have the Southern Colonies Project member badge. Locate some who have their GEDmatch kit number on their profiles. Give a run at GEDmatch doing a one-to-one comparison with the centiMorgan threshold set down to 5cM, and then 3cM.
I just did that using my 23andMe v5 kit rather than my WGS superkit, and the first five totally random WT members showed this:
#1
At 5cM: 21.9cM over three shared segments, largest 9.6, 288 SNPs
At 3cM: 31cM over five shared segments, largest 9.6, 288 SNPs
#2
At 5cM: 15.3cM over two shared segments, largest 8.9, 236 SNPs
At 3cM: 23.2cM over four shared segments, largest 8.9, 236 SNPs
#3
At 5cM: 11.6cM over two shared segments, largest 6.2, 298 SNPs
At 3cM: 20.9cM over four shared segments, largest 6.2, 298 SNPs
#4
At 5cM: no match
At 3cM: 8.3cM over two shared segments, largest 4.6, 263 SNPs
#5
At 5cM: 19.5cM over three shared segments, largest 7, 210 SNPs
At 3cM: 19.5cM over three shared segments, largest 7, 210 SNPs
Notice the very low in-common SNP counts. Our tests can't look at about 8% of the genome, leaving us with about 2.85 billion base pairs. The average microarray test examines roughly 650K markers, so across the genome we're seeing only about 1 in every 4,400 base pairs. Highly rounded, one cM represents very roughly 1 million base pairs. Ergo, that should mean somewhere around 225 SNPs per centiMorgan. Across all our DTC microarray tests, the median overlap of same-to-same SNPs tested is 45.6% (the lowest is 17%). If we drop the 225-per-cM average down by that value, we get about 103 SNPs per cM: this should be what we consider a baseline SNP density for comparisons. Any density lower than that means we have, proportionately, more gaps than average between the SNPs compared. So we should see about 310 SNPs for a 3cM segment; 515 SNPs for a 5cM segment; 720 SNPs for a 7cM segment; and of course 1,030 for a 10cM segment.
If you remember, GEDmatch used to have a default of 700 as the minimum SNP window size. With the Genesis version moving into production that changed to a dynamic variable of between 200 and 400. Recently, the default has been loosened yet again: now it simply says "about 2/3 of segments will have between 185 and 214 SNPs." Those values are way too low and are one of the leading causes of so many false-positive results at small segment sizes.
Even so, while that SNP density benchmark can help in quickly eliminating some segments as unlikely because there are large gaps between compared markers, of itself it is no assurance that a small segment is actually valid. That sounds counterintuitive, but one of the things to keep in mind is that as many as 18.8%--almost one in five--of the SNPs targeted by our current version tests are examined specifically for clinical research purposes. They are not ancestry informative markers. If a matching segment happens to encompass or overlap an area that includes protein-coding genes (and their flanking areas) that are of interest to clinical researchers, the result can be sets of SNPs that are counted and compared but that actually have very little to do with genealogy. In the exome--the regions of the genome that comprise the coding genes and some of their regulatory systems--we're all almost entirely identical.
Without some of the sophisticated genotyping algorithms used by most of the testing companies, GEDmatch simply has no way, other than simply counting, to attempt to validate that a displayed segment is really a shared segment at all. However, as shown on the ISOGG Wiki, even the major testing companies have a significant false-positive rate with small segments. On that same page, data gathered by John Walden and Tim Janzen indicate--based on trio-phasing only and not close examination by WGS sequencing--that a reported 7cM segment will be false 58% of the time; a 5cM segment will be false 86% of the time; and a 3cM segment will be false at least 99% of the time. My own informal results comparing data derived from whole genome sequencing to those from 11 of our common microarray tests would indicate that, at GEDmatch, those numbers are conservative. That even at 10cM and using the other default settings, for some microarray test results as many as 70% of the reported segment matches may be false.
Step one of triangulation has to be to determine the likelihood that a reported matching segment is actually a valid, continuous segment. For small segments, a lot of analytical work has to go into it and, even then, it may be impossible to make a decision with the limited microarray data we have. I described some of the considerations in another G2G post last month.
And despite what we often hear, triangulation itself doesn't validate a given segment. Working that way puts us back at the petitio principii logical fallacy. Several reported matches on a segment that is otherwise false or could be better explained by genetic similarity (e.g., linkage disequilibrium resulting from in-common population subsets) than an identifiable ancestral relationship does not make the segment and the triangulation valid. Garbage in, garbage out.
For another perspective, Blaine Bettinger wrote a piece in August 2022 titled, "An In-Depth Analysis of the Use of Small Segments as Genealogical Evidence." It's definitely worth a read.
Now I'll take my soapbox and go annoy people on a different street corner...
