(Waving at Debbie) But, for those who don't know you, you're representing yourself way too...lightly. I think a research associate in the Department of Genetics, Evolution and Environment at University College London, Education Ambassador for the International Society of Genetic Genealogy, and a co-founder of the ISOGG Wiki (and I can confirm its primary contributor) qualifies in our circles as a background in genetics. Plus, two published books dealing with genealogy, a writer on DNA for all the UK's major family history magazines, and a frequent and sought-after speaker at conventions and conferences, including the massive "Who Do You Think You Are? Live" conference.
(Note: I clearly start rambling to the world at large often...well, almost all the time. But because I might first address someone by name, it can seem I'm still speaking to that specific person. So here's notification that I'm now climbing on my well-worn soapbox on the street corner; just the crazy man talking to any and all passersby.)
I really want autosomal triangulation to be a valid methodology. I really do. But the concept originated with the non-recombinant Y-chromosome, and I think it was merely extrapolated from there as, "Hey! This should work for autosomal DNA, too." Which is comparing apples and oranges. The STRs and SNPs we test on the Y only modify via mutation; other than the PAR they're untouched by meiotic crossover and independent assortment.
There are a number of factors I believe we can point to as hypotheses why autosomal triangulation shouldn't be consistently accurate beyond the most recent generations and, frankly other than experiential--not experimental--findings (usually gleaned from an individual's own family research, which perforce means that he or she is seeing only very isolated samplings, only narrow haplotypic ranges), not much evidence to indicate that triangulation can be consistently accurate.
One critical factor that I believe is often overlooked is precisely what Debbie described in Part 2 of her 2016 blog post on the subject: basic probability. We talk a lot about how much DNA sharing we can/should expect between test-taking cousins. And we all know that the probable expected sharing drops precipitously with each step in cousinship, by a factor of 4 at each full "C," a multiplier of 0.25. That's simply because each distance in full cousinship introduces additional meiosis events in order to make the DNA trip from Test-Taker One, back to the most recent common ancestor, and forward in time again to Test-Taker Two.
With triangulation, it isn't about how much DNA you would be expected to share with that MRCA, or how much with a test-taking cousin. In order for triangulation to be valid, all the test-takers in a triangulation group must share some significant amount of the exact same DNA from that MRCA. So just like rolling dice, the odds go up significantly with each die you include. The odds of rolling two dice and having them match are 1 in 6. You roll three dice, the matching odds increase to 1 in 36.
Unlike dice, meiosis doesn't represent strictly independently-random events. There's a method to the randomization, and structural differences in the chromosomes...even between some male and female crossover "hotspots." I built a table once that used only the coefficient of relationship to try to illustrate what the probabilities looked like for three (and another column, four) distant cousins all sharing some of the same DNA as a distant common ancestor. And I decided going only by the CoR wouldn't work, that I lacked the knowledge to come close to an accurate calculation. Suffice to say, though, that like continually adding another die into the roll, that the odds against the same-same DNA outcome increase dramatically with each test-taking cousin you add to the mix.
The chart Debbie provided us from AncestryDNA shows this. At the 3rd cousin level, Ancestry indicates that any three matches can be expected to share a meaningful portion of the same segment(s) from a given ancestor about 14% of the time. Meaning that, by Ancestry's data, triangulations among three 3rd cousins should only be possible once in every 7.14 attempts. For three 4th cousins, it's a fraction over 1%: you would need 99 triangulation attempts to find three 4th cousins who match on the same segment that came from a specific one of the 16 3g-grandparents.
I don't believe we'll get a solid answer--or any answer--on autosomal triangulation until we have a rigorous research study that can use NGS testing, not our common microarrays, and that can find a way to accurately compare generations of descendants to the actual results of one or more distant ancestors. My hope there is that this may come from exhumation samples obtained from notable, historic individuals.
But I believe it's extremely difficult, if not in some instances impractical, to perform genealogical triangulations to distant cousins using our inexpensive microarray tests. The problem is that, from day one, these chips have never exclusively tested any specific subset of the 5 million or so SNPs that would be most genealogically informative. In fact, a past president of the Open Genomes Foundation told me some time ago that when Illumina introduced the first OmniExpress chip, consideration was given to samplings in the exome, but that the remainder of the SNPs selected were pretty much random, with the concern being evenly-spaced genomic coverage more than any particular ancestral relevance. I can't confirm that, but as early as v1.2 of the OmniExpress-24 chip we already saw (if we look at the specs for sampling of RefSeq exons, ADME genes, SNPs in the Gene Ontology dataset, etc.) that a little over 20% of the SNPs targeted were in the coding-gene, exome region. Starting with the GSA chip, though, Illumina gave us nice illustrations of the breakdown of tested SNPs, this one of the GSA v3.0 chip:
Clinically-relevant SNPs comprise about 20% of all markers tested by the default GSA v3.0 configuration. Further, the company specifies that 262,173 intronic--the genomic area we'd most be concerned with for genealogy--markers are included, so 40% of the 654,027 markers. But even the intronic markers targeted aren't necessarily the most valuable ones for ancestral genetics.
Bottom line here is that I believe we see large numbers of reported triangulations that can't stand up to scrutiny. I think one of the first steps in triangulation needs to be to see if any meaningful portion of the mutually-shared segment(s) is in the exome, if it includes multiple known coding genes. As an absurd example, that neither I nor my cousin is lactose intolerant isn't genealogically very informative; our ancestors probably developed that mutation as they moved into northern Europe before the last ice age. But trying to eliminate exonic SNPs from triangulations is not that easy to do. If we have loci detail, we can at least refer to the GRCh37.p13 version of the 1,000 Genomes Browser to explore the segment manually, but I don't know many who include that in their process.
Second step, for me, would be comparison to haplotypic pile-up regions. We don't yet know about many global-population level pile-ups, so that lookup is fast and easy. Since many times we're comparing matches against cousins who have tested at different companies and/or different microarray chip versions, I think it helpful to try--if the data can be accessed--to compile haplotypic pile-up charts at different companies. Ancestry tries to winnow out some of these for us with their Timber algorithm, but we don't get to see any of the detail there. For more about haplotypic pile-ups, see this January 2018 post by Debbie.
Third, my wish-list includes being able to understand the SNP continuity in a segment. Various reports give the SNP density, how many SNPs were used in the comparison, but today we can't get a report of the actual continuity and positions of those SNPs. In other words, is the proclaimed segment comprised of a stretch that's a virtual SNP desert with most of the SNPs clustered in one or two narrow bands? Are the SNPs matching in a mostly one-after-the-other sequence, or is the mismatch allowance algorithm permitting a relatively high percentage of SNPs in the overall segment length that don't match (e.g., ignoring one or two mismatches every 50 may be permissible to the reporting utility)? This has become more of an issue with the introduction of the GSA chip and the lack of same-SNP overlap from previous tests. That's why GEDmatch had to walk-back their earlier matching defaults and move to a less rigorous threshold of a floating 200-400 matching SNPs, and why MyHeritage introduced imputation in the form of what they call "stitching": using genotype modeled information to "fill-in" missing gaps from disparate microarray data to predict two otherwise very small segments might actually be one modest one. But it's still guesswork.
Ultimately, hopefully, one day we can start using whole genome sequencing data for genealogy and rely less on predictive genotyping. Back when I did my first yDNA test in 2003, we were ecstatic when the 37-marker panel came along to help refine our results. A 37-marker match was a lot more solid than a 12-marker. And the more yDNA testing has progressed, the more refined those matches have become. However, some folks coming from the world of autosomal testing first tend to expect more markers tested equals more matches. It's just the reverse, though. At 12 markers I might match 5% of the whole of the UK. But with the Big Y test I match a dozen men in the genealogical timeframe of the 36 in our Williams subproject, most of those specifically recruited to participate.
I think the same is going to be true of the stage we're at with autosomal triangulation: the more research advances and the more hard data we get about our genomes, the more likely that most of the distant-cousin triangulations we see now will be negated. The biology and the probabilities just aren't on our side.