Richard, I've owed additional information to this thread all week. Just been too busy (and even putting in work hours on Easter Sunday; what follows is what I do with my lunch break
). But I feel I need to comment on the example you provided because there is nothing about our direct-to-consumer microarray autosomal tests that can remotely recommend the use of segments that small with any hope of accuracy. In fact, even the notion of representing centiMorgan values out to two significant digits is rather pointless: some testing providers do--probably because that's simply the way the decades-old cM formula presents, but also because it looks more "sciency" from a marketing perspective--though the concept of the centiMorgan itself is an imprecise estimate to begin with and working with sex-averaged values as we all do for genealogy means that kind of definitiveness is about as close to impossible as it gets.
Since the example used was Chromosome 9, here's a real-world case using a segment on that chromosome illustrating the impact of sex-averaged calculations. The physical loci of the shared segment are a 29,216,761 start to a 70,574,578 end (which, to be clear, no microarray test can establish with precision; none of our start/stop points are accurate), for a total 41,357,817 base pairs in length. Under the outdated GRCh37 genome map that we still use universally for our genealogy microarray tests, the centiMorgan calculations (per Rutgers University) are:
- Female genome: 20.5cM segment
- Male genome: 2.8cM segment
- Sex-averaged value: 11.5cM segment
The differences are not minor. The female genome undergoes crossing over during meiosis at a frequency approximately 70% greater than that of males. Working with generational commingling and large segments means that sex-averaged values can be used for like-to-like comparisons, suitable for reaching back in time a few generations. But the variances are greatly magnified if we try to deal with tiny segments.
You will find no respectable literature, peer reviewed or blogged, that will indicate segments resulting from microarray tests on the order of 2cM can ever realistically be used as a form of genealogical evidence.
FTDNA, who does report down to two significant digits on their cM calculations and who will display very small segment sizes state, in no uncertain terms, what they profess the reach of autosomal DNA testing to be: "Thus, the autosomal DNA admixture for any given individual roughly comprises the DNA of all of their ancestors within five generations." The bolded emphasis is mine. Five generations equals a set of shared 3g-grandparents, or contemporary 4th cousins.
Part of the reason for that--ignoring the biology for now--is simple numbers. Approximately 8% of the human genome is inaccessible to our microarray tests, which equates to about 248 million base pairs, leaving us with roughly 2.852 billion that might be accessible.
Our microarray tests examine, on average, about 650,000 SNPs. Current versions of the microarrays used by 23andMe, FTDNA, and MyHeritage have over 18% of those SNPs targeted expressly for clinical research purposes. Some of those will still be relevant to genealogy--including SNPs in genes that affect phenotype like eye/hair color--so a conservative estimate is that 10% of the 650K SNPs tested are not useful for genealogy, leaving us with about 585K loci tested.
The typical microarray test will have up to 1% no-calls (loci where the synthetic probe wasn't able to bind with with DNA from the prepared solution); a more likely median would be about 0.6% no-calls. That places us at roughly 581K SNPs.
This means that we're looking at a maximum average of one marker out of every 4,900 base pairs, and a cumulative total of approximately 0.02% of testable base pairs, and only about 0.087% of the SNPs cataloged in the NIH's dbSNP database.
There is no direct correlation across chromosomes because the centiMorgan isn't a physical measurement, only a fixed estimate of recombination based on a reference model of a single genome, but often used for rough estimates is 1 million base pairs per centiMorgan. Using that, it means the best-case, averaged scenario is that our microarray tests will be able to contain data on about 204 markers per million.
However, in the relatively short history of direct-to-consumer autosomal testing, not only different manufacturers but also different iterations of the same version of a specific chip have had different SNPs that they targeted. In the worst comparative instance, only 17% of the same markers were examined between different tests, with the average overlap of the most popular tests about 20% (Lu, et al., 2021).
If we can compare only 20% of the SNPs between two given tests, then we're looking not at 581K SNPs but 116,200. That moves the one-to-one comparisons to one marker in every 24,544, and the per 1 million base pairs count out to 41 per (very roughly averaged) centiMorgan.
To further illustrate the imprecision of centiMorgan calculations, I've written elsewhere here on G2G about a quick comparison I did using the AncestryDNA raw data for two known 2nd cousins who both tested on the same iteration of the chip, about one month apart. A database comparison of the raw data files showed that the two kits targeted the same SNPs. These raw data files were then uploaded to FTDNA, MyHeritage, and GEDmatch for comparison.
There were 11 shared segments. In no instance did any of the three companies report the same start or stop loci for any of the segments. Likewise, none of the three companies reported the same centiMorgan values for any of the segments.
The segment that was most similar among the companies was on Chr 1 where FTDNA calculated a 39.38cM [sic] segment; MyHeritage 40.2cM; and GEDmatch 40.5cM. Providing the GEDmatch inferred start/stop loci to the Rutgers University map interpolator yielded: male genome, 26.5cM; female genome, 54.6cM; sex-averaged, 40.5cM.
The proportionately most dissimilar result was on (just a coincidence) Chr 9. FTDNA calculated a 7.58cM [sic] segment; MyHeritage 24.9cM; and GEDmatch 16.3cM. Providing the GEDmatch inferred start/stop loci to the Rutgers University map interpolator: male genome, 21.7cM; female genome, 10.7cM; sex-averaged, 16.3cM.
The net message here is that we can upload the same data to different testing/reporting companies and the returned information will disagree to a degree that rejects the reasonable use of very small segments...even if we could verify physical matching of the SNPs tested and even if we had adequate SNP density to draw a conclusion about IBD matching.