Hi, guys. Just a comment from the cheap seats up here in the balcony. I'm not certain your aggregate anticipated mutation rates are correct.
Each individual STR mutation (except for palindromic multi-copy marker duplications and the pretty rare recLOH event) is an entirely independent, mutually-exclusive element from a probability perspective. Again with exceptions, no STR mutation has any relationship to any other. Two mutations can happen in a single generation, or a collection of men can all show zero mutations at 111 markers back for at least eight generations (have both of those in two of my FTDNA projects). So you wouldn't calculate an aggregate additively, or by using compound probability of independent events (each generation represents a clean slate, so to speak, so probabilities don't compound). That said, certain haplogroups and even haplotypes have experientially displayed differing mutation rates (note that the numbers shown by Iain McDonald at http://dna.cfsna.net/HAP/Mutation-Rates.htm look only at the U106-S21 subclade of M269). Establishing usable data for haplogroup/haplotype variances does require significant sample size combined, importantly so, with confident paper-trail information in order to determine what are actual, unique mutation events and when in the inheritance chain they occurred.
To use published results from others to aggregate an estimated mutation rate for a combination of STRs, you'd simply sum and then average the results. (Also note quickly that the Heinila/McDonald data linked above do not use the infinite allele model that FTDNA switched to in 2016 to evaluate genetic distance for the multi-copy markers). I slapped the Heinila/McDonald data into a spreadsheet to see what came out.
For Iain's results, I get a different sum at 111 markers than Gary did: 0.261853 instead of 0.2948. That would result in an aggregate 111-marker mutation rate of 0.002359, or 0.236% per generation. For the 67-marker panel, it works out to be 0.002089791, and for 37 markers, 0.00264573. The Heinila numbers are 111: 0.002321676; 67: 0.002088015; 37: 0.002747054.
These are all fairly consistent with other estimations of aggregate Y-STR mutation rates. Back in 2001 when yDNA direct-to-consumer testing was just getting started, 0.002 was offered as the benchmark for the aggregate mutation rate. Most other compilations I'm familiar with trended upward of that number, but none by a massive amount.
At FTDNA's 1st International Conference of Genetic Genealogy in Houston in 2004, a presentation showed these cumulative rates:
- Markers 1-12: 0.00399
- Markers 13-25: 0.00481
- Markers 26-37: 0.00748
Comparing Iain McDonald's numbers, respectively: 0.00202, 0.00298, and 0.00488. Iain's findings are lower at each panel, but not astronomically so.
In 2006 in the Journal of Genetic Genealogy (which, alas, ceased operation that same year), John Chandler published a piece titled "Estimating Per-Locus Mutation Rates." The paper includes detail of his computational models that can be duplicated if you have a large enough sample size to work with. Chandler used haplogroup-nonspecific per-locus STR mutation rates taken from data at Ysearch and arrived at:
- Markers 1-12: aggregate mutation rate of 0.00187, with a margin of error of ±0.00028
- Markers 13-25: aggregate mutation rate of 0.00278, with a margin of error of ±0.00042
- Markers 26-37: aggregate mutation rate of 0.00492, with a margin of error of ±0.00074
From 2005 through 2009 Charles Kerchner conducted a study consisting of 55 FTDNA surname projects in an attempt to refine average Y-STR mutation rates. In the list below, the number of markers tested is followed by the estimated combined mutation rate, the standard deviation, and the last numeral (in the tens of thousands) indicates what Charles terms the Marker Mutation Opportunities (MMO): the total number of discrete generational steps evaluated in calculating the mutation rates.
- 12(1-12): 0.0025 ±0.0003 (28,728)
- 25(1-25): 0.0028 ±0.0002 (58,925)
- 37(1-37): 0.0042 ±0.0002 (84,249)
- 67(1-67): 0.0031 ±0.0004 (19,296)
Kerchner summarized the observed cumulative mutation rates broken down by haplogroups (tested or FTDNA predicted):
- I1: 0.0030 +-0.0005 (10,027)
- R1b: 0.0043 +-0.0003 (44,585)
- J2: 0.0042 +-0.0009 ( 4,551)
- G2: 0.0048 +-0.0008 ( 7,104)
- R1a: 0.0077 +-0.0008 ( 8,954)
So we swing from the very highest rate of 0.00748 (markers 26-37; 2004 presentation in Houston), to the lowest of 0.00187 (markers 1-12; John Chandler, 2006, Journal of Genetic Genealogy). The truth is likely within that range, which is far lower than the probabilities that have mentioned the last few posts. Again with the understanding that opposite ends of the bell curve can really throw a wrench into the works when you examine individual, small-sample cases.
We as yet have no idea what the values might look like for STRs 112 through about 450. These will be from the new Big Y-500 testing from FTDNA and, from the looks of it, there will be a significant volume of no-calls in those tests, so the results are likely to be highly haplotype-dependent. Time will tell if anyone proceeds with analyzing aggregate mutation rates for those STRs.