Request for Expert Evaluation of DNA Evidence

+8 votes
584 views

Along with a few others, I am working on a project to collect and analyze autosomal and Y DNA for descendants of John Price-11137, of John Nicholson-2071 and of Benjamin Nicholson-2072. Plus other Nicholson and Price families who may be related.  The ancestors of the families we are interested in lived in 1750 in the area of the Hawksbill Creek in what is now Page County, Virginia.  

We have collected, compared and analyzed 170 GEDmatch kits.  We have collected and compared 15 Y-DNA results. 

A website that shows the family relationships and the DNA results is https://www.garynick.com/genealogy/nicholson/index.php

We are looking for DNA expertise to help us evaluate the results of the study.

Thank you

Gary Nicholson  

WikiTree profile: Benjamin Nicholson
in Genealogy Help by Gary Nicholson G2G1 (1.3k points)

3 Answers

+13 votes
 
Best answer

I wouldn't call myself an expert, just a learner, but I believe I can offer a few comments, strictly concerning the yDNA evidence you have been gathering in your PDF, of Nicholson's and Price's.

  • I wouldn't draw any geographical conclusions from the haplogroups.  They are both so old, and both spread far across the same European continent.  There may be more of one than the other in some regions, but both are common enough.  About all you can say is that both are common in Europe, but more likely western than eastern, and more likely northern than southern.  As an example, I'm R-M269, and my paternal ancestor is from Norway, where there's a sizeable minority of R-M269's.
  • Nicholson is a patronymic, so is likely long ago to be from a Scandinavian country or the Netherlands.  That could have been much longer ago than you're interested in though.
  • The 37 marker test is still somewhat coarse for comparisons, and it's possible to have a perfect match yet the common ancestor be 200 to 500 years ago, possibly before there were surnames where your ancestors were from.  If the common ancestor is too far back, then differing surnames are meaningless.
  • I noticed that 2 I-M253 persons have 67 marker tests - Mike Price and Ronald Nicholson (I don't know the relationships).  Their comparison could be useful, to see if they are still a perfect match with 67 markers, still a GD of 0.  If not, that could help indicate how much farther back the common ancestor is.
  • The fact that you found your potential ancestors living close together is interesting, but it's not conclusive that the unknown union happened there, at that time.  When people emigrate, they often emigrate *from* the same areas *to* the same areas.  So it's very possible that Nicholsons and Prices have been living nearby each other in England for generations, before they came to the same part of America.
  • This is all to say that I have to add a few more possibilities to your list of 3 explanations - while it could be they had the same father or grandfather, they could instead have had the same great grandfather, or the same great great grandfather, or the same great great great grandfather, etc.  I am sorry if that's discouraging.
  • I do have one suggestion, but it's not cheap (can't afford it myself).  If you were to take a Big Y test, and the descendant of John Price did also, and possibly the Abbott descendant did too, then you are likely to have a much more accurate picture of how you're related to each other.  Then you could see how many SNP's different you are.  And if there are sufficient testers on your branch, there are aging estimates available, that could be informative on how far back the common ancestor was.
by Rob Jacobson G2G6 Pilot (137k points)
selected by Gary Nicholson
+8 votes

Thank you for your thoughts.  I'm glad you took the time to read the material on our website:  

https://garynick.com/genealogy/nicholson/index.php

The comment you made was about the coarseness of 37-marker Y-DNA test results comparison was quite helpful.  I found the Y-DNA markers for the two names you mentioned as having 67-marker tests.  They match 66/67, so a genetic distance of 1.  So they are 98.16% likely to have a shared ancestor within 7 generations.  That's according to the FTDNA TIP report for 66/67.  

The paper-trail genealogy for Ron is solid back to Benjamin Nicholson.  The paper-trail genealogy for James is solid back to John Price.  Those two men were neighbors in 1750 when Benjamin was born.  If John Price was the father of Benjamin Nicholson, the number of generations dividing Ron and James would be 7.5.  

Does it still look like more testing would help narrow down the distance to the shared ancestor?  We might consider upgrading Ron's test to 111, because James has already tested to 111.  

Your comments were confined to the Y-DNA analysis.  Did you look at our autosomal comparisons?  I've been concentrating on autosomal DNA, so am still looking for feedback on that aspect of our project.

 

by Gary Nicholson G2G1 (1.3k points)
A 66/67 is a pretty good match, means you're almost certainly on the right track.  A 111 upgrade wouldn't be conclusive, just a refinement of your estimate of how far back the common ancestor is.  Your 3 possibilities are the most likely, although you can't rule out an older generation or 2.  If interested, I'd wait for a sale on 111 upgrades, like last year when I got mine.  However, if you're at all thinking of the Big Y, I'd put all my money there.  Last year, can't remember if it was April or August, FTDNA had a huge discount on the Big Y *plus* a free 111 upgrade (from 37 or 67).

If you do upgrade, I'd expect to see another mutation (~109/111).  If there isn't and you get 110/111, then I'd be concerned the connection could have happened even more recently than 1750.  If instead there are 3 or more additional mutations, then it likely occurred in even earlier generations, 1700's become much less likely.  Others here have more experience with this than I do though.  Certainly, I'd rely on FTDNA's estimates, as you did above.  It's my understanding that STR's mutate at differing frequencies, and FTDNA scientists take that into account.

Thank you for your thoughts.  I've added a few pictures to the PDF article:

https://garynick.com/genealogy/nicholson/Nicholson%20and%20Price%20Y-DNA.pdf

We think the shared paternal-line ancestor for the two men who match at 66/67 is beyond 7 generations in the past.  That changes the probability curve a lot.  I put a figure in the PDF to show that.  The data came from FTDNA TIP report for men sharing 66/67 where we excluded 7 generations.  The curve approaches 100% pretty rapidly.  But I think you were right about our not being able to exclude the shared ancestor being 9, 10 or more generations back.  We probably won't do an upgrade for any of us any time soon, but we will keep an eye out for sales.  

As you can see from the figures in the article, three descendants of Benjamin have Y tests.  Two descendants of John Price have been tested.  If you pair up the three Nicholsons with the two Prices, you have six pairs.  For each of those pairs you have a probability curve.  Is it possible to combine those six curves somehow, mathematically, logically to improve the estimate?  Five of the curves would represent comparison of 37 markers while only one curve would represent comparison of 67 markers.  I don't know intuitively if the resultant probability would be arrived at by adding or multiplication of the six functions, or if the combination approach makes no sense.  Just looking to optimize using all of the data we have at hand.  

That's really nice work!  Looks good!

About trying to combine the estimations, sounds like the old lab problem of trying to combine measurements of differing precisions - you really can't do it.  But what you can do is extrapolate lower precision to higher precision, and then you can average them, once you have measurements of the same precision (assuming a reasonable method of extrapolation).  If you make the assumption that the percentage of mutations per STR count is linear, and I think that's very roughly reasonable, then you could do it.  From what I have seen, roughly, if you match badly at 37, you'll match badly at 67 and 111.  If you match closely at 37, you'll probably match well at 67 and 111.  (Be nice to see some studies on this!)  So assuming linear behavior, if you have one mutation at 37, you should have about 1.9 at 67, and 3 at 111.  This allows lower and higher comparisons at 37 to drag your computed average at 67 lower and higher.  Of course, combining lower precision numbers into it means you have a larger error plus/minus also, corresponding to more generations before and after your average point, a wider curve.

I ordered a test upgrade to 111 markers.  Results show that I match the Price descendant 108/111.  

Based on the new results and the previous data, I built a couple of web pages to show the raw comparisons.  They are 

https://garynick.com/genealogy/nicholson/price_nicholson_strs.php

and 

https://garynick.com/genealogy/nicholson/person_strs.php

I also devised a scheme showing how the STR changes might hypothetically have occurred over all the generations from the first man to have I1a1b2b haplotype to the shared ancestor of the Prices and Nicholsons to the living men who have had their Y DNA tested.  Being able to put the pieces together into a consistent scheme supports our contention that Benjamin Nicholson, my ancestor, was closely related to John Price d before 1784.  

The treatment of the scheme is laid out in the newly revised PDF article at 

https://garynick.com/genealogy/nicholson/download_pdf.php

The probability that Hawksbill John Price was the father of Benjamin Nicholson is 37% based on current match data.  Not close enough to a certainty to call it a done deal.  

Gary

Gary I tried to look at your autosomal pages but for some reason what was coming up on my screen was distorted and I could not really see what you were doing...

Basically for a match to really exist you need a minimum of 7cm and at least 700 SNPs overlapping on the same chromosome.  Anything less than that you can safely ignore.  

I would use the People who Match 1 or 2 kits report.  Enter the two people who have the largest total CMs.  Now look to see if any of the other people you are tracking are on that report.   Select them and then run to get to the chromosome and SNP ranges.

To prove (not really an accurate word but more like to affirm) relationship you would like to see at least 3 people sharing at least 700 SNPs on the same chromosome across a shared SNP range.

Select up to 14 people at a time to run for the chromosome and SNP level so you might need to run several of these...  

Hope that helps.

Laura,  Thanks for your suggestions and for looking at my site.  I am sorry you had problems.  When I checked the page with triangles, I had no problems.  

+6 votes
I was approached about my own paternal lineage's Y-DNA results within the past year or two, studied up on the subject, and even "went to town" with a statistical analysis which was quite inciteful, if I do say so, myself. Applying that experience to your situation, I think I can interpret the data a little differently, and you might find it eye-opening.

But first I'll observe that there are 11 men in the results table, but only 7 in the relationship diagrams. Did I miss something? Or do we just not know about the other 4 (095, 253, 296, 298)?

More importantly, the diagrams mention "300" & "314". I'm going to assume they really mean "301" & "318" from the table, but it's not all that crucial.

(1) Getting on to the analysis, the main concept I want to throw out there (and I don't even know if other people do this) but number 038 (on the Price tree) almost certainly carries completely UNMUTATED Y-DNA thru the first 67 markers. If you could reincarnate John Price and give him a Y67 test, it would be the same as 038's.

For a start, one thing people don't seem to know about this is that the chance of getting ZERO mutations, over 6 generations, in the 38th thru 67th markers is - by my calculations - 79%. Notice that for the 4 tests in your table that go that far, ALL FOUR match exactly among these markers.

It's not that surprising. We have virtually the identical situation as your John Price, with our distant Standlee relatives - 3 tests, all three descended from a different son of the top guy. Only in ours, all three were in the same generation (I think all were 6 generations down), and all were tested out to Y67. All three matched exactly, in the 38th thru 67th markers.

Really, I don't see what Y67 does for you, over Y37, and I kind of hate to say that. Maybe a use will present itself to me someday.

But consider if 038 was NOT unmutated. If 038 had a mutation, then to get the results we're seeing, ALL THREE branches of the tree would have to - COINCIDENTALLY - have gotten the exact same mutation! OR, if the original was 100's DYS570 value, for example, then 038 & 318 would BOTH have to have gotten the exact same mutation, independently. The odds are remote - 038 has John Price's Y67.

So 314 was unmutated over 7 generations. What's the chance of THAT happening? The answer is "27%". So that could easily happen.

But 100 was mutated TWICE over 8 generations. What's the chance of THAT? Answer is "25%". Again, that could easily happen.

(2) 002 probably has Benjamin Nicholson's Y67, too. This one is actually less clear, because you have the same CDYa mutation for two guys that are as distantly related as possible. It's possible that Ben had that mutation, and then it was reversed during the two steps from 006's branch to before 002 & 301 branch off.

But there's ANOTHER thing people don't seem to know, and that's the fact that the CDYs are - by far - the most likely to mutate. After 7 generations, you're almost 3x more likely to see a CDY mutation as one in DYS464, and a DYS464 mutation is almost 3x more common than the next highest 5 (which are all about the same, and which includes DYS570).

So having CDY mutate the same way in 2 independent places isn't so surprising - we have that with the Standley YDNA if I'm not mistaken.

With 002 being the same as Ben, 301 matching him exactly isn't too impressive, since he only has a 12 marker test (it was a 93% chance). Chance of 006 getting 1 mutation over 5 generations is 37%. Chance of 232 getting 2 mutations over 7 generations is 23%. Nothing unusual going on here.

If I had to guess, I'd say 253 (missing from the diagrams) is most closely related to 232, and 296 (also missing) is more related to 232 and 253 than to others. I figure the DYS570=24 mutation happened early on in a branch with all 3 of them, 296 had no further mutation, but 232 and 253 re on separate branches with different single mutations in addition to the DYS570=24 one.

(3) So I'd say that Benjamin Nicholson & John Price had identical YDNA, out to Y37, & probably Y67. That doesn't necessarily make them father-and-son, but it's as consistent with that theory as can be, and the circumstances seem to point to that. That's about all you're going to be able to do with Y-DNA, as far as I know (but I don't know anything about BigY, or much about Y111, either). On average, you only see a Y67 mutation once in about 6 generations, so it's just not a precise tool.

But just think about how remarkable it is that you were able to pick up on how Ben was really in the Price bloodline. You've figured out exactly where the "wrong turn" occurred, and noticed there was a Price literally next door.
by Living Stanley G2G6 Mach 9 (91.1k points)
edited by Living Stanley
When I did my probability number-crunching last summer, I only did the first 37 markers because that's a far as I could find data for. I just used the numbers on Wikipedia, but I think there were other places you could also find them. I'm not sure every source is exactly the same on this, and I'm not sure how well- established they are. Anyway, they were on:

https://en.wikipedia.org/wiki/List_of_Y-STR_markers

I came back to it some months later, and that time I found a site with numbers thru 67 markers:

http://www.rogersdna.com/geddna/mutate.php

But it doesn't work now. I looked at just http://www.rogersdna.com, and it told me why. We can relate! It says:

"NOTICE RogersDNA.Com has been deactivated.

This was necessary because of the challenges we faced in meeting European GDPR privacy requirements.

We apologize for having to take this action."
Those numbers for the 38th thru 67th markers were:

DYS531    0.00037
DYS578    0.00008
DYS395S1  0.00031 (2)
DYS590    0.00054

DYS537    0.00057
DYS641    0.00018
DYS472    0.00001
DYS406S1  0.00154
DYS511    0.00128

DYS425    0.00018
DYS413    0.00202 (2)
DYS557    0.00321
DYS594    0.00029

DYS436    0.00018
DYS490    0.00019
DYS534    0.00832
DYS450    0.00020
DYS444    0.00321

DYS481    0.00544
DYS520    0.00245
DYS446    0.00095
DYS617    0.00042
DYS568    0.00053

DYS487    0.00097
DYS572    0.00212
DYS640    0.00034
DYS492    0.00042
DYS565    0.00087
The results show average mutation rates: Y12 (a mutation about every 45 generations), Y37 (a mutation every 6.5 generations), in the extra 30 that get you to Y67 (treated separately, shows a mutation every 26 generations), and Y67 (a mutation every 5.6 generations, on average).

Usually, the cases you see have guys that go back about 6 generations. For 6 generations, the Y37 results shows a probability of 33% for NO mutations, 37% for 1 mutation, 23% for 2 mutations, 7% for 3 mutations, and 2% for 4 mutations. So you usually see 0, 1, or 2 mutations.

The "extra 30" table shows that the probabilities of getting mutations in the extra 30 markers that get you up to Y67, after 6 generations, are 79% for ZERO, 19% for 1, and 2% for 2.So it's not unusual to see no mutations at all in this set of markers, between test-takers with a paper trail.

Frank,

Thank you for your work on mutation rates.  In the meantime, I found a link on isogg.com that pointed to a republication of some data.  Here is the linked page:  http://dna.cfsna.net/HAP/Mutation-Rates.htm  Maybe similar to what you provided.  I chose to go with the second set of values:  Iain McDonald, University of Manchester, Unpublished average of rates reported by Heinila (2012), Burgarella et al. (2011) and Willems et al. (2016) [Yahoo Group: R1b1c_U106-S21/2017-09]  I figured that averages would be less speculative.  

I did a SWAG based on the numbers in the table.  Adding up all the frequencies for all 111 STRs, you get 0.2948 for the average number of mutations per generation.  So, on average, about one mutation every 3.39 generations.  Between the I1a1b2b haplotype and the Price haplotype, there are seven differences.  So, on average, that would take about 24 generations.  But I can see a problem with that estimate, because some of the mutations will not show up as net differences.  So, I would think, in order to get 7 net changes out of 111 STRs, you would need to have more than 24 generations.  What with all the times CDY changes in one direction and changes back, leaving no trace.  Not sure how to account for that.  

Donor 38 is seven generations from HBJP.  If HBJP was the father of Benjamin Nicholson, the distance from HBJP to donor 2 would be seven generations.  There are three differences in the 111 markers between donors 2 and 38.  And, hypothetically, 14 generations.  At the rate of one mutation every 3.39 generations, the prediction for 14 generations would be 4.13.  Close to the observed value of 3 differences.  

What do you think?

Gary  

BTW, made some tweaks to my article at this location  Nothing major.  Just refinements.

Hi, guys. Just a comment from the cheap seats up here in the balcony. I'm not certain your aggregate anticipated mutation rates are correct.

Each individual STR mutation (except for palindromic multi-copy marker duplications and the pretty rare recLOH event) is an entirely independent, mutually-exclusive element from a probability perspective. Again with exceptions, no STR mutation has any relationship to any other. Two mutations can happen in a single generation, or a collection of men can all show zero mutations at 111 markers back for at least eight generations (have both of those in two of my FTDNA projects). So you wouldn't calculate an aggregate additively, or by using compound probability of independent events (each generation represents a clean slate, so to speak, so probabilities don't compound). That said, certain haplogroups and even haplotypes have experientially displayed differing mutation rates (note that the numbers shown by Iain McDonald at http://dna.cfsna.net/HAP/Mutation-Rates.htm look only at the U106-S21 subclade of M269). Establishing usable data for haplogroup/haplotype variances does require significant sample size combined, importantly so, with confident paper-trail information in order to determine what are actual, unique mutation events and when in the inheritance chain they occurred.

To use published results from others to aggregate an estimated mutation rate for a combination of STRs, you'd simply sum and then average the results. (Also note quickly that the Heinila/McDonald data linked above do not use the infinite allele model that FTDNA switched to in 2016 to evaluate genetic distance for the multi-copy markers). I slapped the Heinila/McDonald data into a spreadsheet to see what came out.

For Iain's results, I get a different sum at 111 markers than Gary did: 0.261853 instead of 0.2948. That would result in an aggregate 111-marker mutation rate of 0.002359, or 0.236% per generation. For the 67-marker panel, it works out to be 0.002089791, and for 37 markers, 0.00264573. The Heinila numbers are 111: 0.002321676; 67: 0.002088015; 37: 0.002747054.

These are all fairly consistent with other estimations of aggregate Y-STR mutation rates. Back in 2001 when yDNA direct-to-consumer testing was just getting started, 0.002 was offered as the benchmark for the aggregate mutation rate. Most other compilations I'm familiar with trended upward of that number, but none by a massive amount.

At FTDNA's 1st International Conference of Genetic Genealogy in Houston in 2004, a presentation showed these cumulative rates:

  • Markers 1-12: 0.00399
  • Markers 13-25: 0.00481
  • Markers 26-37: 0.00748

Comparing Iain McDonald's numbers, respectively: 0.00202, 0.00298, and 0.00488. Iain's findings are lower at each panel, but not astronomically so.

In 2006 in the Journal of Genetic Genealogy (which, alas, ceased operation that same year), John Chandler published a piece titled "Estimating Per-Locus Mutation Rates." The paper includes detail of his computational models that can be duplicated if you have a large enough sample size to work with. Chandler used haplogroup-nonspecific per-locus STR mutation rates taken from data at Ysearch and arrived at:

  • Markers 1-12: aggregate mutation rate of 0.00187, with a margin of error of ±0.00028
  • Markers 13-25: aggregate mutation rate of 0.00278, with a margin of error of ±0.00042
  • Markers 26-37: aggregate mutation rate of 0.00492, with a margin of error of ±0.00074

From 2005 through 2009 Charles Kerchner conducted a study consisting of 55 FTDNA surname projects in an attempt to refine average Y-STR mutation rates. In the list below, the number of markers tested is followed by the estimated combined mutation rate, the standard deviation, and the last numeral (in the tens of thousands) indicates what Charles terms the Marker Mutation Opportunities (MMO): the total number of discrete generational steps evaluated in calculating the mutation rates.

  • 12(1-12):  0.0025 ±0.0003 (28,728)
  • 25(1-25):  0.0028 ±0.0002 (58,925)
  • 37(1-37):  0.0042 ±0.0002 (84,249)
  • 67(1-67):  0.0031 ±0.0004 (19,296)

Kerchner summarized the observed cumulative mutation rates broken down by haplogroups (tested or FTDNA predicted):

  • I1: 0.0030 +-0.0005 (10,027)
  • R1b: 0.0043 +-0.0003 (44,585)
  • J2: 0.0042 +-0.0009 ( 4,551)
  • G2: 0.0048 +-0.0008 ( 7,104)
  • R1a: 0.0077 +-0.0008 ( 8,954)

So we swing from the very highest rate of 0.00748 (markers 26-37; 2004 presentation in Houston), to the lowest of 0.00187 (markers 1-12; John Chandler, 2006, Journal of Genetic Genealogy). The truth is likely within that range, which is far lower than the probabilities that have mentioned the last few posts. Again with the understanding that opposite ends of the bell curve can really throw a wrench into the works when you examine individual, small-sample cases.

We as yet have no idea what the values might look like for STRs 112 through about 450. These will be from the new Big Y-500 testing from FTDNA and, from the looks of it, there will be a significant volume of no-calls in those tests, so the results are likely to be highly haplotype-dependent. Time will tell if anyone proceeds with analyzing aggregate mutation rates for those STRs.

Edison, it might take me a while to go though all that really carefully (it seems like it would be worth it) but I can comment on some basics right away:

Perhaps the problem here is that we're not talking about the same things. What IS meant by an "aggregate mutation rate" and "estimated combined combined mutation rate"? Because the numbers you're quoting for those obviously aren't in concert with reality, if they're supposed to mean the same thing we're talking about. The "1-67" number not only appears to suggest that a mutation only occurs in Y67 every 300 generations on average, but also that there are fewer mutations in Y37 vs Y67, which is impossible. So, like I said, these numbers must mean something other than what we mean to be talking about (because I doubt the people coming up with them are complete idiots).

As to calculating probabilities, consider if there were two STRs in a "Y2" test, with probabilities of mutation in a generation of p1 & p2,

The probability of ZERO mutation after 1 generation would be (1-p1(1-p2) = 1-(p1+p2)+p1*p2. The probability of ONE mutation is p1*(1-p2)+p2*(1-p1) = (p1+p2)-2*p1*p2. The probability of TWO mutations is p1*p2. Simple stuff.

Now, when p1&p2 are something less than 0.01, the p1*p2 terms are negligible, and the probabilities are approximately: P0=1-(p1+p2), P1=p1+p2, P2=0. I, personally, prefer to do the full math (on a spreadsheet) but obviously you can do pretty well by just adding them up, as a rough (not so rough, really) estimate, and I think that's what Gary is talking about having done.

Calculations for multiple generations can only be based on those for the single generation case. The probability of ZERO mutations over TWO generations is just P0*P0, using the single generation P0 (as calculated above). The probability of ONE mutation becomes P0*P1+P1*P0. The probability of TWO mutations becomes P0*P2+P1*P1+P2*P2. So I don't know what you mean when you're saying "each generation represents a clean slate" - it doesn't seem relevant. It's just easiest to calculate each successive probability distribution for each generation by using the 1 generation distribution, and the previous one, and that doesn't violate any principles of independent events.

Don't take my word for it.  smiley  Folks have been analyzing this Y-STR mutation rate stuff for almost 20 years now. 

In particular, I think you would enjoy looking at the Chandler paper to see the calculation models employed: http://www.jogg.info/pages/22/Chandler.pdf.

It's not a matter of "taking your word for it". It's a question of apples and oranges.

This last paper you cite may give a clue about that. In the abstract it gives values for what it calls "calibrated average mutation rates" for Y12, Y25, & Y37, which are numbers on the same basic order of magnitude as the numbers you're quoting. THAT suggests simply an AVERAGE number among the mutation rates for the STRs, which they seem to be calling "loci". Maybe the numbers you're quoting are something similar.

But Table 1 in that paper gives all the INDIVIDUAL mutation rates for each STR/locus. It gives the EXACT SAME numbers I used for my calculations! So this paper may be where the Wikipedia numbers come from. It's from 2006, so that doesn't sound very up-to-date, but that's what I used. The "mutation rate" for a given STR is simply the probability of it mutating when passed from father to son.

If you take (1-p) for each STR in that table (with each "p" being right off the table), and multiply them all together (including 2 or 4 factors for the ones with asterisks, as appropriate), then you should get 0.8322 (83.22%), the probability of ZERO mutations from father to son for Y37. That leaves 0.1678 (16.78%) as the probability of getting AT LEAST ONE mutation. Which is the roughly 1-in-6 number I've stated.

The detailed calculations divide that "AT LEAST ONE" number into a .1542 (15.42%) probability of ONE mutation, 0.0129 (1.29%) probability of TWO mutations, 0.0007 (0.07%) probability of THREE mutations, and increasingly infinitesimal probabilities of higher numbers of mutations.

Really, THESE are the numbers - these probabilities for 0,1,2, and 3 mutation, are what is actually USEFUL for a genealogist, so I don't know why they don't seem to ever be talked about. I have a table (derived from these numbers) that tell you what it all adds up to, for various numbers of generations, and it explains what I've seen pretty well. Maybe it's too "mathy" for people.

The numbers that people seem to focus on are kind of the "reverse problem" - determining the probability of a given number of generations, given a number of mutations (called the "genetic distance", for some reason). I can't imagine any practical way to reasonably calculate such a thing - you would have to make a number of assumptions, or do an impossible amount of research. So I assume that such numbers are simply hokey - derived by quacks under pressure from marketing people to tell the customers SOMETHING.

Such an analysis also seems to focus on comparing ONE result to ONE other result. It's a simple-minded - not holistic - way to look at it. As I described or Gary's case, it's pretty plain what the markers for the top guys are, and that certain mutations clearly occurred on certain branches (in numbers that make sense, given the probabilities), but it seems like nobody ever considers even thinking about that. Not that I've seen all that much discussion on it, but if anybody DID consider such things, somebody on here would probably know about it, and hopefully would say something.

So maybe you heard it here first. Math meets genealogy - a new way of looking at these things is born? My own Standley line is the first-ever application? Wow, this field is NEW!
Frank and Edison,  thank you for your lively and illuminating exchange.  I'm not sure I want to comment on all the details, but I do want to contribute a few clarifications and affirmations.  

Edison, when I added up my 111 mutation rates, I accounted for the multi-marker STRs by adding them the appropriate number of repeated times.  That might account for the difference between your and my totals.    

Yes, I think there was some confusion between total and average calculations.  Frank, thanks for clarifying that.  Chandler used the term "aggregate" which to my mind is ambiguous.  An aggregate could be a total or an average.  It seems obvious that he meant an aggregated average.  "Cumulative" is also excessively vague.  

The Chandler paper is excellent.  It seems that his method would have only accounted for net mutation.  Do we know if he was aware of that?  Not accounting for cases where an STR mutated in one direction and then mutated back in the next generation.  So, the reported mutation rates would be less than the actual events would have revealed had we access to that information.  Comparing the I1a1b2b haplotype to the Price haplotype, seven slowly changing STRs showed differences.  Signalling tens of generations of distance between them, but CDY, a known rapidly changing STR, was the same in both.  That tells me that the value in the earlier haplotype and the value in the later haplotype were both the average values for CDY.  The resting point, so to speak.  STR values tend to regress to the mean.  

Frank, I agree with your take on the charts of probability of generations separating two donors.  What we want is an algorithm that tells us, based on matches and mismatches, how many generations from the shared ancestor that state would take to arise.  An algorithm should probably account for the fact that the rapidly changing STRs have more weight in the short term and not less weight in the long term.  

The reportedly different mutation rates for the different haplogroups is intriguing.  I wonder if Chandler's sample sizes were large enough for those differences to be accurate.  Is there a way to estimate that?

How frequently each STR changes needs more study it seems to me.  The progress from Y12 to Y111 to Y450 is encouraging.  We don't yet have enough data and experience to really see all the patterns, it seems.  I'd like to be able to resolve one degree of separation going back ten or twenty generations.  Maybe that will ultimately be possible.

I've modified a page I posted previously.  The page shows the STR differences among men descended from Price ancestors all belonging to I1a1b2b Haplogroup.  Ordinarily, the page showed the STRs in the order they are reported by FTDNA.  The alternate view of the page sorts the STRs by mutation rate, showing the most slowly changing STRs first.  

https://garynick.com/genealogy/nicholson/person_strs.php?sorted=1

The group's relatedness is shown by the differences accumulating at the bottom of the page. 

Apologies.  I haven't been rigorous about exploring the best values for the mutation rates.  More accurate rates might give different results but I think the general presentation would be similar.  

Related questions

+6 votes
3 answers
+7 votes
2 answers
+10 votes
1 answer
+3 votes
5 answers
1.4k views asked Mar 31, 2020 in The Tree House by Greta Moody G2G6 Pilot (199k points)
+18 votes
3 answers

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...