if my brother's Y DNA is so common, why aren't there any matches???

+5 votes
585 views
My brother's Y DNA is R-M269.  He has tested to 111 markers.  He has no matches at all at 111 markers, no matches at all at 67 markers, and only 3 matches at 37 markers, with none of them being 0 genetic distance.

He has 32 matches at 25 markers, with one actually being 0 genetic distance and having the same surname (but it is Smith, after all..and the matching Smith's paternal ancestor was apparently born in Virginia in 1798 and was an important person, whereas my father was born in England and his paternal line is ag labourers and the odd carter or farmer). Of the other 31 matches, many have tested beyond 25 markers so they aren't any kind of a match by that point.  There are nearly 2800 matches at the 12 marker level, but they likewise don't even make the list for the next marker level.

So what i'm left wondering is this: since this haplogroup is so common, why aren't there more matches?  

Just wondering

Shirlea
WikiTree profile: Riane Smith
in The Tree House by Shirlea Smith G2G6 Pilot (181k points)

Another idea to consider is that if 7,000 men have had their R-M269 tested that's one thing, but if there are as stated 110 million and 99,992,000 have not had theirs tested you might be fortunate to discover a match 

That haplogroup M269 is like 7000 years old. However, the Y111 STR test will be specific to matches "only" 500 or so years old, or much closer to the terminal haplogroup of the tested man. That greatly reduces potential candidates for matching. If descendants of the common paternal ancestor, in whatever world region they are from, haven't tested yet, there's nothing to match.

In other words Y111 will only match an extremely tiny subgroup of men with a root haplogroup as old as M269. The MRCA needs to have been born about 500-600 years ago or later to get a 7 GD or lower match using Y111.
I think the cost of Y-DNA testing is prohibitive for many people, and until the cost comes down, the masses won’t be willing to test. Sadly, that keeps those who do test from finding matches. I’ve wanted someone in my male line to get tested, but that’s not likely to happen until the cost comes down. I don’t understand why Y testing costs so much more than autosomal testing. Is it just because fewer people do it, or is it truly a more costly test?
I'm willing to fully sponsor Y700 tests for anyone in my direct paternal line with MRCA after 200 years ago. However, nothing but cricket chirps from DM solicitations on ancestry.com. Most people are just not interested in doing that type of test.

It's a much more costly test, Jodi. The type of testing--and what is actually tested--in the common 37-, 67-, and 111-marker panels at FTDNA is like the autosomal DNA testing done for forensic and paternity cases, not at all the same type or process of testing as our cheap atDNA microarray tests. At-home tests for paternity start at around $200...and those usually aren't admissible in court; that requires a typically more expensive test.

Our inexpensive microarray tests can't examine the STR (short tandem repeat) markers at all, whether on the Y-chromosome or the autosomes. In a nutshell, the microarrays look like oversized microscope slides, are a glass plate encased in plastic, and are "programmable": the chip contains tens of thousands of "probes" that are tiny, synthetic bits of DNA that will attract specific bits of the test-taker's DNA when the prepared DNA solution is washed over it and allowed to stand a while. The microarray chips are cheap and can even be reused a few times. In the lab, it's a once-and-done process over a period of a couple of days. The DNA is prepped (amplified, fragmented, precipitated) and then "hybridized" to the synthetic DNA on the chip (i.e., the lab worker walks away and lets it sit there). The amount of manual time involved is pretty insignificant, which is why hundreds of samples can be run through a lab each day.

With the microarrays, bits of prepared DNA either find and stick to the probes, or they don't. In every test run there will be "no calls," places where the probes couldn't identify properly "stuck" DNA. That's usually well below 1% of the locations examined, but even a 0.5% no-call rate represents over 3,000 markers that are just skipped over in the results because no value was returned.

Conversely, STR testing can't look for a simple bit of DNA that matches and adheres to a microarray probe. STRs don't represent different alleles--the A, C, G, T of DNA--but an identified set of (usually) two to seven base pairs in length that actually replicate themselves in succession on the chromosome. To look for the number of exact repeats, the DNA can't be fragmented like it is for our autosomal tests. If the chromosome strands are cut into arbitrary small pieces, there's no way to count the number of repeats. Instead, a process much more like traditional Sanger sequencing is employed. A method called polymerase chain reaction is used along with fluorescent staining in order to evaluate the number of repeats at those specific positions along the chromosome. And multiple runs have to be performed in order to arrive at an accurate count. It isn't an either/or with STR testing; it's a "how many." So multiple passes will be run and the fluoroscopic results analyzed manually to determine the most accurate count of the number of repeats for each STR tested.

I know it's more than you ever wanted to know.  wink  But it's a very valid question that comes up from time to time. There are procedural reasons why we can test  600,000 loci (well, minus the no-calls) for $59, but are looking at double that price to examine just 37 different STRs on the Y-chromosome.

Edited: And what Mike's talking about with the Big Y-700 test is a yet whole 'nother ballgame. That's next-gen full sequencing of the Y-chromosome and goes into coverage depths well beyond what's considered medically valid for whole genome sequencing. If money isn't an issue, however, that's absolutely the way to go for anyone interested in yDNA. To the extent current testing technology permits it, that's a one-time expense: everything we can know about a Y-chromosome, the Big Y test will reveal it; and the data is continually evaluated and adjusted against things like new haplotree branches and private variants that become named SNPs.

Someone in WikiTree authority needs to make this answer permanently available for all to read at all times!!!

Thank you Edison!
If your brother did autosomal as well, copy it across to all the other major testers such as Ancestry, MyHeritage, also GEDmatch. This will give you lots more matches and you can approach some of the closer matches to see if they have males willing to test with Family Tree at say Y-37, which is a good indicator. MyHertiage has a nice triangulation tool which will give close matches very easily and you can filter for Smith.

Thank you Edison for your very thorough and informative reply. I still suspect that the price could come down if demand were to increase. (Nothing like dollar incentives to prompt further innovation. The "inexpensive microarray" tests weren't exactly cheap and easy to develop initially either.) But I'm not holding my breath since yDNA testing will never be as fun or useful to the average customer.

I learn so much from your posts. Thanks!
Edison, would you consider turning your comment into a reply so we can upvote it?

7 Answers

+7 votes
 
Best answer
In short, because the mutation that identifies people in haplogroup R-M269 appeared 10,000 years ago. Most of the other men in that group share a most recent common ancestor with your brother who lived thousands of years ago.

My uncle had few matches -- none at 37, and when I upgraded him to 111 he obtained a few at a genetic distance of 10, which are worthless for genealogy. When I upgraded him to Big-Y, which tests SNPs rather than the STRs done by the 111 test, he was placed on the haplotree. His current terminal mutation is still a couple thousand years old, but still better than before, and if you wait long enough, you get moved down the tree to a new, more recent terminal SNP. He has matches closer on the haplotree than his STR matches, and it became clear that even though I can only trace my line back to near Aberdeen, his line at some point came there from the Irish sea.
by Barry Smith G2G6 Pilot (219k points)
selected by Ole Selmer
+6 votes

> R-M269 is of particular interest for the genetic history of Western Europe, being the most common European haplogroup. It increases in frequency on an east to west gradient (its prevalence in Poland estimated at 22.7%, compared to Wales at 92.3%). It is carried by approximately 110 million European men (2010 estimate).[4] 

Haplogroup R-M269 - Wikipedia 

by Susan Smith G2G6 Pilot (497k points)
Thanks Susan -- so where are all the hundreds of matches that we should have, some of whom should be a a relatively close genetic distance!

The stats used here at WT say things about percentages of DNA inherited, like 1/2, 1/4, 1/8, 1/16, 1/32, 1/64th meaning basically the farther away in time between two people presumed to be genetically related, the more diffuse the DNA distribution becomes ... and those percentages come with a plus and minus range 

Frequently mentioned is that DNA is not uniformly distributed, it is more like scatter shot from a shotgun .... one son could easily get more of the paternal DNA than the sum of the other two sons ... in short it is randomized  

DNA Project 

"Frequently mentioned is that DNA is not uniformly distributed, it is more like scatter shot from a shotgun .... one son could easily get more of the paternal DNA than the sum of the other two sons ... in short it is randomized"

Shirlea and Susan.

An important fact to recognize is that the above quoted statement is okay when discussing autosomal DNA.  It does not apply to paternal Y-DNA test matching.   Your brother inherited his father's Y intact - no randomizing involved at all  with rarely a marker or two experiencing a mutation at conception.

Actually, his losing matches as the number of test markers increases is normal and a good thing to have happen.  Your task now is to sort through your grandfather's male relations ie: other sons or his brothers (your uncles) or even further back in time and then work forward in their family tree until you find a living direct line male cousin who can be tested for his Y-DNA.  Sometimes they will agree to testing - sometimes not.  If not - back to the search you go!  Meanwhile, some guy somewhere may decide to match and show up as a match in your brother's Y111 bin.  This can be a very frustrating "waiting" game.

As you are doing all this you also need to read and learn about Y-DNA research.  This science is exploding with new advances as we write.  My suggestion to get started in this education is to subscribe to Roberta Estes' blog at https://dna-explained.com/author/robertajestes/   Guaranteed to explain it all far better than the above.

laugh Thank YOU, William, -- I had hopes SOMEONE who understood DNA stuff would show up and take charge of this (and I now know autosomal is randomized in distribution, Y DNA is not) 

How did they arrive at their estimate of how many men have a specific haplogroup   --  test 100 men and extrapolate into the millions?  Sounds like there is a lot of chance for a big margin of error.

100 test subjects are not enough to create a reliable statistical result ... basic count for a reliable sample poll for instance of how the voters in a region would vote to legalize marijuana would, last I heard, be something like 5000+ registered voters and if the poll takers can obtain 8000 or 10,000 so much the better ...

 each phone call interview = x amount of income and outlay so there's always an eagle eye on the accounting 

Companies that "do" the testing of the swaps and spit tubes or whatever else and the techs who do the work and the equipment also have someone keeping an eagle eye on the income and outlay 

Shirlea, a few articles for you to go along with the wikipedia article about the DNA R-M269  

>What is genetic ancestry testing? - Genetics Home Reference - NIH

>Genealogical DNA test - Wikipedia

>What does it mean to have Neanderthal or Denisovan DNA? - Genetics Home Reference - NIH

And if you are using "DNA genealogy" you may have to take some things on faith, really, if you have not invested in some study -- I keep thinking well, maybe someday I might perhaps read up on this DNA stuff, but I never get around to it ... my own "doing genealogy" is just based on my faith that there are people at WT who DO understand it .   

I myself rely on the paper trail of records (BDM) and census -- and common sense -- and judgment calls in some cases - do have a lot of cousins at WT who have done their DNA testing 

+4 votes
The haplogroup isn't all that is used for matching. It is just part of it so while the haplogroup can be common, the rest of the markers added in have to be factored as well. At 111 markers, you can have quite a range of variants.
by Doug McCallum G2G6 Pilot (425k points)
+3 votes

I have no matches at any level, Y-12, -25, -37, -67, or Y-111. My earliest known paternal line ancestor is from what is now Germany. 

by Lincoln Lowery G2G6 Mach 2 (27.6k points)
I am R-Z326.
Lincoln - That's how I started out in 2004 when I started the Harvey Y-DNA project.  I then found a Harvey 4th cousin 1xR b. in NH a couple of months later through comparing family trees with the surname in New England.  Then a long dry spell until 2009 when family tree comparing again found a 5th cousin 1xR b. in MN whose son did the testing so actually 6th cousin tested .  The well went dry again until 2013 when I was able to build a suspected match's family line back to where an unknown NPE had occurred - this event let me find a 6th cousin 1xR b. in Kentucky.  We are all genealogically and genetically related to a Harvey b. 1711 in MA through three of his sons.  To this day I have no other surname matches from 12 markers through Y700.

Since we are all of the E haplogroup and it appears that another believed to be related major Harvey line are all proved to be of the R haplogroup back to 1617 in MA.  Evidently down through the millennia of mankind's existence our related cousin lines have become extinct.

Almost certainly there are some men, somewhere, who are related but I will probably not be here to see their test results.  Goes to show Lincoln - don't give up on your quest.
+3 votes
I realize I'm late to this party but as administrator of the FTDNA Tryon surname Y-DNA project I've seen some real-world results within the R-M269 group. We have spent many years trying to get living Tryon males tested and are now up to 12. It isn't easy. The good news is we are all good matches at 37 markers and we've learned a bit from those results. 37 is the sweet spot. The 12 and 25 marker levels are of no value because there are too many matches of clearly unrelated people (in the genealogical sense). More than 37 you get fewer matches, but that can be useful in special cases. The other issue is: What makes a match? FTDNA uses genetic distance which is a very simplified measure. At 37 markers, all of the Tryons match at a genetic distance of 1-4. But that does not translate into relatedness. A mutation at DSY-439 has 10 times the probability as one at DSY-438. My father and I are genetic distance 1 because of the common mismatch at 439 while a well documented relative whose line diverged from mine more than 10 generations ago has a genetic distance 1 mismatch at 438. So, in short, genetic testing isn't a plug-n-play, instant gratification means of extending your family tree. It takes a combination of old fashioned documentation, an understanding of the nature of the tests and results, and LOTS of patience.
by Michael Tryon G2G2 (2.8k points)
I have 9 matches at 37 markers; all genetic distance 4.

With 67 markers though, I have 21 matches.

So, depending on how your markers mutate, it is possible to have more matches at the higher markers.

I have only 7 matches at the 110 level, but they all share my surname.  At 67, there are a few other surnames.  At 37, there are some wild matches, so I think there must have been some backward mutations or what ever.    

I think in general, there are fewer matches at the higher levels, but there are exceptions too.
Sure, I simplified things. You can certainly have more matches at at greater levels. But not usually. But more importantly, what constitutes a match? With FTDNA, at level 12 only perfect matches are matches, I don't recall what it is at 25, and at 37 anything up to distance 4 is a match. None of this is easy or straightforward. My message was simply that there is much more to it than matches.

Well, since the topic was restarted and I've already jumped in elsewhere with my unsolicited two cents, might as well here, too. laugh

With over 60,000 Big Y results now in the database, we've been learning that lower resolution STR panels can lead to some deceptive results. Just to fill in Michael's correct statement about FTDNA matching, here's the rundown of what they will display as a "match":

  • At 12 markers: up to a genetic distance value of 1*
  • At 25 markers: up to a genetic distance value of 2
  • At 37 markers: up to a genetic distance value of 4
  • At 67 markers: up to a genetic distance value of 7
  • At 111 markers: up to a genetic distance value of 10

* For 12 STRs only, a GD of 1 will be shown only if both men are members of the same Group Project; otherwise, they have to be an exact match.

In the four FTDNA projects I admin, we've seen some large numbers of Big Y test takers among some family clusters. In one of those clusters, we have 35 members, 24 of whom are Big Y tested. With enough data to map and reasonably date a genetic tree based on both deep SNP and STR results, we've found some interesting things.

At 37 markers, we have men with multiple (up to five per person) "matches" at the maximum GD of 4. The extended data show these men cannot have shared a patrilineal ancestor since circa 900-1000 AD.

At 67 markers it gets even more muddied due to the GD limit increasing to 7. At that level, the same ca. 900-1000 AD situation exists for up to 12 "matches," though the most common value among the group of 35 is 9 pre-genealogical matches. The lowest GD of these misleading matches is again 4 of 67.

At 111 markers that lowest GD moves to 5, and the average number of ca. 900 matches is 11, with one kit having 14. Of interest is that at GD 5 we show both valid, genealogical timeframe matches as well as the thousand-year-old matches.

This shouldn't be taken as an effort to dissuade folks from taking STR tests. But FTDNA hasn't adjusted its maximum GD values in over a decade; the same thresholds have been there since shortly after the 111-marker test was introduced. And it very much seems that the thresholds--at least when trying to apply them as across-the-board parameters--are significantly too lenient. But then we have to keep in mind that FTDNA is in business to sell DNA tests; if we pulled the GD maximums back to the point where we could have better confidence in genealogical timeframe matches, the number of matches displayed might drop by a significant percentage. Imagine the uproar and sales decline.

Since 2019 FTDNA has been promising that some version of a TMRCA tool will be coming soon for Big Y test takers. It looks now as if that will be applied to the FTDNA haplotree as a whole--a la what Urasin, Adamov, et al. did at YFull--and hopefully will be showing up soon.

We (well, I) now think that will be based on recent work by Iain McDonald...a familiar name to folks who delve into yDNA. Michael mentioned the greatly varying average mutation rates among STR markers, and much of that research probably came from Iain. There's been a lot of discussion the past two weeks (in an unnamed private Groups.io forum) about some of the underlying maths and assumptions in this paper, but I believe this is what FTDNA's new date calculations will be based on and, for us DNA dweebs, it's simply a cracking good read: McDonald, Iain. "Improved Models of Coalescence Ages of Y-DNA Haplogroups." Genes 12, no. 6 (June 2021): 862. https://doi.org/10.3390/genes12060862. The paper is Open Access; the full text can be read here; the PDF is available here.

Well, I said it's a good read but Iain is, by day, an astrophysicist. So if you're like me and have to count using your fingers and toes, there are parts of the paper that require some serious concentration. But he's been at this genetic genealogy thing almost since it was a thing, and we're fortunate we have people with brains like that interested in the subject.

I finally relocated this discussion! Thanks for the detailed answer!... I just spent the better part of 45 mintues on a reply onlu to have Wikitree tell me I'm not logged in and of course the entire reply is gone...rolleyes.... The gist was that my only copy of mutations rates seems to have come from Burgarella and is easily over a decade old now... And searchiing for something more current, I seemed t find there aren't many even trying to keep up with that anymore so I decided to wait for SNP based age estimates to come forward...The only real question I had that I recall is if the 83 years is based on the 23~ million base pairs, and BigY is now only looking at about 1 million of those, I thought perhaps and adjustment of some sort would be in order... but then the 1~ million are included in the 23~ million...Intuitively I would think it would make to large of a difference...81~ years vs 83~... Oh, the other question was from a FTDNA Activity feed discussion I was reading and cannot relocate...I thought someone on there was advocatig cutting the 83 yeras in half, but I can't recall why or what the reason was...I have the impression it was something to do with two kits being compared as opposed to analyzing just one kit? Any idea? I could be completely off base...lol I did try to calulate in my own spreadsheet using 40 vs the 83 and things seemed to suddenly line up...odd. Thanks for the detailed answers!

Heya, CR. Yeah; I've been caught by the auto-logoff thing before; I believe the WikiTree authentication cookie has a hard expiration each time it's set. But then, since I tend to ramble infinitely on here, I figured no one would have any sympathy over my losing word count.
devil

In fact, the Y has about 57.2 million base pairs and the Big Y-700 attempts to test about 23.6 million of them. Most tests will approach approximately that number. FTDNA has by far the largest database for yDNA sequencing results, and they add to that knowledge base monthly (pretty much literally: on 16 June their haplotree numbered 43,815 defined branches; this morning it was 45,901).

Prior to Iain's paper a month ago, the only previous published research about general Y-SNP mutation rates that I'm familiar with was Adamov et al. from March 2015. I'm uncertain if that was ever peer reviewed (I knida think it wasn't), but that's what the YFull.com clade branching dates are based on...and those results compare to Iain's numbers surprisingly closely. So--just my opinion--if someone proposed a 40-year cumulative average I'd have to think they were making things up as they went along; no research data to back it up.

Like you, I'm eagerly awaiting FTDNA's overdue publication of Y-SNP dating.

+1 vote

His R-M269 haplogroup is common (and old), but his haploTYPE is not common.

Unless Y-DNA testees can afford Big Y-700 test, they should focus on sufficiently matching Y haploTYPEs (the 37 or 67 etc, STR markers) to determine direct paternal line relatedness.

Y haplotypes should be uploaded to mitoYDNA.org (it’s free) and their mitoYDNA ID added to WikiTree.  To see for yourself, register at mitoYDNA.org https://m.youtube.com/watch?v=I0WWnDdWKKw

and then click on the [compare] links under Dna Connections at https://www.wikitree.com/wiki/Thorpe-1946

Sincerely, Peter

mitoYDNA.org, team member

by Peter Roberts G2G6 Pilot (562k points)
edited by Peter Roberts

Just a quick addition here so that folks don't get confused. For genealogy testing and the Y chromosome, a decade or more ago there was a bright line between haplotype and haplogroup. That's because there was no deep testing or full sequencing available, so the only haplotype, perforce, came from STR testing. Today, a man will often have one or more individual variants from a Big Y test--polymorphisms that no one else tested yet has--that, when combined with the SNPs that tested positive, constitute a haplotype.

A haplotype is, essentially, any set of genetic data that can provide a "DNA signature," something specific enough to help distinguish one individual from another. Our autosomal SNP tests provide a haplotype, as do forensic autosomal STR tests, or the combination of SNP and STR testing in the Big Y-700, or a Y Chromosome VCF (Variant Call Format) file--which can come from a Big Y test, other full yDNA sequencing, or a whole genome sequencing--that doesn't take STRs into account at all.

It's still common to think of a yDNA haplotype as being derived only from STR testing, but that's no longer the case. The ISOGG Wiki entry for "Haplotype" says:

"A haplotype (also known as a signature, a DNA signature, or a genetic signature) is a set of markers (polymorphisms) on a single chromosome that tend to be inherited together. A haplotype can refer to a combination of alleles, to a set of short tandem repeats (STRs), or to a set of single nucleotide polymorphisms (SNPs). Haplotype is a contraction of the term haploid genotype.

"In genetic genealogy the term is normally applied to the letters or numbers obtained from the results of a genealogical DNA test. Haplotypes can consist of varying numbers of markers depending on the test taken and therefore exist at different resolutions."

Thank you Edison.  I wish to revise my answer to "...his STR haploTYPE is not common. Unless Y-DNA testees can afford Big Y-700 test, they should focus on sufficiently matching Y STR haploTYPEs (the 37 or 67 etc, markers) to determine direct paternal line relatedness."

+1 vote
A case could be made that you only need 12 markers to "predict" a halpogroup of M-269. You stated you have 2800 matches at the 12 marker level.

The simpler answer is other "matches" have not tested yet. You can wait, or actively recruit potential matches to test.

If you really dig into the 25, and 37 you can still find out quite a bit of useful information. And remember, sometimes knowing who you do *not* match can have the effect of focusing your research on particular lines while safely ignoring others.

Forget looking for "zero" matches. With Y-DNA, the last I knew is FTDNA stated they were seeing a SNP mutation every 82 to 98 years. STRs (like y111) mutate much faster than SNPs.

I would caution against dismissing a gnetic distance of 10.  I have two "known on paper" 8th cousins that are a genetic distance of 10. And at least 5 more at a grater distance than that. After investigating it turns our my particular branch mutated more than the rest of the family over the same time period. 5 more mutations since 1700. Investigating further it turns out my branch lived on a lke that started beig heavily polluted in the early 1900's. The pollutaants have been found in the water table. The family business in the area was well drilling. More exposure than the average person simple drnking the water. With that kind of exposure to such a concentration of mutagens the extra mutations become quite understandable. So, while a GD of 10 is usually not a good match, it "can" be. We call these people "outliers". SNP testing can usually answer the question definitively as was the case among Clan Stewart and SNP s781.
by CR Campbell G2G1 (1.7k points)
thanks!  very interesting!

With Y-DNA, the last I knew is FTDNA stated they were seeing a SNP mutation every 82 to 98 years. STRs (like y111) mutate much faster than SNPs.

Hiya, C.R. Those statements are correct, but they're in mixed context.

When references are made to a SNP mutation rate of ~83 years, the consideration is for any point mutation, any individual allele change across all of the ~23 million (out of 57 million) base pairs tested by the Big Y-700. And I believe the calculation includes all variants, whether a SNP has been named at a given locus or not: with so few nucleotides in the Y allocated to protein coding genes, and the avoidance of crossing over, just about any variant--at least within that 23.6 Mbp (megabase pair, or one million base pairs) target area--that hasn't yet met the basic criterion to be given a SNP name has an excellent possibility of being so labeled in the future. Since Big Y testing began, we've gone from 16,361 yDNA haplotree cataloged branches in September 2018 to 45,366 today. The dbSNP database currently has 2,548,155 named SNPs on file for the Y Chromosome.

The Iain McDonald paper I referenced uptopic does a pretty good analysis to arrive at 33 years as a blanket generalization for the generational interval along patrilineal lines. So if we assume 83 years is going to be roughly where the estimate settles for Big Y data, then we'd be talking about 2.56 generations per any single mutation among any of the ~23 million nucleotides. SNPs will mutate much slower than STRs individually, but if you're looking at an aggregated ~23 million independent events, the odds go way up that one will happen in any N period of time, or rolls of the dice.

The 1000 Genomes Project included 702 Y-STRs in their data, not far off what FTDNA now targets with the Big Y-700, but there are a total of around 4,500 known STRs on the Y...many in the 33.6 Mbp the Big Y doesn't test and that are just chock-full of tons of repetitions, or that are part of the two pseudoautosomal regions.

If we could look at data to give us an idea of how often--bad analogy again, but hey--rolling the dice 4,500 times would turn up a single STR count change, then we might have closer to an apples-to-apples comparison to an 83 years value across 23 million SNPs.

We're unlikely to ever get a decent value from even that exercise, though, because STRs can be flat-out quirky. In part that's because dad's germline DNA changes as he ages: unlike mom's ova, which are all generated via meiosis while she's still in the womb, dad's gametes are an on-demand operation, and as he ages his DNA goes through a process called deamination, a result of methylation, with the result that a child born when dad is 20 is likely to have genetic differences--especially in the Y since it doesn't recombine and so much of it doesn't have to "repair" itself at meiosis--compared to a child born when he's 40 or 50. That deamination process escalates as we age.

Biology rabbit hole. Sorry. Bottom line, though, is that STRs see over time, not uncommonly, back mutations, parallel mutations, and multi-step mutations...the latter where more than one repeat of the STR's sequence of alleles is added or subtracted in a single generation.

Three other fun STR facts. In addition to the father's age throwing us a curve-ball in accurately estimating a mutation rate, mutation rates of STRs seem to strongly correlate to the length of their repeat sequence (e.g., an STR that has only two or three repeated alleles will tend to mutate more slowly than one with, say, five or six alleles. Also, the number of repeats also seems to make a difference: STRs with a greater number of repeats will tend to mutate more quickly than STRs with a small number of repeats (e.g., two of the STRs with rapid mutation rates are DYS710 with up to 36 repeats, and DYS714 with up to 25 repeats).

Last up on the STR Trivia Tour, that latter effect--more repeats, more tendency to mutate--finally substantiated what a lot of us volunteer FTDNA project admins have been seeing for almost 20 years: the same STR seems to mutate at different rates in different haplogroups. Because different haplogroups can have signature repeat values for certain STRs, we now know that those different repeat numbers can lead to an experienced mutation rate difference among haplogroups of the same STR of up to around 20% (Claerhout, et al. "Determining Y-STR Mutation Rates in Deep-routing Genealogies: Identification of Haplogroup Differences." Forensic Science International, Genetics (May 2018) 34:1–10).

So with STRs, the difference in mutation rates between any two of them can be factors of magnitude. Most of my FTDNA projects are R-M269-centric, and to try to help project members make some sense of individual STRs (beyond FTDNA's TiP report, which isn't a terrible place to start, but is about as accurate as pinpointing a street address in San Diego by saying "Southern California") I maintain a spreadsheet that derives individual STR mutation rates from several sources (McDonald; Heinila; Burgarella; Willems; Ravid-Amir; and the NIST's STRBase: https://strbase.nist.gov). Number one in speed may be open to some debate because it's probably the most variable one tested, and that's the palindromic CDY with what I show as a rate of 0.03531 per generation, or a 3.5% chance of mutation per generation. Rounding out the top three are DYS710 (0.018279) and DYS712 (0.016378). At the tortoise end of the speed curve are DYS632 (a glacial 0.00007), DYS436 (0.000204), and DYS426 (0.000216).

Net message, I suppose, is that there's really no good way to lump all the known STRs together in order to arrive at an aggregate mutation rate estimation the same way we can using tens of millions of single nucleotide polymorphisms which, individually, are far more stable and mutate far more slowly. And generational estimates for genealogy using only Y-STRs can look far different once Y Chromosome full-sequence data can be included in the comparisons.

Related questions

+6 votes
1 answer
+2 votes
1 answer
+3 votes
4 answers
275 views asked Mar 9, 2020 in Genealogy Help by Angela Veach G2G Rookie (220 points)
+6 votes
2 answers
+8 votes
6 answers
+3 votes
2 answers

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...