Help Understanding Y-DNA 111 Marker Mystery Connection

+3 votes


Three of us have tested our Y-DNA to 111 markers, myself to 700. Two of us are believed to be descendant from the sons of Jacob Pfau born about 1721 -- Isaac Faw born about 1773 and Jacob Faw born about 1771 -- while the third has only verified their oldest ancestor as Jacob Poe born about 1795 or so.

The page for Jacob Pfau lists the current research and questions regarding the accuracy of this common ancestor, especially the age at birth of sons and possible missing generations to account for this.

Concerning Jacob Poe, we are at a loss to explain the connection given the vast range of Y-DNA confidence levels and possible MRCA generations from TIP reports.

The breakdown is as follows:

Faw Tester 1: 

  • Genetic Distance of 6 from Faw Tester 2, Genetic Distance 3 from Poe Tester
  • Jacob Pfau would be 7th great grandfather barring issues with age
  • Jacob Faw II is somewhat sourced with clear and sourced path to descendants

Faw Tester 2:

  • Genetic Distance of 6 from Faw Tester 1, Genetic Distance of 3 from Poe Tester
  • Jacob Pfau would be 5th great grandfather barring issues with age
  • Isaac Faw is somewhat sourced with clear and sourced path to descendants

Poe Tester:

  • Genetic Distance of 3 from both Faw Testers
  • Jacob Poe is the 3rd great grandfather 
  • Jacob Poe is somewhat sourced with clear and sourced path to descendants

Paper trail genealogy lines up mostly with expected results and a genetic distance of 6 does show a confidence level slightly over 60% for a generational difference of about 8 for the two Faw Testers. Barring "missing" generation(s) between Jacob Pfau and the purported sons, this is a possible confirmation.

The Poe tester is where the mystery comes in that I am having trouble reconciling. One factor is that I cannot see the markers (Poe has not joined any projects) to see where the matches are to the Faw testers. 


  1. How would the Poe tester match each of us at GD of 3, while the Faw testers only match each other at GD of 6? 
    • I do understand that mutations can occur (or not) at any generation, but it is an interesting coincidence for me to see this.
  2. Is there a way to rule out generations between Jacob Pfau and his purported sons? 
    • I guess other than finding a mystery son with living male descendants
  3. If I could see the marker for Poe is there anything I could gleam from how that match each of the Faw testers?
  4. Given the GD of 6 and the paper trail genealogy and confidence levels for this considered enough for DNA confirmation (currently listed as such, but noted under research and investigation)?
  5. Am I on the right track here, what should I be asking and/or looking for next for my research goals of proving Jacob Pfau as the common ancestor for this line?

WikiTree profile: Jacob Pfau
in Genealogy Help by Stormy Faw G2G4 (5.0k points)

2 Answers

+5 votes
Best answer

Hi, Stormy. It certainly isn't unusual to see confusion when trying to correlate paper trails--deep paper trails that are generally beyond the scope of autosomal DNA comparisons--and Y-STR values. In fact, I'll be so bold as to recommend weighting the value of TiP reports quite low. I'm not as averse to TiP as some FTDNA project admins are (besides, all info is useful), but the representation of likely generational levels can be off by quite literally centuries.

Simple Tandem Repeats on the Y can provide us a very good indication of whether or not two males share a common patrilineal ancestor. And they can help with determining near-generation relatedness because the paper trail is often quite robust for the last 150 years or so. But yDNA as a whole can't be predictive as can autosomal DNA. By that I mean we have no recombinant DNA to work with, so there's no accurate way to establish any estimation of generational relatedness.

With enough data, though, we can reliably determine generational sequence, i.e., which patrilineal family branch came before which. You can try to estimate that with STRs using genetic network phylogram utilities, like Fluxus or SAPP, but they're still only guesstimates. The problem with STRs is two-fold. 

First, the mutation rates are independent. While there is some good experiential evidence that the mutation rates of certain STRs may have a relationship to one another within certain haplogroups, generally speaking they're freewheeling lil' guys. They change as often as they feel like it. In the data I've compiled for my projects (all principally R haplogroup), I range from CDY, moving at a positively highway-like speed of 0.03531 per generation, to DYS632 crawling along at 0.00007 per generation. The TiP reports, contrary to what is sometimes believed, do not go into that kind of per-STR detail. In fact, they really can't because even that kind of rudimentary calculation per mutation rate would be thrown off by a genetic distance of greater than one on any given marker.

Second, the sneaky STRs don't just mutate in one direction. Back-mutations can happen at any time (e.g., a grandfather might be CDY 38-39, the father 38-40, and the grandson back to 38-39). Even sneakier, but more rare, is a process called convergence where, essentially, STR signatures have evolved to look very similar, but have done so only by chance (again within a given haplogroup). You can read a piece Dr. Maurice Gleeson wrote about it, and the bad news is STR testing alone won't reveal this.

Really the only way to get to the bottom of it is via SNP testing. You've had a Big Y-700 done, and that's exactly where someone in the project has to start. It ain't free, but I can't say enough good things about FTDNA's whole Y sequencing. It looks only at about 41% of the Y chromosome because the remainder is either in the PAR regions or is within a huge stretch on the chromosome's long arm that's highly repetitive, making data there questionable. And FTDNA continuously updates results to keep them consistent with the current yDNA haplotree. For all those reasons--and that FTDNA confirms many of the STRs with traditional Sanger sequencing--the Big Y can't, at least right now, be replaced even with a 60x whole genome sequencing.

The other good news is that the verified SNPs are accurate enough to be used for chronological estimations. Back-mutations are probably not impossible, but I haven't seen evidence of one yet. When combined, the results from SNP and STR testing can allow you to start to draw a realistic picture of that patrilineal genetic tree. Here's an anonymized example from my own line's subproject. But a warning: it's a beeeg JPEG. It'll look like only squiggles until you zoom-in.

As an example of how the TiP report, or any generational estimation using STRs only, can lead you astray, you can see by that graphic that we fairly solidly know of some differing surnames originating prior to the common use of such in Britain; in fact, our patrilineal tree links up circa 900-1000 AD. At 111 STR markers, I'm GD 5 to three men and GD 7 to two whom we know can't have a shared patrilineal ancestor with my line since around the time of the Battle of Hastings. To those three GD5s, TiP hits a 92.47% probability at 11 generations. Sounds accurate, doesn't it? All those decimal places. But using an average of 32 years for the male generational interval, we see a common ancestor wouldn't be closer than 29 or 30 generations. Way off.

So my best recommendation would be 1) Try to talk at least one of the other men involved in the scenario to take the Big Y. Having two Big Y testers gives you a solid baseline because you can examine all in-common SNPs, even if they aren't yet a branch on the haplotree. 2) Spot snipping. smiley Testing individual SNPs is inexpensive, though tedious, time consuming, and less informative overall. But the STR results indicate a high level of confidence in the sharing of recent SNPs, so you can start with your terminal SNP on the haplotree. Since yours is a de novo branch, you might want to move up a level to R-FGC72195, or another level to R-FGC71677 or R-S1211, and see if the other two men will test there. Deep level SNPs may not be available to purchase as individual tests; the individual user's haplotree display when logged in will indicate whether or not it can be purchased from FTDNA. If FTDNA doesn't offer it, YSEQ in Germany might, and is another solid testing option. It's kind of a hit-or-miss approach.

Oh, forgot: and 3) Finding others to test, as well. My little patrilineal yDNA project has been going since 2003, and it really wasn't until we had about a dozen Big Y test-takers that enough data began to fall into place that we could start to see how some of the post-1600 brick walls lined up genetically. But we had several brick walls. And they're still brick walls in the paper trail, but at least we now understand the lines of descent. And if a new test-taker comes along who absolutely descends from a common male ancestor in the last thousand years or so, it takes us just minutes to place him on the correct branch of the tree.

Good luck!

by Edison Williams G2G6 Pilot (257k points)
selected by Stormy Faw
I am quite literally in awe at the amount of work and results you and others have been able to accomplish.

I also appreciate the directed comments and advice on moving forward.

As luck would have it, the other Faw tester has already purchased the Big Y; however, I am not exactly at a point that I can use those results confidently as I am lacking in the direction and understanding to put together something similar to the example you provided. What resources are you utilizing for this analysis? I would love to create a basis for future cousins and family as you have as I find more cousins to test. We have joined the Pfau, Farr, and Poe project groups on FTDNA, but I don't believe they are very active at this can take years for new people to join as I understand it.

Thanks again for your detailed reply!

Stormy, thanks for the best answer star. But, shoot; I never came back and answered your follow-up question. And nothing close to awe is warranted. It's more like having a really big jigsaw puzzle and, after almost 18 years, you finally have the corner pieces and most of the pieces on each side. It's tedious, not impressive...and we were kinda slow on the uptake.

No special tools are required. Uptopic were mentioned network phylogram utilities like SAPP and Fluxus and Gephi. These can be quite useful, but they're only a piece of the aforementioned puzzle. And while they serve as good indicators, they can't be fully relied upon. Utilities like that don't know about actual mutation rates or the exigencies involved (like STR back-mutation or SNPs in palindromic regions where the precise position may be ambiguous...this is one reason the Big Y test skips such a large percentage of the chromosome).

I'm pretty much inactive on Facebook, but I did join a private group a couple of years ago that may be of interest to you regarding phylograms and genetic networking:

The two requisites for your own yDNA project are having the highest quality genealogies possible of the participants (paper trail rules!), and the full set of derived (positive) SNPs and all tested STR values. For the latter, it definitely helps if a group project admin at FTDNA is involved, and if all participants set their rights to at least "typical" and to allow data to be published on the group project page. FTDNA will update test takers' results based on new discoveries (novel variants moved to named SNPs), new alignments to the haplotree (new branches created or branches realigned), and of course new matching reports. The rights-permitted admin can see all of that.

Barring that, each participant involved would need to download all their data (SNPs, STRs, matches...and of course family trees) and send them to you. Then you just start reconstructing that jigsaw puzzle!

Since you aren't R-P312 upstream, unfortunately you can't make use of Alex Williamson's "Big Tree" (uploading to YFull is an option, though). But that's where FTDNA recently adopted the look and feel for the "Block Tree" it displays in the Big Y results area. I'm really glad they did that because it's a good go-by and FTDNA keeps it up-to-date with the full haplotree. You can see my little area of the Big Tree here. We have 19 Big Y kits uploaded there and about half-again as many in the subproject that aren't. But you can see how it lines up with that big JPG I linked to uptopic, how the branched SNPs descend hierarchically (with each bounding box as a hierarchical step) and where bifurcation--splitting--occurs. In your own private project records, of course, you'll also be keeping track of individuals' novel variants because new tests matching those could pop up at any time.

A quick word here: if you're dealing exclusively with Big Y results, you're basically golden because you're matching apples to apples. If (as I hope but incorrectly predicted a couple of years ago) that whole genome sequencing really starts to take hold for genealogy, you'll have to get more sophisticated. Getting the Y chromosome's SNP data out in a VCF (variable control format) file isn't all that difficult, but that shows you only the SNPs that differ from the in-use genome reference map, so you likely will have to do some fill-in-the-blank work of your own to get the full picture. And WGS tests show data from all along the chromosome, even in the area that FTDNA doesn't test because it's so full of repetitive and indeterminate information. Also, you'll be dealing with a rather cryptic looking format when you check the VCF files, and the results will show only the loci position information on the genome map in use plus the you'll have to cross-reference that with SNP names and values that FTDNA uses. Manually checking STRs within the data (those are in the BAM and FASTA files, not the VCFs) is, for now, mostly out of the question.

Okay. Sorry; too much information. Net message: just be aware that if someone says they'll do a WGS rather than the Big Y, be happy...but know that you won't be able to easily or immediately incorporate that data into your project. For the near future, for the Y chromosome, Big Y rules.

Then it's just a matter of working the jigsaw puzzle backward. John, Joe, and Jim Smith all descend from Hezekiah Smith, one of four sons of Absalom. All three test. You take their in-common data and align them to Hezekiah in the genetic tree you're creating. If John isn't a good match with Joe but is with Jim, then there may be a problem with Joe. At some point--before or after Hezekiah and Absalom--you'll see consistencies in both SNPs and STRs. The consistencies allow you to spot and catalog divergence. After a while it also becomes relatively easy to tell where different surnames begin to fit, and to make reasonable determinations about whether there is a suspected non-paternal event or if you've cataloged a surname simply adopted a long time ago by that patrilineal branch.

Dating the hierarchy is still more art than science. FTDNA had told group project admins almost a year ago that they were plans to start including SNP dating estimates with the Big Y results in our individual control panels. Hasn't happened yet. A lot of folks use the dates as calculated at YFull and displayed on their haplotree (which is undergoing a version update right now; still available, but look for revised information to propagate shortly after the first of the year). Can be a place to start. But if you dig a bit, you'll see that their estimates are based on a paper the YFull folks self-published at ResearchGate; it isn't peer reviewed or validated. Still, it doesn't seem their dates are too extreme; the rumor was that FTDNA's dating wouldn't be factors of magnitude away from what YFull is doing.

That said, the more full-sequence data you gather within your own project, the more accurate near-generation information will become. For example, say John, Joe, and Jim all match nicely under Hezekiah, but you have an additional SNP defined for two descendants of Hezekiah's brother. As the data come in, you can start to confidently note that the split happened no earlier than Hezekiah's father, Absalom. Voila! A pretty precise bifurcation date.

It's a great time to dive into such a project. The data are available now. I joke that we were kinda dense because it took us over 17 years to get to where we are in our Williams subproject, but it truly wasn't until the appearance of the Big Y test circa 2014 that we had more than piecemeal STR information to work with. It's been leaps and bounds since then.

+4 votes

The GD calculation by FTDNA is rather crude calculation.  It would be better to load the STR data into something like this program:  SAPP

Then a least squares fit can be performed with all the STR data and relationship between individual testers established.  It works best with 111 STR data.  67 or fewer markers though can be misleading. 

If you can point out where the STR data is publicly available, I can do it for you.

by Andrew Ross G2G6 Mach 2 (26.1k points)
update:  I found the public Pfau project on FTDNA.  However, I see only 2 kits that are close to each other.

When I ran the program, their common ancestor was calculated to be 10 generation ago (9-11) or about 1700 (1650-1700).
Thanks for that. Both Faw kits are part of the Pfau project, so that it probably us. At a GD of 6, that estimate does somewhat align with FTDNA TiP estimates.

Related questions

+4 votes
2 answers
+4 votes
1 answer
+5 votes
5 answers
451 views asked Jan 18, 2020 in Genealogy Help by Gail Girard G2G Crew (310 points)

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright