Citing Y DNA Proof - Improvements to the standard required

+5 votes


In looking at the template for Y DNA proof citations, I see two major flaws in it that need to be improved.

I have talked to several experts and they advise for very good reason to NEVER use the TIP feature. 

The proper way is to say that a GD-0, GD-1, GD-2 and GD-3 will match within a range (an agreement would have to made which convention of chart to use).  The reason for this, probably best explained in the vernacular in the article below.

Quite clearly strong documentary evidence will always supersede the calculation for the reasons, particularly when the intersection is found in less than the the outer range shown and a GD-3 is as good as a GD-1 in these cases simply because mutations are entirely random.  

Additionally, as I work in heavily statistical analysis, as the article I think illustrates a mean average is a complete illusion and rarely actually occurs.

Also the actual Haplogroup is highly indicative a well - quite a different scenario when you are dealing with an R as compared to a J.

While exact reasons for mutation vary, I have noted among those who migrate considerably from the originating ancestor seem to mutate more often than those who remained stationary.   

Nevertheless, when a "science" overstates its quantification abilities, it shreds its own credibility and places it in the position where the exception will be the rule

The second issue concerns the citation of kit numbers.  While people may be happy with it, it is clear that FTDNA opts to hide them for good reason.  Given the concern about privacy, this needs to be rethought.  Certainly some may provide it voluntarily but it should never be a requirement or FTDNA wouldnt hide them


in Policy and Style by Lloyd de Vere Hunt G2G6 Mach 2 (26.8k points)
edited by Lloyd de Vere Hunt

Hello Lloyd,

Family Tree DNA cites kit numbers for all public surname projects.  Here is the Smith surname DNA project

FTDNA kit numbers are the first column on the left.

Sincerely, Peter

So I am glad I asked the question.

I see the word public in the url but the general entry point for most of us is logged in and joining a group.  So I logged back out and still could access it.  It is an interesting wrinkle.  Would I suspect then that the public would find this link then by google, because they don't seem to have a public interface (unless I missed it)

I asked the question because as a user, I see that you "waive" or "reset" (you can chose as you wish) some of the inherent privacy for full participation in the group.

Anyhow, I was a bit concerned on behalf of those concerned about privacy (I'm not personally) because it is easy to assume that your data on the site remains part of the community of testers...and that in FTDNA's case, the kit happens to coincide with the login and if you want to combine your Y and autosomal kit, it takes a long time as it is.

So, I felt this was a solicitation for users to share beyond the scope of what FTDNA appears to protect.

So I trust you know then why I asked the question....
I don’t understand what harm can arise by publishing a FTDNA ID (or GEDmatch ID).
So the difference that gedmatch (and I guess you would say that wikitee has) is that your kit or profile designation are different than your login, whereas they aren't for FTDNA.

With an application or intranet, I can see why login and profiles are generally inextricably linked for all its advantages.

But on the general internet, particularly where they are to be shared and given prominence, a better practice would be to keep them separate
I'm not a member, but I often Google those tables.

But I can't link the kit numbers to people.
Hello Lloyd,

My Family Tree DNA kit number is 8867 and my GEDmatch ID is T412069. They are linked with my name here and on my profile  

Kit 8867 can be found at

Please let me know why it would be a better practice to keep them separate?

Thanks and most sincerely,

"I'm not a member, but I often Google those tables. But I can't link the kit numbers to people."


"Please let me know why it would be a better practice to keep them separate?"

2 step authentication is compromised where you bring the two together and you have the links to create an entire profile. 

These things don't bother the genealogical hounds like you and me - but they might the average user.

Just a word to the wise.  I don't think that we need to belabour the issue.  Just to be aware.


So my compromise is to provide a link to the Hunt family FTDNA site and point them to the right group with a footnote link.  In this respect, they actually have better information without the clutter.

My number is provided voluntarily so they can find me - but here the rest of the names are not connected to the profile (though you can do a 50/50 in the J world lol.  I like to show it because it shows the startling reality that there are some 80 different Hunt families.

By the way, poor late Tom - he was tested way back by a sponsor and waited alone as no one who wanted to did match him until Dave, another sponsored match came along toward the end of his life.  So I cant very well have him set up because his legacy was given in terms of his results to Peter who put them on Y-search and well..............

This issue will cause problems later as many people don't have a legacy plan for when they are gone - even though FTDNA provides for it (not sure about the other sites.

Meanwhile I found the perfect autosomal match to dad and I which further attests to a recent connection and have added an annotated entry to explain it's significance and will continue to add as new information comes forward.

2 Answers

+11 votes

Hi, Lloyd. I'm not quite certain where to begin regarding your question, but I'll keep it to DNA matters only, and I'll have a couple of questions for you, as well. The first of these:

"I have talked to several experts and they advise for very good reason to NEVER use the TIP feature."

What experts have you spoken with? The FTDNA TiP feature, while imperfect, is not atrociously so. My biggest difficulty with it has always been the number of significant digits used for its presentation of possible generations-to-MRCA. I would prefer to see the percentages rounded: carrying them out to 1/100th of a percent implies a precision that simply isn't possible.

"The proper way is to say that a GD-0, GD-1, GD-2 and GD-3 will match within a range..."

Which is what the TiP utility does, while comparing two sets of data and making some allowances for individual marker mutation rates. But you may have been led to believe that all estimations of genetic distance are equivalent. They're aren't. Not only is there more than one model for arriving at genetic distance for Y-STRs (FTDNA switched to the infinite allele model in summer 2016), but in the two decades that we've been recording and compiling Y-STR mutation rate data--and continue to do so--we've learned that individual markers differ from each other by as much as a factor of magnitude in the observed average mutation rates.

For example, the palindromic marker CDY is perhaps the most notorious for being volatile and fast-moving. Some observed mutation rates have placed it as high as 0.0353 per generation. The slowest marker I've seen so far is DYS632, estimated to have a 0.00007 mutation rate per generation. Yet a copy count difference of one at either of those markers would constitute a GD of 1. In this regard, TiP actually does a mediocre to fair job, given the complexities. I have frequently compared STR marker data side by side and seen equivalent genetic distances evaluated differently by TiP as confidence-to-MRCA based on which STR markers differed, which is the only accurate way to approach such gross estimations.

"Quite clearly strong documentary evidence will always supersede the calculation for the reasons, particularly when the intersection is found in less than the the outer range shown and a GD-3 is as good as a GD-1 in these cases simply because mutations are entirely random."

For genealogy, DNA evidence of any type always has to work in lockstep with the paper-trail. The closest we can come to DNA serving as evidence sans paper-trail are the autosomal results of twins, parents/children, and full siblings. That said, the blanket statement that a Y-STR genetic distance of 3 is as good as 1 based only on the paper-trail simply cannot be made.  

First, like anything else, the evidence has to be closely evaluated and--both a benefit and drawback of yDNA, being male-specific, haploidy, and escaping crossover--yDNA data easily reaches back into timeframes beyond any genealogical evidence. Second, any summary of genetic distance can only be considered against the number of markers tested. A GD of 3 at 37 markers can be vastly different than a GD of 3 at 111 markers: the two individuals might in fact be GD0 at 25 and 37 markers, and the differences only appear after DYS438 or even DYS565. For example, 25 STR markers are commonly used to determine possible descendancy from Niall of the Nine Hostages. A GD of 1 among those 25 markers has markedly more impact than does a cousin to me who is a GD of 1 at 111 markers at the relatively fast-moving DYS710.

I'm assisting right now with one group of five matches where the earliest known ancestor from one of the trees shows as born c. 1705. They had a paper-trail hypothesis about how four of the five lines connected. The results indicated that two of the lines probably converge earlier than thought, and the two lines thought to have the most consistent paper-trail were off by at least two generations to their MRCA, who now looks to have been born in the mid-1600s or earlier. To do a reasonable job in a comparison for that purpose requires use of phylogram modeling: you have to try to understand which mutations happened in what order framed against the paternal-line chart. Rare occurrences like recLOH events and back-mutations aside, developing that sort of dynamic model--using the actual mutations weighted for individual mutation rate probabilities--is really the only way to arrive at a more granular look at the picture than the high-level grouping in DNA projects or the overview of generations-to-MRCA that TiP provides.

An aside: Y-STR mutations are not entirely random. The aforementioned recLOH events is one example, as are certain null values and copy counts more common to some haplogroups than others, and the propensity of DYS464 to see additive or subtractive multiples. Markers also vary in copy-count diversity. For example, DYS454 will always have a repeat count of either 10, 11, or 12; DYS464 will typically contain four to eight copies, or sub-markers, each of which can range from 9 copies to 20 copies. Some markers simply have more room to move than others.

Speaking of the haplogroup, unless we're talking NextGen Y-chromosome full sequencing, really the only thing a haplogroup is genealogically useful for is negating the possibility of two men sharing a common ancestor. A matching haplogroup--yDNA or mtDNA--at any clade or high-level subclade is zero evidence of relatedness in the genealogical timeframe. Not long ago I had communications regarding someone who was adamant that his tested SNP of M222 was "proof" of descendancy from a minor branch of English royalty...never mind that M222 bifurcated from DF23 sometime around 4,500-3,900 YBP and pre-dates any English genealogy by at least a millennium. Oh, and a note that without specific Y-SNP testing, Y-STR values can only predict a haplogroup; they do that well at high levels, but not definitively. And that prediction can't be assumed to be refined via STR matching; in other words, if my predicted haplogroup is, say, DF29, or I1a, I can't presume I am deeper in the phylotree, at Z74, only because I have an STR match to someone who did have SNPs tested to that level.

"While exact reasons for mutation vary, I have noted among those who migrate considerably from the originating ancestor seem to mutate more often than those who remained stationary."

Do you have any sources of empirical data for that? I've been at this for a fair number of years and have never heard of any correlation between geography and Y-STR mutation rates. Age of the father at conception of the son, yes; but not geography.

To some it up, without detailed evaluation like phylogram modeling, TiP does a reasonable job for the constraints it has. For a snapshot overview, it's about the best we have. To simply look at a comparison by genetic distance only, without consideration for which markers may have mutated, we can view FTDNA's charts for 37 markers, 67 markers, and 111 markers. Considering Y-STR matching that way is mostly useless to genealogists unless dealing with GD0 or GD1. For example, for GD3 at 37 markers, we see that the two males are related, but the most it can be tightened down is: "The relationship is likely within the range of most well-established surname lineages in Western Europe." If a paper-trail shows a much closer TMRCA than would be indicated by the TiP report, close scrutiny of both the DNA and the paper-trail is called for.

by Edison Williams G2G6 Pilot (309k points)
Edison, though I didn’t understand all of what you wrote, I learned more reading your answer than I have reading much online material. Thanks!

Edison, as Pip Sheppard said this is some well-explained information. I have enjoyed some of your previous discussions as well.

One thing I don't think you addressed was Genetic Distance from the modal. When more tests are tabulated, the modal becomes clearer. In our largest group, known as Blue Group, we have 28 matching tests of 26 Beasleys (including 3 of the most common spellings) and 2 non-Beasley, some individual variances are as much as GD4. But none of the tests are beyond GD2 compared to the modal. 

Our two earliest documented MRCA are William and John, both born in the 1680s. They are co-located in Baltimore, Maryland in the early 1700's. By 1730, William migrated to Northern VA and subsequent generations scattered west to the Pacific and mostly northern US, and John to Craven Co NC subsequent generations to SE states. There are NO mutations unique to one of those two branches. My take on this is that, barring some rare circumstance, the two of them would have been GD0 and possibly brothers or likely near cousins. (Please let me know if my interpretation is mistaken.) There are several lineages where the patriarch has not been connected but migrational and naming patterns suggest that they are likely to have descended from John.

Perhaps there are better ways of doing this, but I have made some charts of the lineages. I invite anyone interested to take a look. I would be interested to learn Edison's take on the modal GD and my efforts to represent this information. Blue Group Charts. These charts do not include the two non-Beasleys.

Douglas, sorry for the belated reply; my online "me" time is limited to very early in the mornings and late at night right now. But I'm with you on the value of determining the modal haplotype for a grouping.

One downside--which you don't have in the Beasley BlueGroup, and that you noted in your post--is that the modal can be pure guesswork if there are only a few people in the family grouping. Unless there is a clear winner early with a GD0 pairing or triplet, it's kinda of a chicken-and-egg thing until more tester-takers join the project. It's exacerbated a bit in small projects by the FTDNA colorized results: they always display min/max/modal values, but the modal may not be the modal yet because there are only two or three men in the group. But it makes it look like there's a defined modal.

Technically, the modal would be the most common haplotype in the grouping, one typically that at least 75% of the men share a strong correlation with. Which, theoretically, should also make it the oldest. But it doesn't always work out that way. Before its GDPR-demise last May, one thing I really liked about was a term Terry Barton coined: Apparent Ancestral Profile, or AAP. The idea was, as you worked your way through the data gathering and detail of a project grouping, you may end up with no clear modal haplotype in the beginning, so you relied on paper-trail plus early results to hypothesize an Apparent Ancestral Profile, a patriarch--even if a construct--as a place to put an anchor in getting a 500-foot view of the biological lineage. 'Course, it was dynamic and intended to shift as more data came in and, hopefully sooner rather than later, the AAP and the modal haplotype would meld and you'd have a pretty fair idea of the paternal MRCA of the group. I still like to use that concept when data is scarce.

I end up with something very similar to your BlueGroup charts. Maurice Gleeson does, too, but his are far more detailed than any of mine have been. I really like Maurice's output, but it's a boatload of work since he doesn't use a for-purpose tool; he builds them manually in Excel.

Love it or hate it, I still start with Fluxus Network ( for the phylogram (which some prefer calling a cladogram, but I associate clades with haplogroups, so I stick with phylogram) after I get at least four in a group. It's really pointless with only three, and still sorta pointless with only four. The default renderings are not what you'd call attractive, but nodes (kits and nexus--or bifurcation--points) can be moved around, and you can always then pop the result into Visio or Photoshop for tweaking. As more kits with differing mutations are added, the renderings become much more interesting. Too, its input is a plain ASCII file (so you can alter that a bit, if you like, like switching the kit that's defined as the modal) and it allows weighting of the Y-STRs before rendering so, as I mentioned in the last post, you can down-weight fast-movers like CDY which can change their position in the phylogram. Here's a simple, but modified, example from the ISOGG Wiki:

However, that only takes into account STRs. With deeper sub-clade SNP testing, ya gotta find a way to include that in evaluations, I feel, in order to make better informed judgments. Having multiple Big Y testers in a group can be priceless. It can provide a more objective reference for positioning in the tree, and even help identify instances of Y-STR convergence, where a mutation occurred then, generations later, a back mutation brought it back in a circle making the match look far closer than it really is.

SNP considerations can go in between phylogram and charting, of course, which what I do a less refined extent than Maurice Gleeson. SplitsTree from Universität Tübingen ( is another option for diagramming, but one I've started experimenting with, and that looks promising, is Dave Vance's SAPP ( Dave writes: "Why did I write SAPP? Because it was the tool I needed (frankly, I'm tired of trying to make sense of Fluxus charts and wanted something a little easier to read)."

That said, I've been slow getting around to putting SAPP through it's paces with bigger, more complicated files--like my Williams clan--because it's been rumored that our own Chase Ashley (Ashley-1950; yer ears burnin', Chase?) is working on a phylogramming tool of some sort. Chase did a bang-up job on his grouping app for FTDNA yDNA project admins ( and I'd really like to see what he develops before I give myself over to learning SAPP.

Oh, and thanks, Pip! If nothing else, I can ramble endlessly at a keyboard with the best of 'em. <cough, cough>  wink

What isn't mentioned is that  GD or Genetic Distance varies, between two or more people, as they test more YDNA STR's. at Y12 a person can have hundreds of matches, maybe even thousands,  but FTDNA only shows matches with a GD-0 to GD-1, and those matches are all over the geographic and genealogical map and the case I'm looking at has GD-0 at Y12 with persons with whom he does not share a recent common ancestor, but one that lived  thousands of years ago, and strangely those with whom he does share a recent common ancestor with the last 750 - 150 years are at a GD of 1, at Y12..

At Y25, he has 59 matches with cousins, with a GD of 1 to 2. The very same people that a Y12 were reported as GD of 1.

At Y37 he has 67 matches, but GD has expanded to 4

At Y67 he has 39 matches and GD has expanded to 7 (reason for decreased matches is a decrease in persons upgrading to Y67 from Y37, also all 39 matches share a common ancestor born about 750 years ago or less.

At Y111 he because even fewer of the Y67's have upgraded to Y111, he has 19 matches, ranging from a GD of 2, to 10.

12 of those, with a GD of 2 - 5 share a common ancestor born in America 1637. the other nine share with the 12 a common ancestor born in Northern England about 750 years before the present or less.

Of those 19, 16 have tested Big Y, and of those 12 share the same ancestor, per paper trail, that was born 1583 and migrated in 1618


Yep. That was sorta touched on upstream, slightly, but it's an important point for yDNA testers. When an individual goes to his personal profile to look at his Y-STR matches, FTDNA will only display "matches" that are:

  • 12 markers: 0-1 GD (and GD1 will only be displayed if both men belong to the same group project)
  • 25 markers: 0-2 GD
  • 37 markers: 0-4 GD
  • 67 markers: 0-7 GD
  • 111 markers: 0-10 GD

I'm not certain yet precisely how they're handling STR matches that fall into the 112-marker-plus range that comes with the Big Y-500 test.

Our buddy Chase Ashley will correct me if I'm wrong, but I believe his project matching utility uses the same criteria at first pass, but places no one into any grouping of 37-markers or higher that shows greater than GD4 at 37.

(An aside but similar: with mtDNA some testers are confused about their HVR1/HVR2 results and difference to the rCRS or RSRS standards. FTDNA only shows exact matches as matches in the hypervariable regions. Because mtDNA mutates so slowly (and unusually: heteroplasmic only), exact matches are already likely to be related only several to many hundreds of years ago. For the full sequence test of the tiny molecule, up to three differences are allowed and they include heteroplasmies (although FTDNA QC procedures screen out any heteroplasmies that show less than a 20% occurrence), and two high-frequency insertion/deletion reference clusters, 309 and 315, are ignored completely. Back to yDNA...)

Project administrators have the ability to manually sort groupings within a project (as you know). Back before FTDNA began automatically including an STR upgrade with the Big Y, some folks would test at 12 markers only just to get their kits on file so that they could upgrade to Big Y. Or, similarly, I've seen some (though I never recommended it) test to 12 in order to order a specific SNP or SNP panel as predicted by other matches.

Bottom line there, though, is that SNP testing can tell a different side of the same story. Someone who's tested only 12 or 25 STRs can have deep-clade testing on file that, along with highly correlative STRs, place them solidly into a project grouping. On the other hand, if STR convergence is making a match look much stronger than it really is following a back-mutation somewhere along the line, deep SNP testing can uncover that and allow the kits to be sorted more accurately.

And my belated apologies to all the WikiTreers familiar with autosomal testing who found themselves in here wondering, "What the heck are they talking about?"  smiley

Great responses. FYI I copy paste your posts to a geneticist, he agrees but expands.

But I think you mispoke here:

12 markers: 0-1 GD (and GD1 will only be displayed if both men belong to the same group project).

Not true, I am looking at a kit with 729 Y12 matches, and most of the GD-1 are not in the project, and their names are Russian, Slavic, a few Arab, a few Scandanavian, Turks.. Haplogroup you expand the STR testing, you also expand the SNP's and narrow the number of matches.

At least in this project, when you get to Y67, everyone there shares a rather recent common ancestor (Defining recent as within 750 ybp.

Some matches are not in the project, and don't answer an invite to join.

I speculate that because their surnames differ from that of the project, and that GD is so close at even Y37, that they fear exposure of an NPE in their ancestry, and possibly invalidation of their researched and documented family tree and myth.

On the other hand some members with the same surnames, have proved to be of three different haplogroups I2, R1b and R1a1 of Viking Flavor.

But the original English spelling was occupational (ferror), and that of a maker of iron, as opposed to one who works with iron.a smith.

And there were significant iron ore deposits in England, and the surname and it's variations (due to region,  community, Class accents) make their appearance in those regions with iron deposits, hence an iron producing industry dating even from Roman times, perhaps Celtic times.

Thanks, Jennifer. As to the 12-marker GD0 or GD1 thing, I was going by this: Expanding the item titled "On the Y-DNA - Matches page, are only exact matches shown?" is:

"For Y-DNA12 matches, 11 out of 12 matches are only shown with both customers belong to the same group project. However, to best serve our customers who are adopted, we provide at the Y-DNA12 level both 11/12 and 12/12 matching to the entire Family Tree DNA database to those in the Adoptee Project. This is because they cannot know the best Surname project to join in advance of testing."

Some of those reference pages have gone a long time without update by FTDNA, so I can't say whether that bit is correct or not. I admit I seldom use 12-marker matching for much of anything.

On the invited-matches-won't-join-the-project thing, I feel ya. One of my projects is for a surname about which a beefy tome was written in 1932 about the lineage in America. Many with that surname take that book to be gospel...with the result that few have been interested in yDNA testing and two with the surname who did test and who do strongly match men in the project have never joined the project after being invited. Or communicated much at all with their matches. Particularly frustrating is that one of those men, based purely on the GD and at which level of marker panels the differences occur, is I believe a possible key to understanding how two of those in the project might connect (their GD contradicts the paper-trail), and take a step toward defining a modal haplotype for that grouping. We keep workin' at it....

Vis a vis FTDNA updating. I don't know if you are involved in their Big Y testing, or track it, but they upgraded the chip from hg 19 to hg  38 and nothing has been the same.

Testing is done by computer, no humans involved, assigning or recognizing SNP's is a mix of computer and human, and they have not got things sorted out yet. The job is bigger than at first thought, IMHO.

However Y12 most certainly is not people of the same group, at any of their published GD's.  

FTDNA has a lot of work to do, and it is a Herculean task to clean up and update all pages, FAQ's, procedures.

There are constantly new glitches showing up. I could once click on Matches under Big Y, Now they have Big Y 500 (500 STR's), when I click on Matches, I get Results (there is a results button as well) to get to matches I have to click on the Matches tab, under Results.

They are working on the problem (I think).

But of all the testing companies, FTDNA is the best, IMO, they offer separate full sequence mtDNA (which I consider to be of no value in genealogy as mtDNA is such a slow mutator that TMRCA for matches have to be in the thousands of years.

AuDNA is OK, basically the same as Because of it's limitations I take it with a grain of salt. (it shows a relationship between 2 2nd cousins, and a relationship between one of those and a 4c1r, but not between the 4c1r and the other 2nd cousin. Obvious answer is that a segment of the surname DNA fell out or was so small as not to be considered relevant.

YDNA, is the best, as you know men seldom change their names, and YDNA is a fast mutator comprised of at least 500 maybe 1,000 STR's or DNA Y Segments (DYS) and has SNP's that appear serially, such that a SNP lineage exists, and if one can ascertain a persons terminal SNP then one can with comfort trace the tree back, stopping at Common ancestors along the way.

As you know SNP's are not really terminal or basal as is the other useage, but they keep appearing about every 150 years, but aren't named until at least two men test the same SNP.

In the project to which I refer. There are 12 men who can solidly trace, via DNA their first American ancestor to a man who landed in 1618 - SNP name YP5905, but YP5905 didn't stop mutating, at least two line tested men who are closely enough related to have subclades one was YP5905>YP27595, the other were three who tested YP5905>YP6373, which appeared between 1718 and 1777l. And one of the YP6373's tested is grandson, and there appeared BY30954.  We don't know at what point in time BY30954 appeared but I doubt it was with the member, more likely an ancestor born sometime in the mid 19th Century.

Thought I would jump in, since my name had been mentioned. A few comments on various points raised above:

1. TiP and other MRCMA calculators. It is important to realize that the percentages they output are calculated percentages based on a simplistic mathematical model of how STR mutations work; they are not percentages based on empirical evidence of whether actual people are related. For example, if the model says that there is a 50% chance that two men share a common male ancestor within 4 generations, that does NOT mean that there is any evidence that shows that 50% of the time two men with those STR profiles in fact share a common male ancestor within 4 generations. In my experience, TiP is OK in its estimates where 2 men do in fact share a common male ancestor in the genealogical profile, but it is incredibly inaccurate if they don't and will frequently suggest 70-90% probabilities for men who are false matches due to convergent mutation, something that happens a lot with R-M269 males.

2. Public FTDNA data - FTDNA admins can elect to make the project public or limit its viewability to project members. If the admin elects to make it public, however, only the results for members who have elected to share their data publicly will be viewable to someone who is not logged in as a member.

3. "FTDNA switched to the infinite allele model in summer 2016" - I believe I read that FTDNA uses the infinite allele model for TiP. However, for calculating genetic distance, it uses a blend of the step-wise method and infinite alleles method. It uses the step-wise method for most STRs, but uses the infinite alleles method for certain STRs with null values and for parts of multi-value markers. Since my app is based on gd calculation, I have tried hard to figure out, and match, FTDNA's algorithm, which isn't easy since they don't disclose it, at least not all in one place. For how (I believe) FTDNA calculates gd, see "How does the app calculate genetic distance between two kits?"

3. "Rare occurrences like recLOH events and back-mutations aside" - I don't think there is any reason to believe that a back-mutation in an STR is any less common than the original mutation. The mathematical models for step-wise mutations assume that the probability of mutation in either direction (+1 or -1) is the same. It's possible that mutation probabilities from a given value might be higher in one direction versus the other or may differ for an STR depending on the STR value, but there is no data on this (as far as I know).

4. Modals. I calculate group modals in my app, but I think it is important to realize that they may or may not be close to the STR values of the common ancestor. A modal value would only match the STR values of the common ancestor if (1) the STR values for each marker were represented in the descendants of the common ancestor in a "normal distribution" and (2) the persons whose STRs got tested and included in the project represented a random sample of the population of male descendants of the common ancestor. Re 1 - There won't be a random distribution because some male lines die out while others have large numbers of sons, which can create lumpy distributions with modals at different places than would occur in a normal distribution. Re 2 - This will almost never be true and, as a result, the modals will just represent the modals of the men who happen to be test takers in the project group, which will not be representative of all male descendants of the common ancestor.

5. Phylogram/phylogenetic tree programs - Yes, I was starting to work on one, but I have stopped. Part of the reason is that I think they are only useful for projects that have a large number of members in a group. That is true for clan-based surnames, but not most other surnames. Another reason is that I am not particularly sold on their value. After thinking about algorithms to create one, I concluded that there are multiple possible trees and that no algorithm using STRs can ever accurately determine the "correct" tree. In addition, a phylogenetic tree may not be that useful to determine a genealogical tree because it omits any generation in which no mutation in the tested STRs has occurred, which can happen a lot. Lastly, I suspect that Dave Vance's SAPP program does a good job and also, by allowing you to input SNP values and genealogical info as constraints, it is likely to result in a more probable phylogenetic tree than you would get from STR values alone (which my app would have used).

6. GD cutoffs for matching and grouping. Based on my recollection,  I think the gd cutoffs that Edison listed above for what "matches" FTDNA shows you are accurate. However, just because a kit has a gd that qualifies as a "match" does not mean that FTDNA thinks the kit has a gd that indicates it probably shares a common ancestor within the genealogical time frame (which FTDNA defines as 15 generations). FTDNA wants to ensure that people see lots of matches, because customer's aren't happy if the don't have any matches, so, for higher STR tests, they use looser gd cutoffs for show someone as a "match" than they use for "probably related within the genealogical time frame. For FTDNA's guidelines for "very tightly related," "tightly related," "related," "probably related," "only possibly related," and "not related," see FTDNA's "Expected Relationships with Y-DNA STR Matches." Note that the gd cutoff for a "match" and "probably related" are the same for 12, 25 and 37 STR tests, but for 67 STR tests, the "match" cutoff is 7 while the "probably related" cutoff is 6 and, for 111 STR tests, the "match" cutoff is 10 while the "probably related" cutoff is 7. My app (1) only looks at the gd at the greatest number of STRs that both kits have tested at (ie if one kit tested 67 STRs and the other 111 STRs, the app only looks at the gd at 67 STRs and ignores the gd at 12, 25 and 37 STRs) and (2) uses the "probably related" cutoffs to form groups -- a kit will only be included in a group if it has a gd with another kit in the group (based on greatest number of STRs both tested at) that suggests that they are "probably related" . 

Excellent Chase. Absolutely excellent. Your post goes into my DNA Word file along with Edison's.

I pity R-M269's though,as you know this is the most ubiquitous DNA in Europe and England, the Western Atlantic Modal., what is seriously needed is testing to reveal subclades of R-M269 via Big Y.

I understand, from the R1a project, that within a relative degree of confidence, that an analysis of STR's can lead to a prediction of subclades. Here is an example.

Set Markers to Y111, and number to 5,000 after it loads go to page 2

Search for YP5578, you will find 16 kits, four of which have not tested Big Y but are predicted to be YP5578.

In fact of the YP5578's (tested and predicted).three are predicted to be YP5905, which appeared 375 ybp.

When you check Y67's, it becomes more remarkable, because they were predicting SNP's, before many of the those that subsequently tested Big Y.

There were, and probably still are, erroneous predictions. I know of one, and at Y111 I see possibly two more.  Only convincing one of the two, one is deceased, to test Big Y will the truth be revealed.

A lot could be clarified by Big Y testing, especially with so many available R-M269's, alas Big Y is prohibitively expensive, $600.
Jennifer, you depress me. I’m R-M269! Ugh.
Apologies, not my intention to depress. I suggest that you go to the project, assuming you are a member of FTDNA, and join the project, then test Big Y, it will be worth it.

But R-M269 is the dominant DNA in Europe from the Balkans to Spain, through France, Germany into the British Isles. Those lands the Romans called Iberia, Gaul, Brythania especially.

But upgrade to Big Y and you will drill down and narrow your ancestral origins.

Meanwhile go to their Yresults page, first seet page size to 5,000 then  YDNA 67 (you get more results with Y37 and takes up more pages.I

There are three pages, 5,000 each, for 15,000 R-M269

The SNP's in red are those that are predicted to be R-M269 based on evaluation of STR's, but those in green apparently are confirmed either by testing of a SNP pack (cheapest) or testing Big Y. You can upgrade to SNP Pack M269 (advised)
Me too - I couldn't imagine being an R-M269 because it seems that you have only eliminated 10 percent of the testing population while the rest of us eliminate 99% or more even at 37 - also from discussion emerging at least at J, increasingly there appears to be a far greater tendency for mutation - for who knows why - meaning talking about Y may need to be segmented by group and rules tailored for each one of them
What SNP defines your haplogroup. J-M267. M172/ I strongly recommend that you upgrade to Big Y.

Have you read the wiki article on hg J?

You mentioned some middle names. I am not familiar with moderns using the custom of middle names to perpetuate a genealogical line.

It was a custom, not routinely followed, to give a child the middle name of the mother's maiden name. Sometimes all children would be given that middle name. A good example is William Randolph Hearst. His mother was a Virginia Randolph, wealthy and influential were the Randolph's, self styled one of the First Families of Virginia. Thomas Jeffersons grandfather was William Randolph, his brother Thomas bought up a lot of land along the James River (at bargain prices)

In any event at least two of the names you mention could be French and one of them Fitzmaurice, Norman. for Fitz is the English version of fils de meaning son of.
+3 votes
The specific difficulty when comparing men with the same surname is that they'll often have a common ancestor back in the 15th century or earlier.

So if the DNA doesn't put a confident tighter bound on how far back the common ancestor is, it's not evidence of a speculative closer connection (though it does tend to rule out an NPE).

Sometimes, descendants of two immigrants match, so it's claimed that the immigrants were brothers.  But they could have been 2nd cousins.  Too soon to confirm both lines straight back to the carefully-selected English father (whose wife was a princess).
by Anonymous Horace G2G6 Pilot (568k points)
RJ,two men with the same surname, do not of necessity indicate a common ancestor back to the 15th Century or earlier,

English surnames, per se, did not exist in a heritable form, passed on to son by father, until the Poll tax of 1377.

For ease of tracking populations a schema was devised.

Surnames were based on occupation (ex: cooper, Smith, Wright,Dye, Carpenter, Ferror)

Physical characteristic,(Whitehead, Armstrong, Tall, Short, Wise)

Patronym (son of John or Johnson - Saxon, Johns-Welsh/Scotch, Johnsson- Norse,Fitzjohn (fils de John) if Norman descent.

In some areas predominantly Norse or Danish, a person perceived to be of Angle descent would be called English. Sometimes an immigrant might be named for their country of origin.

Locational (Forest, Hill, Warren, River, Whitby, Gouldsby, etc)

The "by" at the end of Whit and Goulds is Norse for farmstead, and denotes the original farmstead of Whit, which grew to become a vill, and continued to grow to become a town or village.
My name is precisely that - Hunts are occupational - according to FTDNA Hunt there are about 80 complete distinct families, only three like me are J2-M172 and the other two aren't even close in markers.

For this very reason, I have been trying in vain to get some of the fledgling Hunt relations to avoid combining them into one family.  

I would really like to leave a few for another family lol.

When I see someone else with the name Hunt, I assume that they arent related unless I see the name central name de Vere or others like Maunsell, Pfeilitzer, Urquahart or FitzMaurice etc stuck somewhere in the middle.

My only other matches are two other proven de Vere Hunts beyond a shadow of a doubt and one other person with another surname who knows that it is incorrect
Oh yes, people with the same surname don't have to share a common ancestor - a horse I often flog myself, pointlessly.  But having said that, they often do.

Often enough that you can't use yDNA to prove genealogy, because there's a good chance that the DNA will match as expected in spite of the tree being all wrong.

For instance, we can't trace Robert E Lee's ancestry back to the origin of his surname.  Which means a lot of Lees who share the name, the common ancestor and the DNA, but can't be connected by a genuine paper trail.  But so easy to come up with a bogus paper trail and point to the DNA .
True enough you can't use DNA to prove genealogy, but by the same token you really can't use the so called paper trail either. As you said too many bogus paper trails, too much wishful thinking, too many forced outcomes.

And I have made my opinion on "paper" trails known.

I will say this about DNA,in some cases, when dealing with a rare or unique SNP and series of SNP's, which I can translate into an ISOGG hg such as R1a1a1a1b2h and subclades.  One gets very close to identifying the common ancestor, if not the intermediates, whom because of a lack of paperwork, are brick walls.

For instance in afore said project we have people who have "solid" documentation that takes them back 11, 12 or 13 generations, and others because of a lack of documentation can only go back 6, 7 or 8 generations, however in this particular case, perhaps unique, by testing Big Y, even those with missing branches can be identified with the ancestor whom others can trace back.

And then there is an unknown ancestor, who was born 750 ybp, who is an ancestor to all, even those that don't share his surname (which at the time he did not have).

The point is, and this is important for those who just have to have proof of ancestry that they can join this or that organization, that (at least in some cases) a SNP can provide just that proof., even though the actual paper trail is lost to history and time.

As an aside, it has been brought to my attention that Americans are more obsessed with royalty than the British. Yes they make a big to do over the Queen as head of state, but they need some kind of political continuity for social cohesion and the Queen serves that purpose.

It sounds ridiculous to American ears to hear of Brits going into battle to fight and day for King (Queen) and Empire, or Officers raising a toast to same..  But not to them.

So  I have been fortunate as have been my other matches to have a phenomenal paper trail.  Here is just a fraction and illustrates not only records but continuity, context, story and interaction often missing from them.

Our threesome has Burke's Landed Gentry where from our common ancestor,  Dave's (from son Henry) family is covered for 7 generations including his father, I am covered for 4 (son John- 2nd wife, Bowles) and Tom is covered for 2 (son John - 1st wife Hicks).

Additionally in the early 1800s, Aubrey Thomas de Vere (Hunt) successfully proved his descent from Lady Jane de Vere and Henry Hunt for the purpose of adopting the extant name as his last name (as descendants of Susan had done in England)

Burke's wasnt simply a kindly place where people could publish their family history but had a regulative and investigation function.  The only errors that I have ever found were on wives in two rare cases, where they weren't covered in their own family as well as the family to which they married and didn't always get their fair shake

In my case, John's son Henry appears both in our account and that of his oldest daughter's husband, Kearney.  It notes that he was the State Apothecary.

Evidence shows that Henry was in fact the first  State Apothecary appointedin Ireland appointed in 1783 to begin his term in 1784 and by Act of Parliament 1791 became the first Governor of Apothecaries' Hall, an organization that continues to this day.  

His son, James, my fourth great grandfather succeeded him as State Apothecary on his death in 1796 and then nephew on his death in 1918, George Kiernan who writes his thanks  for the appointment and the intention to enjoin uncle Thomas (James  other son ) as partner and eventual successor. 

I can trace all their family business history in the papers, wills including addresses.  Sources like Find My Past and Nick Reddings clippings, wills and other documents, wills, several court cases, or stray notes that socialize the family (this is the most critical aspect of research validation.).  There is also a document in which Henry seems to know the names of his youngest sister's husbands 

James son Charles my third great grandfather married a Baron Pfeilitzer's Daughter (the only one in England - and have all the information on him from his public record parties, military and civil service career and his past in the Adels courtesy of my German and Russian cousins, pictures of the family crypt in Latvia. full records socializing the family in St Kitts, Tobago and Prince George County, Maryland - supported by the records of the East India company

Charles became became a Resident Magistrate, so I can trace every appointment that he had, where he lived that coincided when his children were born(from here down all the baptism records are available for free on the Irish genealogy site), as I can follow Aubrey to all of his Bank Manager appointments to where his children were born, his famous cases and retirement to France.  He also shared a best friend with cousin Aubrey de Vere who was still at Curragh Chase.

Many clergymen in the family from there who are among the easiest folks.  And of course whatever your interpretation, given the matches , no NPE - although that is another reason to believe the more recent relationship because with the Cecils around who knows who was the father of Lady Jane's children.

And if that were not all, i have Dave's 500 page book second edition as we work to collaborate on a third.  

For these reasons, the chances of one of our records being wrong is extremely remote and two of us completely impossible. 


I havent even mentioned the autosomal matches...including a small one for me and my Dad to Dave and others up the paternal line to family passing through the male line.  

Currently, I just found a match for Dad and I to the Georgia branch of the family (noted in Burke's) and will continue to work on others (and in this case, it is a singleton so not confused with any other family.  

This of course is a reminder that it depends on what you want to prove - sometimes triangulation is the best - other times a singleton is more powerful as it doesn't cloud the issue and leaves only one possible option.

 If you go back to the tip index, I have looked at it and yes the probability rate generational escalation happens to be highest on the closer of the two and they actually end up at the same probability.

But even more noticeable is that the break even and into more probable than improbable occurs in or before the first third of the generations and each additional generation adds a decreasing level of probability all the way to the top. 

Maybe that is the problem - is that the tip indicator has become a victim of the outliers which are inapplicable in times of documentation.  So to have the certainty for ALL existing and possible samples we place it outside - and strangely so outside the record of most families and drive back into the mystical past.  Hey, I am willing to believe in no male NPE in the last 10 generations for 3 gentlemen - but no way for 24 times 3 is 72 births 


What needs to happen to encourage more interest in participation and get more verifiable information to move this science from its infancy is to get away from mere sophisticated and layered extrapolations is continual exposure to empirical   

So why do mutations occur.  Is it completely random?  Maybe or maybe not.  J thinking seems to be shifting from seeing us as the great untested to a group where mutation is happening faster and you would get that impression looking at the J2-M172 page

Why did I mention the possible of environmental change as a possible factor -  it tends to be the go to.  For me and my matches, service to the empire or to the cloth was a big part of life and took them all over the world to different parts of the world with lots of travel and temperature swings and our end points are completely different.  

Ages of father at conception of child.   I can only find that in my line where I have some fathering children at over 48 (unless we were thinking 72 or something when that was brought up)

How about going bigger on Y - what would it bring me - decades of waiting around in J land for something new.  Poor Tom waited a long tie only lived briefly to see Dave arrive ...and died long before I came on the scene.  


You said "What needs to happen to encourage more interest in (YDNA) participation and get more verifiable information to move this science from its infancy is to get away from mere sophisticated and layered extrapolations is continual exposure to empirical "

I totally agree

You also asked:"So why do mutations occur.  Is it completely random?  Maybe or maybe not.  "

The same question that has been bugging me. Perhaps it is epigenetics, influenced by environment, a man's occupation, diet and/or habits.

I am puzzled by examples of GD-0's between supposedly 8th and 9th cousins and GD-2 between 2nd and 3rd cousins.

The way I see it is that a "copy machine" kicks in when the zygote starts dividing and becomes a bioplast. At some point the copy machine stutters and an STR can mutate up or down, some will stutter twice and cause a double step mutation, Some will stutter on more than one STR.

But stutter it does. So what causes the "copy machine" to stutter? What causes lights to flicker and a computer to suddenly shut down, a fluctuation in the current.

Is that the answer? Don't know but one guess is as good as another.

yDNA will make sense when there's full sequencing.  You won't get a coherent picture so long as they only look at a few little snippets.

As regards royals - just translate Queen to Flag.  Brits think it's silly having flags in corners of rooms.
Tell us more about full sequencing please - sounds interesting
By full sequencing do you mean expanding STR's past 111 and SNP's.

I would hope so. STR's by themselves are misleading and need to be read in concert with SNP's, but apparently this is a very expensive process.

Not for processing or reading the processed DNA for that is done by computer, but translating and interpreting the results, identifying, grouping, sorting and then classifying variants is, as I understand it, time consuming involving human effort as well as computers.

Y-chromosome full sequencing (as full as it can get; some areas of a chromosome will never be functionally testable, like some of the extremes of the telomeres) has been a thing since the end of 2013. At that time, FTDNA began NextGen yDNA sequencing. Last year they modified the test, calling it the Big Y-500, to also include STR testing at identified loci beyond the 111-marker panel. Verdict is still out as to the value of the extra STRs; will probably require some time before sharing and mutation rates can be observed and folks can figure out what to do with them. There are also a lot of no-calls in the extra STRs, so the reality works out more like the "Big Y-450".  wink

The pricing has come down, at least slightly, and NextGen Y testing is a big seller for FTDNA; FGC (Full Genomes Corp) offers it, as well, but there's no genealogical matching there. The FTDNA testing seems to come in at about 16mbp total coverage (14-23, from what I understand), and FGC's "Y-Elite" at about 22mbp. It's the full yDNA sequencing that's given us the explosion in the depth of the Y phylotree we've seen over the past couple of years. The estimate right now is that about 385,000 SNPs have now been identified as being shared by at least two or more men, with many more undoubtedly to come. We're testing 14-23 million base pairs in the Y, in a chromosome that has only twice as many protein-producing genes as the minuscule mtDNA molecule with a total of 16,569 base pairs, around 11K testable. Cataloging the expanded yDNA into published phylotrees is backlogged big-time as research continues.

And--something that I'm very much looking forward to--anthropology and population genetics studies are starting to catch up. On the technology side, one development that triggered it was improved HiSeq testing and the lowering of related costs so that it became realistic for universities and research institutions...not a ton of medical interest in the Y, so it's something of a stepchild when it comes to research funding. The HiSeq techniques are now allowing sample extraction and enhancement from ancient remains that wasn't possible up until just a few years ago. Most of the anthropological DNA data we've had prior was based on either a very small number of tested reference clusters or STRs, or on mitochondrial DNA, which is proving to be less useful than once believed (at least, less definitive: there isn't a great deal of differentiation in mtDNA in humans). I think most of what we'll see going forward for anthropology will be NextGen atDNA and yDNA testing. It'll be exciting to get much deeper detail about the DNA of ancient remains.

Thanks Edison.

While at it. Are you familiar with the Capelli Census of Y Chromosone in the British Isles?

I stumbled upon it years ago, when it first was published on the interwebs.

When the actual table or results was published, and I downloaded it.

Capelli used grad students to take the DNA of over 2,000 men. Initially blood (this was 2003) then swabs. He chose communities that had little inward migration over the last 1,000 years (one site, York, surprised me, but not others like Penrith and Uttoxeter.

Some 13 sites in all, plus some samples from Northern Germany and Denmark.

He only published or compared six DYS (STR:'s)  which were 393, 390,19,  391, 388., 392, chosen, I imagine, because at the time they were perceived to be the slowest mutators.

I was excited when I encountered it, and thought that it proved a lineage, but then I learned about SNP's, and realized  that he needed moreSTR;s to come to any kind of tentative conclusion.

The six repeats that grabbed my attention, were found in Durness, Shetlands,Orkney's. Western Isles, Morpeth, all of which were occupied by Norse Vikings, three other locations which made sense to me were Uttoxeter,Sowerby and Faversham. Uttoxeter was part of a Norman fiefdom, the Normans, after the conquest, first set up their homes in and around Faversham, and Sowerby is in Yorkshire, which is in North Yorkshire, (Angle and Danish blood was expunged by William in 1069-1070 in his Harrying of the North, he then tried to replace the population with Normans who would have none if it, eventually using Bretons and Saxon serfs.

Norse DNA is R-Z282 and subclades (aka Young or New Scandanavian), while the DNA I was looking for is R-Z93>YP5585>YP5578 which is mutually exclusive of Scandanavian.

While it would appear on the basis of those six STR's alone that my DNA of interest was Scandavian, boring down into full sequence it turns out that they weren't.  Also Scandanavian R1a1 DNA is YCA II 19/21whereas the DNA R-Z93>YP5585 is YCA II 19/23, along with Slavs, Hindu's, Ashkenazi,

Conclusion: Not Norse Viking as one might conclude from Capelli.

Turns out it is Eurasian

Related questions

+4 votes
4 answers
347 views asked Feb 17, 2019 in Genealogy Help by Jane Alexander G2G5 (5.9k points)
+2 votes
2 answers
219 views asked Jan 30, 2019 in The Tree House by Brian Gix G2G3 (3.9k points)
+3 votes
4 answers
462 views asked Dec 31, 2018 in Genealogy Help by Jeff Andle G2G6 Mach 1 (10.2k points)
+6 votes
3 answers
497 views asked Aug 6, 2018 in The Tree House by Chris Weston G2G6 Mach 1 (14.3k points)
+20 votes
2 answers

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright