Does my DNA evidence ‘prove’ New England Dewey settlers are directly related to Dewey family in England?

+9 votes
740 views

In February last year I raised a G2G question (1529101) on endogamy and DNA matches, but my results were considered to be lacking in specific details. Since then, I have extended my research by looking for false positives (and found none) and examples of intermarrying (and found several). I have presented precise details of the DNA match results in what I hope is a clearer format; I’ve summarised my research in a ‘Case Study’ document on the specifics of my DNA results and my conclusions, which can be found here:- https://www.wikitree.com/index.php?title=Space:Thomas_Dewey_Revisions:_Draft_and_Discussion.

*** Putting my .pdf onto a 'Space' has not worked; please try here:-

https://www.deweywiltshireroots.org.uk/docs/CaseStudyDistantCousinDNA_Q415.pdf

Very specific, narrow targeted searches were made, looking for descendants of Thomas Dewey The Settler and John Moore. For example, a search was made on my uncle’s DNA dataset looking for individuals with ‘Dewey’ and ‘Westfield MA’ in their trees. I believe that this technique could be used by other researches to find distant cousins.

in Genealogy Help by Terry Dewey G2G2 (2.0k points)
edited by Terry Dewey

Terry, from other discussions, my understanding is that what you find convincing is that you have so many DNA matches to descendants of the American Colonial Dewey family that it can only be explained by shared inherited DNA from Thomas Dewye Snr (bef.1577-1636) (your documented direct ancestor). 

As others have pointed out we don't really use DNA to "prove" ancestors on Wikitree without a paper trail, and that Y-DNA is usually the type of testing used for distant ancestors.  

Perhaps another way to ask the question would be, what else could explain your Ancestry DNA search results?

Your understanding is correct, but it is not just the shared inherited DNA from Thomas Dewye, there is also that from his wife Mary Moore.  Section 3.2 of my case study details the search for ‘Moore’ + ‘Windsor’ whereby I have found 36 matches over a range of 8 to 36cM; the pattern of results is very similar to that of my search on ‘Dewey’ + ‘Westfield’, e.g. I have more matches than my son, and we have a lot of ‘double matches’, but my uncle only has 1, a ‘triple match’.  I feel that having 2 related, but independent targets giving similar patterns of results supports the validity of the technique.

The question you propose is certainly very relevant when the one word answer to my question is ‘No’.  However, if the answer is ‘Yes’ then there is another, very contentious, question to ask, ‘why does WikiTree not make more use of DNA?’  DNA  testing capability has moved on a lot over the last 5 to 10 years, perhaps time for a reevaluation?

5 Answers

+6 votes
 
Best answer
As a disclaimer, I do not in any way claim any expertise in DNA analysis, but I do have a thought that may be relevant here and that I don't think you've considered.

When you're looking at a connection this far back, even if all the testers are, in fact, descended from one person, there is some chance that you share other common ancestors as well, which could also give you a higher percentage match.  At the 10th cousin level, for instance, each person would have some 2,000 possible distinct ancestors. If many of both cousins' ancestors were in the same geographic area, the odds that you share multiple ancestors grows dramatically.

For example, there are several Wikitree cousins I've run across who share early New England ancestry. We will normally match on not just one immigrant couple, but several. I haven't done a full DNA match analysis with these people, but I suspect we would share far more common DNA than if we only shared a single ancestor at that genetic distance.

Obviously, I can't say if this affects your results, but I think it would be worth looking into how many other common ancestors (and possibly closer ancestors) you may share with your Dewey matches.
by Ashley Jones G2G6 Mach 2 (20.9k points)
selected by Jody Nave
+18 votes
Only Y-DNA would prove a connection that far back unless there is documentation proving the connections.
by Monica Pendleton G2G3 (3.3k points)
edited by Monica Pendleton
My apologies, you obviously were not able to access my Case Study document, please try the alternative link.  In section 3.1 there is detailed a match for Andrew Willhelmi to my uncle, my son and to me - a 'triangulation' match!  Ancestry predict the relationship as ~8th cousin, they obviously believe their detected match does go back that far.
Terry, a DNA match to your uncle, your son, and yourself is not a triangulation. The match to your son is essentially the same as the match to yourself. And, if you are hypothesizing the most recent common ancestor of you and Andrew Wilhelmi is eight to 10 generations back, the matches to your uncle and yourself also are essentially the same.

A triangulation of DNA from a distant ancestor must come down through three different children of the MRCA. Your uncle, your son, and you all descend from the same child of the MRCA.

In addition, all the matches mentioned in your case study are quite small, with the match to Andrew Wilhelmi being only 9 cM. The chances of that being a false positive are rather high.

However, all this is immaterial for WikiTree because we do not "prove" relationships with DNA. On WikiTree DNA can only confirm a paper trail and you don't seem to have one.
OK, lets refer to it as a triple match then, rather than a triangulation; one advantage of using my son’s DNA dataset is as confirmation that the process of ‘sample collection/test of sample/present results’ by Ancestry has performed as it should; the results show that it has, for both of us.  My son’s data also shows that the ‘pattern’ of results is as expected; for example, from section 3.1 of my case study I have 15 matches, my son only has 7, about half of mine, as expected.  More interesting, we both have 4 of the same matches (4 ‘double’ matches); whereas my uncle has 13 matches, but he only has 2 double matches with me, i.e. he has almost the same number of matches as me, but his DNA is very different to mine, as expected.  So your statement “the matches to your uncle and yourself also are essentially the same” is not correct; my father and his brother could inherit very different DNA from their father, so it would be expected that the DNA I inherited from my father could be very different to my uncle’s.  The 35 matches detailed in my case study range from 8 to 17cM.  I believe Ancestry’s detection capability is a lot better than you suggest, mainly because they discard any result <8cM.  But, even if we assume their false positive rate is 1 in 2 for the 8 to 17cm band, then the probability of all 35 matches being false is about 1 in a billion! Technically only 1 ‘true positive’ is enough to prove my claim.

I am well aware of WikiTree’s restriction on usage of DNA results, but my question was seeking views from the DNA experts on WikiTree as to the validity of my DNA research.  The reason being is that WikiTree researchers could be encouraged to use the same technique to find potential distant cousins; once specific individuals are ‘found’, they could then cooperate with them to start at both ends of a paper trail and hopefully meet  somewhere in the middle.
+9 votes
Terry, did you have Y 700 DNA done through FTDNA? if you used Ancestry, it is autosomal DNA and you can likely find matches to about the 5th generation. And even with Y 700 DNA there would still need to be additional matches, particularly if you have no strong paper trail.

You might want to check into FTDNA and see about having your Y 700 DNA done, then see if there is a Dewey Y DNA study in which a rather skilled person can 'categorize' your DNA with other Dewey men.

I paid to have my brother's Y 700 DNA run several years ago. At that time Donn Devine (a certified genealogist and a co-author on the ethics of genetics in genealogical research) organized my brother's DNA with several other's who emigrated from England to CT and MA. They all had a common ancestor. After a long and rigorous paper trail (and I am still working on more paper trail information), we found that my brother and at least one other WikiTreer who has the same Y 700 haplogroup descend from the Baldwins of Buckinghamshire which goes back to 1500. Again, this would not have been found with autosomal DNA given the roughly 5 generation guidelines. It took Y 700 AND a paper trail that I and others have worked years on.

Donn also explained to me that he took on the Baldwin name on FTDNA because his wife was a Pennsylvania Baldwin. It appears as though the Baldwin men of PA have a differing Y 700 DNA haplogroup. What is further interesting about Y 700 is that it gives migration patterns that occur way before the onset of surnames. So, 1) your Dewey ancestors could come from one area of England while another group of Deweys could come from a second. 2) Even though the names may be the same, the haplogroups and migration patterns could differ. When it comes to genealogy, as the others have said, one cannot use DNA as 'proof.' The 'proof' comes in tracking down proper sources for your particular Dewey and DNA can only assist.
by Carol Baldwin G2G Astronaut (1.2m points)
edited by Carol Baldwin
I used 23AndMe for my Y_DNA test, but the matches I have found, as detailed in my Case Study document, cannot be achieved by using Y-DNA.  However, I have used Y-DNA as evidence for my theory as to the origin of the Dewey line; from my research it seems likely that we came from Wales, and moved to Wiltshire around the year 1100.  My haplogroup (R-M269) is shared by 80% of Welsh men.  For details see my website:- https://www.deweywiltshireroots.org.uk

Y-DNA has restricted capabilities because it only traces back males along the single paternal line.  If you go back 5 generations, then there are 63 ancestors, of which only 5 can be found by Y-DNA.  Also there is no indication of how far back in genealogical time the match is, it could be 3 or 300 generations.  So to use Y-DNA to find the MRCA, another reference must be used, e.g. a fully referenced paper trail.

Ancestry’s autosomal DNA test, on the other hand, can potentially find any ancestor, male or female, with cM data indicating (and this is important for my research) as to how far back the MRCA couple are.  My search for TDTS’s descendents gave me 35 matches in the 8-17cM range, i.e. all beyond 8th cousins; I know that the MRCA couple must have been living in England sometime between 1600 and 1640 because that is when the Dewey line split, with TDTS emigrating to America and the rest of the Dewey line staying in England.  This would give living descendants of TDTS as being about 10th cousins.  According to the ‘Shared cM Project’ on https://thegeneticgenealogist.com the range for an 8th cousin is 0-42cM, with an average of 11.  So my 35 matches are consistent with being about 10th cousins.  As detailed in my case study, the match of Andrew Wilhelmi is ‘special’ as he is a match to me, my uncle and my son; it is probably not a coincidence that Andrew has at least two DNA boosting events due to intermarrying, one involving Abigail Dewey-66 and the other Abigail Dewey-2745.

I used Ancestry’s search app, with very specific filters, to find my 15 matches out of 30,000 distant cousin matches, my son’s 7 out of 19,000 and my uncle’s 13 out of 22,000.  I cannot believe that all 35 DNA matches can be ‘false’

I have used Ancestry to find near cousins out to 7 generations.  Currently I have about 120, ranging from 2nd to 6th cousins based on Ancestry prediction and my confirmation.

I have found only 1 DNA match having the surname Dewey, a distant cousin (5th to 8th); I messaged him twice in 2019, but received no reply, his profile on Ancestry shows “Last logged on over 1 year ago”.  I tried to contact him indirectly but no success, yet!
23andme’s “YDNA-test” is not the same thing as the y-DNA testing we are suggesting. It can be useful to know your y-haplogroup, so it’s not quite like apples and oranges, but essentially they are not comparable.
Terry, just a relatively quick comment and I will be done with this G2G thread as I am in Mexico getting ready to teach.

I had my mother's, brother's and kid sister's autosomal DNA done at Ancestry, 23andMe and My Heritage. It is still autosomal DNA. You get a bit of information from both parents and this DNA is only useful for about 4-5 generations. The results for each of us with each of the companies is equivalent. I seem to recall someone telling me at the time (about 2012) that two of these companies used the SAME company to run DNA. No matter what, all three companies did autosomal DNA.

Your R-M269 haplogroup is shared by roughly 80% of ALL Western European men (not just Wales, which is Western Europe). My brother's haplogroup is R-M269 and our male line came from Buckinghamshire, England. I mentioned in an earlier post that I paid for Y-111, the Y-500, then Y-700 when it became available to 'drill down' further into his haplogroup, which lead to common ancestry with several Baldwin men on WikiTree. R-M269 only tells you that your male line is Western European.

As to finding one Dewey, it could be that this is the only Dewey who signed up and paid to have their DNA done. We have been encouraging Baldwin relatives to have their Y DNA done at a higher level than Y67.

The genetic genealogist site can be very helpful. What I found even more helpful was to take a semester-long genetic genealogy course through our local genealogical society. If Blaine Bettinger ever offers a seminar where you live, sign up quickly. I had this opportunity in 2020 and he was awesome and I learned a great deal about the pros and cons of autosoma, mitochondrial, Y and so much more. Blaine is also very good at explaining the cut off he uses for 'cousinhood', particularly with autosomal DNA.

The important thing is that you are working very diligently on your Dewey ancestral quest and my sincere best wishes to you in this endeavor. Ancestry/genealogy can be a love-hate relationship some days!

I knew I'd hit the character limit. <grumble grumble>

Part 1

Hi again, Terry. I attempted a "quick" reply to your question several days ago, but it was atrocious and I immediately hid it. I found myself trying to encapsulate in a couple of paragraphs some of the discussion we had here on G2G back in January 2023. I failed at that, so would ask those interested in the background to review that content.

In particular, I believe the terminology used in a 2020 paper by Mathieson and Scally is relevant. See Mathieson, Iain, and Aylwyn Scally. "What Is Ancestry?" PLOS Genetics 16, no. 3 (March 9, 2020): e1008624. https://doi.org/10.1371/journal.pgen.1008624. This describes three categories that are not the same things, but which do, among at least recent generations, overlap like a Venn diagram: genealogical ancestry, genetic ancestry, and genetic similarity. I like that term "genetic similarity" because I believe it connotes that we have all been genetically admixed for hundreds and thousands of years (most of the world's population displays between 1% and 2% Neanderthal DNA, for example), and it also seems to infer a sliding continuum rather than specific demarcations: we can be more similar or less similar, but we're still similar. And the farther we go back in time where meiotic mechanisms like independent assortment, crossover interference, genetic linkage, and linkage disequilibrium can do their things, the less likely we are to be able to accurately, causally ascribe a given DNA segment to a specific, identifiable ancestor.

This tripartite description was reinforced by publications in 2022 by Anna Lewis, et al., and Graham Coop. See Lewis, Anna C. F., et al. "Getting Genetic Ancestry Right for Science and Society." Science 376, no. 6590 (April 15, 2022): 250–52. https://doi.org/10.1126/science.abm7530; and Coop, Graham. "Genetic Similarity and Ancestry Groups." Individually published, Center for Population Biology and Department of Evolution and Ecology, University of California, Davis (July 2022). https://gcbias.org/2022/07/12/genetic-ancestry-groups-and-genetic-similarity/.

AncestryDNA themselves echo the conundrum in their support article, "How do we find DNA matches that are meaningful in genealogy?":

"But there are other reasons why two people's DNA could be identical. After all, the genomes of any two humans are 99.9 percent identical. (And the genome of a human is 50 percent identical to the genome of a banana.) Pieces of DNA could be identical between two people because they are both human, because they are of the same ethnicity or come from the same region, because they share some other more ancient shared history, or other reasons. We call these identical pieces of DNA identical by state (IBS), because the DNA is identical for a reason other than having a recent shared common ancestor."

Let's start there. You wrote:

"Ancestry's autosomal DNA test, on the other hand, can potentially find any ancestor, male or female, with cM data indicating (and this is important for my research) as to how far back the MRCA couple are."

That's not completely accurate. Ancestry themselves use the level of 5th cousins and farther to designate "distant cousins." Note this description from Ancestry's support article, "AncestryDNA® Match Categories":

4th cousin and more distant

Shared centimorgan range: 6–65 centimorgans

Enough DNA is shared with closer relatives that relationships can be determined with a higher degree of accuracy. But because we don't necessarily inherit DNA from ancestors in the exact percentages one might expect (25% from each grandparent, 12.5% from each great-grandparent, and so forth), and because our cousins don't receive exactly the same DNA as we do from our common ancestors, determining exact relationships via DNA becomes less feasible the more distant the genealogical relationship is. Percentages of DNA shared between relatives at the 4th cousin level and beyond may signify any number of distant relationships, but the genealogical relationships are unlikely to be closer than six degrees from the test taker.

Some additional items from the most recent "AncestryDNA Matching White Paper" (Ball, Catherine A., et al. "AncestryDNA Matching White Paper." Ancestry.com White Paper, last published version 15 July 2020. https://www.ancestrycdn.com/support/us/2020/07/2020whitepaper.pdf):

1.4. Assessing informativeness of matches for relationship estimation

In practice, however, the IBD we detect may reflect other factors, such as selective pressures (Albrechtsen et al., 2010), or more distant shared genealogy, in which case this IBD will confound the relationship estimates. An additional consideration is that since shorter IBD segments are difficult to identify accurately, a large proportion of shorter IBD segments that we detect could be false, and therefore could contribute errors to relationship estimation.

The unique Ancestry algorithm called Timber attempts to help in that regard, but the white paper states: "Timber improves relationship estimates for more distant relatives, such as 5th or 6th cousins, by downweighting the evidence from regions that are less likely to be informative of close relationships." Ancestry makes no claims about genealogical relevance beyond that 5th or 6th cousin level.

The results of the Timber match-culling can also be seen in AncestryDNA's presentation of matching statistics (see Table 1 at "Should Other Family Members Test with AncestryDNA®?"). Here's a quick summary of some of those numbers when compared to results published by Brenna Henn, et al. (https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0034267) and by the Williams Lab at Cornell University (https://hapi-dna.org/2020/11/how-often-do-two-relatives-share-dna-2/). The percentages indicated the likelihood of two cousins sharing any detectable DNA (with the exclusion of yDNA and mtDNA):

Relationship AncestryDNA Henn, et al. Williams
5th cousins 32% 14.9% 15.9%
6th cousins 11% 4.1% 4.2%
7th cousins 3.2% 1.1% 1.09%
8th cousins 0.91% 0.24% 0.286%


Only the Henn paper attempted to go as far as 10th cousins: 0.002%. But even the AncestryDNA percentage for 8th cousins mean only 1 in 110 tested would present as a match. And, again, Ancestry offers no accuracy estimates for reporting at that distance. The Henn and Williams numbers at 8th cousins are reasonably consistent at 1 in 400.

What would assist your hypothesis--since there is no chromosomal detail to evaluate (thank you, Ancestry)--is considering a report of shared matches, someone who appears both Person A's list of matches and on Person B's. You did this partially with your own research, but it would help if you could make a direct comparison of multiple kits via Ancestry's own reporting. However, that now requires a subscription to the new AncestryDNA Plus service, and it's informative that it will display only 4th cousins or closer. This has to do with computational load as well as implied accuracy. At the level of 4th cousins, mutually shared matches will generally be relevant. That some of those on your lists show under all three of the test kits considered (you, your son, and your uncle) is not a direct indication that any specific segments are actually matching. We simply cannot tell.

On that front, you noted, "I used Ancestry's search app, with very specific filters, to find my 15 matches out of 30,000 distant cousin matches, my son's 7 out of 19,000 and my uncle's 13 out of 22,000. I cannot believe that all 35 DNA matches can be 'false'."

I'm assuming that you investigated the Ancestry trees for each of those matches to determine whether or not they were relevant to your hypothesis. In which case, your support criteria is dependent upon each match having a genealogically accurate family tree on Ancestry. There are no good data on the percentage of accurate, well-documented public trees on Ancestry, but I imagine we can all guess at a number for ourselves.

My assumption that you filtered out all matching results that did not indicate a relationship back to "TDTS" comes from me repeating, against my own AncestryDNA results, the searches you described in your PDF report. My results were interesting.

Part 2

I have no instance of the surname Dewey in my tree, but all 16 2g-grandparents have roots in the British Isles, and most of those lines were in North America by the first half of the 18th century.

Using the exact surname "Dewey" and a birth location of Westfield, Massachusetts, I get 10 4th-6th cousins, and 75 5th-8th cousins...over five times the total number of matches that your own DNA comparison garnered.

I do have the surname Moore in my tree; none dated earlier than 1830, and none that I know of as being connected to John Moore and his line. Running that search with a Windsor, Connecticut, birthplace yielded 5 2nd-3rd cousins, 3 3rd-4th cousins, and a staggering 536 4th-6th cousins. I didn't bother importing the 5th-8th cousin list into Excel, but at that point the individual sharing amounts hadn't dropped below 20cM yet.

Regarding that 20cM amount, I believe we need to be clear that our inexpensive microarray tests simply cannot be 100% accurate. In no small part because the chip technology itself advertises call rates of > 99%. When a small segment comparison may rely on only a few hundred tested markers, a call-rate failure of 1% is statistically significant. And as many as 19% of those tested markers are targeted for clinical/pharmacological purposes, most of which have little applicability to genealogy.

Plus, there is empirical evidence that even AncestryDNA's interpretation methods are wrong occasionally. Well-known genetic genealogist and honorary Research Fellow in the Department of Genetics, Evolution and Environment at University College London, Debbie Kennett, has the admirable advantage of having her own DNA and that of both parents tested. She can do actual trio-phasing comparisons. She's written that she has "three matches at AncestryDNA over 15 cM which don't match either of my parents. They share respectively 19 cM, 24 cM and 25 cM." This performance rate is much better than at other testing companies, but that segments at Ancestry of 19-25cM can be false-positives should provide a lens for viewing results regarding smaller segments.

Similarly, we can't really use Blaine Bettinger's Shared cM Project as a de facto scientific evaluation of DNA sharing ranges. It is a crowd-sourced, self-reported, and unvetted set of data. There is no way to validate which submitted data are correct and which are not. If you read the full report PDF, you see that Blaine himself highlights the issues with user-provided data. Blaine does the best he can to attempt to normalize the data, but it's by an indiscriminate, brute-force approach: he removes 0.5% of the reported submissions from both low and high ends of the centiMorgan values for each relationship.

If you look at the provided histograms, you'll see how rapidly they begin to diverge from a Gaussian distribution as the relationships grow more distant, which we would otherwise expect to see with a large enough--and accurate--sample size.

Also important to note is that Blaine attempts to offer an estimate of standard deviation only out to his "Grouping 10," where 4C1R is the most distant relationship. The data starts to become too unreliable after that point to make an attempt at SD.

It should also be noted that all the averages reported beyond 2C1R need to be taken with a grain of salt. By 3C we reach a point where roughly 8% of cousins will share no detectable DNA between them. One of the core issues with these kinds of crowd-sourced data are that they will always be underreported on the low side. The actual averages will be lower than presented.

That's why I always start with the simple Coefficient of Relationship numbers as a baseline in order to evaluate how much a reported amount might be skewed. By the CoR, 8th cousins would share on average 0.0008% of their DNA; 9th cousins 0.0002%; 10th cousins 0.00005%. For a 6800cM calculated genome, we'd be talking 0.05cM, 0.01cM, and 0.003cM respectively.

Following on that last point, we also need to keep in mind that, as I described in our 2023 conversation, the genetic effects of pedigree collapse dilute rapidly once the collapse ceases. If it did not, truly endogamous populations like the Rapa Nui or Ashkenazim would have been so severely affected by the lack of genetic diversity that they might not even have endured. I'll reference again my extreme--and admittedly rather silly--example from 2021 when I broke down what happened genetically with the deeply inbred Lannister family from Game of Thrones.

Jamie and Cersei Lannister were twins who had children. If the children and subsequently their children didn't again inbreed, Cersei's and Jaime's 2g-grandchildren would be down to the genetic difference between double 3rd cousins and regular 3rd cousins, and a distinction of about 106-117cM versus 53-59cM. Not insignificant, but we're nudging a level where the amount of shared DNA can't readily distinguish between the pedigree collapse scenario and one where none of the parents were genetically related.

The Dewey/Moore hypotheses evidently range from 8th to 10th cousins, or 9 to 11 generations ago. At 11 generations, two full siblings could have children together and if the pedigree collapse stopped there, by 7 generations ago there would be little or no genetic evidence of it. (By the way, "If you go back 5 generations, then there are 63 ancestors..." Actually, speaking genealogically and not genetically, at 5 generations we would have 32 ancestors, not 63; the simple formula to calculate the potential number of ancestors at any generation is 2k where k is the number of generations; self is always generation 0.)

That puts us back to the distinctions among genealogical ancestry, genetic ancestry, and genetic similarity. At AncestryDNA I show 85 matches to "Dewey" and "Westfield, Massachusetts" not because I have potentially identifiable ancestors that match those criteria, but because my roots are also in the British Isles and those 85 matches and I (assuming all are physically valid segments, which at least 90% probably are) carry chunks of DNA that have been passed down via regional, local, and even tribal/clan populations from many, many generations ago.

Without directly analyzing the segment detail (and preferably the raw data themselves) I can't make any assumptions about a shared, identifiable ancestor within the genealogical timeframe. And even deeper analysis may prove inconclusive: it may offer only a little more or a little less weight to the information as possible evidence. As broad-stroke examples, the closer the shared segment is to the ends of the chromosome, the more likely the crossover events were somewhat recent, making attribution potentially possible; if I've mapped my probable haplotypic pile-up regions and the segment falls into one of those, it's most likely from a much older source where attribution won't be possible; if the segment includes very few matching SNPs (or SNP mismatches), the more likely it's a false-positive; if the segment is small and spans an area of the chromosome where protein-coding genes are densely clustered, the more likely the match is genealogically irrelevant.

Last up for today: "DNA testing capability has moved on a lot over the last 5 to 10 years, perhaps time for a reevaluation?"

That is a correct statement. But interestingly enough, it's true far more for Y chromosome testing than for our typical autosomal tests...which technology hasn't appreciably changed in well over a decade. The microarray was invented--well, first published--in 1991 by Stephen Fodor and colleagues. They were with the Affymax Research Institute which, not coincidentally, became the origin of the name of one of the first microarray chips, the Affymetrix GeneChip (more rabbit-hole diving: Affymetrix is now Applied Biosystems, a brand of DNA microarray products sold by Thermo Fisher Scientific after they acquired Affymetrix; Living DNA is the only major genealogy testing company that today uses a version of the Affymetrix chip).

Speaking of over a decade, our autosomal DNA results are still being compared based on the GRCh37 reference genome, the last iteration of which was published in 2013. Even GRCh38 was scheduled to be replaced by GRCh39 a year ago, but the Genome Reference Consortium has deferred that with the possibility of moving to a pangenome model rather than a single reference...the majority of which, by the way, is from one man who lived in Buffalo, New York, years ago and responded to a newspaper ad about DNA testing). There are known errors and omissions in GRCh37 (which can affect cM calculation, among other things), but it's impractical (read: costly) for our DNA testing companies to switch to a different reference model. For more information about GRCh37 vs. GRCh38, see Guo, Yan, et al. "Improvements and Impacts of GRCh38 Human Reference on High Throughput Sequencing Data Analysis." Genomics 109, no. 2 (March 1, 2017): 83–90. https://doi.org/10.1016/j.ygeno.2017.01.005.

The advent of direct-to-consumer next generation sequencing of the Y chromosome--and specifically the FTDNA Big Y test--has meant a sea-change in how we can evaluate and use yDNA information. My first commercial yDNA test was back when 12 STR markers was state of the art. In the intervening 23 years I've upgraded every time a new offering came out, including when SNP panel testing was first offered. That, unfortunately for my pocketbook, added up to a lot of incremental dollars (I've done a total of nine yDNA tests at FTDNA).

Today, and most especially in the world's most tested subclade of the yDNA haplotree, R-M269, deep NGS testing allows us to use reliable TMRCA predictions as tightly as about 83 years, or roughly 2.5 generations if we use an average generation interval of 32 years. The 23andMe, Living DNA, and now FTDNA yDNA haplogroup report as derived from autosomal testing isn't really a for-purpose yDNA test. At best, it can determine a defining SNP somewhere fairly high in the haplotree, meaning a quite old date of first appearance. My own branch on the haplotree is currently 14 levels deeper than R-M269.

The Dewey/Moore hypotheses may very well be spot on. But in the lab, to attempt to avoid confirmation bias, it's incumbent upon us to actively seek to disprove the hypothesis. There, and in the Genealogical Proof Standard, that effort extends to objectively determining the merit and strength of the evidentiary information.

My personal opinion--and that's all it is, my opinion--is that in and of itself the AncestryDNA information is insufficient to support definitive identification of common ancestors who date back to the beginning of the 17th century.

Edited: Gave the Genome Reference Consortium an incorrect name. I can't live with that, so I had to correct it. :-)

Edison, thank you for the very informative analysis.  

Am I correct in saying that although these matches can't be proven to be from a given ancestor (or are even unlikely to be), distant matches still might be IBD segments?  While they can't be used as evidence of a relationship, one could use them as a way to target research.

For example, before I broke through my Johnson brick wall, I investigated any DNA match listing Johnson as a surname.  I was looking for a needle in a haystack, and the DNA matches helped focus my search.  Ultimately, none of them panned out, and I ended up making the connection through traditional research.  However, eventually, DNA matches for the correct line did show up, so had my other research not been fruitful, DNA clues would have led me there as well.

"Am I correct in saying that although these matches can't be proven to be from a given ancestor (or are even unlikely to be), distant matches still might be IBD segments?"

Oh, absolutely. Unfortunately, we work with rather slack definitions surrounding "IBD." At least slack when it comes to genealogy. Because the only qualifier for IBD is that it is "a copy of a single piece of DNA [from] some ancestral individual…" (statement from Elizabeth Thompson, Genetics 194:2; 2013). The ISOGG wiki says: "Identical by descent (IBD) is a term used in genetic genealogy to describe a matching segment of DNA shared by two or more people that has been inherited from a common ancestor without any intervening recombination."

That's great as far as it goes, but what's lacking for genealogical purposes, in my opinion, is a distinction between "some ancestral individual" and a specific, identifiable ancestor. From Henden, et al., PLoS Genetics 14:5 (May 2018):

"Two alleles are identical by state (IBS) if they have the same nucleotide sequence. These alleles can be further classified as identical by descent (IBD) if they have been inherited from a common ancestor. While a genomic region that is IBD must also be IBS, the converse of this statement is not true."

It's like an "all squares are rectangles, but not all rectangles are squares" thing.

IBD can technically include those numerous instances where small(ish) segments match due to broad genetic patterns derived from specific populations: they did originate with some ancestor at some point in time, but there is no realistic chance of our ever being able to identify that specific ancestor. That autochthonous ancestor might have lived a thousand or two thousand years ago and chunks of his or her DNA then spread broadly across a local or even regional population.

We can see population geneticists readily referring to segments as IBD that, for all practical purposes, are not useful for genealogy at all. Goes back to the example of Europeans carrying approximately 2% identifiable Neanderthal DNA. A chunk of Celtic DNA from some Chalcolithic Period Briton may be present in many people with British Isles roots to this day. It can be determined to be IBD, but we'll never be able to identify the individual source of the segment.

All valid autosomal and xDNA segments--meaning that they are, in fact, a continuous, unbroken series of matching nucleotides inherited from either the paternal or maternal haploid chromosome--are IBS (Identical by State).

Complicating the picture, years ago we had some non-scientist pundits coin the term "identical by chance," IBC. A truly unfortunate phrase, in my opinion. The inherent problem with "by chance" is that de novo nucleotide mutations happen infrequently. By infrequently I specifically mean at a per-generation frequency of about 1×10−8 (Mohiuddin, et al., Frontiers in Genetics, 13, 2022). Other than large structural deviations which carry serious clinical impact (e.g., trisomy), our DNA, for practical purposes, never happens "by chance." The only place I could have received my DNA was from my parents, and the only place they could have received their DNA was from their parents.

So an actual by-chance matching of segments should occur very, very seldom in genealogy, especially since our microarray tests are looking at a subset of only about 650K SNPs selected, in part, for their value as either an AIM (Ancestry Informative Marker) or for population or clinical purposes. If we apply that 10-8 conclusion to the 650,000 SNPs in our microarray tests overall, the probability per generation of seeing an autosomal de novo mutation in the results would be 0.0065: we'd be waiting about 154 generations to see one. In practice it would be more frequent than that since some SNPs are chosen specifically because of increased mutation rates, but still; not like the frequency would be every couple of dozen generations.

In other words, I don't believe there is any value in the term "identical by chance" because what is really implied is "identical by error"…whether that error is via actual test results, digitization/interpretation of the results, or an error or omission in one or more algorithms that are attempting to identify matching segments. I'm sure the testing companies' marketing departments prefer the term "identical by chance" to "identical by error," though.

Different continental-level and subpopulations will have differing likelihoods of allele frequencies (this references that a SNP, by definition, is biallelic and has identified major and minor allele frequencies) when we get below some high percentage value of the major alleles in the global population. And for genealogy what I think we're after in most instances isn't simply IBD vs. IBS. It's something that might more appropriately be called Identical by Descent and Identifiable Pedigree, or maybe just Identical by Identifiable Pedigree--i.e., evidence strong enough to have a reasonable probability of identifying a specific ancestor or ancestral couple from whom a segment was inherited.

The trick is determining when an IBD segment can be classified as also being IBIP. There's no magic formula that I know of, and it certainly isn't the often-used threshold of 7cM. Not only has previous work indicated that segments that size will be false-positive over half the time, but considering the degree with which the testing and reporting companies differ on the calculation of segment size when using exactly the same raw data, 7cM can fall into the realm of being just a rounding error.

An oft-quoted 2014 study by Speed and Balding (Nature Reviews Genetics, 16: 33-34) gives us at least some initial insight by estimating the origin age by segment size. They found that approximately 40% of 20 Mbp (mega-base-pairs, or 1 million base pairs which, very roughly, would equate to about 20cM) segments date back beyond 10 generations. In other words, at a centiMorgan level about where we might expect to find 4th cousins, the segment might actually have come to us down the inheritance chain from a source not our 3g-grandparents, but as far back as our 8g- or 9g-grandparents. Additionally, fewer than 40% of 10 Mbp segments came to us from within the last 10 generations.

This is just one of the factors that make genealogical use--accurate genealogical use--of autosomal DNA very unlikely beyond 5th cousins. In fact, in my years of messing with this stuff, I've never accepted a triangulation beyond 4C1R.

I would say that, in general, I'm fairly comfortable accepting segment sizes of over 20cM to be bankable. But then again when I ran a comparison of two 2nd cousins who both tested at Ancestry at the same time and then uploaded the raw DNA results elsewhere, I found some discrepancies that could bring even 20cM into question. The most extreme example from that little experiment was, for the same segment on Chr 9, having MyHeritage evaluate it as 24.9cM, GEDmatch showing it as 16.3cM, and FTDNA calling it as 7.58cM.

I think everything becomes a case-by-case instance requiring close analysis for any segment shown as being 20cM or less. It may not be as genealogically relevant as it first appears. And we can't assume we understand the inheritance chain in question unless we can nail down the genealogy not just to the hypothetical MRCA, but at least three or so generations farther back. We need reasonable assurance that the proposed MRCA is the only possible source for the DNA segment; we need to rule out that it might be present in tested individuals via different inheritance pathways.

Not an easy task.

Part 1

Edison, thank you very much for your detailed and extensive comment. I am still going through all the details, but I thought I should give you a ‘heads-up’ on my ideas as to my future investigations.

1. My search on Ancestry for ‘Dewey & Westfield gives me 35 matches. What evidence is there that my specific matches are invalid. It cannot be just “false positives”, because if the probability of a false positive is as high as 0.8, then for all 35 to be false the probability is 1 in 2400. If I include the 36 Moore matches in as well, then the probability is 1 in just over 7 million. And I have confidence in Ancestry's ability to detect distant cousins as being a lot better than a failure rate of 80%.

2. Your statement “The Henn and Williams numbers at 8th cousins are reasonably consistent at 1 in 400.” If we take the figure for finding 8th cousins at 1 in 400, I have 30000 distant cousins (6 to 20 cM) so 1 in 400 gives me potentially 75 matches, and I only have 15 from my specific search, which seems to me to be consistent.

3. Your statement “I get 10 4th-6th cousins, and 75 5th-8th cousins...over five times the total number of matches that your own DNA comparison garnered.” I need to investigate how that can be possible. As a starter I did Ancestry searchs on ‘Williams & Westfield’ - no matches! ‘Williams & Windsor’, 5 matches - what is really interesting is that 2 of the matches (D.C. & Christopher Lord) I had already found by my ‘Dewey & Windsor’ search. Another 2 matches have several Deweys as well as Williams on their tree. Perhaps because from Matthew Grant Record that there was at least one Williams family in Windsor?

4. Your statement “Actually, speaking genealogically and not genetically, at 5 generations we would have 32 ancestors, not 63”. My 63 figure refers to all of the individuals involved on the whole of the tree, i.e includes the subject and all of the generations, so I get 1+2 (2 parents, 1st generation) + 4 +8 +16 + 32 (32 of 3*G Grandparents, 5th generation) = 63.

Part 2

5. I have found a 5th cousin on my great-grandmother’s side, with a DNA match of 14cM. We have been in email contact and exchanged evidence regarding our ‘grey areas of evidence’ to the point that we are both content with our joint paper-trail. Definitely not a ‘false positive’.

6. From Shared cM project, 6th cousin has an average of 18cM, we have 6 matches of 15cM or over. Need to investigate “Coefficient of Relationship”.

7. What is the effect of endogamy? I have found a number of cases of intermarrying, but do not know the effect of boosted DNA on the probability of a match or any change to the cM figures.

8. Could the match results have been substantially improved by the fact that we 3 are on an unbroken line of Deweys up to MRCA, and then down TDTS’s Dewey line, as well as all his other descendants. Also each match will have some Dewey DNA and some Moore DNA as the Early Settlers of Windsor include TDTS (whose DNA has Dewey & Moore) and John Moore who I believe is TDTS’s uncle.

p { text-indent: 0.3cm; margin-bottom: 0cm; background: transparent }a:visited { color: #800000; so-language: zxx; text-decoration: underline }a:link { color: #000080; so-language: zxx; text-decoration: underline }

Part 4: Considerations for at least one ‘true positive’:-

9. Of the 35 DNA matches, some have public trees which between them go back to all 5 of TDTS’s children! The matches found of most interest for each child are:-

i) ‘fifeandcat’ (and ‘J.M.’, managed): 8cM, ‘McGrath-North Family Tree’ goes to TDTS’s son Jedediah b1647.

ii) ‘Murray_Milne’: 14cM, ‘Milne Family’ tree goes to TDTS’s son Israel b1645

iii) ‘Deborah Spadaro’: 11cM, ‘Kenney/Stein Family Tree’ goes to TDTS’s son Thomas b1640

iv) ‘rdecker2’: 9cM, ‘DECKER Family Tree’ goes to TDTS’s daughter Anna b1643

v) ‘Cid1956’: 9cM, ‘Davis/Black/Buckingham/Pinney Family Tree’ also goes to Thomas b1640

vi) ‘kevin_stevens_55’: 8cM, ‘Kevin Stevens Family Tree’ goes to Josiah b1641.

The trees on Ancesty are ‘light’ on sources and many have errors, as you have noted. However, it is adding to the unlikely string of coincidences that all 35 DNA matches are false positives, and all 5 trees that suggest otherwise are erroneous.

10. A couple of years ago I carried out a search on my DNA results with a simple filter of just the surname of any match as being ‘Dewey’; Ancestry came back with 4 hits, my son, my uncle, my nephew and Davis Dewey (shown as 5th - 8th cousin, 19cM). Davis only has a small tree linked to his DNA test, but he is on the Thomas/Schuyler Family Tree’; this shows him as being descended from Jedediah Dewey (1647-1727) son of TDTS. Davis Dewey and I have 12 ‘Shared Matches’; all are predicted as 4th - 6th cousin, with range 22-33cM. One of Daviss shared match is ‘M.M.’, with the linked tree Mmatlock, which has no Dewey on it, but it does go back to Andrew Moore, the disputed son of John Moore b1614. So I match someone with the same surname as me and we share a Moore as a ‘Shared Match’. Surely this is highly likely to be a ‘true positive’?

11. The ‘special’ triple match of Andrew Wilhelmi’s is interesting because according to his tree his maternal grandmother is Mildred (Campbell) Dewey b1899, (Y-DNA would not be able to trace this!) who married a Porter Dewey b1897; 3 generations back from Porter is Roger Dewey b1785 who is on WikiTree as Dewey-1925 and Roger is 4 generations from TDTS. So, apparently, the 3 of us are a DNA match to someone 9 generations down from TDTS. Another match highly likely to be a ‘true positive’?

12. I have 30,000 ‘distant cousin’ DNA matches, of these I have filtered out 35 by name & place. If these 35 are false positives, then should it not follow that most/all of the 30,000 are false positive? And not just mine, other Ancestry DNA subscribers 10s of thousands of DNA matches?

13. None of my 35 matches are closer than 6th cousin, which implies that the MRCA are at least 7 generations back from living descendants, up to a maximum of perhaps 12 generations, at which point any DNA match (even with boosting) is too small to be detected. The obvious splitting of the descendants paths is PGM, so the MRCA have to be TDTS’s parents or his grandparents, probably not his G-Grandparents, certainly not his 2*G-grandparents.

p { text-indent: 0.3cm; margin-bottom: 0cm; background: transparent }a:visited { color: #800000; so-language: zxx; text-decoration: underline }a:link { color: #000080; so-language: zxx; text-decoration: underline }

+6 votes

I would consider it worthwhile to upload your ancestry DNA test results to MyHeritage, FTDNA and GedMatch and look for these identified cousins there and other cousins also. These three services give you segment data and with that segment data you have the possibility of triangulating your DNA with known cousins.

Wikitree does have the capacity to record DNA triangulation and has a handy tool to help you identify matches who are an appropriate distance apart.

See Help:Triangulation
Three or more cousins need to all match each other on a single segment of DNA that is at least 7cM long. Seven cM is a bare minimum and presumes that the cousins' relationship back to their most recent common ancestor(s) is well-documented.

AncestryDNA unfortunately does not give you the tools to identify the segments you share with your cousins.

by Anne Young G2G6 Mach 9 (97.4k points)
edited by Anne Young
+5 votes

You asked if the DNA evidence you have compiled proves that the New England Dewey line is related to your English Dewey line.  The answer quite simply is no.  It would be impossible to 'prove' a common ancestor of 400+ years ago using autosomal DNA.

The only way to prove a connection will be through Y-DNA testing.  A basic (Y-37) won't work, although initially you could see if American Deweys share your extremely common R-M269.  If so, then more extensive Y-DNA tests will be necessary to narrow down the haplogroup.

Reviewing the descendants listed on Wikitree for [[Dewey-54|Thomas Dewey (abt.1613-1648)]], there are few, if any sources, for the majority of the children and their children to connect them to their parents (e.g. no wills, land records).  This leads to a possibility that the autosomal DNA matches you and your uncle have in common with American Deweys may not be from the Deweys.

by Darlene Athey-Hill G2G6 Pilot (552k points)
Darlene, I think you're understating the potential value of the Y-37 test for his situation.  For two test takers with the same surname it provides a lot more information than just haplogroup.

Terry, if you do decide to pursue Y-DNA testing (and I really hope you do), it would be best to post a new thread, or perhaps post in one of the Genetic Genealogy Facebook groups to help design a study based on your goals, budget and available testers.

Related questions

+3 votes
3 answers
+11 votes
4 answers
+2 votes
1 answer
198 views asked Dec 29, 2021 in Genealogy Help by Norman Dodge G2G6 Mach 1 (15.4k points)
+10 votes
2 answers
117 views asked Sep 9, 2021 in The Tree House by Susan Dewey G2G Crew (460 points)
+3 votes
1 answer
+3 votes
0 answers
122 views asked Oct 24, 2018 in Genealogy Help by Janine Barber G2G6 Pilot (236k points)
+3 votes
1 answer

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...