Need help with a DNA conundrum

+7 votes
507 views
I am seeking someone willing to look at my tree and my DNA results from Ancestry.com DNA. I know enough to be dangerous, and this is out of my league. (Names have been obscured.) I have a cluster of more than 7 DNA matches who are clearly family to each other. They share DNA with my 1800s distant cousin, who was related to my 2-G grandma "E" via her maternal line. The 3 largest matches are:

1. Gee shares 205cm with me;

2. his 1c cousin Kay shares 182cm with me;

3. Gee's half-sibling H shares 132cm with me.

I thought I had these 3 placed as descendants of E's brother. However, that would make them my 3c1r, which can't be the whole story or is wholly incorrect; these numbers are too high for that. (No one in this cluster shows up in my 23andMe DNA, nor in my GEDmatch uploads of both files.) Thanks in advance for any help.
in Genealogy Help by Deb Gunther G2G6 Mach 2 (23.4k points)
edited by Deb Gunther
Deb, have those matches also uploaded to GEDmatch?  Have they tested with 23andMe?  (You say they didn't show up, but it's unclear to me whether that's because they're not there at all, or because, while there, they don't show as your matches.)
I'm not sure how to determine whether these relatives have placed data on 23andMe or GEDmatch. None of them have been responsive to my queries besides a brief acknowledgment or two. All I have to go on is their cryptic name(s) on ancestry DNA, and in a few cases, they have attached their trees. That's how built out their tree, and I found out that their one ancestor from the 1850s was in the same town as my ancestors. I suspect that man, M., may have been fathered by my 2g-grandma's brother.
I would contact them and ask them.
I have done so. They're not known relatives, and it's highly probable that their ancestor is a surprise NPE. Therefore, here we are chasing the DNA evidence.
I meant that you could ask them whether they had tested with 23andMe or uploaded to GEDmatch.  It would be useful to have the chromosome detail.  GEDmatch is free, so I wonder if you could get them to upload there.

To answer the question you asked, yes, for what it's worth I'd be willing to look at your tree and your DNA results.  Please send me a WikiTree private message if you'd like to pursue that.

I've spent a long time working with my own Ancestry DNA matches, but I will tell you from my sad history of overpromising help to other people that it is much, much harder to figure out what's happening when I have no familiarity with a family.  So I'd be happy to take a look, but may not be much help.

3 Answers

+4 votes
 
Best answer

This is a tangent--sort of--so I thought about making it a comment rather than an answer. But it doesn't seem to to fit as a comment and, surprise!, it grew too long to allow it to muddle another conversation.

Valerie offered the good advice of checking the Shared cM Project tool at DNA Painter. Fairly recently, Jonny Perl, the author of DNA Painter, made the individual relationship boxes in that tool clickable in order to present a pop-up window of the histogram for that particular connection. I think those are very important to look at; Blaine Bettinger includes them in the full PDF report of the Shared cM Project, but it's easy just to consider the low-to-high value ranges presented and assume that they're all similarly accurate. The probabilities as shown from calculations by Leah Larkin (The DNA Geek) help a lot, but the data still are what they are.

The fact is, we have no experimental, peer-reviewed research to help substantiate what are and are not valid centiMorgan ranges, or even percentage sharing ranges...which would be somewhat more precise because centiMorgans can be calculated differently depending on at least three independent factors, and because the values we see from testing companies can start out calculating the whole genome differently, from a high of 7494cM to a low of 6800cM (some studies have calculated the male genome to 5618cM). If we get new information any time soon separate from the Shared cM Project, my bet is that it will come from the Williams Lab at Cornell University in the form of extended computer simulations.

But for now, the Shared cM Project is the best source there is, and even Blaine is very clear that, by its very nature, crowd-sourced data can be inaccurate and that there is no way to vet what input is and isn't valid. The reliance is on volume: with enough submissions by knowledgeable people, mistakes and errors can be sublimated because they should be far fewer by proportion.

I'm not entirely sure that's true when it comes to self-reporting of DNA and genealogy, but Blaine does take a manual route to help eliminate the most egregious problems: he removes, from the analysis of each relationship, 0.5% of submissions from the high end and 0.5% from the low end. This arrives at an approximation of a 99% confidence interval. For this last update, he also supplied an estimated standard deviation for each of the 10 major groupings he uses (down to 4C1R; he doesn't estimate farther removed than that). To me, this was just as important as Jonny making the histograms immediately available on the tool at DNA Painter. It's really rough, back-of-the-napkin stuff, but I took those standard deviations and made a table (a PDF file viewable here) by backing into tighter approximate confidence intervals of 95% and 68%.

Other than the most pervasive errors in the data--people submitting information that is incorrect either because they misname the genealogical relationship (e.g., a 3C rather than a 1C2R) or believe they have identified a distant cousinship for which the DNA is invalid and can't be substantiated--the problem with crowd-sourced data is that we can't know what we don't know. Most testing companies will stop reporting, reasonably so, at segment lengths smaller than 6cM to 8cM, depending upon the company. With GEDmatch you can go smaller, but since they do no phasing or imputation even segments shown as large as 10cM can be false a significant percentage of the time. The result is that the lowest values will always be underrepresented in the Shared cM Project--at least in more distant relationships, and we can see this starting to happen when the histograms begin to show counts that are decidedly to the left in the graphs, as with 3C1R--and the averages skewed to a sharing amount that is artificially high.

Deb's 3C1R relations are showing as 132cM, 182cM, and 205cM. The projected 68% confidence interval shows a range of 16cM to 80cM for that relationship, with a theoretical average of 27cM. Looking at Blaine's histogram, 96.9% of all 3C1R reported matches were 125cM or less; 99.7% were 175cM or less. So, yes: the values are out of bounds for 3C1R.

Genealogy is really tough to mix with DNA because we come into it by default carrying a great big bag of confirmation bias.  smiley  Over the years I've communicated with folks about DNA "evidence" in everything from low-resolution mtDNA tests to fairly outlandish items like autosomal DNA triangulations out to 10th cousins...things where the odds are about like being bitten by a shark while being struck by lightning. It's our family, so of course we tend to think that, since outliers do exist, that we must be the outlier. But that obviously can't be the case or there would be no outliers. If something looks out of the ordinary, it's probably out of the ordinary.

Last up is that exceptionally large autosomal DNA sharing is often written off as pedigree collapse somewhere back in the tree. That definitely can have an effect, especially in populations that have been endogamous in the past couple of centuries. But pedigree collapse is in all our trees, and a few instances of it, even in the genealogical timeframe, don't necessarily show up to any great extent among current-generation test-takers. The genius of biology is that occasional pedigree collapse doesn't have much lasting impact...otherwise the species would never have survived extreme population bottlenecks that have occurred in history.

Speaking of extremes, think Game of Thrones. The Lannisters' Cersei and Jaime were fraternal twins...who share the same amount of DNA as full siblings. Their children, had Cersei and Jaime been unrelated, would normally have shared about half their DNA. But Joffrey and the other kids (can't remember their names) were also nieces/nephews of their parents. So Joffrey et al. would also be 1st cousins because their parents were brother and sister. The average 1C sharing is 12.5%, which means instead of sharing 50% with each other, the kids would have shared about 62.5%.

Say two of Cersei's and Jaime's children had kids of their own but, this time (thank goodness), the other two parents weren't related to the Lannisters or to each other. So the kids--Cersei's and Jaime's grandkids--would be double 1st cousins rather than plain ol' 1st cousins. Their genetic relationship would look a little more like a half-sibling than a 1C, and they'd match on about 23.4% of their autosomal DNA. In the imperfect world of centiMorgan calculations, that would mean somewhere around 1,592cM to 1,756cM rather than the 850cM to 937cM we would expect for 1st cousins.

Continuing that pedigree, if these grandchildren of Cersei's and Jaime's grandkids then had children of their own--again with unrelated parents; the pedigree collapse stopped with Cersei and Jaime--then those children would be double 2nd cousins to each other. Genetically, that's about the same amount of sharing we'd expect from 1C1R: about 6.25% instead of the 3.125% expected of regular 2nd cousins. In centiMorgans, we're looking at roughly 425cM to 468cM rather than the 212cM to 234cM expected of 2nd cousins.

When we get to Cersei's and Jaime's 2g-grandchildren in this same unrelated-parents progression, we're down to the genetic difference between double 3rd cousins and regular 3rd cousins, and a distinction of about 106-117cM versus 53-59cM. Not insignificant, but we're already to a point where the amount of shared DNA can't readily distinguish between the pedigree collapse scenario and the one where none of the parents were genetically related.

They key is that the pedigree collapse didn't continue, that half the DNA in each birth came from an unrelated source.

Repeated pedigree collapse, as in endogamous populations, can be extremely difficult to evaluate...which is why for recent instances of endogamy, as with the Rapa Nui people of Easter Island, autosomal DNA is generally useful only to a couple of generations previous. But the effects of even extreme instances of pedigree collapse, as with the Lannisters, is diluted in the gene pool fairly quickly if the subsequent parents are otherwise unrelated.

Net message here is that overlarge DNA sharing results, proportionately speaking, shouldn't be quickly written off by the possibility that some set of distant ancestors were related. In Deb's case, the sharing amounts are about six times greater than would be expected for a 3C1R relationship, or around three times greater than the top 63% of respondents to the Shared cM Project; two times greater than 84% of the respondents. If the most recent common ancestors really are the 2g-grandparents then, using the Lannister example, even if they had been brother and sister what we'd expect to see down at the 3C1R level would be a DNA sharing of 0.7813% instead of 0.3906%, or a difference of about 25cM.

Edited: Crikey! A terrible, disconnected sentence structure, a poor choice of words, and a typo. I shouldn't be allowed near a keyboard today...

Edited Again: Some rightly pointed out that, in Game of Thrones, Jamie and Cersei Lannister were twins, and I had written that they were full siblings. Being male and female they were, of course, fraternal or dizygotic twins, not identical or monozygotic twins. The amount of DNA they shared would be no different than would any male/female set of full siblings.

Edited Again Again: Working on a new post made me realize I'd made a computational mistake when describing the descendants of Cersei and Jaime Lannister. Hm. I wonder if anyone has ever written a genetic genealogy summary of the entire Game of Thrones major families based upon the lineages as described in the books...

Not a complete workup of the families in GoT, but I did just find this article, from Stanford's The Tech Interactive. And Cersei's and Jaime's children were Joffrey, Tommen, and Mycella...whose names I will now promptly forget again.

by Edison Williams G2G6 Pilot (442k points)
edited by Edison Williams
Sorry, Edison.  I'm confused.  Who the heck are Cersei and Jaime?  Just an example, right?

laugh For once I didn't use an outdated media/pop-culture reference, and I still blew it. Drats.

George R.R. Martin's seemingly never-to-be-finished epic series of seven novels, A Song of Ice and Fire. Fantasy loosely based on the 15th century's Wars of the Roses. First book published in 1996. Big hit. Optioned by HBO and made into a wildly successful series titled Game of Thrones. Had 626 Emmy nominations and 382 wins. Ran from 2011 to 2019. Since I needed to look some of this up (no, my memory isn't that good): https://en.wikipedia.org/wiki/Game_of_Thrones.

Cersei and Jamie Lannister were the children of Tywin and Joanna, themselves 1st cousins. The books (and TV series) had not only political intrigue and epic action, but genetic genealogy! smiley

Oh, a BTW since I mentioned Lord Tywin. Because he and his wife were 1st cousins, Cersei and Jamie started out sharing about 53.125% of their DNA: 50% for being siblings plus another 3.125% for also being 2nd cousins. In my answer I noted that the children of two siblings would share about 62.5% of their autosomal DNA (full siblings plus 1st cousins), but in the "real" case of Cersei's and Jamie's children, they would share about 63.3% because they would also have a pair of 2g-grandparents in common, adding another 0.8% (well, technically 0.7813% for average 3rd cousin sharing) to the mix.

Obviously we can't start at 106% to get to get a half that's 53%, but when comparing one person to another we typically look at half-identical regions; each of our 44 autosomes come in pairs--one maternal and one paternal--so matching occurs if either of the two alleles from the chromosome in one person is the same as one of the two alleles in the other person. The extra sharing here comes in the form of fully-identical regions, where both alleles match, not just one.

Ain't pedigree collapse fascinating?

Edited: To include an answer to a question about how, if we get one autosome from mom and one from dad, the two combined can add up to over 100%. A more than reasonable observation I should have addressed earlier.

Edison, I think I follow. And the GOT example makes it more interesting. When you talk about adding in percentages from extra ancestors, is there an easy way to calculate that?

Hi, Lucas. The calculations aren't complicated, but they can get a bit tedious for complex cousinships. The math is derived from what's called the Coefficient of Relationship, something created many decades ago as an aid to animal breeders. For details about that, you can visit a website by F.M. Lancaster, retired senior lecturer at Harper Adams University College, Shropshire: http://www.genetic-genealogy.co.uk/Toc115570135.html.

A good general resource is the "Autosomal DNA statistics" page at the ISOGG Wiki: https://isogg.org/wiki/Autosomal_DNA_statistics. It provides a couple of different tables and charts, but also, quite importantly, an explanation of the two different ways the sharing between full siblings can be calculated. For that, hop down to the section titled "Distribution of shared DNA for given relationships" and read about Methods I and II for those computations.

I made a table several years ago that extends the theoretical expected sharing out to 7g-grandparents and 8C4R...unnecessary to go that far with autosomal DNA, but there you are. It doesn't show half-cousin relationships, but as with a 1x removed, you simply halve the full relationship value.

Now that I look at that 2017 table, it could stand a healthy overhaul. Excepting the oddity of full-siblings, as mentioned, the percentages are all correct. But despite some commonly held opinions, centiMorgans are a moving target.

I'll skip other reasons why centiMorgans are far from precision tools, but the main one that affects us when we try to convert percentages derived from the Coefficient of Relationship is that companies choose different models to represent a whole, sex-averaged genome. I believe the lowest whole-genome cM estimate we've seen among the major testing companies was used by FTDNA in its first iteration of the Family Finder test. At that time, the totals were based on a 6761cM genome. By contrast, 23andMe is the only company that includes xDNA values in the totals...provided that minimal autosomal thresholds are first met. At one point, 23andMe used 7494.8cM as the total for a female genome (but that's still translating it as a sex-average value). GEDmatch currently uses 7172.7cM as the basis for a whole genome, and prior to GEDmatch Genesis that was 6800cM.

The Shared cM Tool at DNA Painter was mentioned, and if investigating percentage sharing there be aware that Jonny uses 7,440cM as the calculation basis for translating percentages to centiMorgans.

One of the main things I should do with that 2017 chart I made is to go back and provide a range of centiMorgans for each percentage value, a range that runs from a 6761cM genome to one that calculates with 7494.8cM as the basis. This is as simple as multiplying the percentage figures by 67.61 and 74.95 to get the respective ranges.

Probably not needed, but it's important to note another caveat. The percentages are computational averages only, not real-world actuals. Though I believe they're a good place to start. The reason is that they're a baseline that's repeatable and that has stood the test of time. Benchmarks, on the other hand, can differ circumstantially based on things like the population in question (even the specific family lineage and pile-up regions associated with it) and whether the inheritance chains involved are, for the most recent generations, predominantly all male or all female (I won't dive into a separate tangent here, but the female genome produces roughly 46 DNA segments when forming a gamete compared to the male's, equally roughly, 27; since the centiMorgan calculation is based upon predicting an estimate of crossover locations and frequency, even though they're the same physical size in the 22 autosomes the female genome maps to a cM count about 70% higher than the male's).

Speaking very generally only, the variances from the baseline should be increasingly noticeable as the relationships grow more distant. The reason is simple: we begin dealing with smaller and smaller individual--and often singleton--segments, and the imprecision of the centiMorgan means that the analyses become more prone to error. You can see this in the crowd-sourced Shared cM Project data. If you check the histograms for Blaine's Groupings 1 through 3 (full siblings through 1st cousins) you see that the distribution looks much like a familiar bell curve, with the highest values in the center, and lower values tapering off to each side. Visible skewing begins with Grouping 4 and by Grouping 7 (3rd cousins) it's become starkly evident. Plus, the farther back we go the more the amounts of DNA contributed by any given ancestor will differ from the theoretical. Our parents can only give us 50% each, but grandparents aren't "restricted" to exactly 25% each. And by 8th great-grandparents the odds are a little over 50/50 that any given ancestor will have contributed no DNA to us (Coop, 2013).

Edited: Added a link to Graham Coop's post.

+5 votes

Hi Deb, you probably just share an additional distant relationship with these three matches. Your relationship with Gee is just outside of the normal range for a 3C1R, but your other two matches are on the high end of normal (see the Shared cM Tool). In your shared matches tab on Ancestry, do each of these matches share DNA with your other known matches from this particular line? Have you confirmed that your match's trees/genealogies are accurate? If yes, then you don't need to be concerned.

by Valerie Penner G2G6 Mach 7 (77.4k points)
+2 votes
I'm not sure what you mean, by some of this, but if I understand you correctly, you have 3 matches, all of whom are grandchildren of "John" and "Mary" (whose actual names you know), and they come up as 205cM, 182cM, and 132cM. We don't really need to know the names of the three, or even how they're related to each other, asdie from that "John" and "Mary" are their most recent common ancestors (MRCAs).

The bottom line is that obviously "John" and/or "Mary" are related to you. Normally, only one would be related to you, and the cM values would tell you that "John" is likely a brother of one of your own grandparents or "Mary" is a sister of one of your grandparents. In other words, that these three matches are 2nd cousins of yours. It sounds like you can tell that that's not the case, so apparently we don't have a "normal" situation.

Nonetheless, the focus should be on "John" and "Mary". Probably, they are deceased, and so it shouldn't be a big deal, I wouldn't think, if you simply say who they are, when they were born, where they were from, etc. You never know when a detail will help.

You've got a DNA confirmation for all of your gt-gt grandparents, so your Shared Matches lists for these people (there's a tab for that, but it sounds like you know about that stuff) should tell you exactly which gt-gt grandparents they are related through (which again, doesn't really seem like which one(s) is a privacy issue).

Certain populations are known to have endagamy as an issue. That means that it has been a fairly small, isolated population where everybody is everybody else's 5th cousin in six different ways - the cM numbers for such a group are inflated, and tell you nothing, in such cases.

I looked at the DNA match list for a guy whose mother was from Puerto Rico, for example. It was a nightmare - it seems like practically every Puerto Rican has at least SOME sort of match to every other.

I also looked at the DNA matches for an adopted woman whose roots included a rural county in TN. It was a mess - apparently, practically everybody in the county is descended from just a handful of early settlers who had large families. The cM values were useless.

Eastern European Jews are especially well-known for having endagamy. I see a lot of Polish folks on your tree, but I didn't look to see if they might be Jewish. If not, it's possible that certain Polish communities might be endagamous (I just don't know). So it might be helpful to know which part of your tree we're talking about.

Another compliciation is if your matches are in a diffferent generation vs you. Birth years for "John" and "Mary" might give us a decent indication whether you're in the same generation as you matches or not. Again, they are the key. The trick is to find some possible connection between "John" and/or "Mary", and your specific gt- or gt-gt grandparents that you're connected through.
by Living Stanley G2G6 Mach 9 (91.3k points)

Related questions

+7 votes
5 answers
475 views asked Apr 19, 2021 in The Tree House by Robin Lee G2G6 Pilot (864k points)
+3 votes
5 answers
335 views asked Jul 4, 2021 in Genealogy Help by R Power G2G1 (1.1k points)
+5 votes
4 answers
309 views asked Jul 3, 2021 in Genealogy Help by R Power G2G1 (1.1k points)
+4 votes
1 answer
+4 votes
2 answers
+8 votes
1 answer

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...