G2G: What do "DNA Connections" on the right panel of our profiles tell us

+6 votes
603 views
I am beginning a "more than curiosity seeking" effort on DNA's part in my husband and my profiles, I'd find it useful to take a "first step" while glancing at our histories.  I see DNA connections which seems to show our DNA contribution to the family that we've made public. Posted are those in a  given  family group who we can gedmatch with if their information remains public.

1. Am I correct if I assume the comparisons I connect with in that group are autosomal and can go back 200-300 years?

2. What can I expect to see from the relevance of these names where the profiles they connect with are concerned?
WikiTree profile: Amanda Torrey
in WikiTree Tech by Amanda Torrey G2G6 Mach 1 (15.6k points)
retagged by Amanda Torrey

1 Answer

+13 votes
 
Best answer

Those are the names of other Wikitreers who have posted information about DNA tests they have taken to their profiles here and who, according to the information they have entered, are closely related enough to the profiled person to be expected to share measurable DNA with them (or their descendants).  Of course if the relationships aren't correct, then there won't be a match.  In order to actually find out if anyone is a DNA match to you you'll need to use a third party site like Gedmatch.

The percentages you see are just mathematical progression - 50% with a parent, 25% with a grandparent and so on, just an estimate of what you'd be expected to share.  Matches go out to 8 degrees of separation, up to 6th great-grandparents and out to 3rd cousins for autosomal DNA.  Y-DNA and MtDNA matches are listed separately.

Here's a link to the Help page: Help:DNA Test Connections (wikitree.com)

by Kathie Forbes G2G Astronaut (1.0m points)
selected by Amanda Torrey

OK, now more specifically there are 2 profiles which are being contested. 

Re: Mary (Glendinning) Gillespie - WikiTree Profile

There are many mysteries around a family who settled in Augusta County VA in the 18th Century mostly as consequences of the French/Indian War.. 

We have four generations as part of this 18th Century profile. Mary married William and this family was horribly confused with other families' genealogy, but this one settled in Augusta County Virginia with her husband and 6 children, (maybe a 7th which distracts from this point), and 2 more after their family left the area. Mary's last name is absent in documents we've found so far. There are some opinions that have attached her to parents that live near here. Now there is some dispute.

So here's the question: 

Janice is a descendant and she appeared as the first of 40 other descendants on DNA Connections at 1.56%. 

She appears on DNA Connections for all Mary's children at .78 % but one, Elizabeth, their 5th child, where she's 3.12%. I am assuming that means she descends from Mary and Williams 5th child, Elizabeth. 

Mary's parents are being disputed. Janice appears on Esther (Mary's "mother") at .78% (with a lot of other descendants)   She appears on Archibald (Mary's "father") at .78% (also with a lot of other descendants). Janice is not listed on Esther's first husband's profile. Nor is she listed on Archibald's first wife's profile.

Is it safe to assume that Mary who is mother to children all of which Janice appears as a DNA connection is the daughter of Archibald and Esther?


None of the DNA connections list shows any actual relation of any of the people listed to anyone else.  

All that is saying is that if each of those people's trees are correct then that's the amount of DNA they would be expected to have from the profiled person; it's really just another way of saying how many degrees of separation each person is from the profiled person.  The only way to know if any of those possible descendants are related to one another is to use something like Gedmatch to see if they actually match in the same locations, which would show that they descend from a common ancestor but not necessarily the one they think it is.  These descendants may be too far from one another to actually share any DNA.

So before I start, :D

actually I started that process a few years ago, gedmatching individuals with each other. This was particularly important because my ancestor was one of the from PA, and her profile led to this family. I couldn't figure out how that was possible so I gedmatched myself with all those descendants. Additionally I had a 3d cousin follow me down and we gedmatched him with some of the PMs. There was a lot of work and before I went deeper, I decided I needed more expertise...so I hoped the DNA connections that I already did could give me reason to go further. Now I believe in order to REALLY understand these 18th century profiles, I need a better understanding of segmentology.

But without getting too complicated, I'm trying to start at a beginning. And I'm not sure you've answered my question. If you did I apologize...I've been dense before and I need to understand this to understand interpreting a profile.

For example, my second profile I question: there are two people who came into Savannah in 1733 a couple of months apart, both named Peter Morel. They both have entries in Early Settlers of Georgia. One brought his wife and 4 unnamed children off the boat to Savannah, the other identified his wife Martine who died that year and two children as John and Mary Anne and settled in High Gate GA. The Will names 4 children besides John and Mary Anne and a wife. One would assume then he remarried and they had 4 children who are also named in the will and not off the boat together moving to Highgate.

So as that's not enough, I thought the fact that DNA descendants are all on the profile to Highgate and not at all on the one for Savannah. There is a descendant in the one to Savannah, but we don't match -- at all. I need to see if he matches with any of the other descendants...that's probably the step I'm missing, yes? But would it be enough? Let's say I do a match with the descendants of the Savannah and High Gate profiles, do I need more? What else do I need?

Oh and Kathy? thank you -- I'm really giving you a workout tonight...

You need to confer with an actual DNA expert to determine whether the Gedmatch matching you have done shows that you and your potential matches here on Wikitree all descend from a particular common ancestor that far back.  If you add DNA as a tag to your original post someone will see it.

Thank you Kathy, you're the best!

Hiya, Amanda! Being the DNA curmudgeon that I am, I have to rain on the parade a bit, I'm afraid.

By 3rd great-grandparents it is very difficult; beyond 4th great-grandparents it is quite unlikely; and beyond 5th great-grandparents highly improbable that autosomal DNA can be used as an accurate form of evidence indicative of a most recent common ancestor. You will hear many people claim otherwise--even some popular bloggers on the subject--but not one of them can point to a rigorous scientific study indicating that is not the case, while there are multiple factors in both the biology and the math that would indicate that it is.

That's why Family Tree DNA's v5.0 matching algorithms are based on simulations out to a maximum of eight generations, 6th cousins. In their help article, "Detecting Relatives And Predicting Relationships," 23andMe notes: "when we say that two individuals are unrelated in this help article, we mean that their common ancestor is 9 or more generations back. At AncestryDNA, their "match categories" end at "4th cousin and more distant," and on that help page they state: "Percentages of DNA shared between relatives at the 4th cousin level and beyond may signify any number of distant relationships, but the genealogical relationships are unlikely to be closer than six degrees from the test taker."

Bottom line is that autosomal DNA simply can't be used with presumed accuracy beyond a certain number of generations. The chance of two DNA-tested cousins sharing any measurable DNA at all drops off quickly: there's about a 46% chance that two 4th cousins will share DNA, but less than a 15% chance that two 5th cousins will. The odds drop even more dramatically when we consider finding three or more cousins who share some of the same DNA from the same identifiable ancestor (what we consider to be autosomal triangulation): finding three 5th cousins would be somewhere around a 0.075 probability; three 6th cousins, around 0.021.

It's the number of generations that matter--we can loosely consider them as meiosis events--and not really the years. But if we were to estimate a benchmark we might assume a contemporary test-taker was born in 1960, and the average generational interval to be 27 years. That would place the "unlikely verifiable" limit at an ancestor's birth in 1798, and the "highly improbable" limit at 1771.

My last cautionary note is that I would very much recommend using larger segment thresholds at GEDmatch than you might at the major testing companies. The testing companies all use varying forms and degrees of computational phasing and/or genotype imputation to better estimate the validity of a reported segment, but GEDmatch does not. In order to accommodate the array of different direct-to-consumer tests, some of which use only 17% of the same markers, GEDmatch lowered some of the default minimum threshold values that they used to use before the "Genesis" version was moved into production.

I've done some informal comparisons for results obtained from popular tests to those derived from whole genome sequencing data, and below a minimum segment size in the teens, the number of seeming false-positives grows rapidly. For example, out of 69 segments examined as matching one of our current GSA chip tests at 10cM in Tier 1 one-to-one matching, only 14 of them, or 20.3%, also reported as a match against the WGS data. The 69 segments reported for the chip test averaged only 337 SNPs per segment; 14 of those that were also in the WGS data showed an average of 1,033 SNPs per segment. Using a SNP density calculation that assumes the microarray test looks at approximately one base pair along our chromosomes out of every 4,800, the segments reported from the GSA chip test showed a SNP density of 0.17, while the WGS test data had a SNP density of 0.533. The implication being that the much larger number of markers in the WGS data yields more accurate comparisons, and that 79.7% of the segments from the GSA chip test were false-positive.

My recommendation at GEDmatch is to not go below a minimum segment size of 12cM for matching, and to manually use, where you can, a minimum "SNP Window Size Threshold" of at least 700. When using the free one-to-many limited version or the Tier 1 one-to-many full version, set the "Overlap Cutoff" value to at least 90,000 to help minimize the potential false positives arising from low in-common SNPs among different test versions.


I truly appreciate what Edison Williams puts into his answers! always have. It's because of answers like this which I welcome because I am probably going to be delving into this in the near future. My husband's profile tagged Peter Roberts somehow so he invited David to join mitoYDNA which we did. I joined too as long as it was there.

I have worked with thousands of family profiles. When I started our trees, I had no idea it would be so extensive. Here's the thing: "Janice" appears in the DNA Connections with this one colonial family. I guarantee there is no way to make her appear anywhere in my husband's lines. I've looked for her on other familial profiles with the same names, but she's absent. I can see the percentage. Of course the percentages are small. Of course we know gedmatches will show false positives. but its hard to reconcile repeated appearances of a given name down some family lines and not others. I have never seen this discussed anywhere.  

I backed up the gedmatch to 3 to see where she goes. I found her same ratios for me onn a whole other line who married into the same last name.  One from my PA father's side and one from my GA mother's side. There was a marriage of two cousins marrying into two men in my families.

Before finding cousins who married into these two seemingly separate families those low gedmatches showed me there was something going on somewhere --

And although Janice iss apparently a 4th cousin, the connections wouldn't be relevant to wiki,because the cM's are too low, even if she has been a strong marker when differentiating our family from another with the same last name.

Its not arbitrary. its not random. it happens all the time. even with the pile ups and false positives. When 40 curious descendants create a pattern under specific conditions, I have to pay attention. This is especially important when a church burned down in the area with all the vital records.

I just want to understand how this phenomenon happens or even works in spite of pile ups and false positives so I can have better management over family groupings and their genetic patterns...

Finally - I am JUST beginning to understand https://segmentology.org/2020/01/31/in-defense-of-small-segments/ which becomes so tantalizing for families that have a large group of curious descendants whose tests register under DNA Connections...I understand

copied from one of my responses on my comment on a family website which I use to BEGIN mapping out descendants and applied to earlier ancestors:

you are 8 generations from Mary. If you and Ian share a common ancestor, the closest it could be would be Mary's grandfather (since her possible father was the immigrant on our side). That means at least 10 generations. Applying Jim's formula, we would need 10 matches (including you), all of whom are known or suspected descendants of Mary's Clendennin "grandfather", and all of whom match on the same one or two segments." and with this large a DNA Connection we might be able to do just that. BOOYAH lol or not.


Well, I said I was a curmudgeon about this stuff, so I might as well continue to live up to that cantankerous moniker. If someone tells you that you can go deep into your genealogy using autosomal triangulation, ask them a very specific question:

"Can you provide me a citation to a study published in an academically-respected, peer-reviewed journal that expressly examines the use of distant cousins and microarray tests to establish an MRCA via autosomal DNA triangulation and indicates it to be a valid, accurate, and broadly applicable methodology?"

If you don't get an outright "no," what you'll get instead, usually, will be one of three forms of logical fallacies in response. 

"My own family research shows over and over that autosomal triangulation using distant cousins is an accurate methodology" (petitio principii; the most common form of this fallacy is using an unproven conclusion as the very evidence upon which which a claim is made that the conclusion itself is valid; i.e., a form of circular logic).

"If you read material by Popular Blogger ABC you'll find that autosomal triangulation using distant cousins is an accurate methodology" (argumentum ad verecundium; literally, "argument from that which is improper"; this fallacy capitalizes on feelings of respect or familiarity with a well-known individual who might actually know very little about the topic).

"Experienced genetic genealogists know that autosomal triangulation using distant cousins is an accurate methodology because it's used all the time" (argumentum ad populum; a favorite rhetorical device of propagandists and advertisers, this is basically an "everybody knows it's true" failure in critical thinking...like, "everybody knows the sun orbits around the earth").

At the end of the day, though, what's important is your specific goal. If the goal is accuracy and the ability to use DNA evidence in keeping with the analytical guidelines of the Genealogical Proof Standard, that's one thing. It's an entirely different thing if it's only a hobby and, as with the proverbial horseshoes and hand grenades, "Close enough will do; we don't need to over think it." Which though, I would argue, is the same way we end up with such vast numbers of blatantly incorrect public trees stuck rather permanently in the Raiders of the Lost Ark-like warehouse of the internet.

I went into typically lengthy detail in a G2G thread last month about what I consider the two--seemingly simple but actually quite difficult--criteria to determine whether "matching" DNA segments can be useful for genealogy, the concept of genetic similarity, and some reasons why we see so many small, false-positive matches.

Using the GEDmatch free one-to-one autosomal tool, if we lower the threshold to 3cM and leave the SNP window size at the floating default, then depending upon the two test versions used in the comparison just about anyone with the same continental population origins will show as if they have matching segments. But all those tiny segments shown in the 3cM to 6cM range are, a very large majority if not all of them, false. And there's no way to determine which are real and which aren't; as Blaine Bettinger has pointed out, triangulation can't actually do that for you. 

Amanda, as a simple experiment for the Mary Glendinning problem, since you have American Southern Colonies roots as do I, try giving a look at some random people among the 165 WikiTree members who have the Southern Colonies Project member badge. Locate some who have their GEDmatch kit number on their profiles. Give a run at GEDmatch doing a one-to-one comparison with the centiMorgan threshold set down to 5cM, and then 3cM.

I just did that using my 23andMe v5 kit rather than my WGS superkit, and the first five totally random WT members showed this:

#1
At 5cM: 21.9cM over three shared segments, largest 9.6, 288 SNPs
At 3cM: 31cM over five shared segments, largest 9.6, 288 SNPs

#2
At 5cM: 15.3cM over two shared segments, largest 8.9, 236 SNPs
At 3cM: 23.2cM over four shared segments, largest 8.9, 236 SNPs

#3
At 5cM: 11.6cM over two shared segments, largest 6.2, 298 SNPs
At 3cM: 20.9cM over four shared segments, largest 6.2, 298 SNPs

#4
At 5cM: no match
At 3cM: 8.3cM over two shared segments, largest 4.6, 263 SNPs

#5
At 5cM: 19.5cM over three shared segments, largest 7, 210 SNPs
At 3cM: 19.5cM over three shared segments, largest 7, 210 SNPs

Notice the very low in-common SNP counts. Our tests can't look at about 8% of the genome, leaving us with about 2.85 billion base pairs. The average microarray test examines roughly 650K markers, so across the genome we're seeing only about 1 in every 4,400 base pairs. Highly rounded, one cM represents very roughly 1 million base pairs. Ergo, that should mean somewhere around 225 SNPs per centiMorgan. Across all our DTC microarray tests, the median overlap of same-to-same SNPs tested is 45.6% (the lowest is 17%). If we drop the 225-per-cM average down by that value, we get about 103 SNPs per cM: this should be what we consider a baseline SNP density for comparisons. Any density lower than that means we have, proportionately, more gaps than average between the SNPs compared. So we should see about 310 SNPs for a 3cM segment; 515 SNPs for a 5cM segment; 720 SNPs for a 7cM segment; and of course 1,030 for a 10cM segment.

If you remember, GEDmatch used to have a default of 700 as the minimum SNP window size. With the Genesis version moving into production that changed to a dynamic variable of between 200 and 400. Recently, the default has been loosened yet again: now it simply says "about 2/3 of segments will have between 185 and 214 SNPs." Those values are way too low and are one of the leading causes of so many false-positive results at small segment sizes.

Even so, while that SNP density benchmark can help in quickly eliminating some segments as unlikely because there are large gaps between compared markers, of itself it is no assurance that a small segment is actually valid. That sounds counterintuitive, but one of the things to keep in mind is that as many as 18.8%--almost one in five--of the SNPs targeted by our current version tests are examined specifically for clinical research purposes. They are not ancestry informative markers. If a matching segment happens to encompass or overlap an area that includes protein-coding genes (and their flanking areas) that are of interest to clinical researchers, the result can be sets of SNPs that are counted and compared but that actually have very little to do with genealogy. In the exome--the regions of the genome that comprise the coding genes and some of their regulatory systems--we're all almost entirely identical.

Without some of the sophisticated genotyping algorithms used by most of the testing companies, GEDmatch simply has no way, other than simply counting, to attempt to validate that a displayed segment is really a shared segment at all. However, as shown on the ISOGG Wiki, even the major testing companies have a significant false-positive rate with small segments. On that same page, data gathered by John Walden and Tim Janzen indicate--based on trio-phasing only and not close examination by WGS sequencing--that a reported 7cM segment will be false 58% of the time; a 5cM segment will be false 86% of the time; and a 3cM segment will be false at least 99% of the time. My own informal results comparing data derived from whole genome sequencing to those from 11 of our common microarray tests would indicate that, at GEDmatch, those numbers are conservative. That even at 10cM and using the other default settings, for some microarray test results as many as 70% of the reported segment matches may be false.

Step one of triangulation has to be to determine the likelihood that a reported matching segment is actually a valid, continuous segment. For small segments, a lot of analytical work has to go into it and, even then, it may be impossible to make a decision with the limited microarray data we have. I described some of the considerations in another G2G post last month.

And despite what we often hear, triangulation itself doesn't validate a given segment. Working that way puts us back at the petitio principii logical fallacy. Several reported matches on a segment that is otherwise false or could be better explained by genetic similarity (e.g., linkage disequilibrium resulting from in-common population subsets) than an identifiable ancestral relationship does not make the segment and the triangulation valid. Garbage in, garbage out.

For another perspective, Blaine Bettinger wrote a piece in August 2022 titled, "An In-Depth Analysis of the Use of Small Segments as Genealogical Evidence." It's definitely worth a read.

Now I'll take my soapbox and go annoy people on a different street corner...
laugh


Edison: I'd never call the person that gives what you do so generously, a "curmudgeon." Sensei maybe, Yoda perhaps...

just sayin'

later...

Related questions

+10 votes
2 answers
+9 votes
0 answers
+9 votes
1 answer
+9 votes
1 answer
asked Mar 29, 2021 in Genealogy Help by Amanda Torrey G2G6 Mach 1 (15.6k points)
+7 votes
2 answers
asked Feb 18, 2021 in WikiTree Tech by Amanda Torrey G2G6 Mach 1 (15.6k points)
+4 votes
2 answers
+4 votes
1 answer
asked Feb 3, 2021 in Genealogy Help by Charlotte Boyd G2G1 (1.4k points)
+7 votes
1 answer
asked Jan 5, 2021 in The Tree House by Charlotte Boyd G2G1 (1.4k points)
...