Can we hold a meeting with the leaders of the DNA Project?

+31 votes
693 views
I have received several inquiries about the requirements to DNA confirm ancestors. I understand the importance of this, and I also understand the difference between a 3rd cousin (and closer) and beyond a 3rd cousin.

My questions are these: Is there a cM value we are looking for as of October 2018 to use in triangulation groups? What is this number and how did we come up with it? Can we use two third cousins and one 4th cousin in a TG? If not, why? A single segment should all lead to common ancestor(s) between the three cousins.

I believe a 3rd cousin can flip back and forth between a statement that requires only one other person who is 3rd (full) or closer and then a 3rd cousin (again full) with another 3rd and then their 4th. If no DNA is shared between 3rd cousins (happens 10% of the time) then that would prevent any statement between the two. However, if a single segment exist that matches all three cousins, then to me this defines a triangulation group (TG).

I don't feel a TG is necessary for cousins closer than 3rd, but essential for beyond. My vote is to set the TG threshold at 10 cM and then for every additional cousin in the TG beyond 3 people, we lower by 1 cM, no lower than 7 cM. So 3 people = 10 cM, 4 = 9 cM, 5 = 8 cM, and then 6+ = 7 cM.

With six people who triangulate you could have two separate groups of 3rd cousins and then each group could be 4th cousins to each other. How much more evidence does one need than a single segment of 7 cM between 6 distantly related people?

I welcome all opinions and thoughts on this matter. I'm looking forward to a healthy discussion and hopefully within a few weeks, we can all come to an agreement on what steps we need to take in confirming lines of DNA. I don't have all the answers, but hopefully after this discussion, we will all be enlightened.

Genetic Genealogist
Christopher "Topher Blake" Sims
Facebook Topher Blake
asked in WikiTree Tech by Topher Sims G2G6 (6k points)
You said, "My vote is to set the TG threshold at 10 cM and then for every additional cousin in the TG beyond 3 people, we lower by 1 cM, no lower than 7 cM. So 3 people = 10 cM, 4 = 9 cM, 5 = 8 cM, and then 6+ = 7 cM."

While this sounds sensible to me, I would urge that we not make the use of DNA on WikiTree any more complicated. I suspect that many, if not most, WikiTreers find it too complicated, already.
Bennet George and Topher Sims,

You are correct! Those statements do take some time to do properly.

In one line/surname, I did a triangulation using my DNA and 2 known 3+ cousins, wrote the statement.

Then I added a second triangulation using my Mom's and the same 2 known 3+ a gen cousins (my results not included) to show that she picked up an additional chromosome with the cousins, wrote the statement.

My conclusion is I didn't get from my Dad the DNA my Mom had and shares with these cousins. That could be useful in future triangulations!

Sherrie
I'm with Bennet George, it is too complicated already!

8 Answers

+12 votes

Hey Topher,

It would be great to have your input into how we frame our DNA features on WikiTree.

From our DNA Help Pages: 

Triangulation

If an autosomal DNA test match is beyond a third cousin, confirmation gets complicated. You need at least one more match and all three of you need to match on the exact same segment of DNA.

We want to make sure that we are working with three separate legs to a triangulated MCRA.

The general, acceptable, talked about, used minimum threshold is 7cM. However the information on the auDNA triangulation Page states: "At least three distant cousins (each one related to each of the others more distantly than third cousin) must share the same segment of DNA that is at least 12 cM."

Which segment Threshold is correct? It depends on who you ask - for deeply endogamous groups it should be 20cM or more...

Be great to have some discussions on this to see how the community lines up.

Do you work with a threshold of 7cM?

Do you work with a threshold of 12cM?

What threshold do you use?

Answers from this post and the ISOGG poll (wish we could do polls here!):

7 to 10cM 68 votes
10 to 12cM  15 Votes
5to 7cM  13 votes
15 to 20cM  10 Votes
20cM +  9 Votes

Mags

answered by Mags Gaulden G2G6 Pilot (451k points)
edited by Mags Gaulden
Hi Mags,

not sure how you refer to an industry standard of 7 cM.

23andMe has matches from 5 cM onwards

FTDNA has matches from 1 cM onwards

No sure about Ancestry or MyHeritage. On GEDmatch you can set it yourself.

So I don't think:

a) there is an industry standard

b) as each of the testing companies has their own proprietary algorithms we can use a standard
Thanks for that Andreas, I've never used 23andMe and wondered what their hurdle rate was.

Small addition to your comment about FTDNA. They seem to use a very weird double standard. You don't match with someone else unless you match at least 7cM, but then they add in all those teeny tiny matches down to about 1cM and report that you match 20cM or so (7 plus lots of bits). Either those little matches mean something or they don't - you can't have it both ways.

Ancestry goes down to 6 cM and does not include the irrelevant snippets.

There is also a discrepancy between the rubber bands being used to measure cM between the some of the providers.

Example: a match between two of my project members is reported as 6.1 cM at ancestry but 8.3 cM at gedmatch - this may well vary from chromosome to chromosome, I haven't explored it thoroughly.

ftDNA seem to use almost the same algorithm as gedmatch, at least on the ones I've checked.

Industry standard ??????

I told myself I would sit this one out, but...

I wholeheartedly agree with Andreas and Derrick that the term "industry standard" has no place in this landscape, and is a misconception when applied to genetic genealogy in general. I'm going to try to convince myself not to write more it.  angel  Meanwhile, a synopsis of how some of the testing companies state they perform matching, if they state it at all.

23andMe: 7cM with a minimum 700 SNPs for the first half-identical region (HIR); 5cM and a minimum 700 SNPs for each additional HIR. The error rate allowance seems to be pegged at roughly 1%: 1 opposite homozygote per 300 SNPs, and each opposite homozygote in an HIR must be separated by roughly 300 SNPs. For fully-identical regions (FIR), the threshold is 5cM and 500 SNPs.

Note also that 23andMe adds X-chromosome segment matching into the autosomal total count, and no other company does that. They do not presume a match if only the X-chromosome and none of the autosomes match, but if there is an autosomal match the criteria are:

  • Male-to-male: 1cM, minimum 200 SNPs 
  • Male-to-female: 6cM, minimum 600 SNPs
  • Female-to-female: 6cM, minimum 1,200 SNPs

FTDNA: Their formerly useful "Learning Center" sitemap (https://www.familytreedna.com/learn/sitemap/) was razed sometime in the past few days. One can only hope it's because FTDNA is correcting and expanding it, not abandoning it. That said, the previous information that I'd found shows a multiple-choice sort of criteria:

  • 1) 9cM with a minimum 500 SNPs for a single HIR, regardless of the total amount shared.
  • 2) If there is no segment of at least 9cM within 500 SNPs, the single-segment threshold is reduced to 7.69cM if there is a combined-segment total of at least 20cM, which total then includes very small HIRs between 1cM and 7cM (which kinda drives me nuts).
  • 3) Only for specific but undefined non-European populations, 5.5cM with a minimum 500 SNPs for the first HIR. This seems to be applicable to only 1% or less of the customer base.

Unlike 23andMe, no known error rate estimation is reported. Also note that their Family Finder matching routines exclude certain regions of certain chromosomes (specifically some small areas at telomeres and centromeres) and as a result there may not be a one-to-one correlation when looking at the data in GEDmatch.

AncestryDNA: Here we can see the least of what actually goes on under the hood. The simple statement is that the minimum threshold to be considered a match is 6cM, but they do count 3cM segments. Now, that said, the count is done after the application of their proprietary phasing algorithm called Underdog, and a second proprietary algorithm called Timber that uses genotyped modeling to attempt to filter out small regions of excess IBD sharing which are not useful for genealogy, i.e., not indicative of recent ancestry.

One outcome here is precisely what Derrick reported: you'll sometimes see a smaller total matching amount reported by AncestryDNA (which, alas, is all we can get from them) than GEDmatch will show shared in a single segment. The good news is that Ancestry is designed to computationally phase the data (as opposed to traditional trio phasing where the parents' data are known) and "condition" it via population genotyping before reporting matches, so the matches should theoretically be more accurate...less chaff in the wheat, in other words. But Ancestry provides us no segment-level detail, so we really can't tell what exactly is taking place or how the numbers directly compare to what we see in third-party reporting using the Ancestry raw data.

MyHeritage: Is almost as mysterious as Ancestry. From what I can tell, their threshold is 8cM for an single HIR, and then at least 6cM for each additional HIR segments. They do not report what minimum SNP threshold is in use. However, they also indicate (as of February this year) that they perform some type of computational phasing; but I've never found any detail about what the phasing entails or how it affects results. They also indicate that they use something that might be called "anti-Timber."  wink  Where Ancestry's Timber looks to filter out small regions of excess IBD sharing, MyHeritage seems to use an imputation algorithm that they refer to as a "stitching" process. It evidently looks for very small HIR segments that are very close to each other and may have missed the 6cM threshold only because of a few mismatched SNPs separating them. If the algorithm--using whatever proprietary criteria programmed--determines that the two very small segments really look like they should be a single segment, the matching routine will report it as such. This will also explain some differences you may see when running MyHeritage raw data through the unfiltered GEDmatch.

MyHeritage, similar to FTDNA, also makes some adjustments for certain populations. If the test-taker's ancestry is at least 50% Ashkenazi, 12cM is required for the first HIR.

GEDmatch: This is where I hope relatively newish WikiTreer Aaron Wells will chime in. Maybe we could offer kolaches and coffee to the room?

To my knowledge, GEDmatch does zero "pre-treatment" of any of the raw data. I believe it's what-you-see-is-what-you-get. Which is a great thing for us DNA nerds, but removes all of the let-us-help-you-with-that services from the testing companies. No Underdog; no Timber; no imputation; no genotyping; no phasing; no "stitching"; no exclusion of SNP-poor areas at centromeres and telomeres; no exclusion of pile-up regions of excess IBD sharing. It's up to the genealogist to understand how to evaluate the raw data. You can set your own search criteria at the centiMorgan, SNP count, and mismatch bunching limit levels, but there are no other filters or conditions applied (generally speaking; some utilities have less granular control--like the test for runs of homozygosity--but there's still no hidden math going on).

I've had reason the last couple of weeks to work almost exclusively with GSA chip results in Genesis, and I've come away believing that GEDmatch has decided not to apply any behind-the-scenes imputation in trying to arrive at better OmniExpress-to-GSA chip comparisons (since the SNP overlap is only 23% or less), but to report the unfiltered results while giving us clear indication of the number of SNPs compared and a sliding scale towards a red flag (well, literally a red background on a field) if we need to start being concerned that we may be working with too few in-common SNPs for the comparison at hand. I'd really like Aaron's input on that.

Because of the extended Genesis beta, I'd thought the direction would be an imputational model with best-guesswork employed by GEDmatch to get the two disparate chip types to compare. But if the end result will continue to be a stance of "just the facts" and not to mess with manipulating the data behind the scenes, I'm way good with that. But it means that, as it always should be, it's incumbent on us as genealogists to learn enough to make our own well-informed evaluations, analyses, and decisions, and to weigh what we consider to be an acceptable level of accuracy versus quickly adding an icon somewhere that indicates a degree of verification that might be totally false.

>> To my knowledge, GEDmatch does zero "pre-treatment" of any of the raw data. I believe it's what-you-see-is-what-you-get.

We do filter certain SNP's that don't contribute to genealogical matching, for example, SNP's that contain excessively rare alleles.

Our experience working with kits that obviously contained imputed and/or exomic data is that imputation does not help genealogical matching, in fact it contributes to many false matches.  We certainly don't do imputation ourselves, and we have no plans to.  

We believe that 7 cm's as a default is a reasonable one.  We believe that the SNP threshold can be used as low as 200 SNP's in most cases, depending upon the chromosome and the location on the chromosome.  In Genesis, we don't apply a fixed SNP threshold.  In the Genesis One-to-One, we allow the user to specify a hard SNP threshold, but the default is to use a dynamically determined SNP threshold (same as batch processing).

Thanks Edison and Aaron for your excellent responses. I thought I knew a lot of details but I've learned something new from Edison's response!

The 200 SNP's threshold is very interesting, Aaron. I think I've got shot at many times when I was telling people that I work with a lower SNP's threshold (I usually try to keep the ratio between genetic distance and SNP's constant to a 100x factor). To me, 5 cM and 500 SNP's is the minimum I check but if I have people that are either descend from a known MRCA or triangulate with others in the group but due to the fact that they match either only at the beginning or end I also go down to 1 cM segment whilst keeping the SNP's up (which only works well if there is a good overlap between the DTC's). So just that everyone understands the last part, there has to be a TG already and the density (meaning how well is everyone matching everyone else) is less than 100% then I go down with the cM or if I can visualize that someone is like a wrote matching either at the end or beginning.

I have an example of such a complicate matching from 23andMe here where I'd love to examine the raw data myself: https://www.facebook.com/yourDNAfamily/posts/445243965889595?__xts__%5B0%5D=68.ARA-qJI4D5y994pCLrT90dSYaBFJNCtdlD8HSoKOUaJpANwe7_2pD7TodeotYoR-PuLx0m8ymKohwYsq0DsaThua5VHGoXAt-6x2b71GsLwAbIWF_XppMJkdF7rMtsHXsQHZhUW3Ah1msZXKkvQo50pzwtMy8dhiEiSHTLwasQXv5bU-C1w3&__tn__=-R
So, to echo Mags question above, should the threshold for triangulation groups be 7, 10, 12 or something else? Is any particular threshold arbitrary or well-founded? And should we just accept the 12 cM standard and get to work?
Mags,

I use Gedmatch standard values when doing triangulations. It seems pure and adds the various companies as needed, however they are calculated. Plus I love the pictures of intersection!

Sherrie

At the risk of exposing my own ignorance, could someone explain to me why the threshold for triangulation needs be any different to the threshold for an ordinary match?

If it shouldn't be (and I fail to see any logic that says it should - probably just me being dumb) then to follow the industry "standard" the threshold for both an ordinary 1:1 match and for a triangulation should be 6 (oops, you can't do triangulation at ancestry angry), 7,8 or maybe 5 or 6 if there are other matching segments present.

Seven-ish sounds like a good number to me.

Is there a study that has shown the likelihood (%) of matches (shared by auDNA testers) sharing between about 6 to about 13 cM on a segment that are IBS vs IBD?  It is my understanding that a 12.5 cM segment has close to a 100% likelihood of being IBD.  A 7 cM segment has more than a 50% likelihood of being IBD.

What changes in the percentages are there if 3 or more testers share the same segment?

Hey Peter,

I have asked this in the ISOGG FB group as a poll and am getting a lot of good answers. I'll give it some time and I'll add the tallys to this post. Edison- one answer even included the "Industry Standard" comment - thought you would enjoy that.

Here is the poll question - "We don't have an Industry Standard (we don't have an Industry standard producing commission, board or governing body - yet) in Genetic Genealogy, but what is your absolute minimum cM threshold for confirming/determining/defining a (non-endogamous) match by a segment? Please post your absolute min. cM threshold in the comments."

And since Blaine was the first to join in, I asked him this, "
Without recreating a post elsewhere Blaine, could you draw any conclusions on this from your Shared cM Project? Did you have a minimum limit?"

From Blaine Bettinger - "No, not from the Shared cM Project. However, there's a lot of research in this area, namely the 23andMe Paper from 2014 looking at segment size and IBD. There are also the studies by many different genealogists comparing lists of child matches to parent matches, and finding the sizes of false segments."

Links from Debbie Kennett: 

ISOGG, Speed and Balding IBD Distribution and Steve on Genetics, Genetic Genealogy and the Single Segment 

Mags

Since the ISOGG Facebook Group is a closed group (anyone can ask for admission, but you can't see what's posted without being a member), and because I believe my reply there is germane to this discussion, here's what I wrote:


I believe the poll question, as phrased, may be a little bit confusing. We may be trying to draw a correlation between apples and oranges.

Minimum single-segment validity is a biological matter: at what threshold can we have reasonable belief that a segment, as calculated in centiMorgans, is IBD. And let's not forget that cM is an estimation only, and that what we see from the reporting companies are sex-averaged values: the female and male genome maps differ considerably in terms of computed centiMorgans due to average crossovers at gametogenesis, and the smaller the sex-averaged value, the more likely--generally speaking--that a differentiation in the male/female inheritance chain might zero-out a presumed segment size. These are the thresholds the testing companies set in their algorithms (pre- and/or post- genotyping, computational phasing, imputation, etc.). Also important is whether or not you have the luxury of working with trio-phased data: if both your parents are tested, your trust in certain small segments may slide a notch lower on the scale than would mine.

But these minimum thresholds, in and of themselves, have no relevance in assignment to a hypothetical MRCA in someone's genealogy. The latter is a genealogical matter, not a biological one. An IBD segment could be the result of population- or haplotype-level pile-up regions, or one resulting from pedigree collapse in the tree (in which case, as within truly endogamous populations, assignment to a distant MRCA may become extremely difficult if not impossible), or one that has simply proven ancestrally persistent and may originate many generations earlier than believed.

There's no hard-and-fast rule here: if each individual segment has a reasonable probability of being IBD, matching and genealogical assignment then becomes case-by-case situational. If I find someone with whom I share three 7cM segments, I'm going to be more interested than in another match with whom I share a single 10cM segment. If I've built up a solid triangulation group for a particular segment that has many contributors (Jim Bartlett averages over 20 segments mapped per TG, as an example), I'll be more inclined to include a smaller segment because, as Jim points out, if you've already got 20 segments of various lengths indicating the validity of the segment and its inheritance linkage, including a 7cM segment that might ultimately prove false isn't going to tip the scales regarding the validity of that TG.

Thanks very much for adding your post from the ISOGG FB group Edison. It's a general question (on purpose) and it is eliciting lots of great responses - including yours.

Mags

+12 votes
If we have known 1st or 2nd cousins, I think we should not exclude them from a triangulation when we are trying to confirm a distant cousin(s). I can understand that including a parent/child or sibling match in a triangulation WITH ONLY THREE PEOPLE would be not valid and just a strong match because parent/clhild and full sibling matches are legal proof of relationship. But, grandparent/grandchild, aunt/uncle/neice/nephew matches (and beyond) are not legal proof. IT'S LUCK OF THE DRAW!

So, it looks like the closeness (or distance) of the relationship in the matches is being used to determine if a triangulation is valid or not.

First and second cousins are more distantly related than parent/child or siblings so, first and second cousins matches in a triangulation are closer to being a valid triangulation and, third cousin matches in a triangulation are even closer to being valid.

I think it adds to our confidence when confirming distant cousins if we have more distant cousin matches included with 1st, 2nd or 3rd cousin matches. The more distant the cousin we are trying to confirm, the smaller the cM value is likely to be.

So, I think for valid triangulations, just keep it simple and set a 7 cM limit, allow first cousin and greater matches but require at least one 3rd cousin match or greater and, of course, your dna confirmation must comply with the relationship finder.
answered by James Dukes G2G Crew (580 points)
edited by James Dukes

James Dukes, I totally agree with your this in your statement.

"I think it adds to our confidence when confirming distant cousins if we have more distant cousin matches included with 1st, 2nd or 3rd cousin matches. The more distant the cousin we are trying to confirm, the smaller the cM value is likely to be."

The farther back we get and yet come up with shared bits of DNA, the better the line is confirmed.

AS mentioned in a previous comment, some of us who are working on our older lines and are at 5th-6th cousin realm are very excited to find tiny bits of overlapping DNA!

I almost agree with you, but not quite. Including 1st or 2nd cousin matches depends entirely on how far away they are from the top of the triangle you are trying to establish.

The analogy I use in teaching this concept is a traditional three legged stool.

If all three legs are the same length, then the stool is stable no matter how long the legs (number of generations) are.

If the legs are different lengths then the stool is less stable and more likely to fall over.

If you've only got two legs, but one of them splits in two near the ground (1st cousins) then the stool is decidedly wobbly - unless they are very short legs.
I do agree with you that a stool with three uneven legs will be wobbly and likely fall over.

I am working with a group of people who have pooled their dna matches so we can work together to find our common ancestors. We do have many three legged stools in our findings but we also have many stools with more than three legs.

Some of these stools with more than three legs might have parent/child or aunt/uncle/niece/nephew matches along with 4th cousin or greater matches.

I am presently looking at a five legged stool that involves 1st, 2nd, 3rd and 4th cousin once removed matches.  These five legs (people) all match each other on the same segment for about 32 million base pairs. This causes me to think we have found a 4th cousin once removed and a common ancestor because our tree and speculative mix of source information do not argue against it.

But, if I remove the 1st, 2nd and 3rd cousin legs of the stool, the stool will surely fall over.
+10 votes

I have a set of DNA matches involving a third cousin once removed that have made me think about similar issues. DNA match data (from 23andMe) are available for:

  • G, the 3rd cousin 1R
  • E (me)
  • N (my sibling)
  • T (our first cousin, and also G's 3C 1R)

Summary of the data:

  • G and E match on 75 cM over 7 segments at 5, 6, 9, 12, 13, 14, and 16 cM.
  • G and N match on 38 cM over 4 segments at 6, 9, 11, and 12 cM.
  • If we treat E and N as one person (EN), G and EN match on 84 cM over 8 segments at 5, 6, 9, 9, 12, 13, 14, and 16 cM.
  • T and G match on one segment on 16 cM.
  • The segment where T and G match has a very short 3-way-matching overlap (less than 6 cM) with the 16 cM segment where G and E match -- way too short to triangulate.

So I feel like my excellent DNA match with G, supplemented by my sibling's match, supported by our first cousin's match with G, ought to provide DNA confirmation for ours and G's relationship with the common ancestors  we share. How much data like ours should it take to be called "confirmation"?

answered by Ellen Smith G2G6 Pilot (852k points)

The whole problem with this is the very idea that DNA matches such as yours "confirms" the relationship.

It simply cannot "confirm" it

It is however completely, absolutely and totally consistent with the relationships you describe.

I would really like for us to drop the label of DNA Confirmed and have one that says DNA Consistent instead.

If two brothers tested and they matched as they should, then the DNA match confirms that they are brothers. It does not confirm that the person they believe to be their father was actually their father - but it is consistent with that belief.

If G was my third cousin or my 2nd cousin once removed, my 78-cM match with G would be treated as "confirmed with DNA." (It's basically the expected match for a third cousin.) But because G is only a 3C 1R, I'm supposed to triangulate the match in order to treat it as confirmation.

Also, T's 16-cM matching segment with G is kind of low for a 3C 1R (but remember that some third cousins have no match at all), but (1) the fact that T and G do have a significant match and (2) the short 3-way match is in an area where our matching segments overlap (and one boundary of that segment is the address where my matching segments with T and G both terminate). I think this additional matching information confirms (it doesn't conclusively prove this, but there's no such thing as conclusive proof in this area) that the relationship is real and is on the ancestral line it's mapped on. Also, the distribution of cM values from the three sets of matches (E to G, N to G, and T to G) narrows the range of statistically probable relationship distances (indicated by https://dnapainter.com/tools/sharedcmv4)  to 3C to 5C (or equivalents).

Couldn't agree more, I've had the self same issue in some of my work. The DNA matches are completely consistent with the relationships you describe.

If we got rid of this silly notion that autosomal DNA proves (confirms) relationships, and instead used the notion of consistency - then the whole issue goes away. We could then have a set of more rational guidelines than those we have now.
Hello Derrick,

In WikiTree confirm means to make more firm.   That is one of the standard meanings of that word.  The known relationship is made more firm because the DNA match is consistent with that relationship.

WikiTree does not use the word prove, and one should not believe confirm can only mean prove.

Most sincerely, Peter
Two peoples divided by a common language?

Webster: to make firm or firmer

OED: establish the truth or correctness of

If, in wikitree, it is meant to mean "is consistent with" then why not just say that?

Unfortunately, a lot of people think that a DNA match is proof of a relationship, and using terminology such as "confirmed by" tends to reinforce that view even if it is not meant to.
I've confirmed and reconfirmed airline flight reservations and my flight did not take off.  There are people with confirmed flight reservations who have been pulled off flights.   Most people understand words can have more than one meaning.  

If proved was the intended meaning then why not say proved?  There are very rare instances where DNA can prove a relationship.  If a child takes an autosomal DNA test and their parent takes an autosomal test then the expected results are about as close to proof as you will get.  However their parent could also have an identical twin who is the real parent ;-)
At the risk of seeming a little tedious and pedantic, why not use a term that is unambiguous and does what it says on the tin?

As you've ably demonstrated, confirm can mean different things to different people (it obviously meant something different to you and the airline).

Much easier IMHO to use words that mean the same to everyone.
And what is your word Derrick?

Mags
Mags

I think that "consistent" means the same to everyone, as also does "supported".

Both are appropriate terms for what is actually true about auDNA matches- they can support the tree as drawn or be consistent with it, what they can't do is confirm it as in "establish the truth of".
Thanks very much Derrick. this is a great suggestion,

Mags
+7 votes
I vote to allow 3rd cousin and beyond to be included in DNA data findings for family groups.
answered by P. Kelly G2G Crew (420 points)
+8 votes
I am not a leader but am a member of the DNA project and have done successful triangulations for one of my lines using the standard. However,  I would like to point out that DISTANT cousins (like 5th-6th) with a proved paper trail are very excited to be able to reduce comparative values at Gedmatch and come up with overlaps on chromosomes. Tiny bits of shared DNA and we know that doesn't fit for a statement but we are thrilled to see it. As the database grows, I hope this will be a consideration for the future.
answered by Sherrie Mitchell G2G6 (9.8k points)
Excellent point, but it begs the question of just how far do you go?

Case in point from my Kinman one-name study. I have two donors, both with well researched trees that connect at only one point (to the best of our knowledge). They match each other 8.3 cM 1189 SNPs in a non-pile up area, predicted generations to MRCA by gedmatch 7.1.

This would seem to fall just about inside what you are talking about.

Problem is, the intersection on the tree is 12 generations back.

So is this a real match with someone born in 1560, or is it an artefact of the experimental technique, or an indication of a more recent intersect that we don't know about, or...?
Hi Derrick,

Appreciate your point, especially because these way back lines do often involve endogamy in the time and place, but they are not 12 gens back for us, generally we are 5th-7th cousins.  So, though we know it's not considered proper proof we are sharing and encouraging others to test. Some of us capture bits of DNA in common and that is very exciting to think of this tiny bit of DNA we share and inherited from these known ancestors. And depending on who shares what specific chromosome, we may be able to see and map how the DNA transfer flowed.

Sherrie
+2 votes
I’d love to have a conversation about DNA confirmation too. I saw a post a while back about shaping the DNA part of WikiTree (can’t find it now) and would love to be involved.
answered by Janettee McCrary G2G6 Mach 1 (10.5k points)
Thanks Janettee,

This is the conversation on minimum segment size, what are your thoughts?

It would be great to have your voice added to shaping DNA confirmation and other concepts and features on WikiTree.

Mags
Sorry for the delayed response, I was offline after responding. Here's my thoughts on the minimum segment size...

My father was adopted so I had NO information to start with, until DNA. So once I got my kit, I found myself overwhelmed at first, until I figured out how things work.

So of course, to confirm close relatives, you need a bigger cM count, but when you are going further back to confirm lines, the amounts get smaller and smaller. Personally, I've been able to confirm my many-times-great-grandparents and cousins with cM counts of as low as 6 cM. I think a cM count of 5-6 cM's for relatives further back (ALONGSIDE matches with higher ranges) is not unreasonable.
+9 votes
Just to add my two cents!

First, just the disclaimer that my understanding of "confirm" here is to add evidence to, NOT to "prove" (a dirty 5-letter word).

Second disclaimer, just to be sure we are on the same page, there's of course no *requirement* for segment triangulation when using DNA evidence. There are plenty of other ways to utilize DNA evidence, even without segment data, and even for distant relationships. It's excellent when we CAN add segment triangulation, but it isn't and cannot be a requirement.

To the important stuff:

1. I'm hesitant to establish a "standard" segment size for triangulation, as it should be science that defines these minimums, not consensus (unless it is consensus based on science!). Any discussion of minimum segment size has to include the available information about the probability that a segment of size X is IBD vs. non-IBD. There's some peer-reviewed  evidence out there, namely a 2014 paper from 23andMe as well as the oft-repeated analysis by genealogists of the lack of overlap between child & parent matches (and its relationship to segment size). A minimum segment size would have to account for this.

For example, if I write a paper using DNA evidence to support a genealogical conclusion that John and Jane Doe are my 3rd-great-grandparents, and that DNA evidence is based on a 5 cM triangulated segment, I have a serious problem there. I may have poisoned my conclusion with a non-IBD segment.

Unfortunately, there is NO evidence that triangulating a segment of X cM increases the likelihood that the segment is IBD. It makes perfect sense that it should, and I'm guessing that it will, but of course science doesn't work based on 'perfect sense' and 'guessing.'

And, of course, finding a genealogical connection with someone does not increase the likelihood that a segment of X cM is more likely to be IBD. That's confirmation bias, again without any scientific basis.

2. Second, I believe that any standard for adding DNA evidence to a genealogical conclusion must account for the case-by-case nature of genealogical research. The amount of DNA evidence used in a conclusion will typically be dependent upon the amount of documentary evidence used in the conclusion. Although a case-by-case basis, the more documentary evidence there is, the less DNA evidence that may be necessary, while the less documentary evidence there is, the more DNA evidence that may be necessary.

Blaine
answered by Blaine Bettinger G2G1 (1.7k points)
Thanks very much Blaine, always a pleasure to have you stop by and give your two cents to any discussions about our great big ole shared tree!

Mags
+2 votes

Some thoughts:

  • This is long, maybe even too crazy for some, so you're excused for not reading it
  • A triangulation as specified by WikiTree is a special case of matching, designed to create a sufficient level of confidence in proposed ancestral lines to confirm their relationships, but limited to specific cases where 3 independent testers match through 3 separate lineages of equal length back to one common person or couple.  This triangulation is required by WikiTree when the relationship distance between 2 testers is too great to provide a segment length essentially guaranteed to be IBD.  Because genealogy is so often messy, it's difficult to apply this simplistic set of triangulation rules to real world situations, so there is constant controversy over the many situations that don't fit, yet 'feel' just as provable.
  • I'd like to suggest we back up and look at the broader problem of what we are trying to achieve.  Currently, many people assume that matching is all about segment lengths - find a segment of an arbitrary length and you match, confirming the relationship.  But length is not the fundamental thing here, the statistics are.  What we want is a sufficiently high statistical probability, perhaps an arbitrarily determined number such as 99%, that provides sufficient confidence that we can say 'they match'.  Looking at it this way, we can see that length is an important factor, because it's proportional to the probabilities, but it's not the only factor!  Any other factor that raises the probability of the certainty of the match is also useful.  Why do we need a probability at all?  Because some segments are good and valid (IBD), and others are not, matching only by chance (IBC).  So any method that determines validity of a segment, irregardless of its length, is valid, useful.
  • Phasing clearly identifies IBD segments, so any segment that has been 'proven' by phasing is a 100% match, therefore length rules no longer apply.  However, both testers would need to phase the segment at their end.
  • Smaller segments that are part of a proven longer segment should be valid, and not require a specific length.  For example, a great great passes a 15 cM down, and the great passes it down, then the grand, but the parent only passes a third of it.  If you have some way to prove the segment to the greats and grands, then you know the segment is IBD, and any part thereof is also IBD.  But see the following for one complication ...
  • Those very big segments you've been working with for matching ... because all big segments are collections of small segments, that big segment may actually be a set of both IBD segments and one or more IBC segments in the middle of them.
  • Another way to validate a segment, especially for a smaller one, is to find repeats of it, because while the chance a small segment is valid by itself is low, finding it repeated in multiple people increases its probability of IBD.  My rule for this is - it must occur in a significant number of people in a specific lineage, *and* it must not occur in *any other* lineage.  The shorter the segment the lower it's probability of validity, therefore the shorter the segment the more occurrences of them you need to find.  What the minimum counts should be requires a good scientifically designed study, and I don't know what good numbers for it should be.  A starting number might be 2 testers plus an additional tester for each cM below 10 cM (e.g. 3 for 9cM, 5 for 7cM, 7 for 5cM, 11 for 1cM).  The point is, finding the same segment in multiple people down the same lineage (and only the same lineage) is sufficiently improbable that it adds to the statistical probability that the segment is valid.  Finding enough of them makes it an IBD segment.
  • While these suggestions may sound too new for acceptance, I believe to some extent, most have thought there must be some additional validity to certain segments they were finding.  I believe in Ellen Smith's example above, she was intuiting that those short legs may not meet the rules, but surely carry some weight toward an increased level of matching confidence.  And I believe she's right, shorter triangulation legs do carry added weight, not as much as a full length leg, but they are certainly worth something, even if it's harder to quantify.  If nothing else, they're additional repeats, which increases their probability of being IBD.
  • The quantification of these additive and subtractive factors is clearly a problem.  (By subtractive factors, I mean things such as determining degrees of common ancestry or endogamy, pile-ups, etc.)  I know some have already complained about the complications of the current Triangulation requirements, but unfortunately that's a simple case compared to the full problem.  A solution to determining a probability from a given set of relationships and testers will consist of a large equation with multiple terms, some additive and some subtractive, plus the correct coefficients to apply to each, based on relationships and segment lengths, and tables such as Blaine's Shared CM Project, numbers modified by his histograms (extrapolating within the ranges per box).  I'd love to see some mathematicians among us determining how to define and combine the various terms.
  • When I think practically how this could be implemented, I struggle, because it's going to be hard, and probably requires more resources than are available.  It requires a mathematician, considerable development effort, some integration with GEDmatch, and would benefit from Blaine's involvement.  So the whole idea seems very unlikely to happen, but since we're brainstorming here, I couldn't help thinking that the ideal project would involve the WikiTree staff getting together with the GEDmatch staff, plus perhaps Blaine as science adviser.  Who else but WikiTree has all of the relationships available and the GEDmatch ID's!  GEDmatch developers have much of the expertise, and combined with the WikiTree developers, could create an absolutely groundbreaking tool that analyses sections of the WikiTree, and automatically determines DNA confirmations!  It could mark as 'DNA Confirmed' any close or distant relationship it could determine a 99% probability for.
  • A huge side benefit is that all user determined 'DNA Confirmations' could be turned off, removing all user controversy.  They would be determined strictly under WikiTree control.
  • One last problem - DNA is only involved with matching testers, and has absolutely nothing to do with the ancestral trails between them.  All DNA can do for distant relationships is calculate the probabilities for that relationship given a specific set of ancestors between them.There are multiple trails for every distant relationship, and DNA cannot prove which is right, so it is still imperative that the paper trail is as correct as possible.  For example, a trail could go through a certain father.  DNA cannot prove it goes through him any more than through his brother.  If the calculation indicates a 100% validity of a match through that father, then it will also indicate a 100% validity of a match through that father's brother.  In my view, relationships being marked 'DNA Confirmed' should require birth documents for every person on that relationship trail, that indicate the parent/child relationships.
answered by Rob Jacobson G2G6 Mach 7 (73.4k points)

Related questions

+2 votes
3 answers
+24 votes
10 answers
+7 votes
1 answer
+11 votes
1 answer
158 views asked Oct 17, 2017 in Genealogy Help by Robin Lee G2G6 Pilot (446k points)
+3 votes
2 answers
+11 votes
1 answer
+19 votes
10 answers

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...