Can we hold a meeting with the leaders of the DNA Project?

Question

Can we hold a meeting with the leaders of the DNA Project?

949 views

I have received several inquiries about the requirements to DNA confirm ancestors. I understand the importance of this, and I also understand the difference between a 3rd cousin (and closer) and beyond a 3rd cousin.

My questions are these: Is there a cM value we are looking for as of October 2018 to use in triangulation groups? What is this number and how did we come up with it? Can we use two third cousins and one 4th cousin in a TG? If not, why? A single segment should all lead to common ancestor(s) between the three cousins.

I believe a 3rd cousin can flip back and forth between a statement that requires only one other person who is 3rd (full) or closer and then a 3rd cousin (again full) with another 3rd and then their 4th. If no DNA is shared between 3rd cousins (happens 10% of the time) then that would prevent any statement between the two. However, if a single segment exist that matches all three cousins, then to me this defines a triangulation group (TG).

I don't feel a TG is necessary for cousins closer than 3rd, but essential for beyond. My vote is to set the TG threshold at 10 cM and then for every additional cousin in the TG beyond 3 people, we lower by 1 cM, no lower than 7 cM. So 3 people = 10 cM, 4 = 9 cM, 5 = 8 cM, and then 6+ = 7 cM.

With six people who triangulate you could have two separate groups of 3rd cousins and then each group could be 4th cousins to each other. How much more evidence does one need than a single segment of 7 cM between 6 distantly related people?

I welcome all opinions and thoughts on this matter. I'm looking forward to a healthy discussion and hopefully within a few weeks, we can all come to an agreement on what steps we need to take in confirming lines of DNA. I don't have all the answers, but hopefully after this discussion, we will all be enlightened.

Genetic Genealogist
Christopher "Topher Blake" Sims
Facebook Topher Blake

asked Oct 11, 2018 in WikiTree Tech by Topher Sims G2G6 Mach 1 (12.5k points)

8 Answers

Andreas West · Answer 1 · 2018-10-11T21:15:42+0000

Hey Topher,

It would be great to have your input into how we frame our DNA features on WikiTree.

From our DNA Help Pages:

Triangulation

If an autosomal DNA test match is beyond a third cousin, confirmation gets complicated. You need at least one more match and all three of you need to match on the exact same segment of DNA.

We want to make sure that we are working with three separate legs to a triangulated MCRA.

The general, acceptable, talked about, used minimum threshold is 7cM. However the information on the auDNA triangulation Page states: "At least three distant cousins (each one related to each of the others more distantly than third cousin) must share the same segment of DNA that is at least 12 cM."

Which segment Threshold is correct? It depends on who you ask - for deeply endogamous groups it should be 20cM or more...

Be great to have some discussions on this to see how the community lines up.

Do you work with a threshold of 7cM?

Do you work with a threshold of 12cM?

What threshold do you use?

Answers from this post and the ISOGG poll (wish we could do polls here!):

7 to 10cM	68 votes
10 to 12cM	15 Votes
5to 7cM	13 votes
15 to 20cM	10 Votes
20cM +	9 Votes

Mags

answered Oct 11, 2018 by Mags Gaulden G2G6 Pilot (642k points)
edited Oct 16, 2018 by Mags Gaulden

Thanks for that Andreas, I've never used 23andMe and wondered what their hurdle rate was.

Small addition to your comment about FTDNA. They seem to use a very weird double standard. You don't match with someone else unless you match at least 7cM, but then they add in all those teeny tiny matches down to about 1cM and report that you match 20cM or so (7 plus lots of bits). Either those little matches mean something or they don't - you can't have it both ways.

Ancestry goes down to 6 cM and does not include the irrelevant snippets.

There is also a discrepancy between the rubber bands being used to measure cM between the some of the providers.

Example: a match between two of my project members is reported as 6.1 cM at ancestry but 8.3 cM at gedmatch - this may well vary from chromosome to chromosome, I haven't explored it thoroughly.

ftDNA seem to use almost the same algorithm as gedmatch, at least on the ones I've checked.

Industry standard ??????

commented Oct 14, 2018 by Derrick Watson G2G6 Mach 4 (48.9k points)

I told myself I would sit this one out, but...

I wholeheartedly agree with Andreas and Derrick that the term "industry standard" has no place in this landscape, and is a misconception when applied to genetic genealogy in general. I'm going to try to convince myself not to write more it. Meanwhile, a synopsis of how some of the testing companies state they perform matching, if they state it at all.

23andMe: 7cM with a minimum 700 SNPs for the first half-identical region (HIR); 5cM and a minimum 700 SNPs for each additional HIR. The error rate allowance seems to be pegged at roughly 1%: 1 opposite homozygote per 300 SNPs, and each opposite homozygote in an HIR must be separated by roughly 300 SNPs. For fully-identical regions (FIR), the threshold is 5cM and 500 SNPs.

Note also that 23andMe adds X-chromosome segment matching into the autosomal total count, and no other company does that. They do not presume a match if only the X-chromosome and none of the autosomes match, but if there is an autosomal match the criteria are:

Male-to-male: 1cM, minimum 200 SNPs
Male-to-female: 6cM, minimum 600 SNPs
Female-to-female: 6cM, minimum 1,200 SNPs

FTDNA: Their formerly useful "Learning Center" sitemap (https://www.familytreedna.com/learn/sitemap/) was razed sometime in the past few days. One can only hope it's because FTDNA is correcting and expanding it, not abandoning it. That said, the previous information that I'd found shows a multiple-choice sort of criteria:

1) 9cM with a minimum 500 SNPs for a single HIR, regardless of the total amount shared.
2) If there is no segment of at least 9cM within 500 SNPs, the single-segment threshold is reduced to 7.69cM if there is a combined-segment total of at least 20cM, which total then includes very small HIRs between 1cM and 7cM (which kinda drives me nuts).
3) Only for specific but undefined non-European populations, 5.5cM with a minimum 500 SNPs for the first HIR. This seems to be applicable to only 1% or less of the customer base.

Unlike 23andMe, no known error rate estimation is reported. Also note that their Family Finder matching routines exclude certain regions of certain chromosomes (specifically some small areas at telomeres and centromeres) and as a result there may not be a one-to-one correlation when looking at the data in GEDmatch.

AncestryDNA: Here we can see the least of what actually goes on under the hood. The simple statement is that the minimum threshold to be considered a match is 6cM, but they do count 3cM segments. Now, that said, the count is done after the application of their proprietary phasing algorithm called Underdog, and a second proprietary algorithm called Timber that uses genotyped modeling to attempt to filter out small regions of excess IBD sharing which are not useful for genealogy, i.e., not indicative of recent ancestry.

One outcome here is precisely what Derrick reported: you'll sometimes see a smaller total matching amount reported by AncestryDNA (which, alas, is all we can get from them) than GEDmatch will show shared in a single segment. The good news is that Ancestry is designed to computationally phase the data (as opposed to traditional trio phasing where the parents' data are known) and "condition" it via population genotyping before reporting matches, so the matches should theoretically be more accurate...less chaff in the wheat, in other words. But Ancestry provides us no segment-level detail, so we really can't tell what exactly is taking place or how the numbers directly compare to what we see in third-party reporting using the Ancestry raw data.

MyHeritage: Is almost as mysterious as Ancestry. From what I can tell, their threshold is 8cM for an single HIR, and then at least 6cM for each additional HIR segments. They do not report what minimum SNP threshold is in use. However, they also indicate (as of February this year) that they perform some type of computational phasing; but I've never found any detail about what the phasing entails or how it affects results. They also indicate that they use something that might be called "anti-Timber." Where Ancestry's Timber looks to filter out small regions of excess IBD sharing, MyHeritage seems to use an imputation algorithm that they refer to as a "stitching" process. It evidently looks for very small HIR segments that are very close to each other and may have missed the 6cM threshold only because of a few mismatched SNPs separating them. If the algorithm--using whatever proprietary criteria programmed--determines that the two very small segments really look like they should be a single segment, the matching routine will report it as such. This will also explain some differences you may see when running MyHeritage raw data through the unfiltered GEDmatch.

MyHeritage, similar to FTDNA, also makes some adjustments for certain populations. If the test-taker's ancestry is at least 50% Ashkenazi, 12cM is required for the first HIR.

GEDmatch: This is where I hope relatively newish WikiTreer Aaron Wells will chime in. Maybe we could offer kolaches and coffee to the room?

To my knowledge, GEDmatch does zero "pre-treatment" of any of the raw data. I believe it's what-you-see-is-what-you-get. Which is a great thing for us DNA nerds, but removes all of the let-us-help-you-with-that services from the testing companies. No Underdog; no Timber; no imputation; no genotyping; no phasing; no "stitching"; no exclusion of SNP-poor areas at centromeres and telomeres; no exclusion of pile-up regions of excess IBD sharing. It's up to the genealogist to understand how to evaluate the raw data. You can set your own search criteria at the centiMorgan, SNP count, and mismatch bunching limit levels, but there are no other filters or conditions applied (generally speaking; some utilities have less granular control--like the test for runs of homozygosity--but there's still no hidden math going on).

I've had reason the last couple of weeks to work almost exclusively with GSA chip results in Genesis, and I've come away believing that GEDmatch has decided not to apply any behind-the-scenes imputation in trying to arrive at better OmniExpress-to-GSA chip comparisons (since the SNP overlap is only 23% or less), but to report the unfiltered results while giving us clear indication of the number of SNPs compared and a sliding scale towards a red flag (well, literally a red background on a field) if we need to start being concerned that we may be working with too few in-common SNPs for the comparison at hand. I'd really like Aaron's input on that.

Because of the extended Genesis beta, I'd thought the direction would be an imputational model with best-guesswork employed by GEDmatch to get the two disparate chip types to compare. But if the end result will continue to be a stance of "just the facts" and not to mess with manipulating the data behind the scenes, I'm way good with that. But it means that, as it always should be, it's incumbent on us as genealogists to learn enough to make our own well-informed evaluations, analyses, and decisions, and to weigh what we consider to be an acceptable level of accuracy versus quickly adding an icon somewhere that indicates a degree of verification that might be totally false.

commented Oct 14, 2018 by Edison Williams G2G6 Pilot (441k points)

>> To my knowledge, GEDmatch does zero "pre-treatment" of any of the raw data. I believe it's what-you-see-is-what-you-get.

We do filter certain SNP's that don't contribute to genealogical matching, for example, SNP's that contain excessively rare alleles.

Our experience working with kits that obviously contained imputed and/or exomic data is that imputation does not help genealogical matching, in fact it contributes to many false matches. We certainly don't do imputation ourselves, and we have no plans to.

We believe that 7 cm's as a default is a reasonable one. We believe that the SNP threshold can be used as low as 200 SNP's in most cases, depending upon the chromosome and the location on the chromosome. In Genesis, we don't apply a fixed SNP threshold. In the Genesis One-to-One, we allow the user to specify a hard SNP threshold, but the default is to use a dynamically determined SNP threshold (same as batch processing).

commented Oct 14, 2018 by Aaron Wells G2G Rookie (100 points)

Thanks Edison and Aaron for your excellent responses. I thought I knew a lot of details but I've learned something new from Edison's response!

The 200 SNP's threshold is very interesting, Aaron. I think I've got shot at many times when I was telling people that I work with a lower SNP's threshold (I usually try to keep the ratio between genetic distance and SNP's constant to a 100x factor). To me, 5 cM and 500 SNP's is the minimum I check but if I have people that are either descend from a known MRCA or triangulate with others in the group but due to the fact that they match either only at the beginning or end I also go down to 1 cM segment whilst keeping the SNP's up (which only works well if there is a good overlap between the DTC's). So just that everyone understands the last part, there has to be a TG already and the density (meaning how well is everyone matching everyone else) is less than 100% then I go down with the cM or if I can visualize that someone is like a wrote matching either at the end or beginning.

I have an example of such a complicate matching from 23andMe here where I'd love to examine the raw data myself: https://www.facebook.com/yourDNAfamily/posts/445243965889595?__xts__%5B0%5D=68.ARA-qJI4D5y994pCLrT90dSYaBFJNCtdlD8HSoKOUaJpANwe7_2pD7TodeotYoR-PuLx0m8ymKohwYsq0DsaThua5VHGoXAt-6x2b71GsLwAbIWF_XppMJkdF7rMtsHXsQHZhUW3Ah1msZXKkvQo50pzwtMy8dhiEiSHTLwasQXv5bU-C1w3&__tn__=-R

commented Oct 14, 2018 by Andreas West G2G6 Mach 7 (75.9k points)

Hey Peter,

I have asked this in the ISOGG FB group as a poll and am getting a lot of good answers. I'll give it some time and I'll add the tallys to this post. Edison- one answer even included the "Industry Standard" comment - thought you would enjoy that.

Here is the poll question - "We don't have an Industry Standard (we don't have an Industry standard producing commission, board or governing body - yet) in Genetic Genealogy, but what is your absolute minimum cM threshold for confirming/determining/defining a (non-endogamous) match by a segment? Please post your absolute min. cM threshold in the comments."

And since Blaine was the first to join in, I asked him this, "Without recreating a post elsewhere Blaine, could you draw any conclusions on this from your Shared cM Project? Did you have a minimum limit?"

From Blaine Bettinger - "No, not from the Shared cM Project. However, there's a lot of research in this area, namely the 23andMe Paper from 2014 looking at segment size and IBD. There are also the studies by many different genealogists comparing lists of child matches to parent matches, and finding the sizes of false segments."

Links from Debbie Kennett:

ISOGG, Speed and Balding IBD Distribution and Steve on Genetics, Genetic Genealogy and the Single Segment

Mags

commented Oct 15, 2018 by Mags Gaulden G2G6 Pilot (642k points)
edited Oct 15, 2018 by Mags Gaulden

Since the ISOGG Facebook Group is a closed group (anyone can ask for admission, but you can't see what's posted without being a member), and because I believe my reply there is germane to this discussion, here's what I wrote:

I believe the poll question, as phrased, may be a little bit confusing. We may be trying to draw a correlation between apples and oranges.

Minimum single-segment validity is a biological matter: at what threshold can we have reasonable belief that a segment, as calculated in centiMorgans, is IBD. And let's not forget that cM is an estimation only, and that what we see from the reporting companies are sex-averaged values: the female and male genome maps differ considerably in terms of computed centiMorgans due to average crossovers at gametogenesis, and the smaller the sex-averaged value, the more likely--generally speaking--that a differentiation in the male/female inheritance chain might zero-out a presumed segment size. These are the thresholds the testing companies set in their algorithms (pre- and/or post- genotyping, computational phasing, imputation, etc.). Also important is whether or not you have the luxury of working with trio-phased data: if both your parents are tested, your trust in certain small segments may slide a notch lower on the scale than would mine.

But these minimum thresholds, in and of themselves, have no relevance in assignment to a hypothetical MRCA in someone's genealogy. The latter is a genealogical matter, not a biological one. An IBD segment could be the result of population- or haplotype-level pile-up regions, or one resulting from pedigree collapse in the tree (in which case, as within truly endogamous populations, assignment to a distant MRCA may become extremely difficult if not impossible), or one that has simply proven ancestrally persistent and may originate many generations earlier than believed.

There's no hard-and-fast rule here: if each individual segment has a reasonable probability of being IBD, matching and genealogical assignment then becomes case-by-case situational. If I find someone with whom I share three 7cM segments, I'm going to be more interested than in another match with whom I share a single 10cM segment. If I've built up a solid triangulation group for a particular segment that has many contributors (Jim Bartlett averages over 20 segments mapped per TG, as an example), I'll be more inclined to include a smaller segment because, as Jim points out, if you've already got 20 segments of various lengths indicating the validity of the segment and its inheritance linkage, including a 7cM segment that might ultimately prove false isn't going to tip the scales regarding the validity of that TG.

commented Oct 16, 2018 by Edison Williams G2G6 Pilot (441k points)

Answer 2 · 2018-10-11T23:55:19+0000

If we have known 1st or 2nd cousins, I think we should not exclude them from a triangulation when we are trying to confirm a distant cousin(s). I can understand that including a parent/child or sibling match in a triangulation WITH ONLY THREE PEOPLE would be not valid and just a strong match because parent/clhild and full sibling matches are legal proof of relationship. But, grandparent/grandchild, aunt/uncle/neice/nephew matches (and beyond) are not legal proof. IT'S LUCK OF THE DRAW!

So, it looks like the closeness (or distance) of the relationship in the matches is being used to determine if a triangulation is valid or not.

First and second cousins are more distantly related than parent/child or siblings so, first and second cousins matches in a triangulation are closer to being a valid triangulation and, third cousin matches in a triangulation are even closer to being valid.

I think it adds to our confidence when confirming distant cousins if we have more distant cousin matches included with 1st, 2nd or 3rd cousin matches. The more distant the cousin we are trying to confirm, the smaller the cM value is likely to be.

So, I think for valid triangulations, just keep it simple and set a 7 cM limit, allow first cousin and greater matches but require at least one 3rd cousin match or greater and, of course, your dna confirmation must comply with the relationship finder.

Answer 3 · 2018-10-12T02:51:10+0000

I have a set of DNA matches involving a third cousin once removed that have made me think about similar issues. DNA match data (from 23andMe) are available for:

G, the 3rd cousin 1R
E (me)
N (my sibling)
T (our first cousin, and also G's 3C 1R)

Summary of the data:

G and E match on 75 cM over 7 segments at 5, 6, 9, 12, 13, 14, and 16 cM.
G and N match on 38 cM over 4 segments at 6, 9, 11, and 12 cM.
If we treat E and N as one person (EN), G and EN match on 84 cM over 8 segments at 5, 6, 9, 9, 12, 13, 14, and 16 cM.
T and G match on one segment on 16 cM.
The segment where T and G match has a very short 3-way-matching overlap (less than 6 cM) with the 16 cM segment where G and E match -- way too short to triangulate.

So I feel like my excellent DNA match with G, supplemented by my sibling's match, supported by our first cousin's match with G, ought to provide DNA confirmation for ours and G's relationship with the common ancestors we share. How much data like ours should it take to be called "confirmation"?

answered Oct 12, 2018 by Ellen Smith G2G Astronaut (1.5m points)

If G was my third cousin or my 2nd cousin once removed, my 78-cM match with G would be treated as "confirmed with DNA." (It's basically the expected match for a third cousin.) But because G is only a 3C 1R, I'm supposed to triangulate the match in order to treat it as confirmation.

Also, T's 16-cM matching segment with G is kind of low for a 3C 1R (but remember that some third cousins have no match at all), but (1) the fact that T and G do have a significant match and (2) the short 3-way match is in an area where our matching segments overlap (and one boundary of that segment is the address where my matching segments with T and G both terminate). I think this additional matching information confirms (it doesn't conclusively prove this, but there's no such thing as conclusive proof in this area) that the relationship is real and is on the ancestral line it's mapped on. Also, the distribution of cM values from the three sets of matches (E to G, N to G, and T to G) narrows the range of statistically probable relationship distances (indicated by https://dnapainter.com/tools/sharedcmv4) to 3C to 5C (or equivalents).

commented Oct 13, 2018 by Ellen Smith G2G Astronaut (1.5m points)

Answer 4 · 2018-10-12T16:19:11+0000

I vote to allow 3rd cousin and beyond to be included in DNA data findings for family groups.

Answer 5 · 2018-10-15T19:06:26+0000

Just to add my two cents!

First, just the disclaimer that my understanding of "confirm" here is to add evidence to, NOT to "prove" (a dirty 5-letter word).

Second disclaimer, just to be sure we are on the same page, there's of course no *requirement* for segment triangulation when using DNA evidence. There are plenty of other ways to utilize DNA evidence, even without segment data, and even for distant relationships. It's excellent when we CAN add segment triangulation, but it isn't and cannot be a requirement.

To the important stuff:

1. I'm hesitant to establish a "standard" segment size for triangulation, as it should be science that defines these minimums, not consensus (unless it is consensus based on science!). Any discussion of minimum segment size has to include the available information about the probability that a segment of size X is IBD vs. non-IBD. There's some peer-reviewed evidence out there, namely a 2014 paper from 23andMe as well as the oft-repeated analysis by genealogists of the lack of overlap between child & parent matches (and its relationship to segment size). A minimum segment size would have to account for this.

For example, if I write a paper using DNA evidence to support a genealogical conclusion that John and Jane Doe are my 3rd-great-grandparents, and that DNA evidence is based on a 5 cM triangulated segment, I have a serious problem there. I may have poisoned my conclusion with a non-IBD segment.

Unfortunately, there is NO evidence that triangulating a segment of X cM increases the likelihood that the segment is IBD. It makes perfect sense that it should, and I'm guessing that it will, but of course science doesn't work based on 'perfect sense' and 'guessing.'

And, of course, finding a genealogical connection with someone does not increase the likelihood that a segment of X cM is more likely to be IBD. That's confirmation bias, again without any scientific basis.

2. Second, I believe that any standard for adding DNA evidence to a genealogical conclusion must account for the case-by-case nature of genealogical research. The amount of DNA evidence used in a conclusion will typically be dependent upon the amount of documentary evidence used in the conclusion. Although a case-by-case basis, the more documentary evidence there is, the less DNA evidence that may be necessary, while the less documentary evidence there is, the more DNA evidence that may be necessary.

Blaine

Answer 6 · 2018-10-20T01:41:51+0000

Some thoughts:

This is long, maybe even too crazy for some, so you're excused for not reading it
A triangulation as specified by WikiTree is a special case of matching, designed to create a sufficient level of confidence in proposed ancestral lines to confirm their relationships, but limited to specific cases where 3 independent testers match through 3 separate lineages of equal length back to one common person or couple. This triangulation is required by WikiTree when the relationship distance between 2 testers is too great to provide a segment length essentially guaranteed to be IBD. Because genealogy is so often messy, it's difficult to apply this simplistic set of triangulation rules to real world situations, so there is constant controversy over the many situations that don't fit, yet 'feel' just as provable.
I'd like to suggest we back up and look at the broader problem of what we are trying to achieve. Currently, many people assume that matching is all about segment lengths - find a segment of an arbitrary length and you match, confirming the relationship. But length is not the fundamental thing here, the statistics are. What we want is a sufficiently high statistical probability, perhaps an arbitrarily determined number such as 99%, that provides sufficient confidence that we can say 'they match'. Looking at it this way, we can see that length is an important factor, because it's proportional to the probabilities, but it's not the only factor! Any other factor that raises the probability of the certainty of the match is also useful. Why do we need a probability at all? Because some segments are good and valid (IBD), and others are not, matching only by chance (IBC). So any method that determines validity of a segment, irregardless of its length, is valid, useful.
Phasing clearly identifies IBD segments, so any segment that has been 'proven' by phasing is a 100% match, therefore length rules no longer apply. However, both testers would need to phase the segment at their end.
Smaller segments that are part of a proven longer segment should be valid, and not require a specific length. For example, a great great passes a 15 cM down, and the great passes it down, then the grand, but the parent only passes a third of it. If you have some way to prove the segment to the greats and grands, then you know the segment is IBD, and any part thereof is also IBD. But see the following for one complication ...
Those very big segments you've been working with for matching ... because all big segments are collections of small segments, that big segment may actually be a set of both IBD segments and one or more IBC segments in the middle of them.
Another way to validate a segment, especially for a smaller one, is to find repeats of it, because while the chance a small segment is valid by itself is low, finding it repeated in multiple people increases its probability of IBD. My rule for this is - it must occur in a significant number of people in a specific lineage, *and* it must not occur in *any other* lineage. The shorter the segment the lower it's probability of validity, therefore the shorter the segment the more occurrences of them you need to find. What the minimum counts should be requires a good scientifically designed study, and I don't know what good numbers for it should be. A starting number might be 2 testers plus an additional tester for each cM below 10 cM (e.g. 3 for 9cM, 5 for 7cM, 7 for 5cM, 11 for 1cM). The point is, finding the same segment in multiple people down the same lineage (and only the same lineage) is sufficiently improbable that it adds to the statistical probability that the segment is valid. Finding enough of them makes it an IBD segment.
While these suggestions may sound too new for acceptance, I believe to some extent, most have thought there must be some additional validity to certain segments they were finding. I believe in Ellen Smith's example above, she was intuiting that those short legs may not meet the rules, but surely carry some weight toward an increased level of matching confidence. And I believe she's right, shorter triangulation legs do carry added weight, not as much as a full length leg, but they are certainly worth something, even if it's harder to quantify. If nothing else, they're additional repeats, which increases their probability of being IBD.
The quantification of these additive and subtractive factors is clearly a problem. (By subtractive factors, I mean things such as determining degrees of common ancestry or endogamy, pile-ups, etc.) I know some have already complained about the complications of the current Triangulation requirements, but unfortunately that's a simple case compared to the full problem. A solution to determining a probability from a given set of relationships and testers will consist of a large equation with multiple terms, some additive and some subtractive, plus the correct coefficients to apply to each, based on relationships and segment lengths, and tables such as Blaine's Shared CM Project, numbers modified by his histograms (extrapolating within the ranges per box). I'd love to see some mathematicians among us determining how to define and combine the various terms.
When I think practically how this could be implemented, I struggle, because it's going to be hard, and probably requires more resources than are available. It requires a mathematician, considerable development effort, some integration with GEDmatch, and would benefit from Blaine's involvement. So the whole idea seems very unlikely to happen, but since we're brainstorming here, I couldn't help thinking that the ideal project would involve the WikiTree staff getting together with the GEDmatch staff, plus perhaps Blaine as science adviser. Who else but WikiTree has all of the relationships available and the GEDmatch ID's! GEDmatch developers have much of the expertise, and combined with the WikiTree developers, could create an absolutely groundbreaking tool that analyses sections of the WikiTree, and automatically determines DNA confirmations! It could mark as 'DNA Confirmed' any close or distant relationship it could determine a 99% probability for.
A huge side benefit is that all user determined 'DNA Confirmations' could be turned off, removing all user controversy. They would be determined strictly under WikiTree control.
One last problem - DNA is only involved with matching testers, and has absolutely nothing to do with the ancestral trails between them. All DNA can do for distant relationships is calculate the probabilities for that relationship given a specific set of ancestors between them.There are multiple trails for every distant relationship, and DNA cannot prove which is right, so it is still imperative that the paper trail is as correct as possible. For example, a trail could go through a certain father. DNA cannot prove it goes through him any more than through his brother. If the calculation indicates a 100% validity of a match through that father, then it will also indicate a 100% validity of a match through that father's brother. In my view, relationships being marked 'DNA Confirmed' should require birth documents for every person on that relationship trail, that indicate the parent/child relationships.

Categories

Can we hold a meeting with the leaders of the DNA Project?

Please log in or register to add a comment.

Please log in or register to answer this question.

8 Answers

Triangulation

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Related questions