General... I like it. It's an improvement.
Changes to the DNA Confirmation Guide:
Step 5:
Add a link to here: https://dnapainter.com/tools/sharedcmv4
For now, I think that predictions of the testing companies OR the Shared cM Project tool could both be used. See the final section for a comment on how this might ultimately be improved.
Step 6:
I would add a 6th step, even if slightly redundant: "6. Is the match a member of WikiTree?", in order to explain the anonymity requirement and to improve it (note A). Additionally, the privacy requirement should be changed with respect to matches who are members of WikiTree (see note B):
- If they are not members of WikiTree, their identity must be kept anonymous (A), simply stating the relationship: e.g. 2nd cousins once removed, sharing the common ancestors [WT ID Ancestor 1] and [WT ID Ancestor 2]
- If the match is a member of WikiTree, their identity should only be disclosed with their consent. Otherwise, the relationship can be stated as described in the guide for non-members. (B)
A. With regard to the anonymization, the guide as currently written is problematic. "Your WikiTree ID and the initials (or another anonymous identifier) for your match. You can include your full name but do not publicly reveal the identity of your match." Given the ability of the typical genealogist to Facebook stalk, initials provide nearly all of the information required to correctly identify the individual in question, thus it is not an "anonymous identifier" (see: oxymoron, noun). That their name is not attached on WikiTree means very little if one can simply side-step that in most instances. The only anonymous identifier would be the biological/genetic relationshiop, .e.g. 2nd cousins once removed, sharing the common ancestors [WT ID Ancestor 1] and [WT ID Ancestor 2]. That is the standard which has been provided in the past and it seems entirely sufficient.
If there was a private, non-publicly-viewable field where such initials could be stored, that may be acceptable. If we had a system of where sources, including DNA confirmations, were separate from the biography, this might be feasible: If such private, creator-only-viewable fields were provided.
B. We should not be operating on presumed consent with regard to DNA confirmations involving other WikiTree members. Perhaps that could be automated such that for 2 WT members, for a DNA link to be made, there should be a recorded consent of both parties to that linking process. Although point 5 in the Honour Code states, "We privacy-protect anything we think our family members might not want public. If that's not enough for someone, we delete their personal information.", we need to realize that once information is placed on WikiTree, we cannot guarantee that WT's deletion of it will remove it from being publicly accessible as it could have been recorded prior to deletion. Because of this, the consent should be obtained prior to posting of the linked citation.
Make Sources Separate (So we can do cool things with them)
If we are going to the work of automating this, you may wish to consider the idea of keeping citations separate from the biography. If we had a separate data system for citations, it would likely be much easier to record this affirmation, as well as to propagate the citation:
Wter-1 and Wter-2 are both members. WTer-1 goes to Wter-2's profile page and clicks, "We're DNA Matches", then enters a few details (what test, how many cM for each shared segment or total cM and # of segments). Wter-2 gets a notification and clicks "Confirm"... automatically those citations climb the tree to their common ancestror(s).
That would be nice.
Making the separate data system for sources could also be something gradual. If we made one for DNA Confirmation sources, with other application and/or future expansion in mind, it could gradually be expanded to address other kinds of sources as well.
Such a system could also include the confidence of the relationship (using Shared cM Project data, as used on DNAPainter). It could be interesting for edge cases, e.g. 3C vs half 3C, where the ancestral couple being confirmed (especially with automated, separate-source citations) might have a note that there's a possibility (including % probabilities) that the terminal relationship might not be as described (1 vs 2 shared ancestors). I think that would help to balance the over-confidence sometimes produced with DNA confirmations (and take a little heat off from the use of the word "confirmation"). Perhaps instead of a "DNA Confirmed" checkmark, we might be able to have a % probability or a qualifier ("High", "Likely", "Possible", etc...)
Additionally, as more knowledge is gained, we might be able to have the system recognize if 2 or more DNA Confirmation citations are present and then introduce a statement of odds into the Research Notes section (or as a separate, non-editable part) of the profile. This could be useful for multiple matches with distant ancestors (> 3C). Again, this would be best executed by linking WT IDs and a separate citation section. Having multiple 1-to-1 matches (as a simple, "We're DNA Matches" button/link), recorded with automated citations would in my view be ideal. And I think that it's something that could ultimately be built on for more robust distant confirmations in the future.
In the instance where two people are DNA matches and have WT profiles, having that profile-to-profile link that exists separately from a known shared ancestor, could allow for a narrowing down of how two or more WT users are likely related. Forget ThruLines™ and Theory of Family Relativity™. Now introducing, Algorithmic Speculation by WikiTree™ ... but joking aside, this could be a serious possibility. DNA networks are incredibly powerful, especially as one increases the network size, and because this is a global family tree (where DNA tests could rule out certain connections and suggest others), the predictive power becomes much more precise.
If the DNA Confirmation inputs were well tracked, one could create McGuire charts and which might help users to evaluate testing options and the data (along with odds or probability calculation):
If we can start doing these kinds of relationship analyses, we can, I think, significantly expand our answer to "What conclusions can users confidently draw from what they're getting from AncestryDNA, MyHeritage, etc?" Rather than having questionable suggestions imputed from tangled and unsourced trees via algorithms that are a black box, we can open the box to show how the certainty is determined and provide users with a verifiable level of certainty. I believe that there's a real parallel here between the push for open source code in science vs closed source "black box" code. WT can be the open source option - trustworthy and verifiable - in its code more than in the source data.
Although Chris isn't interested in re-hashing the "whether we should require public verification" question, there is a relevant aspect here. Because we have a better opportunity for how to be open and verifiable: Provide an analysis that is open and a platform for the data. As more people contribute relationship data to answering questions, outliers will become apparent, and those can be marked as such and viewed with a skeptical eye. This, for the doubters, is actual verification... not linking the fake kit I may have uploaded to GEDmatch!. The value to users continues as the dataset grows. Moreover, being cross-platform — as far as DNA data goes — there's an opportunity to reach conclusions that can't currently be accessed on other platforms.
The Tangent on Segments:
I am still unsure if we might want to store start and stop points for segment matches.
Regarding the tangent on storage of segment position data, I'm against the storage aspect. As outlined above, I believe that there is much more that we have the potential to do without getting into segment positions.
As medical inferences could be made from accessible start and stop points for segment matches, segment position storage should not be part of WikiTree. It creates both regulatory and privacy risks. For good reason, many jurisdictions redact or withhold cause of death from public inspection on death records. Such information links living individuals, especially those closely related, with a greater likelihood of certain genetic conditions. Still, some provide such information. (Such may depend on local healthcare and/or privacy laws.) So if a segment containing a SNP with a causal relation to the cause of death, then medical information can be inferred with even higher certainty about any others with that cited segment. I would swerve to avoid that landmine.
That said, there could be a middle ground approach here. WT could have a calculator input using segment information which could (1) calculate and (2) evaluate relationship confirmations which does not store the segment position data. Perhaps what could be recorded, not for individuals, but for analysis purposes, is the number of cM for each segment. Such could be done collaboratively with the Shared cM Project to help grow the database using WT to help crowdsource further collection. I could see a synergy between WT and the Shared cM project there - it would certainly "increase the world's common store of knowledge." This could produce useful info for endogamous populations, more complex relationships, and even for simple relationships — as the Shared cM Project does not currently take segment number or size distribution into account.