As WikiTree grows, how will we swiftly eliminate possible matches?

+8 votes
When entering a person, if similar persons are already present, those "matches" come back to us allowing us, if a true match is found, to select a current profile and avoid adding a duplicate to the tree. But, as the tree grows, the number of matches (especially on the more common surnames) will grow greater and greater.  Complicating the problem is that not all possible matches are viewable. Is it possible to add more data to the returned matches so that a more rapid elimination can occur?
in Genealogy Help by Rick Young G2G Crew (880 points)

1 Answer

+3 votes
Just an idle thought. if we could all work together to match all the current duplicates, and keep them matched, there would be fewer to match to.

I do notice that when matches occur, if a DOB is present, possible matches could span 300 years. If DOB or DOD were used as criteria, many non-matches would be eliminated.
by Tom Bredehoft G2G6 Pilot (193k points)
I think it already takes the date of birth into consideration, but like the "Find Matches" from the Watchlist, it also opts to include "century typos" so your profile for someone born in 1820 turns up matches born about 1810-1830, as well as people born 1610-1630, 1710-1730, etc. (Chris, is this how it works?)
I'd forgotten about that, I'll have to remember to eliminate typos when I search for matches. Thanks, Erin.
On the other hand, when entering a new profile, the matches that come up cannot be narrowed  down, we are presented with sometimes screens full of names. I would hope this could be restricted to nearly exact matches.
I know that with a common last name such as mine, Young, there are thousands of people with such a surname and the match list can be intimidating. Not only can it take forever to examine each and every one but I can be staring a match right in the eye and because I have only the data from my side of the family (with no spouse for that person) and the other person only has that person as the spouse of their family line, what is a proper match can not be made. And, as said before, that applies only to those where the poster can view the details of the possible match. Theoretically, if the potential match is a locked individual (and often there are many of such matches), one would have to contact each and every such potential match submitter and request more details. This drastically slows down the process of growing a tree and causes a significant amount of discouragement.
I love this question. We're actually right in the middle of some good improvements to the FindMatches function -- and improvements are something we should always be making as WikiTree grows. I'd be working on them now (and explaining them here) but I'm at NGS in Ohio at the moment, and someone is coming up at the table ...
This is one of my favorite topics as well.  I'm convinced that we should not allow GEDCOM imports without dates.  If people want to manually create new profiles without dates that is fine but, I'm sure you will all agree that the majority of matches don't have dates and all say automated gedcom import.

Looking forward to the improvements.
Hi Ed.

You mean in the pre-scan of a GEDCOM upload, don't process the file if no person in the file has any dates at all? Or if any person in the file is missing a single date? The latter would be too strict but the former is a good idea. It doesn't seem to me that there's a good reason to upload a file with no dates at all.

FWIW, the pre-300 prohibition works if some dates are missing. Our code is sophisticated enough to know that the ancestors of someone over 300 is also 300. But it ain't much more sophisticated than that.

My opinion is at least a birth and death year for every record but if the software is sophisticated and you could check for at least one that would help.  It just seems to me that many are uploading completely blank GEDCOMS hoping others will fill them in.
Hey Ed. Do you note the upload date when you see those junky GEDCOM-generated profiles? I hope that most of them are older, before many of the more recent policies, procedures, etc. were implemented. (Something awesome coming soon: FindMatches-like evaluations of GEDCOM after upload and before processing, so that the user can see how much likely duplication there will be and select certain lines not to import.)
I haven't been looking at the dates to be honest.  I've stuck with manually entering my tree information because of the number of 'false positives'  So far no siblings only direct descendents.

My future's so bright I've got to wear shades ;)

Related questions

+3 votes
1 answer
+5 votes
1 answer
+3 votes
1 answer
+7 votes
1 answer
74 views asked Mar 24, 2016 in The Tree House by anonymous
+1 vote
3 answers
+5 votes
2 answers
+5 votes
1 answer
179 views asked May 21, 2016 in WikiTree Tech by Cynthia B G2G6 Pilot (127k points)

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright