Could MatchBot suggestions be sent only to the development team for the next five years or so?

+20 votes
149 views
Since I am on the SA Roots project, I see suggestions for matches, which when made by my fellow genealogists, are generally well motivated, but even so not always to be supported.

Today there was a merge suggested by something called MatchBot. This looks like the same piece of software that suggests dozens of totally irrelevant names when one creates a new profile.

In this case, it seems to have jumped to a conclusion that two persons with the very rare surname Smith born on the same very precise date ABT 1870, whose profiles do no contain any actual contradictions (except that one "died young" in prose and the other lived to 66), are worthy candidates for a merge.

Its counterparts at MyHeritage, Geni etc are responsible for most of the halfbaked if not totally raw profiles that get GEDCOM-imported into WIkiTree.

WikiTree leadership: if you want to chase me off WikiTree, you've found the way to do it. Up to now I have seen WikiTree as a place where I can publish my family search subject to peer scrutiny, and I would like it to stay that way.

I can understand that there are people who enjoy the challenge of writing this kind of software, but I conservatively estimate that another five years of intensive development will elapse before anything useful comes out of it.

In the meanwhile, can those suggested merges please be only sent to those who work on MatchBot?
WikiTree profile: Charles Elmore Smith
asked in WikiTree Tech by Dirk Laurie G2G6 Mach 3 (32.5k points)
Fixing matchbot is already on the to-do list.

I was in the matchbot monitors group when it first started and had to give it up because matchbot made me angry lol.
I've never had a good match from matchbot in the few dozens it has suggested. The people it finds are not even remotely close in my experience. It doesn't find that many, so the nuisance factor is very low for me.
They are not even in the same century or on the same continent; they don’t have the same names or the same relationships. It’s a total mystery what matchbot thinks makes them match. I have looked into a merge proposal with a profile where the other one had last name unknown and I saw that the one with the unknown last name had dozens of proposals made at the same time with other profiles as the one made with mine - all with different last names. I can’t help but wonder how many of these end up getting merged by some mishap such as automatic 30 day approval and someone not paying attention working merges. Plus when I find a bad merge I often see it was a matchbot merge so I know it does happen. I just wonder how often. It would be better if at least they were exempt from automatic approval. Bad merges cause a lot of problems.
I think maybe there must be too high a demand on the humans that have to review them because bad merges are getting through.

1 Answer

+6 votes
No matches suggested by MatchBot are automatically merged. They are all subject to review by a human prior to being merged.  If they aren't merged or rejected, they will just stay as a pending merge.

We do have a team that monitors the matches and rejects any that have a privacy level that allows us to review them.

We are aware that there are issues with the match criteria.  The match parameters still need to be updated to avoid the really "bad" matches that it makes - just name and date of birth for example. Until we can work with the developer to update the match criteria, you will see some that only have a match for Name and Year of birth.

However, we felt that it was important to have it resume as it does point out valid duplicates. It also allows us to tag old profiles with an 'unsourced' tag, which hopefully someone will then work on during a Sourcing Sprint.

I apologize for any inconvenience this may cause and appreciate your patience until we can update the match criteria.

Thanks,
Susan  
MatchBot MP
answered by Susan McNamee G2G6 Mach 3 (32.9k points)
Just because a piece of software sometimes produces "useful" matches, does not mean it should continue if there are really bad false positive matches that also result.

It would still be better to take the MatchBot offline, or out of the public eye, until it can be better tuned to reduce the false positive rate.

Honestly, it should not be that hard to program in better criteria. And if the software developer is not able to address this soon, then perhaps someone else can be found who can collaborate on it in a more timely manner.
I had two MatchBot matches this morning.  Granted, one of them (name and birthdate only) really needed a human to carefully investigate before rejecting.

But the other one? Same name, same year of birth - but one man was born in England, the other in Newfoundland.

Really?

So that's a - well, I'll be kind - 50% success rate.  That's just not good enough, surely.
Susan, Liz, Jamie - this kind of functionality is right down my alley, although I'm rusty.  I'd love to help with the development, as I spent quite a bit of time in the past developing 'fuzzy matching' for merging library records, normalizing the different data types, weighting and scoring the comparisons, to produce a merge score of yes/no/maybe, all operator controlled (it needs tweaking the ranges of scoring, of both individual data elements and overall totals).  More data points are critical.  Can I help?  I'm happy to sign an NDA, and suggest small and large code changes.

Related questions

+13 votes
3 answers
197 views asked Jun 9, 2017 in The Tree House by Ros Haywood G2G6 Pilot (452k points)
+11 votes
1 answer
+8 votes
0 answers
+10 votes
1 answer
+11 votes
1 answer
+7 votes
1 answer
78 views asked Apr 11 in The Tree House by Ros Haywood G2G6 Pilot (452k points)
+10 votes
1 answer

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...