Would anyone be willing to help keep an eye on MatchBot?

+23 votes
588 views

Hi WikiTreers,

A lot of members have complained about MatchBot, our tool for automatically proposing merges. Because matching now includes surname variants the complaints are almost certain to grow louder!

Even though the surname variants are only considered for pre-1800 profiles, MatchBot's activity has really ramped-up since the change. You can see it here: http://www.wikitree.com/index.php?title=Special:Contributions&who=WikiTree-4

I think we have it limited to 100 merge proposals a day, so that's what it's doing.

Merge proposals are proposals, not merges. Any member can reject a merge proposal. And merges are never completed automatically, even when they're not rejected. But some members don't know this, and I assume some members actually complete bad merges because they don't know they should be rejected.

Would anyone be willing to monitor MatchBot's activity and help reject bad merge proposals?

Thanks!

Chris

WikiTree profile: MatchBot WikiTree
in Requests for Project Volunteers by Chris Whitten G2G Astronaut (1.2m points)
Sure - not online enough to be the only one keeping an eye on it's proposals, but can be one of several.  Just did a look-see and many were spot on, some could be, but there was one absolute not a match, that had a prompt approval by one of the PM's.
A light-hearted observation  - there does seem an ironic side to it all.  Matchbot is proposing merges ... many really bad ones so Members are needed to watch Matchbot.  If Matchbot were a live person then after several ignored warnings he or she would undoubtedly be blocked for being such a nuisance!

12 Answers

+13 votes
 
Best answer
Hi Chris,

I wouldn't mind helping either. Though I'm new to the site, I'm on quite often during the day. Is there a way to filter out the already rejected ones?
by Rob Lenihan G2G4 (5k points)
selected by Ellen Smith
Wow, three good members stepping up already! Thank you, Rob!

I don't think there would be easy way to see which have already been rejected. You, Matt, Jamie and anyone else willing to help would probably want to coordinate with each other and cover different days. Maybe as an Arborists sub-project?
On the "Pending Merge" page, there is a link to display all pending merges by "me". The url has a # at the end of it.. I'm assuming each member has a unique number. I can't figure out where that number comes from, but if we had Matchbot's #, then we could list all pending merge requests submitted by the bot.

for example, here is my url

http://www.wikitree.com/index.php?title=Special:BrowseMatches&type=pending&order=dateup&canAct=0&requested=13181041

Where does 13181041 come from?

Brilliant, Rob!

(N.B.: You're a new member. I'm the founder. You're the one who realized there is a way to only see the MatchBot merge proposals that have not already been rejected!)

http://www.wikitree.com/index.php?title=Special:BrowseMatches&type=pending&order=datedn&canAct=0&requested=1

I marked this "best answer" because that link that Rob found is the thing that makes this "check up on Matchbot" process feasible.
Thanks :)

You can also edit the URL so that you can see 100 records at a time, so there is less bouncing between pages

Just add the &limit=100 like below. Or.. change it to a number you want. (Then, bookmark it)

http://www.wikitree.com/index.php?title=Special:BrowseMatches&limit=100&type=pending&order=datedn&canAct=0&requested=1
+10 votes
Chris I can help keep an eye on it
by Matt Pryber G2G6 Mach 4 (48.6k points)
Thank you, Matt!
+9 votes
I wouldn't mind doing it. Since a lot of the propsed matches have LNABs that are different is there an easy way to tell which direction the merge is going?
by Jamie Nelson G2G6 Pilot (289k points)

That's great, Jamie. Thank you!

Regarding the direction of merges, I think the second one listed in the history item is in the second position. For example, Rose-1493 would be the "merged-into" default here:

     MatchBot WikiTree proposed a merge of Williams-37911 and Rose-1493

Of course, the merge direction can be changed during the merge and the difference in LNABs will be highlighted.

I'll help too, but I think the question should be whether there's an easy way to know which LNAB is correct, not which way the merge was set.

It can be an easy call if one of the profiles has documentation about the LNAB or if one LNAB has a typo or the difference is covered by a project's naming guidelines (for example, EuroAristo would say to go with LNAB Corbet when the choices are Corbet & de Corbet; Dutch Roots tends to van Dyke over Van Dyke while New Netherland Settlers is ok with Van Dyke, but neither would support "Dyke" over van Dyke or Van Dyke). If it's between two pre-1500 (even pre-1700) Welsh names, I'd recommend deferring to Cymru or Wales project via a G2G post.

If the MatchBot merge seems like a good idea, I think you should post a comment with that opinion & whether or not you have

* done the research on which LNAB should be used (and say why if you have)

* checked the name guidelines for any applicable projects (and which projects)

* searched for the lowest-numbered profile with that spelling/styling.

For links to projects' naming guidelines, see this page. If I missed any, please add the info (or let me know via a Private Message or a comment posted to that page or mine).

Cheers, Liz

I was asking about which way, so I could warn profile managers if the proposed merge direction was incorrect and needed to be reversed. But I think your suggestion of writing a comment for each one is a better idea.

I'll mention this as well, although this is not meant for the Matchbot MPs because they only have to check for wrong merge proposals and reject them to prevent bad merges  or add a comment to the profiles that the merge is ok & why !  And see the more detailed explaining from Liz below :

But normally and for all maybe new profile managers / WikiTree members who don't know what to do when a merge is proposed by Matchbot or anyone else:

If profiles look very much the same and there are no sources, or if maybe one of the parents or both are different, so if to make sure if they are or aren't the same, more research is needed, we can of course set them as unmerged match = Postpone the merge and add a post with the reason why...so and so might be the same and maybe should or could be merged....but... more research and sources are needed .

And also first check for duplicate parents, grandparents etc. sometimes they are duplicates as well and if that's the case we start with proposing merges for the eldest and set all younger duplicates as unmerged match.

And, thank you Robin for the reminder :) always check if maybe somewhere is another duplicate with maybe a lower number, if there is all duplicates should be merged into that one of course .

So for each merge proposal you normally have three options/choices:

1. if they are clear duplicates, see above, if there's no other with a lower number and no duplicate great,-grand,-parents approve /or merge and clean up after the merge

2. see above : set them as unmerged match (postpone the merge)

3. if they are clearly different people , reject. if you reject a merge you will automatically have to fill in a post with some explaining .

 

Matchbot MPs see the comment Liz added below :)

actually, I'm not recommended that MatchBot MPs approve, complete, or postpone any merge on behalf of the project. I apparently wasn't clear on the project page and I'll look at it tomorrow with an eye on revisions. In the meantime, ...

For this project, we should just be looking at option 3: reject the merge if they're not duplicates. If they are duplicates, post to both profiles that they are and say why, but don't approve the merge. And unless you're going to undertake the research needed, please don't postpone the merge.

As I mentioned on WikiTree-4 when Bea posted this there, unless a MatchBot MP has a personal interest in one of the profiles, we only want to review the merges & reject bad merges or add a comment to the profiles that the merge is ok & why. See my post in this discussion about the launch of the project or the MatchBot Monitors Project page for details.

If the MatchBot MP has a personal interest in the profile, then that's a bonus and they can do whatever they need to do as an interested, involved WikiTree member. But the 1st and 2nd options Bea describes should be undertaken on behalf of the Arborists Project (or the Database Errors Project?), not the MatchBot Monitors Project.

Cheers, Liz

 

You're of course right Liz and I think it's clear...it's just me thinking like an arborist and we all have worked on the Nederlands Portaal for a few months, where we tried to explain things for all new members ;)

So corrected it a bit ;)
whew! There's a reason I'm no longer an Arborist... I think I am _the slowest_ Arborist ever to have the badge! It takes me at least a couple of hours to complete a merge, but usually about 6-8 hours!  Cheers, Liz
+5 votes
Are you sure that merges are never completed automatically? As an arborist, Many times I have gone to complete a merge, only to be advised that it has already been completed by "Matchbot". It never cleans the BIOS though!

I had a proposal of a Matchbot, this morning, in no way did it resemble a merge.

Perhaps, the better solution would be to suspend its use, rather than monitor its mistakes!
by R W G2G6 Pilot (259k points)
MatchBot definitely only proposes merges. It never completes them, or even records approval for them. There must be some confusing or mistaken error message. Can you copy and paste it here next time you see it?
Yes,will do!
Perhaps Matchbot proposed a merge of two orphaned profiles?  I believe that type of merge would automatically be approved because there are no PMs to approve/disapprove.
I thought you'd cracked the code Star, but just saw a MatchBot-proposed merge that was pending & both profiles were orphans, so that's not it (unless something changed recently).
+9 votes
I'll help, Chris. I had been checking it on some mornings, but I'd let it lapse with the db errors project. I'll get back on it.
by Nan Starjak G2G6 Pilot (246k points)
That's great, Nan! Thank you.

It's looking like volunteers could coordinate and take one day per week.
Maybe a space page with a link to the non-rejected proposals where folks could make a note about any of those they're working on? (e.g., I trying to figure out what to say about a pair that's an obvious match but I don't know enough about Norwegian naming conventions to say whether -sen or -son is better).
+7 votes
I'll help, too, Chris, if I can do it on an intermittent basis and not be tied to a schedule.
by Star Kline G2G6 Pilot (537k points)
Thanks, Star. You're a, uh, star.

Maybe Liz will take on the coordination.

done. MatchBot Monitors Project is up & running.

+7 votes
Seems like MatchBot started a new day, which means another 100 merge proposals (if I understand correctly).

If you get a chance to look through them for clear rejects, please do - see the list [http://www.wikitree.com/index.php?title=Special:BrowseMatches&limit=20&start=40&type=pending&order=datedn&canAct=0&requested=1 here].

I'm working on a space page we can use to track ones that need resarch.

 

Thanks!
by Liz Shifflett G2G6 Pilot (402k points)
+5 votes

>> I think we have it limited to 100 merge proposals a day, so that's what it's doing.

If you just limit it to 100 profiles maybe also restrict it to profiles with sources ==> it doesn't try match profiles marked Unsourced..... 

Profiles marked Unsourced has some homework todo before its match time..... I hope we get errors in the Database Error project so we faster find those profiles marked with category Unsourced

 

Example match of unsourced WikiTree profiles feels more guessing than necessary because of bad researched profiles plus that one is also marked with Unsourced ==> indicates lack of quality...

Both profiles lack

  1. No exact birth dates
  2. No death dates
  3. No born locations
  4. No death locations
  5. No sources
by C S G2G6 Pilot (273k points)
edited by C S
The gain is limited because the profiles that are actually flagged Unsourced are only a small fraction of all the unsourced profiles.

This is less true for new profiles.  It might be good for MatchBot to prioritize new profiles, so that people don't get too far with new duplicates before they get caught.

I'm in two minds about whether it should deprioritize private profiles.

The gain is limited because the profiles that are actually flagged Unsourced are only a small fraction of all the unsourced profiles.

181 355 profiles are marked Unsourced 

The concept of trying to match unresearched profiles with no sources is not genealogy...

I did a check today on the proposed match  Bjørnsen-38 and Bjørnsen-4

Its old norwegian profiles born 1563 ==> difficult to find sources and you need to be highly skilled in genealogy to find proofs...

So was it well researched profiles imported to WikiTree? 

  1. both profiles are part of gedcom imports
    1. one is 217 people no sources warning bell
    2. the other is 5903 people some sources
      1. this gedcom file had norway mentioned 3108 times but I found no sources from Norway - warning bell
    3. If you click "around" in the family tree "next" to suggested match you can see no sources has been added... to any person after import to WikiTree warning bell

Lesson learned it will just be guessing if those profiles match or 100 of hours researching...  

My suggestion is that WikiTree should focus on well researched profiles not unsourced uploaded gedcom files

  1. Start mark profiles Unsourced if sources are not there
  2. Don't try to match unsourced profiles as you can never prove that it's the same profile.... it will just be guessing....
  3. Create an Error in the Database Error Report for unsourced profiles so we find them faster...

As we have default approval one month after the merge has been suggested by the MatchBot I guess that we will have many unsourced merges done....==> its not good for the quality of WikiTree

I was hoping MatchBot doesn't approve its own proposals...

What I meant was, if MatchBot finds 100 matches, maybe 50 of them will be unsourced, and about 2 of the 50 will be tagged Unsourced, so dropping the 2 won't have much impact.

Default approval means that you just need one person that think its enough with matching names to say that a profile from 1763 Norway is the same person.... 
 

  • If a proposed merge is not answered within one month, this account is also used to give default approval so that a Wiki Genealogist other than the Profile Manager can evaluate the match and proceed to reject or complete the merge.

2 less ==> 700 less in a year ==> 15 hours less work for someone to check and we start to focus on researched profiles....

A get a bitter taste of Ancestry Green leaves when I see matches with no exact birth dates, birth location, just a year.... why not focus on profiles people care about.... 

Magnus,

The health of WikiTree needs to be tended at all levels, from well-developed, sourced profiles to unconnected orphans. The chances of the latter being improved increases greatly when all the potential duplicates are found and merged. MatchBot finds potential duplicates, and the goal of the new MatchBot Monitors Project is to do an initial review of those proposals to make sure they help rather than harm WikiTree. An initial review - not in-depth analysis or extensive research.

Cheers, Liz

Mark them unsourced and continue with genealogy is my humble advice (what I did

The health of WikiTree is based on profiles with sources the rest could be(or is) mythology...not having sources on WikiTree profiles is against WikiTree Honour code.

No one would think about trying to find matches in WikiTree for Snow white and the Sewen Drafts as they have a source the Brother Grimm story and is fictional

Trying to have opinion of non sourced profiles in WikiTree is just a waste of time. Mark them Unsourced to warn other people (and the matchbot ;-)) 

  • Quote "You must include your sources when you put information on WikiTree. It's in our Honor Code.
     
  • WikiTree Honor code:
    • We cite sources. Without sources we can't objectively resolve conflicting information.
 
+6 votes
Hi Chris,

If there is no way to compare the information.. i.e. - one profile has a spouse, other doesn't list one. One lists children and parents, other doesn't, one doesn't list birth location, etc.   is it sufficient to reject the comparison if the last names do not match and based on no other information?

Like, how much research are we putting into rejections? I'm assuming we want to quickly attack most of the obvious rejections so as to keep numbers low.. would that be an accurate statement?

Rob

edit: lol, and.. i love how old some people are on this site.. those that have first hand knowledge as a source for relatives born in the 1700's  :)
by Rob Lenihan G2G4 (5k points)
good questions. I think that's a judgement call, but my inclination would be to reject with a comment that there's not enough info to tell if they're the same person or not.

A rejected merge is still attached to the profile & your reason will be posted as a comment. If someone comes along later with more info on the family, they can look at the rejected merge and re-assess it then.

I think every person looking at MatchBot-proposed merges will have a different definition of what an appropriate level of research is. So long as you communicate what you're doing/have done & why, I think we're good.

Cheers, Liz
+7 votes

And the project has launched. See http://www.wikitree.com/wiki/Space:MatchBot_Monitors_Project for details of the project’s goals, but in a nutshell:

* check the feed & reject bad merges

* for merges that look ok, post a comment on both profiles saying so & why you say so

The “why you say so” is where it gets tricky, but the two main points are

* which LNAB should be the final profile for merges of duplicate profiles with different spelling/styling of LNAB and

* whether or not the profiles are the lowest-numbered profile for that person with that spelling/styling of the LNAB

If you’re looking into a merge, list it under Working on the project page & post a note to both profiles.

OK'd merges should be listed under Pending Merges. I'll probably be making a separate page for that section, but I want to see how this system works first.

I currently don’t plan on having a schedule or shifts, but if Rangers include the feed on their rounds, that would be great!

Thanks everyone!

edit: tweaked the links to go to page 1.

by Liz Shifflett G2G6 Pilot (402k points)
edited by Liz Shifflett
Could you tweak the feed links?  They start at page 3
well dang - that explains why they weren't changing. I thought it was cuz MatchBot had hit 100. Thanks RJ!
+2 votes
why no compare button? it makes things unnecessarily longwinded
by Gillian Causier G2G6 Pilot (227k points)

Hi! The feed has a compare link you can click for each pair. And if you're on the profile page, the proposed merge has a clickable compare link too (it's listed in the matches section at the bottom of the profile if you're logged in... my experience is that the section doesn't show if you're not logged in). 

So I'm not sure where you're saying there needs to be a compare button?

Ah. Thanks. That page is well beyond my ability/authority to change. Since it's listing actions taken by the profile, I don't see how a compare link could (technically) be added, but I'll forward your suggestion. Before I do...what's an OP?

And as a workaround - click one of the two profiles in the proposed merge & then at the bottom of the profile page, click the compare button.

Or you could check the feed that has MatchBot-proposed merges that haven't been rejected or postponed, which does include a compare link, instead of the list of its contributions.

+5 votes
I'd be happy to keep an eye on them!
by Valerie Kerr G2G6 Mach 1 (12.3k points)
Thanks Valerie! Just need you to send me a private message so I can send you an invitation to the project's Google Group & you're all set :D

Related questions

+17 votes
2 answers
+7 votes
2 answers
124 views asked Jul 18, 2017 in The Tree House by Kathy Zipperer G2G6 Pilot (276k points)
+5 votes
0 answers
78 views asked Oct 23, 2016 in Policy and Style by Carolyn Martin G2G6 Pilot (164k points)
+5 votes
1 answer
100 views asked Jan 5, 2018 in WikiTree Tech by Cynthia B G2G6 Pilot (127k points)
+7 votes
2 answers
+3 votes
1 answer
160 views asked Dec 10, 2016 in The Tree House by Nick Andreola G2G6 Mach 3 (36.8k points)

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...