Is there something wrong with the mechanism that suggests potential duplicates at profile creation?

+34 votes
677 views
This is not the first time that I inadvertently create a profile (actually several in a family group) and later find out that there was already a profile for this person.

The profiles have the same first name and last name and the birth years are less than 2 years off. Why was not the older profile suggested as a duplicate before the creation of the duplicate was completed? It should have been. I very often have potential duplicates suggested for profiles that are very far from being a good match, so why not this one?

The same problem occurred with the wife and other family members. And I had the same issue during the Connect-a-thon when the duplicates were only picked up after I had already added several involuntary duplicates. (in that first case the names were slightly off, Myers vs. Meyers. I can provide links if that is helpful).
WikiTree profile: Benjamin Warner
in WikiTree Tech by Isabelle Martin G2G6 Pilot (566k points)
retagged by Jamie Nelson
Hi Isabelle. If you can recall exactly what names and dates you entered in the form, and let us know what profile(s) should have been suggested as a match, Jamie can try to replicate the problem. Thanks.
Thanks Chris. Wonsal-6 was created with Last Name at Birth Wonsal, Proper First Name Benjamin, Current Last Name Warner, Birth Date 1858. (and a birth place that is not taken into account by the matching algorithm). In hindsight, I would have expected it to suggest Wonsal-1 as a duplicate.
Also, a similar thing happened when I created Eichelbaum-5 which should have triggered the suggestion of Eichelbaum-3. The differences between the two is that one is entered as "Perel" Eichelbaum with a birth date at 1858, and the other is "Pearl" with no dates at all.
The identity of Perel Eichelbaum and Pearl Eichenbaum was not detected because the WikiTree name search function cannot recognize variant spellings of given names unless the variant spelling appears in one of the name data fields for the existing profile.
I believe Pearl was entered as Preferred First Name for Perel though. I'll check.

If Jamie would like another example, the one that just happened to me may have an easily identifiable cause. I created Bassett-5469 with the following details: First Name - Eliza, LNAB - Bassett, Current Last Name - Lamb, Birth date - about 1835. No matches were suggested, so I created the profile.

I then looked to see if she might have any relatives already on here and found she actually already had a profile. The details on that profile (Bassett-5389) were: First Name - Eliza, LNAB - Bassett, Current Last Name - Bassett, Birth date - before 25 Oct 1835.

So it seems that either the difference in Current Last Name or the difference in dates between about/before, or the combination of the two, caused the match to be missed.

Thanks Paul, that is helpful.
This has happened to me also.  Drives me nuts because I always go through the list I thought carefully
Isabelle, excellent that you've raised this issue. It is definitely a problem!. Jamie, that would be amazing if you could tweak things a bit. Thanks both of you.
Improving the duplicate detection has been at the top of my wishlist for a while.

We are currently working on introducing first name variants, which should help a lot. But I have a list of other tweaks that should be done as well.

Thanks for this news, Jamie. heart

It also looks like something has been done recently to teach the system about variants for names that it did not recognize previously. For example, I am pleased to see that the surname genealogy page for TERPENNING now says:

About 97 TERPENNINGs. Related surnames: TERPENING (164) TEERPENNING (31) TURPENING (21) TARPENNING (18) TARPENING (11).

Is there anything we can or should be doing to help ensure that additional variants are recognized in the future?

We did get an updated last name variant list from werelate, so that might have been updated.

We do plan on updating the database with the werelate variant names data ~much~ more frequently (every few months? instead of every... 5 years), so we are going to encourage people to make corrections there.
Great!

I think that group of variant spellings for Teerpenning might be one that I had added over at werelate in hopes that it would get picked up here.

8 Answers

+22 votes

Happens to me quite often as well, and I like to think that I know how to check for these things. (apparently very wrong of me! frown)

by Natalie Trott G2G Astronaut (1.3m points)
+13 votes
It happens to me sometimes, I even inadvertently made a dupe for my own grandfather when I first joined Wikitree.
by Jessica Key G2G6 Pilot (315k points)
+18 votes
I have noticed the same thing.   Because of this, I sometimes just search the name with no dates.
by Robin Lee G2G6 Pilot (859k points)
Me too!
+16 votes
I have a hunch that this might have happened because one profile had both a birth date and a death date, and the other profile had only a birth date.  We assume that the matching algorithm will find that kind of match (and it usually does), but for some reason it does not always seem to work that way.

Like Robin Lee, I often search for a person by name without any dates (and with variant spellings or wildcards for the first name) before I create a profile.
by Ellen Smith G2G Astronaut (1.5m points)
Yes. But if we must thoroughly search for a possible existing profile each time (even for modern profiles, which this one was), what is the point of having an algorithm which will sometimes suggest dozens of potential matches that don't even look like matches?
Ellen, how do you use wildcards in name fields ? I did not even know it was possible.

Wildcards work in the name search fields. A ? is a wildcard that substitutes for a single letter and a * substitutes for multiple letters.

Some use examples for wildcards:

  • To search for Elisabeth or Elizabeth, enter Eli?abeth
  • To search for Elizabeth or Eliza, enter Eliza*
  • To search for any name that starts Mar, enter Mar*
  • To search for Margaret, Margriet, Margot, or Margit, enter Marg*t
That is something I didn't know, you have just saved me so much time. I have been working my way through an un-indexed film creating profiles for all that appear while trying my best not to create duplicates but some have managed to slip by me because of first name variants.
+14 votes
I always use the name search first when creating a new profile.  I sort them in date order and scan the +-10 years around the new person to see if there is any similar one.  That way it's easy to see the name variations and also which parents or spouse is connected.  I advise new people to do this because they assume that the suggestions are complete (or at least I should speak for myself and say that I thought they were complete until I learned from experience.)
by Cindy Cooper G2G6 Pilot (328k points)
+7 votes

I'm having this problem as well. I didn't realize there was a thread on this already and posted a question about it. I was directed here with the suggestion to add my example: 

I added a profile for Ray Nash Studt (Studt-73) several months ago via a GEDCOM upload. A couple of days ago, I discovered there is a duplicate Ray Nash Studt (Studt-39). The two have the same full name with the exact same spelling and the same birth year. I was surprised, because I try to be very careful about comparing profiles when I'm adding people. But there it is, so I clicked on matches from Studt-39 to propose a merge.

The only potential match that was offered from the Studt-39 profile page was to Ray Stitt (Stitt-39) with no dates. Then I tried to find the match from the Studt-73 page. I was offered matches to Stitt-39 again, Stout-1020, and Steed-1959. 

I ran another search for potential matches from the general "Find...Matches" page with very loose parameters and asking to include any rejected matches. They still don't come up as a potential match (and this confirms I didn't accidentally reject it).

With the same exact full name and birth year, these two should be showing up as potential matches--especially when profiles that don't match either the name or the birth year are being offered as possible matches. And when you click on matches from each profile, the list of potential matches should be the same!

(The merge has been proposed now. That is not the problem.)

by Regan Conley G2G6 Mach 4 (45.0k points)
+9 votes

I can document another example of this functionality not working.

I created a profile for Cora Abbott, born Dec 1866. First name Cora, LNAB Abbott, current last name Gifford.

Before I created the profile, I ran a name search for Cora Abbott. I did not find any profiles for my Cora, but I found several that were close enough to show up as possible matches:

When I created the profile for Cora, none of these showed up as possible matches.  (I was not shown any possible matches.)

by Ellen Smith G2G Astronaut (1.5m points)
Thanks Ellen. Those Coras show up if I don't type a Current Last Name, but don't show up if I've typed "Gifford" (although your new profile shows up as a match). This is helpful.
+7 votes

Jamie - another example:

It happened to me Jan 23, 2021.  I added a child for Hall-50530 with first name Annie, LNAB Reynolds, CLN Ellis, Female, Birth 1894-11-15, and Death 1977-01-13 and checked the option that her father was Reynolds-21399.

I don't recall what (if any) showed up as possible matches, but I can confirm that if there were any, I checked the list carefully and saw no potential matches already here.

After adding the profile (Reynolds-21570), I immediately proceeded to add seven sources and write a biography based on these records.  Finally, I was ready to add her husband but a possible match was shown for him and it turned out to be him, plus he already had a wife.

The already existing profile Reynolds-5151 was created April 3, 2014 and is very minimal - only WikiTree's boilerplate for biography and sources.  No parents are shown and the only connections are to husband and one of their five children.  First name is Annie, Middle Name is Mae, LNAB and CLN are both Reynolds, and Birth Date is about 1895.

It seems like this should have shown up as a match when I was adding what is now a duplicate.  I proposed the merge as soon as I found the duplicate and it is still awaiting what will probably end up as default approval because the manager of the original one was last active nearly 2 years ago.

by Gaile Connolly G2G Astronaut (1.2m points)
Thanks Gaile!

I've been watching this issue too since the discussion last year, and my experience sounds similar to Gaile's. What I think I've been finding is that the more accurate the details you use initially, the less likely it seems to be to pick up a potential match than if the details you put in initially are quite vague.

Related questions

+7 votes
3 answers
227 views asked Dec 2, 2016 in The Tree House by Ros Haywood G2G Astronaut (1.9m points)

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...