Should spaces in last names be ignored for most purposes on WikiTree?

Question

Should spaces in last names be ignored for most purposes on WikiTree?

381 views

Hi WikiTreers,

We are in the middle of a change that would improve how our search functions handle last names with spaces and special characters.

To explain it briefly: Right now our systems interpret, for example, St. Laurent and St Laurent as different names. Soon they will be interpreted as the same name.

Ellen Smith has suggested that StLaurent, with no space, also be interpreted as the same name.

In her words: "please teach the search function to overlook the underscore character _ in last names like Van_Dyke and Du_Bois and van_der_Walt when searching for possible duplicates. Currently, if I'm looking to see whether there's an existing profile for an Abraham Van Arnhem, I can enter Abra* and Vanarnhem in the search boxes, and I'll see results for Abram or Abraham with last names of Vanarnhem, Vanaernum, Vanarnum, VanAernam, VanArnem, and more. However, if I want to find variant spellings for the two-word version of the name (and this was properly a two-word name), I have to search separately for Abra* with each possible variant spelling (Van_Arnhem, Van_Aernum, van_Aernam, van_Arnheim, etc.). How hard would it be to teach the search function to treat Van_Arnhem as equivalent to VanArnhem, for the purpose of finding possible duplicates?"

This would be relatively easy to do and now would be the time to do it, before we complete the change we already have in progress.

I'm inclined to think that Ellen is right (she usually is). For search purposes, Van Arnhem and VanArnhem, and O'Reilly and OReilly, etc., should be treated the same.

Do you agree? Can you think of cases where the spaces are very significant for matching?

I'd like to say that Van Arnhem and VanArnhem could be considered as similar to each other but not exactly the same, e.g. so that a search for Abraham Van Arnhem would weight another Abraham Van Arnhem more highly than an Abraham VanArnhem. It should be this way. But I'm sorry to say that we have to be fairly black and white here. Either Van Arnhem equals VanArnhem or it doesn't.

Also, if we do this, Van Arnhems and VanArnhems would be lumped together in other contexts, not just search and matching. The change would affect almost all contexts where we group people by surname.

For example, there would be one surname index page: the one without the spaces. All the O'Reillys, O' Reillys, O Reillys, and Oreillys would be on https://www.wikitree.com/genealogy/OREILLY

As another example, the van Der Walts would no longer be on https://www.wikitree.com/genealogy/VAN_DER_WALT they would be https://www.wikitree.com/genealogy/VANDERWALT even though nobody on WikiTree has the name Vanderwalt.

To find orphaned and unconnected profiles, van der Walts would need to look at Vanderwalt. And van der Walts should start following and using the G2G tag VANDERWALT instead of VAN_DER_WALT.

Some van der Walts might not appreciate all this.

Some of these things can be mitigated, but not easily, and it wouldn't be a priority. Van der Walts would have to get used to seeing and using Vanderwalt a lot.

I don't mean that Philip van der Walt's profile would say Philip Vanderwalt and that's how he'd appear in search results or family trees, etc. Individuals' names would be respected where they are displayed as individuals. It's just that in group contexts, they'd be Vanderwalts.

I don't know how much this would upset the van der Walts, et al, or how heavily this should be weighted in the decision.

Do you have any thoughts?

Thanks!

Chris

asked Jun 2, 2017 in The Tree House by Chris Whitten G2G Astronaut (1.5m points)
retagged Aug 22, 2019 by Ellen Smith

depending on where the name is displayed without spaces and only initial cap, I think the benefit is not worth it.

Is the change of searching for space/no space something that could be implemented separately from looking for St. or St & surnames with more than one word? If it were separate, could it be easily implemented & reversed? In other words, would it be feasible to implement it temporarily?

With the past two changes in search parameters, there was a massive influx of MatchBot proposals (the first was surname variables, which lasted quite a while; the second just began - matching to Unknowns).

I think that the complicated lots & lots of duplicates with the different styling of space/no space will all be found within a month or two & then finding duplicates would not be so onerous & that change could be reversed, if that is technically feasible.

Cheers, Liz
MatchBot MP
Join the MatchBot Monitors Project! See [this G2G post].

commented Jun 2, 2017 by Liz Shifflett G2G6 Pilot (629k points)

2 Answers

Answer 1 · 2017-06-02T15:02:05+0000

I think this would be a very useful improvement, and I wonder if it can be expanded a bit further.

Can variations with a leading 'de' also be found? So for a name like de Normandie, a search of Normandie, or De Normandie would return the same profiles? If so, I am sure we can come up with a handful of more suggestions, ('de la ", 'la ', 'fitz', 'of ', etc.).

Answer 2 · 2017-06-02T17:51:47+0000

Thanks for considering this, Chris. My hope is that it would be possible to change the way spaces in names are handled in WikiTree search functions, without affecting the way these names are displayed or spelled in data fields.

The phenomenon of surnames with spaces in them is a major reason why the New Netherland Settlers project has faced difficulties with managing duplicate profiles. Historically, both families and genealogists have changed names with spaces to the space-free versions of the same names, so we always have to be mindful of the possibility of variant spellings with or without the space. WikiTree has lacked effective tools for helping deal with these variations. Surname lists for names like my family name of Van Aken don't identify any "related surnames" (see https://www.wikitree.com/genealogy/Van_Aken) and surname pages for the concatenated versions of the same names have "related surnames" lists that don't hint at the existence of the versions of the name that contain spaces (see https://www.wikitree.com/genealogy/Vanaken). And as I noted in my earlier comment, the name search function doesn't deal at all well with spaces in last names. If our search protocols could treat Van Aken and Vanaken as the same name, while retaining the distinction in the actual name fields, it would save a lot of time and energy -- and prevent some gnashing of teeth.

Categories

Should spaces in last names be ignored for most purposes on WikiTree?

Please log in or register to add a comment.

Please log in or register to answer this question.

2 Answers

Please log in or register to add a comment.

Please log in or register to add a comment.

Related questions