Have you seen that searching and matching now includes surname variants?

+59 votes
738 views

Hi WikiTreers,

I have an exciting announcement.

Surname variants are now included in searches. For example, if I search for my own name the results now include a Chris Whiddon, a Chris Whiting, and a Chris Whitney.

This is similar to the "Soundex" searching that many genealogists are familiar with, but we're using the database from the Variant Names Project initiated by our friend Dallan Quass from WeRelate.

In addition to the main Person Search tool, the variants are included in Find Matches (our tool for searching for matches for every person in your Watchlist, rather than searching for them one at a time) and the automatic matches that are suggested when you create a new profile.

They are also used in GEDCompare (our tool for searching for matches for every person in a GEDCOM, rather than searching for them one at a time) and MatchBot (our system for automatically proposing merges of likely duplicates). With these two, the surname variants are only considered if the people were born or died before 1800. Lots of bad matches are especially annoying in these cases and users can't turn them off.

You can turn off the surname variants with Person Search and Find Matches. You'll see it as an option beneath the search form.

For the automatic matches when you create a new profile, we'll have to wait and see if too many suggestions are now being made and if that annoys people. Post here if it does.

Onward and upward,

Chris

in The Tree House by Chris Whitten G2G Astronaut (1.5m points)
retagged by Ellen Smith
Horray! Great news.
Excellent!! Thank you, Chris. I know we've been pushing for search improvements like this for a long time. Thanks!!
Just posted a question on this subject since I came across really annoying ''similarities''.  I think that will need to be gone into more fully, really diminishes the value of search function when you get so much irrelevant data thrown in.

This is the question.  Just went to look at the WeRelate, I think they need massive help, just looking at what they have for Coutu, totally ridiculous.

http://www.wikitree.com/g2g/261746/search-function-similar-names-needs-some-adjustments?show=261830#c261830
THIS WAS A MIGHTY LEAP FORWARD!  WELL DONE, TECH TEAM -- HUZZAH! HUZZAH! HUZZAH!
While the "soundex" variant search may help when one is searching for an unknown, it may hinder a search for a known person when there are too many results to the search.

I can no longer find some profiles because there are so many variants that sorting will not display the profiles I know to exist.

Is there a way to bypass the variant "soundex" search?
Please add a switch to turn off variants on Find matches for this person and also for when you enter a name in the blank fields at the top of a profile page. These two search functions have unfortunately become rather useless with the variants added. Thank you.
I noticed that when I was just creating a new profile.  This is truly wonderful.  As many profiles are created under the variant names!!!! This will help to reduce the number of duplicate profiles hence reduce the number of merges!!! yeah!!!

Awesome!  Taylor
Ouch!  Sorry, but I have done searches for someone with a French name, and get all sorts of possibly similar names thrown up that are not in the least French.  So I have to limit the search to no-variations to find anything.  Who created these so-called variations anyways?  They went rather overboard.
It was a project on WeRelate. They got a list of all the surnames on Ancestry and grouped it by Soundex or some such.  Then they asked users to whittle it down by removing bad matches.

But the whittling down hasn't got very far.  It's very hard to do internationally.  Often you can be fairly sure that two names are distinct in England but you can't be sure they didn't get confused in America.  In other cases, variants that weren't significant in England became distinctive in America.
well, they should have actually asked people who speak each language to verify their so-called ''Soundex''.  Disaster control is needed.

15 Answers

+15 votes
 
Best answer
Hot dig it y. Thanks Chris. I just checked it out. I got all my variants on Hunnicutt. My ability to find and merge and separate those little hummers just got much easier. And even a techno defunct little hummer like me can use it.
by Anonymous Roach G2G6 Pilot (198k points)
selected by David Penfold
+19 votes
Wonderful! I think  I've wondered about how many more potential matches would appear if all variants were included and now I guess I'll find out.  For whatever reason, I don't get many suggested matches in my families, so I'm looking forwad to seeing what happens.
by Dave Dardinger G2G6 Pilot (440k points)
+16 votes
Fantastic news - thank you Chris Whitten.
by Living Bowling G2G6 Mach 6 (63.8k points)
+15 votes
That's excellent news.
by Kyle Dane G2G6 Pilot (112k points)
+15 votes
Chris,

Was working on a family name that I knew little about, saw some that were clearly duplicates as a result of the new method, seems like a nice improvement.
by Philip Smith G2G6 Pilot (339k points)
+13 votes
Best improvement ever.

It'll encourage silly variants in LNABs, but at least they won't be so disruptive.

Hopefully it'll no longer be necessary to string out minor spelling variants in OLNs.
by Living Horace G2G6 Pilot (632k points)
edited by Living Horace
+13 votes

Hurray and thanks, Chris!!

Before we all get drunk on the celebratory champagne, it's worthwhile to point out that this new arrangement will not detect all types of name variants, so we'll have to continue to search manually for some types of variants (or enter the variants in WikiTree's "Other Names" fields). Omissions I'm aware of:

  • It does not find variants for surnames that include spaces, such as Du Bois or Van Vechten or van der Walt. Also, if you enter one of these names without spaces, it will not return the form of that name that includes spaces.
  • It does not find variants for given names. (Chris' message did specify surnames, but I figure this bears repeating, because I overlook this kind of information sometimes.)
  • In the past, WeRelate would not create name variants for what it calls "rare names that have the same soundex code," because these are matched automatically at WeRelate (but not at WikiTree). As a result, many spelling variations are not in WeRelate's database, and searches here won't find them.  For example, if I search for Willem Trophagen, the system doesn't return http://www.wikitree.com/wiki/Traphagen-8 (Willem Traphagen Jr.) or http://www.wikitree.com/wiki/Traphagen-51 (Willem Traphagen). It does return http://www.wikitree.com/wiki/Traphagen-2 (Willem Jansen Traphagen), but only because Trophagen is entered in the  Other Last Names field for that WikiTree profile. It appears to me  that WeRelate has changed its policy for surnames (yay!) and http://www.werelate.org/wiki/Special:Names now allows users to associate pairs of names like Traphagen and Trophagen, but unless and until "rare" spellings are manually entered into the WeRelate database, they won't show up in searches here. (And there will be a lag time between being entered at WeRelate and appearing here. Chris may be able to tell us what the delay will be.)
by Ellen Smith G2G Astronaut (1.5m points)

Hmm... WeRelate still won't match all "rare" spellings.

When I went to http://www.werelate.org/wiki/Special:Names and tried to submit "Pooll" as a variant spelling of "Poole," I got the message "The following rare names have the same soundex code, so they will be matched automatically."

A nice dividend from my fooling around to test this nice new capability:

I found a profile for the father for one of my Unconnected people, so I've been able to connect about 18 of my Unconnecteds to the Big Tree!
Ellen,

Way to go girl!!!!

Taylor
+11 votes
Thank you!  This was one of the most-needed improvements!
by Nan Starjak G2G6 Pilot (382k points)
+9 votes

Being able to search surnames with variations is a great improvement, but I often search by first name only and refine with a birth date if I get too many hits, and that doesn't seem to be working for the past day or so.  It's great for trying to find EuroAristo pre-1500 profiles that might be duplicated under a very different LNAB or CLN that the surname variation search won't find.  I keep on getting the message below - is it related to the changes to the search function or is something else happening?  (The error message doesn't appear if I search just by surname or first name and surname)

Error

We're sorry, we have had an internal error.

Please e-mail info@wikitree.com right away to let us know you got an internal error. We want to make sure it's fixed. Be sure to include what you were doing at the time you got the error. Thank you.

We apologize for the inconvenience.

by John Atkinson G2G6 Pilot (618k points)
I think it still works if you put a * in the Last Name box.
Thanks RJ, that does work.
Thanks for reporting this, John. It has now been fixed. You shouldn't need the asterisk anymore.
Thanks Chris, that's great news.
+7 votes

I'm befuddled (not unusual!) .  I searched for some Thomas Whites from a  rather tangled family that I've been researching for months and couldn't find them. Eventually found that I could turn off the name variants and found them.

A search for Thomas White without name variants ordered by birth date results in

  1. Thomas White 1 abt 1468 England - 1549(1 is in the suffix field)
  2.  Thomas White 11 1488 Dorset, England - 25 Dec 1556
  3. Thomas White  1490 Marriot, Somerset, England - 1549
  4. Thomas White1500.
  5. Thomas White III MP  abt 1517 Poole, Dorset, England - 21 Dec 1590

Searching with variants, ordered by birth date (as per the now default)

Thomas White 1490 in Somerset is on the list  but  Thomas White 11 and 111 aren't there. I thought perhaps that the Roman numeral in the suffix field was causing a problem .But Thomas White 1500 isn't in that list either.

Tried again with the name Robert Martin. (sorted into birth order)

Without name variants there are Robert Martins, 11, 111 and 1V. Searching with name variants resulted in the disappearance of Robert Martin 11 and  Robert Martin 111  but Robert Martin 1V headed the list.

 

 

by Helen Ford G2G6 Pilot (469k points)
Helen, strange things happen when there are more than 100 profiles that match a particular search string. See http://www.wikitree.com/g2g/241854/unexpected-behavior-of-wikitree-search-function -- where I reported a similar problem in searching on the common name "Mary Hall."

It seems that when there are more than 100 profiles (particularly if there are a lot more than 100 profiles) that match the search string, and you sort the results by birthdate, you don't necessarily get the 100 earliest birthdates. To ensure appropriate results, it's apparently necessary to  limit the search to reduce the number of potential matches to about 100 or less. For Mary Hall, that meant restricting the results by date. For Thomas White, eliminating variant surname spellings apparently was sufficient to get you the results you were expecting -- but I suspect that you might have missed some other people named Thomas White who lived in that same time period.

As the database grows, this search limitation will become increasingly more of a concern.
Thank you Ellen, I think I agree with RJ on the other thread, weird! but its an important limitation to know about.
+8 votes
There’s something badly wrong with this new matching process, I was adding a new person Elizabeth Baird born 1891 in County Antrim, Northern Ireland and this is the list of potential matches I had to go through:

If any of the following appear to be a match do not proceed to create the new profile. Connect the existing profile instead.
Mary Elizabeth (Ward) Frost March 25, 1893 Cambridge, Middlesex, Massachusetts
Elizabeth Baird August 18, 1889 Marshall, Alabama, USA
Elizabeth Baird 1889 Dundee, Angus, Scotland - ~ Managed by Maureen Rosenfeld. [view] [set as spouse]
Elizabeth E (Byrd) Eddingfield July 7, 1889 Wells County, Indiana
Mary Elizabeth (Jackson) Ward September 25, 1889 Brisbane, Queensland, Australia
Edith Elizabeth (Ward) Saunt June 6, 1889 Nottingham, England - December 23, 1947
Elizabeth Ward 1893 Barwell, Leicestershire, England -
Elizabeth (Ward) Clarke September 14, 1889 Wandsworth, London, England
Elizabeth Ella (Blood) Ward 1889 - November 1968
Elizabeth (Byrd) Craig January 4, 1892 Wachapreague, Va - August 9, 1936
Alice Elizabeth (Ward) Benson September 15, 1890 Fairfield, Jefferson, IA
Zettie Elizabeth (Holley) Ward December 5, 1890 Union Parish, Louisiana
Grace Elizabeth (Merritt) Bird April 24, 1892 -
Mary Elizabeth Ward April 27, 1890 Newberry, SC - August 3, 1987
Daisy Elizabeth (Crane) Ward October 25, 1890 Roughton, Norfolk, England - November 29, 1982
Pearl Elizabeth (Holmes) Baird September 2, 1890 Tiffin, Adams, Ohio, USA - March 10, 1972
Elizabeth M Baird March 18, 1890 -
Dora Elizabeth (Ward) Moses January 9, 1892 Keota, Haskell, Oklahoma, USA - December 1980
Flough Elizabeth (Bair) Doyle January 16, 1892 Shade Gap, Huntingdon County, PA - November 28, 1973
Jennie Elizabeth (Shesong) Ward February 1, 1890 Greenville, ME - August 16, 1986
Elsie Elizabeth (Bird) Smith March 12, 1892 Cumberland, Nova Scotia, Canada - 1967
Elizabeth (Lansdown) Ward October 12, 1892 Goulburn, New South Wales - December 21, 1971
Ada Elizabeth Ward 1891 - 1892
Sarah Elizabeth Ward 1891 Hinckley, Leicestershire, England
Mildred Elizabeth (Findlay) Ward January 21, 1891 Cambridge, Middlesex, Massachusetts - April 13, 1977
Mary Elizabeth Ward February 18, 1891 Oklahoma - June 5, 1976
Pearl Elizabeth Byrd September 9, 1891 Cleveland, Conway Co., AR
Mary Elizabeth (Howell) Byrd May 3, 1891 South Carolina, USA - March 3, 1970
Elizabeth Baird March 9, 1891 -
Sara Elizabeth Ward December 1891 Hinckley, Leicestershire, England - June 1975
Elizabeth M Bird 1891 Iowa, United States
Elizabeth Lennox Baird July 31, 1891 Falkirk, Falkirk, Scotland, United Kingdom - 1978
Elizabeth (Leach) Bird 1890's - 1960's
Minnie Elizabeth Byrd December 28, 1891 United States - November 1985

Not one of them was even from the same country, surely this can be vastly improved.

Valerie
by Valerie Kerr G2G6 Mach 1 (16.1k points)
Since your visual review verified that none of these people even looked like a match, I trust that you ignored the whole lot -- and you didn't check any of the boxes to create a rejected match.
If I was so unconcerned that I would ignore the whole lot, I wouldn't have bothered making the comment in the first place.

My concern is the integrity of the tree.

I checked every box to create a rejected match, as I should as an Arborist.
Try doing Smiths.... I always used to tick every box but it was getting to ridiculous proportions with over 100 suggested matches so I ended up having to ignore most of them, in order to get anything worthwhile done. That was before this update so I shudder to think what it will be like. I only have one life but I'm chronically sick so I'm sure I'm not spending it on anything as futile.
+4 votes

Thanks for the description, Chris - -

This weeks quiz! - - perhaps someone would like to have a go with my lot [from OPC : Births - pre-1841] - had to run the search a number of times to find them, The date is the first occurrence in their data-base. . .

Problem

AndrEwartha-12 & AndrAwartha-1 : are matches : they both m: Jane Hendra-209. [Sep 2016.]

The problem is spelling in pre-1841 Cornish records and names.

Andrewartha and Trewartha are the traditional surnames - see Trewartha-7, circa 1591. There are a number of alias and aka's in the hard copy books and MS :

Andrew is a common alias. . . Andrewartha in OPC circa : 1632, Gwithian.

Andraw, circa 1715, Illogan.

ANDREW WORTHA, circa 1731, Cowan.

ANDREWWORTHA, c.1738, Breage, and,

ANDREW-WORTHA, c.1747,

ANDREWORTHA. . .

ANDRAWARTHA, c.1751, Gwinear. . .

ANDREWWARTHA, c.1760, Breage, . and

ANDREAWARTHA, c.1763. . .

ANDRAWORTHA, c.1802, Germoe. . .  [to 1864 search]

Have a go folks - onwards for ever - - - john.a

by John Andrewartha G2G6 Pilot (114k points)
+4 votes
I like it. I have been filling in "also known as" (e.g. Stirling / Sterling) this will become less necessary.

 I would like there.also to be a Filter whereby we could exclude red dots and yellow dots Which would exclude a number of probably irrelevant profiles.
by William Arbuthnot of Kittybrewster G2G6 Pilot (182k points)

If there were a date RANGE field [not a match +/- 30yr] for Birth/deaths - this could be useful for the volume of "Smith [s]" you are looking at - and not ticking "still living".

It would be good too, if the "Search Again: This will search WikiTree's database." section of the "http://www.wikitree.com/wiki/Special:SearchPerson" page also is included at the footer of the "http://www.wikitree.com/genealogy/"name" " page. - - tks and cheers.

+6 votes
Accent marks -

When I search for just the last name "Peche" I get both

John Peche
and
John (Pêche) de Pêche (several of them)

When I search for "John Peche" I get John Peche, and Peak and Pack and Peachey and so on.

But NOT Pêche.

Same problem with other accent marks.
by Janet Gunn G2G6 Pilot (158k points)
+3 votes
Any chance that this will be expanded to first names any time soon?

It's great for us Scandinavian researchers that all the eight versions of Knut/Knud - s/_ - son/sen are a match, but we also have a lot of variants for recorded first names Niels/Nils/Nels to take one well known example.
by Bjørnar Tuftin G2G6 Mach 1 (13.3k points)
Completely agree!

Also Sjur, Siver, Siur, Sivert, Syvert, Sever, Severt, possibly Siguer ...

It will never even occur to most of us Americans that those names could actually be the same, which degrades matching and merging.

Related questions

+49 votes
6 answers
+20 votes
12 answers
+9 votes
2 answers
+12 votes
4 answers
283 views asked Aug 2, 2019 in WikiTree Tech by George Fulton G2G6 Pilot (635k points)
+8 votes
1 answer
256 views asked Jul 9, 2016 in The Tree House by Chris McCombs G2G6 Mach 6 (60.1k points)
+58 votes
8 answers
+32 votes
2 answers
+3 votes
2 answers

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...