Hi WikiTreers,
There have been a lot of recent discussions about our location matching and the style of place names in our location fields. I'm sorry I haven't been more involved in these. I do want to make sure that anyone discussing these issues knows what's going on behind the scenes.
There is a long history here. Since the beginning of WikiTree we've known that at some point we would need to "normalize" location names. That is, in order to enable sorting and searching profiles by location, we would need a system for knowing when the various strings of text for place names mean the same thing as other strings of text, and ideally, when one is contained within another.
We could do simple text matching, e.g. "London" to "London", but that breaks down quickly when there is a London, England, and a London, Great Britain, and a London, Ontario, and a London, Ont., etc., etc.
In the past I was inclined to think that we would use Google's location API to match various place names with geographic coordinates. But they're not set up to handle historical place names.
We could create our own place names database. We have a start in this in our regional categories, but despite the immensity of this hierarchy and all the hours generously contributed by those who have worked on it, it would have a long way to go to be what we need. Developing this into an historical place names database with name alternates and corresponding geographic coordinates or other meta-data would be a huge undertaking, technically, and for the community. It would become a huge part of what we do at WikiTree.
A few years ago we talked to Dallan Quass about leveraging the historical place names database he started for the WeRelate wiki. Unfortunately, Dallan has moved on to other projects and isn't supporting that database any more.
There is another historical place names database that's designed for genealogy and made available for free: FamilySearch's Place Research Tool.
At this point I am inclined to think that we should use FamilySearch's database and not build our own. A massive amount of programming and volunteer hours have already gone into their database, and they appear committed to continuing the work and keeping it free for others to use.
I think that we could still adhere to our most basic style rule: to use their conventions instead of ours. We have always aimed to use names that the people themselves would have used in their time and in their language.
This could get tricky. It would be much easier to use FamilySearch's database if we just wanted to use English names. But they do have place names in other languages and they're continuing to build on those.
Regarding how we would use their database, the first thing we'd aim to do would be use their API to automatically suggest and hopefully automatically fill-in location names when creating or editing profiles.
This would make it easier for WikiTreers to use FamilySearch's place names -- to avoid typos and use standardized versions.
We would probably also create a tool that makes it easy to search your Watchlist for non-standard location names that you might want to edit. (If Aleš doesn't beat us to it, which he probably would. See http://www.wikitree.com/g2g/249287/location-errors.)
It's unlikely that we would force you to choose one of FamilySearch's place names when entering a location. The place name you're entering might not exist in their database. If it doesn't, hopefully there would be (already is?) an easy mechanism for feeding it back to FamilySearch, but that's their domain not ours.
But -- and I suppose this gets to the real question I'm asking the community -- we would be adopting their place name style as our official style.
For example, the FamilySearch place names database has "London, Middlesex, Ontario, Canada." That would become WikiTree's style for writing London, Ontario. If you entered "London, Ontario" or whatever else on a profile we would aim to someday change it to "London, Middlesex, Ontario, Canada." Not that it would be necessary in this case, because FamilySearch's database can match those two, but the latter would be the ideal way to write it.
For more on what style rules mean, see http://www.wikitree.com/wiki/Style_FAQ, e.g. "Is it forbidden to break the style rules?"
Back to the technical aspects: By using FamilySearch's standardized names, we could later use FamilySearch's database to index place names for searches and sorting. This means more than just better text matching. FamilySearch connects their location names to geographic coordinates, and has them organized in hierarchies. Theoretically we could do what FamilySearch does (and what Ancestry may be doing, using FamilySearch's database), e.g. match someone born in London, Ontario with someone born in Ontario, or just born in Canada, etc., or with someone born in London, Canada West, British Colonial America, etc.
Thoughts?
Thanks!
Chris