DB Schema expansion: Name table

Question

DB Schema expansion: Name table

1.1k views

Hi! This post will be moderately technical in nature. Originally I was planning on posting it the appropriate Google+ group, but [[Whitten-1]] suggested that I should do it here.

The problem: For many a millenia our ancestors were not aware that English is going to become lingua franca when WikiTree will come around. So those ancestors didn't bother to write down their names in English in the format First Name Last Name. Consequently WikiTree doesn't really work nicely with name formats used in different cultures.

Solutuion: Extend (not change) WikiTree to permit support of names in virtually any format.

How?

High-level design: permit storage of names suitable for every culture, in every language.

Technical implementation details:

1. Add to existing WikiTree DB schema new [names] table with the following structure:
ProfileID - FK to the main profile table
Name - the actual name value being added
NameASCII - name value using English letters without extended char set
LanguageID - FK to languages table
NameTypeID - FK to name types table (First, Last, Patronym, Homestead/Farm, Father's, Clan, NamePrefix [Mr., Dr., Rev., King, etc.], NameSuffix [Jr., Sr., XVII, etc.])
DisplayOrder - the position in the array of the full formal name to show as Profile's header

2. Extend profile entry forms to allow unlimited amount of "name" fields in the order specified.

3. Extend search functionality to perform name searches on [names].[name] column which would return distinct ProfileID values, from which the result set will be constructed.

4. Begin extending functionality for full Unicode support

asked Jan 9, 2018 in WikiTree Tech by Patrick Munits G2G6 Mach 1 (12.6k points)
retagged Jan 9, 2018 by Ellen Smith

A clarification with regards to a form where these additional names can be entered by end users.

There really needs to be just two text boxes - the actual name in being added and it's romanized version. Plus there would three drop down boxes:
1. Language
2. Name Type
3. Display order (with default option - Hidden, followed by 1,2,3,...)

Upon entry of a new name row a label control can right away show how everything would be shown as end result. I think this limited number of fields and instant feedback will be sufficiently easy to grasp by a user who already knows how to fill out the basic profile form.

Furthermore, limiting values that describe type of name, and language will control what users can add. This would also allow to gather some basic usage statistics by language.

The order of the display of the name value can also be limited to some sane number (like ten).

This richtextbox web control has a lot more options than the new [names] user control/datagrid/section/iframe/whatever that I propose. Users figured out how to comment in G2G, so I think they can just as easily figure out how to use the expanded names options. I do agree that absolutely must be very intuitive.

commented Jan 10, 2018 by Patrick Munits G2G6 Mach 1 (12.6k points)

If WikiTree moved to a structure like this, I'd be ecstatic, and I'd be posting to every Hungarian genealogy board/list that I know of, suggesting that people switch.

A few thoughts in response to objections or thoughts raised in various other comments and answers:

The problem of users creating utter nonsense because they're given too much freedom and not enough knowledge could be addressed with suggested templates: "here's a set of fields and rules that work for many [NamingCultureX] names. If you need different fields, you can choose a different template or use the Customize button." (Or "We don't have a template yet for [NamingCultureX]. Here's a generic template that you can change as needed using the Customize button.") This would also address the objection that a totally open-ended structure fails to capture our collective knowledge about names.

The problem of multiple names, such as a hypothetical lady married five times (who changed her name each time just to spite her descendants), could be solved with multiple name entries, sequentially numbered. I don't know enough about database design to determine whether all of the names should count as a single "name" entry/entity, or as six separate entries/entities.

I think the ordering of elements should be entered as a function of the chosen display language. For example, you could tell it that if the display is set to English, it should show "Eugene Ormandy" or "András Schiff", and if it is set to Hungarian, it should show "Ormándy Jenő" and "Schiff András". Notice that this involves some knowledge about how each person's name was used: in Ormandy's time, it was still customary to translate given names, but by Schiff's time, this was no longer the case. One language/ordering could be set as "preferred" or "default", to be used when the display is set to a language not specifically addressed in the person's name.

My one worry with this scheme is in regards to searching: how would the search engine deal with name elements that can go in multiple categories, such as unmarked patronymics? If you searched for the surname Thomas, would all the people with Thomas as a given name royally screw you up?

commented Feb 16, 2018 by J Palotay G2G6 Mach 8 (87.3k points)

9 Answers

Best answer

An addition of [namesLocations] with many-to-many relationship can easily accommodate what you’re asking for. I suppose there’s no such table now, therefore it would be an extension that expands the functionality without breaking existing features.

Consider a scenario where a newly-minted genealogist searches for a platform to start building their tree. Let's say that enthusiast knows that she/he is distantly related to some famous person. They come across WikiTree. What would they do first? Probably plug in the name of their famous relative. Would you like to try it?

Princess Diana – fail, no results, even though she’s here: - [[https://www.wikitree.com/wiki/Spencer-40| Diana Frances (Spencer) Princess of Wales]

Чингис хаан (Mongolian) – fail, even though millions of people are his descendants, and of course he’s here: [[https://www.wikitree.com/wiki/Khan-12| Temüjin Khan]]

Ленин (Russian) – fail, however a billion million people over 5 generations knew this person by this name – [[https://www.wikitree.com/wiki/Lenin-2| Влади́мир Ильи́ч Ilyich (Lenin) Ulyanov]]

Prince – fail, he’s not in the result set for search by first name! Billions have heard of him but have no clue about his LNAB – [[https://www.wikitree.com/wiki/Nelson-10329| Prince Rogers Nelson]]

There’s absolutely no way that the above shown examples should be accepted as the norm. WikiTree will not become universally accepted as The Family Tree site until the very basic needs of entering names appropriate for every culture will be supported. One can go to Wikipedia, type the above search terms and get exactly what they search for. WikiTree can’t yet do that. Let’s change the situation! What prevents us from making this functionality enhancement in 2018?

answered Jan 9, 2018 by Patrick Munits G2G6 Mach 1 (12.6k points)
selected Jan 12, 2018 by Phil Grace

Answer 1 · 2018-01-09T05:46:56+0000

Database schemas were denormalized in the past because:

1. Users were local and there was rarely a requirement to support more than one language, and extremely rare more than two languages. Needs, therefore could be accommodate by an addition of just another column(s) in the existing table. WikiTree is a global platform, it’s essential for it to support multiple languages for the same profile. The route of WikiPedia where separate articles (equivalent of WikiTree’s profile) exist for each entry (profile), I don’t believe is suitable for WikiTree needs.
2. It was a nightmare to attempt to design a GUI form that supports dynamic entry fields. Over the past decade it became infinitely easier to do so on web forms. We should embrace this ability for the needs of genealogists.

commented Jan 9, 2018 by Patrick Munits G2G6 Mach 1 (12.6k points)

Answer 2 · 2018-01-09T13:26:19+0000

Hallelujah! All praise the great Whitten-1!

This is a bigger task than perhaps you realize. Could it form a project? (I mean for all the discussions, not the coding.)

I would begin by collecting base cases from various cultures and languages which don't fit into the (given name, surname) paradigm. I can easily think of a few off the top of my head but native speakers should really be involved in the conversation. Once you have an idea of all the wacky and wonderful ways names can be formed and expressed you should be able to design a sufficiently flexible framework. I'm already sure some edge-cases will have to be dropped but the objective should at least be to represent all standard forms.

Another difficult area is going to be nailing down precisely what is and what is not part of a name, especially as far as prefixes and suffixes are concerned. Some people here (only Americans) get very upset if you suggest that someone's ancestor's military rank is not actually part of their name. My personal dislike is the adding of post-nominal letters (for degrees or awards) in the suffix field. All these sorts of issues will have to be argued over.

answered Jan 9, 2018 by Matthew Fletcher G2G6 Pilot (132k points)

Matthew, Patrick's design is elegant, from a database architecture view. Perhaps, for G2G discussion, his description could stand some expansion of the data item descriptions to explain all the functionality that people fluent in technobabble immediately see there.

No cases need to be collected from any culture or languages. No definitions of name elements need to be defined. Let me see if I can make a start at an explanation for the non-geeks among us.

All the name elements would be stored separately from the rest of the profile data so that there could be any number of Name fields in any profile.
When editing a profile, you would be able to add as many name fields as you want and assign each field whatever label you want.
Patrick included an added enhancement of also being able to specify which name fields are used, and in what order, when the person's name is displayed.

Thus, each profile would be able to have its own composition of name fields, without any need whatsoever for any community debate about what fields are needed to cover what cultural requirements. This is absolutely brilliant!

commented Jan 9, 2018 by Gaile Connolly G2G Astronaut (1.2m points)

A names table with many to one relationship to ProfileID (no clue what the actual column name in the main table is) can accommodate most wacky name combinations.

For example, take Bill Gates. His formal name is William Henry Gates III. Here's how this info can be stored in the new names table:
"Gates-1183", "Henry", "Henry", English-FK, MiddleNameType-FK, 2
"Gates-1183", "III", "3", English-FK, GenerationalType-FK, 4
"Gates-1183", "William", "William", English-FK, FirstNameType-FK, 1
"Gates-1183", "Gates", "Gates", English-FK, LastNameType-FK, 3
"Gates-1183", "Bill", "Bill", English-FK, FirstNameType-FK, 0

So the above can then display name as such for users who selected English interface of WikiTree: William Henry Gates III

Furthermore any combinations of the following search parameter would locate the appropriate WikiTree profile:
"Bill Gates"
"Henry Gates III"
"William Gates"
"Gates"

commented Jan 9, 2018 by Patrick Munits G2G6 Mach 1 (12.6k points)
edited Jan 9, 2018 by Patrick Munits

Yup, that's what I'm saying. "We" can encompass the WikiTree decision makers about system design, or a group of WikiTree members who offer input to it, but it can never encompass enough people to represent an omniscient body with regard to all languages ever in use or all styles of naming people utilized by all cultures since the start of humanity. Patrick's proposal allows for that - anyone can define a combination of name fields and populate them in any character set included in Unicode and, furthermore, this can be done on the granular level of each profile.

EDITED TO ADD:
Schemas are very efficient for very many things, but there could never be a single schema that is all inclusive of every naming convention used by every culture in the history of the world.

commented Jan 9, 2018 by Gaile Connolly G2G Astronaut (1.2m points)
edited Jan 9, 2018 by Gaile Connolly

Another example: Peter the Great. The ruler of Russia didn't even have an official surname as per Wikipedia. You can find him in Wikipedia, but on WikiTree you have to jump through hoops before you can locate him. Here's how the entries for him could appear in the [names] table:

"Romanov-63", "Peter", "Peter", English-FK, FirstName-FK, 1
"Romanov-63", "the", "the", English-FK, TitlePrefix-FK, 2
"Romanov-63", "Great", "Great", English-FK, Title-FK, 3
"Romanov-63", "Пётр", "Peter", Russian-FK, FirstName-FK, 1
"Romanov-63", "I", "1", Russian-FK, LastName-FK, 2
"Romanov-63", "Алексеевич", "Alexeyevich", Russian-FK, Patronym-FK, 0
"Romanov-63", "Романов", "Romanov", Russian-FK, LastName-FK, 0

Here's how the name would be shown to English speaking users of WikiTree: Peter the Great. And here's how it would appear to Russian users: Пётр I. Same profile would be shown correctly to either culture, in the way they expect the name to be shown.

I would also support an addition of another table that can help link appropriate sources the assigned names. Perhaps this will then calm down the discussions about how names of ancient European rulers (who dared to live in different countries) should be shown.

commented Jan 9, 2018 by Patrick Munits G2G6 Mach 1 (12.6k points)

For backward compatibility there's no need to extract and parse names from the existing main profile table into the new table. If there are names in the [names] table then they can be used for the profile title. If no names have been added, then names from the main table will be used.

The overall count of profiles is still very small, so an introduction of proposed functionality would only seem drastic for another year. I think if we can tackle the name issue then users will embrace WikiTree in ever greater numbers.

Initial roll-out can be limited to a few main forms, with gradual roll-out as time permits. Sure, there's the merger app, the enhanced search functionality, etc. However, as far as there's a work around for existing web forms, the new functionality can be implemented and deployed.

Of course, I admit, I don't know how the feature roll out process works at WikiTree. But I'd like to learn that, if possible, and see whether I can be of assistance, limited unfortunately, but still I'd like to see this happen.

commented Jan 9, 2018 by Patrick Munits G2G6 Mach 1 (12.6k points)
edited Jan 9, 2018 by Patrick Munits

DisplayOrder column in the proposed [names] table would take care of our German friends.

Anything with a value of zero would not be shown in the preferred name display. Any other numeric value would instruct the name parsing algorithm how to assemble the name per each provided LanguageID value. Users would then be able to control in what languages, in what order in each language, and what part of the name in each language WikiTree would display the name. All of this without infringing on the original schema design, and names stored in the main profile table.

I've done quite a bit of development on dinosaur IT systems. Sure, I haven't worked on a mainframe that's been in production for fifty years, but I do have experience making twenty-year-old apps run like chickens with front-end web interfaces. There's no way WikiTree platform can be considered archaic and hard to modernize.

commented Jan 9, 2018 by Patrick Munits G2G6 Mach 1 (12.6k points)

But are you an experienced genealogist? You seem to be real excited about the database design. But do you know the problems that come up when doing genealogy? Take this one part:

2. Extend profile entry forms to allow unlimited amount of "name" fields in the order specified.

If you give users to much free reign they will screw things up royally. They already try to enter junk and now the error program catches some. That's why the gedcom designers kept it simple: prefix, given, surname, suffix. I can't see any way or reason to improve on that. Gaille seems to want a large textarea with no structure but the whole purpose of databases over word processors is to impose structure on the data.

And with that I'm done - peace out.

commented Jan 9, 2018 by Living Anonymous G2G6 Mach 5 (51.7k points)

Mikey, your concern is very valid and would be addressed by the following:

The Language ID value is a foreign key. Users can only select a value from a prepopulated by admins drop-down box.

The NameType ID value is a foreign key. Once again, only pre-populated value can be selected from a drop-down box.

Finally, as I have already proposed in a earlier reply, each entry should really be linked to a "source" entry. A name entry without a source can then be periodically marked up by a bot to alert profile manager(s). Then afterwards, a month or so, the same bot can delete those entries. Isn't this already not an existing process, that can be extended to take care of the new table as well? A bot wouldn't even need to understand what's in the name entry. If it's not sourced for too long then it gets purged.

As a developer I strongly believe that if a user is permitted to screw up something then it most definitely, absolutely guaranteed will happen. Therefore, I strive to address such possibilities at the design stage.

commented Jan 9, 2018 by Patrick Munits G2G6 Mach 1 (12.6k points)

Answer 3 · 2018-01-09T21:00:03+0000

This proposal appears to address most of the concerns with respect to name display (I'm purposely cautious here, I don't see a situation that could not be covered right now, but who knows, maybe there is still something lurking out there that is not coming to me).

One question, though, remains with respect to name order: Let's say we take Mao Zedong, in Chinese he would be 毛澤東, in English Mao Zedong preserving the order family name first, given name last. However, Kung Hsiang-hsi in English, 孔祥熙 in Chinese, but after he moved to the USA he went by H. H. Kung. The former pro football player known in English as Dat Nguyen was born Nguyễn Tấn Đạt in Vietnamese.

A straight forward rule South-East Asian name order for South-East Asian languages and for English does not work as some people maintain the name order and others change to Western order. Can this be addressed on an individual profile base?

answered Jan 9, 2018 by Helmut Jungschaffer G2G6 Pilot (604k points)

The following is for Kung Hsiang-hsi:
"孔-2", "Dr.", "Dr.", English-FK, PrefixName-FK, 1
"孔-2", "Kung", "Kung", English-FK, LastName-FK, 3
"孔-2", "H.H.", "H.H.", English-FK, FirstName-FK, 2
"孔-2", "Hsiang-hsi", "Hsiang-hsi", English-FK, FirstName, 0
"孔-2", "孔", "Kung", Chinese-FK, LastName, 1
"孔-2", "祥熙", "Hsiang-hsi", Chinese-FK, FirstName, 2
"孔-2", "Kǒng", "Kung", pinyin-FK, LastName, 1
"孔-2", "Xiángxī", "Hsiang-hsi", pinyin-FK, FirstName, 2

The above would be then parsed and displayed as:
Dr. H.H. King (Chinese: 孔祥熙; pinyin: Kǒng Xiángxī)

Also note that since "Hsiang-hsi" is stored in the [names] table it would still return correct profile id among search results for "Hsiang-hsi" when used as First Name search parameter.

commented Jan 10, 2018 by Patrick Munits G2G6 Mach 1 (12.6k points)

Answer 4 · 2018-01-10T02:38:00+0000

Here are a few items that proposed as-is schema doesn't address:

What determines the order of languages when displaying the names? English probably should always be first, if present. Then it can be in the order entered, but names retrieval from the table will not necessarily retrieve in that order (unless there's also a ROWID for each row in [names]). Many Ashkenazi Jews who lived between 1915 and 1995 had official documents issued in as many as seven different languages throughout their life time. Theoretically it would be great to show all these variations. However, with the proposed schema it not possible to control the order of display. I don't know if it's a critical requirement though.
How to enter both preferred and official names in the same language while clearly distinguishing them as separate name sets? For example, Samuel Langhorne Clemens also known as Mark Twain. The schema can either show Samuel Langhorne Clemens or Mark Twain or Samuel Langhorne Clemens Mark Twain. None of this is ideal. A possible work around can be an introduction of a special name type such as "Name Separator" with value "a.k.a". So then an entry for Mark Twain would look like:

"Clemens-1", "Samuel", "Samuel", English-FK, FirstName-K,1
"Clemens-1", "Langhorne", "Langhorne", English-FK, MiddleName-FK,2
"Clemens-1", "Clemens", "Clemens", English-FK, LastName-FK,3
"Clemens-1", "a.k.a.", "a.k.a.", English-FK, AKA-FK,4
"Clemens-1", "Mark", "Mark", English-FK, FirstName-FK,5
"Clemens-1", "Twain", "Twain", English-FK, LastName-FK,6

The above can then be shown more properly as:
Samuel Langhorne Clemens a.k.a. Mark Twain

Answer 5 · 2018-01-10T12:30:17+0000

Today I decided to take a look at the de-facto standard way IT attempts to give structure to genealogy as a science. That glue is GEDCOM.

GEDCOM 5.5 is correct in defining that an Individual has PersonalName(s). However, that same GEDCOM version is fundamentally wrong in its rigid declaration that PersonalName element consists of FirstName, LastName, and Title elements. GEDCOM 5.5 is fundamentally wrong because a person doesn't require FirstName and LastName to be born and to exist.

A human who was born might have lived their life (however long or short it was) without ever having an official FirstName or LastName. This doesn't mean that a genealogical application shouldn’t be capable of having a profile for such a person.

GEDCOM X addresses this shortcoming in the following way. The Name element no longer makes a claim that a person must have FirstName and LastName. Instead it says that it can hold a value that represents some type of a name.

Consequently, I’m most certainly not the first IT-person slash genealogist to declare that an individual doesn’t have to have a formal First and Last names, and by extension can actually have many variations of titles that constitute variations of names in various cultures.

The proposed schema extension for WikiTree use addresses this need to correct the problem in GEDCOM 5.5 without breaking it, and at the same time likely makes it a lot more compatible with various newer GEDCOM XSDs proposals.

Answer 6 · 2018-01-10T13:13:06+0000

An excellent idea. Some random thoughts.

People change their names for a variety of reasons. Could we distinguish a name change with optional effective date range, reason and a field to hold the source. Honorific titles can be granted too and change how someone is known.

Romanisation. Not sure how well that works. Also somewhat biased to the Latin script. Different languages written in the Latin script may Romanise another script differently depending on how each letter in the Latin alphabet is pronounced in tbat language. It normally attempts to transcribe the pronunciation of the other script. Not the same as someone picking an English surname for themselves when they come into the USA for instance. (They might translate their native surname into English or arbitrarily pick a new one)

Personal and family name order of East Asian people in English, can be either way around depending on the individual. At least one national leader currently has his name given family first in the Western media.

In some parts of the world a title can indicate someone's religion.

There can be clan names used in someone's name.

I've noticed some arabian royalty use a string of patronymics which indicates a path of descent on the male line, rather than just their father.

In Spain the child gets their father's first surname as their first surname and their mother's first surname as their second surname.

People may have nicknames, nom de plume, nom de guerre, stage name, regal name. Murderers may get a popular name that they retain after their identity is discovered.

People may have their names translated if the literary language is not the same as the spoken language. E.g. English William might be Gulielmus in a legal document written in Latin.

Some spoken languages don't have a written form, though I imagine the number is decreasing. Use international phonetic alphabet if pronunciation is known?

Where names are often duplicated, the local community may add a tag to distinguish them. Jones the Steam, the Briton.

Regards,

Tim

Answer 7 · 2018-02-14T17:07:21+0000

Hi Patrick,

Great post. I have added a link to it on our Team to-do list so we can look at your suggestions when we next work on improvements on the profiles pages. Thanks!

Categories

DB Schema expansion: Name table

Please log in or register to add a comment.

Please log in or register to answer this question.

9 Answers

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Related questions