DB Schema expansion: Name table

+33 votes
743 views
Hi! This post will be moderately technical in nature. Originally I was planning on posting it the appropriate Google+ group, but [[Whitten-1]] suggested that I should do it here.
 

The problem: For many a millenia our ancestors were not aware that English is going to become lingua franca when WikiTree will come around. So those ancestors didn't bother to write down their names in English in the format First Name Last Name. Consequently WikiTree doesn't really work nicely with name formats used in different cultures.

Solutuion: Extend (not change) WikiTree to permit support of names in virtually any format.

How?

High-level design: permit storage of names suitable for every culture, in every language.

Technical implementation details:

1. Add to existing WikiTree DB schema new [names] table with the following structure:
ProfileID - FK to the main profile table
Name - the actual name value being added
NameASCII - name value using English letters without extended char set
LanguageID - FK to languages table
NameTypeID - FK to name types table (First, Last, Patronym, Homestead/Farm, Father's, Clan, NamePrefix [Mr., Dr., Rev., King, etc.], NameSuffix [Jr., Sr., XVII, etc.])
DisplayOrder - the position in the array of the full formal name to show as Profile's header

2. Extend profile entry forms to allow unlimited amount of "name" fields in the order specified.

3. Extend search functionality to perform name searches on [names].[name] column which would return distinct ProfileID values, from which the result set will be constructed.

4. Begin extending functionality for full Unicode support
in WikiTree Tech by Patrick Munits G2G6 Mach 1 (11.0k points)
retagged by Ellen Smith

As per a comment posted by Matthew Fletcher further down the thread, the column NameASCII in reality corresponds to <NAME_ROMANIZED_VARIATION>, I just didn't know how it's called. Having this column in the [names] table would permit future advanced search functionality using transliterated strings. Furthermore, if that can be linked to things like Daitch–Mokotoff Soundex then the name search would be able to dig up all types of crazy wacky spellings without the need for resource-heavy Unicode string search engines.

A clarification with regards to a form where these additional names can be entered by end users.

There really needs to be just two text boxes - the actual name in being added and it's romanized version. Plus there would three drop down boxes:
1. Language
2. Name Type
3. Display order (with default option - Hidden, followed by 1,2,3,...)

Upon entry of a new name row a label control can right away show how everything would be shown as end result. I think this limited number of fields and instant feedback will be sufficiently easy to grasp by a user who already knows how to fill out the basic profile form.

Furthermore, limiting values that describe type of name, and language will control what users can add. This would also allow to gather some basic usage statistics by language.

The order of the display of the name value can also be limited to some sane number (like ten).

This richtextbox web control has a lot more options than the new [names] user control/datagrid/section/iframe/whatever that I propose. Users figured out how to comment in G2G, so I think they can just as easily figure out how to use the expanded names options. I do agree that absolutely must be very intuitive.
This is a really elegant solution to a problem that is not easily solved, nor supported by most web applications, even if they have a global user base.

I'm also impressed by the obvious time and effort you put into this, including thinking through a variety of use cases.

This would be a welcome change to any web application. Lucky us if this enhancement idea becomes reality!
Thanks. I will wait until this is solved. It would be great to have the ability to search under the complete names and know that you will find the right person, and if not,  that they're really not there. A lot of old Arabic names became standardized/romanized? in Spanish, such as el-Andalus or al-Andalus, for instance. I'm sure many Indigenous (globally) names will emerge that will require similar capabilities on Wiki Tree.
Please don't call them "last name" and "first name". It makes for an endless loop of confusion when dealing with surname-first languages: if I mark the language as Hungarian, does that make the "last name" label into a synonym of "given name", or does it stay with its English usage of "surname"?
my proposal would eliminate the need for a ridgid storage and presentation of names. Each part of the whole name would be stored individually, and presentation layer can then customized based on each culture's customs, while clearly and correctly identifying each individual part of the complete name with all the titles, prefixes and suffixes in as many languages as necessary.

my understanding is that such functionality is not within immediate interests of the management of WikiTree. Since it is also not an open platform the expansion of functionality cannot be initiated and completed without them buying into the idea.
If WikiTree moved to a structure like this, I'd be ecstatic, and I'd be posting to every Hungarian genealogy board/list that I know of, suggesting that people switch.

A few thoughts in response to objections or thoughts raised in various other comments and answers:

The problem of users creating utter nonsense because they're given too much freedom and not enough knowledge could be addressed with suggested templates: "here's a set of fields and rules that work for many [NamingCultureX] names. If you need different fields, you can choose a different template or use the Customize button." (Or "We don't have a template yet for [NamingCultureX]. Here's a generic template that you can change as needed using the Customize button.") This would also address the objection that a totally open-ended structure fails to capture our collective knowledge about names.

The problem of multiple names, such as a hypothetical lady married five times (who changed her name each time just to spite her descendants), could be solved with multiple name entries, sequentially numbered. I don't know enough about database design to determine whether all of the names should count as a single "name" entry/entity, or as six separate entries/entities.

I think the ordering of elements should be entered as a function of the chosen display language. For example, you could tell it that if the display is set to English, it should show "Eugene Ormandy" or "András Schiff", and if it is set to Hungarian, it should show "Ormándy Jenő" and "Schiff András". Notice that this involves some knowledge about how each person's name was used: in Ormandy's time, it was still customary to translate given names, but by Schiff's time, this was no longer the case. One language/ordering could be set as "preferred" or "default", to be used when the display is set to a language not specifically addressed in the person's name.

My one worry with this scheme is in regards to searching: how would the search engine deal with name elements that can go in multiple categories, such as unmarked patronymics? If you searched for the surname Thomas, would all the people with Thomas as a given name royally screw you up?
A separate advanced/extended search form can be easily created where users would be able to specify whether Thomas is first or last name when the search is done.

Furthermore, it would then be possible to write the search algorithm in a way where a query for "Ludovic XIV" would actually be able to find [[Bourbon-106]], which is not the case today!
when all parts of the name are stored separately and none are missing,  because database design is not lacking, then it is possible to write the search algorithm in such a way that it will show on top the results that more closely match the entered search parameters. So a search for Ludovic XIV would then show  Louis Dieudonné (Bourbon) de France as the first result. Imagine the shock when one would actually get what they are looking for without knowing the proper name of the person they are currently expected to enter.
I figure y'all are already familiar with the famous blog post "Falsehoods Programmers Believe About Names", right? Here's a version with examples: https://shinesolutions.com/2018/01/08/falsehoods-programmers-believe-about-names-with-examples/

(I'm tempted to come up with a "Falsehoods Genealogists Believe about Names" version.)
All forty items from the article can be supported by the proposed schema. Yes, even #40 and #11 if there's a character that can be entered as a character.

Twenty years ago I wrote a name search component for BestBuy's system. I then designed and expanded existing Customer Information Systems at other companies in both USA and Canada. On a more personal front I have seen the names/surnames of my immediate relatives written in official documents in English, Russian. Old Russian, Latvian, Old Latvian, German, Old german, French, Hebrew, Yiddish, Ukrainian, and Lithuanian.

So, my proposal is backed by a bit of experience doing this stuff for a couple of decades.
Is there any progress on this idea? Is it even under consideration at all?
No progress yet, but it is on the suggestions list.

9 Answers

+17 votes
 
Best answer
An addition of [namesLocations] with many-to-many relationship can easily accommodate what you’re asking for. I suppose there’s no such table now, therefore it would be an extension that expands the functionality without breaking existing features.

Consider a scenario where a newly-minted genealogist searches for a platform to start building their tree. Let's say that enthusiast knows that she/he is distantly related to some famous person. They come across WikiTree. What would they do first? Probably plug in the name of their famous relative. Would you like to try it?

Princess Diana – fail, no results, even though she’s here: - [[https://www.wikitree.com/wiki/Spencer-40| Diana Frances (Spencer) Princess of Wales]

Чингис хаан (Mongolian) – fail, even though millions of people are his descendants, and of course he’s here: [[https://www.wikitree.com/wiki/Khan-12| Temüjin Khan]]

Ленин (Russian) – fail, however a billion million people over 5 generations knew this person by this name – [[https://www.wikitree.com/wiki/Lenin-2| Влади́мир Ильи́ч Ilyich (Lenin) Ulyanov]]

Prince – fail, he’s not in the result set for search by first name! Billions have heard of him but have no clue about his LNAB – [[https://www.wikitree.com/wiki/Nelson-10329| Prince Rogers Nelson]]

There’s absolutely no way that the above shown examples should be accepted as the norm. WikiTree will not become universally accepted as The Family Tree site until the very basic needs of entering names appropriate for every culture will be supported. One can go to Wikipedia, type the above search terms and get exactly what they search for. WikiTree can’t yet do that. Let’s change the situation! What prevents us from making this functionality enhancement in 2018?
by Patrick Munits G2G6 Mach 1 (11.0k points)
selected by Phil Grace
+7 votes
I like (and understand) this idea!
by Richard Shelley G2G6 Pilot (220k points)
To whoever flagged this answer:

Many people don't understand the purpose of flags - they are to alert the sys-ops that something inappropriate has been posted so that they can take action and remove it before it does any (or any more) damage.

If you did not flag this because you don't think it belongs here then you can click the red flag to remove it.

WARNING - ONLY the person who flagged it can do that - if a second (different) person clicks the flag then the item will IMMEDIATELY be hidden from view.
+2 votes
GEDCOM 5.5 handles names fine; been around since the 90s, every genealogy program supports the basics.  It can and is routinely converted back and forth to and from various db formats.

What would be a bigger boost IMO is getting location search combined with names and dates.
by M Anonymous G2G6 Mach 4 (47.2k points)
edited by M Anonymous
Gedcom 5.5 does not handle the use of multiple language names in the manner in which Patrick is suggesting.
An addition of [namesLocations] with many-to-many relationship would accommodate the data storage part of what you’re asking for. I suppose there’s no such table now, therefore it would be an extension that expands the functionality without breaking existing features.

Sorry Mikey but this isn't the case. I just looked up the Gedcom format for names. It's sufficiently rich that most names could be represented but it's very inadequate for a truly global system.

PERSONAL_NAME_STRUCTURE:=
n NAME <NAME_PERSONAL>
+1 TYPE <NAME_TYPE>
+1 <<PERSONAL_NAME_PIECES>>
+1 FONE <NAME_PHONETIC_VARIATION>
+2 TYPE <PHONETIC_TYPE>
+2 <<PERSONAL_NAME_PIECES>>
+1 ROMN <NAME_ROMANIZED_VARIATION>
+2 TYPE <ROMANIZED_TYPE>
+2 <<PERSONAL_NAME_PIECES>>

PERSONAL_NAME_PIECES:=
n NPFX <NAME_PIECE_PREFIX>
n GIVN <NAME_PIECE_GIVEN>
n NICK <NAME_PIECE_NICKNAME>
n SPFX <NAME_PIECE_SURNAME_PREFIX

Great for IBM computers in the 1970s perhaps but an XML schema/object database? Not really.

Database schemas were denormalized in the past because:

1. Users were local and there was rarely a requirement to support more than one language, and extremely rare more than two languages. Needs, therefore could be accommodate by an addition of just another column(s) in the existing table. WikiTree is a global platform, it’s essential for it to support multiple languages for the same profile. The route of WikiPedia where separate articles (equivalent of WikiTree’s profile) exist for each entry (profile), I don’t believe is suitable for WikiTree needs.
2. It was a nightmare to attempt to design a GUI form that supports dynamic entry fields. Over the past decade it became infinitely easier to do so on web forms. We should embrace this ability for the needs of genealogists.
I can have alternate names just fine in Gramps and export them out in GEDCOM. The problem is wikitree does not have a name table structure. Name is just some fields in the profile. I surely recommend the wikti tree to be fully compatible to the standards, especially when the problem is already solved in the standard. see:

https://www.wikitree.com/g2g/554535/gedcompare-ignores-alternative-names
+4 votes
Hallelujah! All praise the great Whitten-1!

This is a bigger task than perhaps you realize. Could it form a project? (I mean for all the discussions, not the coding.)

I would begin by collecting base cases from various cultures and languages which don't fit into the (given name, surname) paradigm. I can easily think of a few off the top of my head but native speakers should really be involved in the conversation. Once you have an idea of all the wacky and wonderful ways names can be formed and expressed you should be able to design a sufficiently flexible framework. I'm already sure some edge-cases will have to be dropped but the objective should at least be to represent all standard forms.

Another difficult area is going to be nailing down precisely what is and what is not part of a name, especially as far as prefixes and suffixes are concerned. Some people here (only Americans) get very upset if you suggest that someone's ancestor's military rank is not actually part of their name. My personal dislike is the adding of post-nominal letters (for degrees or awards) in the suffix field. All these sorts of issues will have to be argued over.
by Matthew Fletcher G2G6 Pilot (107k points)

Matthew, Patrick's design is elegant, from a database architecture view.  Perhaps, for G2G discussion, his description could stand some expansion of the data item descriptions to explain all the functionality that people fluent in technobabble immediately see there.

No cases need to be collected from any culture or languages.  No definitions of name elements need to be defined.  Let me see if I can make a start at an explanation for the non-geeks among us.

  1. All the name elements would be stored separately from the rest of the profile data so that there could be any number of Name fields in any profile.
  2. When editing a profile, you would be able to add as many name fields as you want and assign each field whatever label you want.
  3. Patrick included an added enhancement of also being able to specify which name fields are used, and in what order, when the person's name is displayed.

Thus, each profile would be able to have its own composition of name fields, without any need whatsoever for any community debate about what fields are needed to cover what cultural requirements.  This is absolutely brilliant!

You seem to be saying you don't want all our knowledge of how names actually work encapsulated by the framework. Without that it's very unrealistic to expect software to render names nicely or allow sophisticated interaction (searching).

Data serialization seems easy if a schema is nailed down. Doesn't it just come 'for-free' these days?
A names table with many to one relationship to ProfileID (no clue what the actual column name in the main table is) can accommodate most wacky name combinations.

For example, take Bill Gates. His formal name is William Henry Gates III. Here's how this info can be stored in the new names table:
"Gates-1183", "Henry", "Henry", English-FK, MiddleNameType-FK, 2
"Gates-1183", "III", "3", English-FK, GenerationalType-FK, 4
"Gates-1183", "William", "William", English-FK, FirstNameType-FK, 1
"Gates-1183", "Gates", "Gates", English-FK, LastNameType-FK, 3
"Gates-1183", "Bill", "Bill", English-FK, FirstNameType-FK, 0

So the above can then display name as such for users who selected English interface of WikiTree: William Henry Gates III

Furthermore any combinations of the following search parameter would locate the appropriate WikiTree profile:
"Bill Gates"
"Henry Gates III"
"William Gates"
"Gates"
Yup, that's what I'm saying.  "We" can encompass the WikiTree decision makers about system design, or a group of WikiTree members who offer input to it, but it can never encompass enough people to represent an omniscient body with regard to all languages ever in use or all styles of naming people utilized by all cultures since the start of humanity.  Patrick's proposal allows for that - anyone can define a combination of name fields and populate them in any character set included in Unicode and, furthermore, this can be done on the granular level of each profile.

EDITED TO ADD:
Schemas are very efficient for very many things, but there could never be a single schema that is all inclusive of every naming convention used by every culture in the history of the world.
Another example: Peter the Great. The ruler of Russia didn't even have an official surname as per Wikipedia. You can find him in Wikipedia, but on WikiTree you have to jump through hoops before you can locate him. Here's how the entries for him could appear in the [names] table:

"Romanov-63", "Peter", "Peter", English-FK, FirstName-FK, 1
"Romanov-63", "the", "the", English-FK, TitlePrefix-FK, 2
"Romanov-63", "Great", "Great", English-FK, Title-FK, 3
"Romanov-63", "Пётр", "Peter", Russian-FK, FirstName-FK, 1
"Romanov-63", "I", "1", Russian-FK, LastName-FK, 2
"Romanov-63", "Алексеевич", "Alexeyevich", Russian-FK, Patronym-FK, 0
"Romanov-63", "Романов", "Romanov", Russian-FK, LastName-FK, 0

Here's how the name would be shown to English speaking users of WikiTree: Peter the Great. And here's how it would appear to Russian users: Пётр I. Same profile would be shown correctly to either culture, in the way they expect the name to be shown.

I would also support an addition of another table that can help link appropriate sources the assigned names. Perhaps this will then calm down the discussions about how names of ancient European rulers (who dared to live in different countries) should be shown.
For backward compatibility there's no need to extract and parse names from the existing main profile table into the new table. If there are names in the [names] table then they can be used for the profile title. If no names have been added, then names from the main table will be used.

The overall count of profiles is still very small, so an introduction of proposed functionality would only seem drastic for another year. I think if we can tackle the name issue then users will embrace WikiTree in ever greater numbers.

Initial roll-out can be limited to a few main forms, with gradual roll-out as time permits. Sure, there's the merger app, the enhanced search functionality, etc. However, as far as there's a work around for existing web forms, the new functionality can be implemented and deployed.

Of course, I admit, I don't know how the feature roll out process works at WikiTree. But I'd like to learn that, if possible, and see whether I can be of assistance, limited unfortunately, but still I'd like to see this happen.

"there could never be a single schema that is all inclusive of every naming convention used by every culture in the history of the world"

That seems very defeatist. I'm not proposing to cover Australian mother-in-law dialects or African click-language. I'm just suggesting collecting some base cases from a wide variety of common languages/cultures.

I can provide samples that cover 1/6 of the landmass for the period when Soviet Union existed as I was born there.

Matthew, I don't mind characterizing my statement as defeatist, but I still regard it as an observation on reality.  Your suggestion of having "some base cases from a wide variety of common languages/cultures" is, by definition, not inclusive of every naming convention used by every culture in the history of civilization.  I also neglected to include allowing for naming conventions of possible future cultures that we could not dream of guessing at.

No matter how many base cases are included in any schema, there will still be some that are not covered - Patrick's proposed architecture actually covers everything that ever was, as well as anything that ever could be.

Well I'm trying to solve a harder problem than just database storage.

Just on the proposed minimalist framework I would suggest doing away with 'FirstName' and just having an ordered list of GivenName(s). Germans seem unhappy with the first name and middle name(s) approach. The 'preferred' name can still be any of them - or something different - but in the absence of a preferred name the first given name would be used. I would also change the name of the Nickname field in wiki (and Gedcom) to something more formal like Cognomen. In English 'nickname' seems a very frivolous word when the field is used for something like a nun's devotional name or a king's regnal name.
DisplayOrder column in the proposed [names] table would take care of our German friends.

Anything with a value of zero would not be shown in the preferred name display. Any other numeric value would instruct the name parsing algorithm how to assemble the name per each provided LanguageID value. Users would then be able to control in what languages, in what order in each language, and what part of the name in each language WikiTree would display the name. All of this without infringing on the original schema design, and names stored in the main profile table.

I've done quite a bit of development on dinosaur IT systems. Sure, I haven't worked on a mainframe that's been in production for fifty years, but I do have experience making twenty-year-old apps run like chickens with front-end web interfaces. There's no way WikiTree platform can be considered archaic and hard to modernize.
But are you an experienced genealogist?  You seem to be real excited about the database design.  But do you know the problems that come up when doing genealogy?  Take this one part:

2. Extend profile entry forms to allow unlimited amount of "name" fields in the order specified.

If you give users to much free reign they will screw things up royally.  They already try to enter junk and now the error program catches some.  That's why the gedcom designers kept it simple: prefix, given, surname, suffix.  I can't see any way or reason to improve on that.  Gaille seems to want a large textarea with no structure but the whole purpose of databases over word processors is to impose structure on the data.

And with that I'm done - peace out.
Mikey, your concern is very valid and would be addressed by the following:

The Language ID value is a foreign key. Users can only select a value from a prepopulated by admins drop-down box.

The NameType ID value is a foreign key. Once again, only pre-populated value can be selected from a drop-down box.

Finally, as I have already proposed in a earlier reply, each entry should really be linked to a "source" entry. A name entry without a source can then be periodically marked up by a bot to alert profile manager(s). Then afterwards, a month or so, the same bot can delete those entries. Isn't this already not an existing process, that can be extended to take care of the new table as well? A bot wouldn't even need to understand what's in the name entry. If it's not sourced for too long then it gets purged.

As a developer I strongly believe that if a user is permitted to screw up something then it most definitely, absolutely guaranteed will happen. Therefore, I strive to address such possibilities at the design stage.
+3 votes
This proposal appears to address most of the concerns with respect to name display (I'm purposely cautious here, I don't see a situation that could not be covered right now, but who knows, maybe there is still something lurking out there that is not coming to me).

One question, though, remains with respect to name order: Let's say we take Mao Zedong, in Chinese he would be 毛澤東, in English Mao Zedong preserving the order family name first, given name last. However, Kung Hsiang-hsi in English, 孔祥熙 in Chinese, but after he moved to the USA he went by H. H. Kung. The former pro football player known in English as Dat Nguyen was born Nguyễn Tấn Đạt in Vietnamese.

A straight forward rule South-East Asian name order for South-East Asian languages and for English does not work as some people maintain the name order and others change to Western order. Can this be addressed on an individual profile base?
by Helmut Jungschaffer G2G6 Pilot (544k points)
Absolutely, Helmut.  The way Patrick proposed this, the profile editor controls the order in which the components of the name appear when it is displayed.  He specified 0 for something that is not to be included in the display and sequential numbers for the order.  There would probably be a default order to be used if the editor does not specify the order, but each profile could have the order set individually.  As near as I can tell, this proposal covers everything that anyone could ever, now or in the future, want to do.
This is where I'd fundamentally disagree. Surely the fact that in Chinese the surname comes first whereas in Westernised form it comes last should be captured in code (once) rather than in millions of individual profiles?

Helmut, thank you for the excellent exercises. Let's see if everything can be handled as expected.

Mao Zedong
"毛-1", "毛", "Mao", Chinese-FK, LastName-FK, 1
"毛-1", "澤東", "Zedong", Chinese-FK, FirstName-FK, 2
"毛-1", "Mao", "Mao", English-FK, LastName-FK, 1
"毛-1", "Zedong", "Zedong", Chinese-FK, FirstName-FK, 2
The name parser responsible for the retrieval of names from the [names] table would then group records belonging to "毛-1" first by Language, and then by the order, which is specified in the last column.
Consequently, the following would then appear on the public form for profile "毛-1": Mao Zedong (Chinese: 毛 澤東)
 

The following is for Kung Hsiang-hsi:
"
孔-2", "Dr.", "Dr.", English-FK, PrefixName-FK, 1
"孔-2", "Kung", "Kung", English-FK, LastName-FK, 3
"孔-2", "H.H.", "H.H.", English-FK, FirstName-FK, 2
"孔-2", "Hsiang-hsi", "Hsiang-hsi", English-FK, FirstName, 0
"孔-2", "孔", "Kung", Chinese-FK, LastName, 1
"孔-2", "祥熙", "Hsiang-hsi", Chinese-FK, FirstName, 2
"孔-2", "Kǒng", "Kung", pinyin-FK, LastName, 1
"孔-2", "Xiángxī", "Hsiang-hsi", pinyin-FK, FirstName, 2

The above would be then parsed and displayed as:
Dr. H.H. King (Chinese: 孔祥熙; pinyin: Kǒng Xiángxī)

Also note that since "Hsiang-hsi" is stored in the [names] table it would still return correct profile id among search results for "Hsiang-hsi" when used as First Name search parameter.

The following is for Dat Nguyen. I didn't find his profile on WikiTree, so let's pretent his WikiTree ID is Ngyuen-777.
"Ngyuen-777
", "Dat", "Dat", English-FK, FirstName-FK, 1
"Ngyuen-777", "Tan", "Tan", English-FK, MiddleName-FK, 0
"Ngyuen-777", "Ngyuen", "Ngyuen", English-FK, LastName-FK, 2
"Ngyuen-777", "Nguyễn", "Ngyuen", Vietnamese-FK, LastName, 1
"Ngyuen-777", "Tấn", "Tan", Vietnamese-FK, FirstName, 2
"Ngyuen-777", "Đạt", "Dat", Vietnamese-FK, FirstName, 3
The above produces this output: Dat Ngyuen (Vietnamese: Nguyễn Tấn Đạt)

Matthew, in recent years there's a strong push in application design to separate content and presentation layers.  Current WikiTree schema uses a conventional approach where database design closely reflects final presentation of data.

I propose to separate data and presentation. Each piece of data is important in itself because it's a piece of information that can be described separately by a genealogist. Each Title, Prefix, Surname, Middle Name, First Name, Suffix, etc can have it's own source, and be present in one language, and absent in another.

A side table permits this to be achieved without fundamentally altering the base design of WikiTree engine.
Of course data and presentation should be separate! I'm just suggesting some debate over how you define your schema. What fields of the form PrefixName, FirstName, etc do we need? And I still think the best way to find out is to first collect a few hundred base cases.
+2 votes

Here are a few items that proposed as-is schema doesn't address:

  1. What determines the order of languages when displaying the names? English probably should always be first, if present. Then it can be in the order entered, but names retrieval from the table will not necessarily retrieve in that order (unless there's also a ROWID for each row in [names]). Many Ashkenazi Jews who lived between 1915 and 1995 had official documents issued in as many as seven different languages throughout their life time. Theoretically it would be great to show all these variations. However, with the proposed schema it not possible to control the order of display. I don't know if it's a critical requirement though.
  2. How to enter both preferred and official names in the same language while clearly distinguishing them as separate name sets? For example, Samuel Langhorne Clemens also known as Mark Twain. The schema can either show Samuel Langhorne Clemens or Mark Twain or Samuel Langhorne Clemens Mark Twain. None of this is ideal. A possible work around can be an introduction of a special name type such as "Name Separator" with value "a.k.a". So then an entry for Mark Twain would look like:
     

"Clemens-1", "Samuel", "Samuel", English-FK, FirstName-K,1
"Clemens-1", "Langhorne", "Langhorne", English-FK, MiddleName-FK,2
"Clemens-1", "Clemens", "Clemens", English-FK, LastName-FK,3
"Clemens-1", "a.k.a.", "a.k.a.", English-FK, AKA-FK,4
"Clemens-1", "Mark", "Mark", English-FK, FirstName-FK,5
"Clemens-1", "Twain", "Twain", English-FK, LastName-FK,6

The above can then be shown more properly as:
Samuel Langhorne Clemens a.k.a. Mark Twain

 

by Patrick Munits G2G6 Mach 1 (11.0k points)
Hi Patrick,

A woman that was married five times and each time she remarried her surname changed. How would that display?
Hi Louis,

the schema can certainly accomodate the storage and search needs for your case. However, the best way to display such scenarios should be up for debate.

Let's say we have Mary born Smith, married Jones, then McDonald, then Williams, then White, then Miller.

"Smith-1111", "Mary", "Mary", English-FK, FirstName-FK, 1
"Smith-1111", "Smith", "Smith", English-FK, MaidenName-FK, 3
"Smith-1111", "Jones", "Jones", English-FK, LastName-FK, 4
"Smith-1111", "McDonald", "McDonald", English-FK, LastName-FK, 5
"Smith-1111", "Williams", "Williams", English-FK, LastName-FK, 6
"Smith-1111", "White", "White", English-FK, LastName-FK, 7
"Smith-1111", "Miller", "Miller", English-FK, LastName-FK, 8

So the above would get displayed as Mary Smith Jones McDonald Williams White Miller. This is essentially a gibberish. However, the schema itself is flexible enough and can permit introduction of new name types without modification of the user interface. So, after much debate WikiTreers may agree that such a case should be handled by new name type "SeparatedName". Then the name parser module that displays name can be extended to format such values with a preceding blurb such as "formerly known as" or "divorced surname", etc.

Then the data entered in the following way:
"Smith-1111", "Mary", "Mary", English-FK, FirstName-FK, 1
"Smith-1111", "Smith", "Smith", English-FK, MaidenName-FK, 7
"Smith-1111", "Jones", "Jones", English-FK, SeparatedName-FK, 3
"Smith-1111", "McDonald", "McDonald", English-FK, SeparatedName-FK, 4
"Smith-1111", "Williams", "Williams", English-FK, SeparatedName-FK, 5
"Smith-1111", "White", "White", English-FK, SeparatedName-FK, 6
"Smith-1111", "Miller", "Miller", English-FK, SeparatedName-FK, 2

Mary Miller formerly known as Jones, formerly known as McDonald, formerly know as Williams, formerly known as White, nee Smith.

The thing is that while geneaologists will debate about the proper way to display the data it will already be stored in the database in way that is flexible enough to allow future presentation changes while already fully supporting the search functionality that can be build to use individual name fields.
Thanks Patrick,

You certainly have my support.
+4 votes
Today I decided to take a look at the de-facto standard way IT attempts to give structure to genealogy as a science. That glue is GEDCOM.

GEDCOM 5.5 is correct in defining that an Individual has PersonalName(s). However, that same GEDCOM version is fundamentally wrong in its rigid declaration that PersonalName element consists of FirstName, LastName, and Title elements. GEDCOM 5.5 is fundamentally wrong because a person doesn't require FirstName and LastName to be born and to exist.

A human who was born might have lived their life (however long or short it was) without ever having an official FirstName or LastName. This doesn't mean that a genealogical application shouldn’t be capable of having a profile for such a person.

GEDCOM X addresses this shortcoming in the following way. The Name element no longer makes a claim that a person must have FirstName and LastName. Instead it says that it can hold a value that represents some type of a name.

Consequently, I’m most certainly not the first IT-person slash genealogist to declare that an individual doesn’t have to have a formal First and Last names, and by extension can actually have many variations of titles that constitute variations of names in various cultures.

The proposed schema extension for WikiTree use addresses this need to correct the problem in GEDCOM 5.5 without breaking it, and at the same time likely makes it a lot more compatible with various newer GEDCOM XSDs proposals.
by Patrick Munits G2G6 Mach 1 (11.0k points)
Not only do you not need a name to exist, at one point in England and Wales you didn't need a name to have your birth registered. I have a certificate with no name entered at registration and no name entered later. He was given one when he was baptised. He spent his first census away from his family then joined the merchant navy and was abroad for many more. He did put his date of birth on an official form which let me confirm the birth cert was for him.

Tim
Without a subsequent document issued to a person, how does one identify for whom the original certificate was issued to? Were there names of parents, clan, congregation, address, anything besides the date?
+2 votes
An excellent idea. Some random thoughts.

People change their names for a variety of reasons. Could we distinguish a name change with optional effective date range, reason and a field to hold the source. Honorific titles can be granted too and change how someone is known.

Romanisation. Not sure how well that works. Also somewhat biased to the Latin script. Different languages written in the Latin script may Romanise another script differently depending on how each letter in the Latin alphabet is pronounced in tbat language. It normally attempts to transcribe the pronunciation of the other script. Not the same as someone picking an English surname for themselves when they come into the USA for instance. (They might translate their native surname into English or arbitrarily pick a new one)

Personal and family name order of East Asian people in English, can be either way around depending on the individual. At least one national leader currently has his name given family first in the Western media.

In some parts of the world a title can indicate someone's religion.

There can be clan names used in someone's name.

I've noticed some arabian royalty use a string of patronymics which indicates a path of descent on the male line, rather than just their father.

In Spain the child gets their father's first surname as their first surname and their mother's first surname as their second surname.

People may have nicknames,  nom de plume,  nom de guerre, stage name, regal name. Murderers may get a popular name that they retain after their identity is discovered.

People may have their names translated if the literary language is not the same as the spoken language. E.g. English William might be Gulielmus in a legal document written in Latin.

Some spoken languages don't have a written form, though I imagine the number is decreasing. Use international phonetic alphabet if pronunciation is known?

Where names are often duplicated, the local community may add a tag to distinguish them. Jones the Steam, the Briton.

Regards,

Tim
by Tim Partridge G2G6 Mach 3 (33.2k points)
Yes, this additional [names] table can cover all these scenarios. This is because it doesn't have a column for each type of a name, but instead has a column that identifies what type of a name it is.

I don't support the idea that type name should be permitted to be entered in a free form. Instead, as the schema shows, it's an ID (ForeignKey) of a value that exists in another table called something like nameType.

The nameType table can have a structure that would permit storage of not just the name of the name type (i.e. First Name, Nom de Guerre, etc.) but also additional fields such as description, localities where such name type/form is used, as well as when it was introduced for use by WikiTree community, etc.

Admins would populate the values in nameType table, and then WikiTreer would be able to select a value they need for a new record they add to [names] table.

Such structuralization of data inherently supports ability to build reports. It would be very easy then to locate all profiles of Kings, Counts, and revolutionaries that have nom de guerre in their profile. All of this without the need to build new categories for every new such name/title type.
I'd be hesitant to add dates and localities right to the [names] table. These would most certainly will have to be optional data values, with a very large portion of the records expected to have no data. It also doesn't cover situations when a title was granted and revoked multiple times.

It would, however, be relatively easy to add another table that has a many-to-many relationship with [names] table and would be designed to hold just such a data for dates, localities, etc. Such separate table would permit staged approached to functionality roll out, limit required code development scope, and keep it simple for the end users.
+4 votes
Hi Patrick,

Great post. I have added a link to it on our Team to-do list so we can look at your suggestions when we next work on improvements on the profiles pages. Thanks!
by Abby Glann G2G6 Pilot (469k points)
It is valuable to read the discussion in here as well, old thread same subject:

https://www.wikitree.com/g2g/178270/use-of-non-roman-alphabets

Related questions

+9 votes
2 answers
+10 votes
2 answers
123 views asked Jan 18 in WikiTree Help by Maurice Rivet G2G Crew (430 points)
+4 votes
4 answers
156 views asked May 4, 2020 in The Tree House by Peter Roberts G2G6 Pilot (564k points)
+4 votes
0 answers

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...