We just released a round of improvements to our GEDCOM import system.
This has been requested by many members and on the to-do list forever. To be honest, I find working with GEDCOMs depressing so there was a little procrastination involved. :-)
To rationalize my procrastination a bit: I kept thinking that the old GEDCOM standard was on its way out. I thought we should focus our very limited resources on the standards and systems of the future instead. But I was wrong. GEDCOMs aren't going away any time soon. They remain important for genealogy, and important for WikiTree. Not important for all members, but for many members, especially new members.
Profiles created through GEDCOM imports will never be beautiful. We need to balance a lot of different considerations. But they can and should be better than they've been in the past.
In the past, we operated with these principles:
- We never want to lose any information that's in a GEDCOM.
- We never want to misinterpret or mislabel information.
- More information is always better than less.
I'm sure many of you will agree that these principles sound correct. :-) But in practice they've created horrible, junky profiles that need extensive editing.
Now we are skipping a lot of information.
For example, many GEDCOMs contain ID numbers for each individual. They're unique identifiers for the exporting system. Often they're unique to the one user of the system and they're meaningless to everyone else. And 99% of the time they're meaningless to the one user too, because they're only used by their software in the background. But, according to our old thinking, they could theoretically be useful in some cases that could be helpful even in a collaborative environment like ours. And they could. But that's likely to be such a small fraction of the cases that they're not worth what it costs the community to include them.
Now we're just skipping these ID numbers. And a whole bunch of other stuff. Here's the complete list: http://www.wikitree.com/wiki/Skipped_Tags_in_GEDCOMs
We're also now leaving a lot of information unlabeled. Rather than putting "Address:", "City:", and "Country:" in an address, for example, we just print the address.
See http://www.wikitree.com/wiki/Skipped_Tags_in_GEDCOMs#Translated_Tags for the exact details. (Not that it'll be easy to understand from that, because what the translations mean are complicated by a lot of other things in the code.)
We've also changed how we format the information we do print in the text. We no longer make lots of subheaders. We do something closer to a plain-language paragraph structure.
We did a variety of other little changes, but most aren't worth mentioning. One that's important to our Dutch community: we better preserve the capitalization in a Last Name at Birth like van der Beek. And we also do a better job of guessing at proper capitalization in a name like McClellan when it appears in a GEDCOM as MCCLELLAN.
Here's an example of a profile created today under the new system: http://www.wikitree.com/wiki/Syme-151
Feel free to post here with additional suggestions, comments, questions, etc.
I have to warn you, though, that a suggestion that would make one GEDCOM import better might at the same time make other GEDCOMs worse. In fact, that's almost a guarantee. We have to balance the good that a change does for some imports with the harm it does to others. Judging this is incredibly difficult and time-consuming. So, now that I've made that excuse for why suggestions might not be implemented, feel free to fire away. :-)
Merry Christmas and happy holidays everybody!
P.S. I don't want my complicated explanations and excuses above to discourage posting suggestions. We do plan on continuing to make improvements. It's already on the to-do list for early next year to completely rewrite our code so that we have a cleaner foundation.