A few thoughts on GEDCOM handling

+12 votes
218 views

Sometime back, while trying to source and connected some isolated Slade profiles, I realised that they were part of a GEDCOM which had somehow gotten completely disconnected during the upload process. Then I learned that there are other GEDCOMs which have had the same thing happen, so I started the Lost and Found Project. Up until today, all the disconnected GEDCOMs that we have identified were from several years ago (2010-2012, mostly), so I had assumed that whatever was causing the GEDCOMs to be disconnected had been fixed, so that once we reconnected the existing disconnected GEDCOMs, there would be no more of them and we would be able to move on.

However, a new post caused me to realise that the problem has not, in fact, been fixed, so I suggested to the original poster that she add the "tech" tag to her post so Chris can see it, and hopefully troubleshoot why it is that GEDCOMs are (apparently still) getting disconnected on upload. I don't know whether the problem is due to some other site (or standalone software) not forming their GEDCOM files correctly, or whether the import tool isn't parsing the GEDCOMs correctly, or what. I haven't had this issue importing GEDCOMs from three or four different sources, but clearly other people have.

As of the last time I got a dump from AleŇ°'s database for the Connectors Chat page, about 2.5 million profiles on WikiTree had no connections at all. That's about 5/6 of our total unconnected profiles, and nearly 1/6 of our total profiles. Granted, probably a number of those are people who sign up, decide that WikiTree isn't for them, and bail without doing anything. However, I'm guessing that a bunch of them are because people uploaded GEDCOMs that got disconnected, and they never managed to get them reconnected again. (Come to think of it, I wonder how many people have joined WikiTree, uploaded a GEDCOM, and then, when all the family members they uploaded never showed up in their family tree, assumed that WikiTree was useless and left, not realising what had happened to their upload.)

(There would also be a bunch of people with only one connection from those disconnected GEDCOMs, because for some reason, husband-wife connections somehow manage to survive the upload process. It's the parent-child connections which get broken during the upload process, and since siblings are connected through their parents, they get disconnected from each other, too.)

And, even though that is a discrete topic and may entail considerable work to fix, I can't help but add some more comments about GEDCOM handling that I've had simmering on the back of my brain for some time now:

When I uploaded my GEDCOMs, they didn't get completely disconnected like that, but when I learned about the Unconnected report, I discovered to my horror that a number of profiles (sometimes singles, sometimes small clusters) had gotten isolated because I skipped uploading profiles where the person already existed on WikiTree (as the GEDCOM import documentation recommends). Unfortunately, when I skipped importing those profiles, all their relationships got skipped, too, and thus I had to go back through my Unconnected report and reconnect those people manually. 

Recently, somebody wrote to my wife because he was preparing to import a GEDCOM which includes some people who already have profiles on her watchlist. He suggested (and I agree) that he should import those people anyway (because his GEDCOM contains a lot of information that she doesn't have) and then merge the profiles. 

I understand that merges are less than ideal, because too many redirects slow down the system. What I have long thought would be preferable would be a setup which, when a match is identified during the upload process, the information from the duplicate profile gets added to the existing profile, rather than either skipped or created as a new profile and then merged. (In a case where the existing profile isn't Open, then what I'd like to see is some kind of temporary holding record created, which would get added to that profile once a profile manager approves the addition.) Granted, there would still be editing cleanup to do, but at least no redirects, and the connections and data from the GEDCOM would be preserved.

in WikiTree Tech by Greg Slade G2G6 Pilot (410k points)
retagged by Keith Hathaway

Hi Greg,

FYI, we've been working for a long time on a complete revamp of our GEDCOM import process. Soon they won't be imported at all. They will just be used as the source to auto-fill forms for editing and creating profiles, similar to how WikiTree X works.

"What I have long thought would be preferable would be a setup which, when a match is identified during the upload process, the information from the duplicate profile gets added to the existing profile ...".

This indeed what will happen: :-)

 

Yahoo!

3 Answers

+4 votes
 
Best answer
As a mentor, I have told several new members....please read all 11 help pages on GEDCOMS, https://www.wikitree.com/wiki/Category:GEDCOMs  then, if you still want to do a GEDCOM load you should "know" what to expect.   

I think the issue is more that people go rushing into a load of their GEDCOM without thinking, nor reading the help pages.   I have loaded several GEDCOMs, spent the time to go through the error report, clean up my profiles, attach them to existing profiles, etc.   I know it is time consuming to do the clean up afterwards, but it beats loading all that information manually.
by Robin Lee G2G6 Pilot (670k points)
selected by Anonymous Barnett
I just went through those docs again to be sure, but if there's anything on any of those pages that says anything like "If you do this (or don't do that), all the parent-child links in your GEDCOM will be broken when you import it, and all your imported profiles will end up disconnected", I still missed it.

Its not real clear, but its in here: https://www.wikitree.com/wiki/Help:After_importing_a_GEDCOM

We offer this option in case you want to intentionally create a few duplicates to merge later. This [not skipping] might be necessary to keep family lines together in a convenient way.

If you skip yourself you'll need to recreate the relationships between your existing account profile and the GEDCOM-created profiles of your parents, siblings, spouse, and/or children. 

We do not automatically merge or connect any GEDCOM-created profiles with your existing account profile.

If you think about it, it makes sense. The gedcom import can only import individual records. If those records are linked to each other, then the links will be maintained as well. But if a linked record is skipped, then there's no [current*] way to maintain any links between the imported records and any existing WikiTree records.

* the gedcom import process is in the middle of a major revision. (there's a thread on it somewhere)

Robin, that's a really nice category! I don't believe I've ever seen that one before. I've only ever seen a one or two or three disconnected help pages on gedcoms, but never a complete set.
Dennis,

Yes, skipping profiles can isolate other profiles. That happened with my own imports, where skipping importing a profile would isolate individuals or small groups.

However, on the GEDCOM which I'm working on putting back together (which is pushing 600 profiles now, and I haven't added them all to the category for it yet), I haven't found a single parent-child link that was left intact after the import. All of the husband-wife links were intact, but every single parent-child link was broken. (And, just to be sure, I went into the Changes tab to check, and nobody had edited the profiles to disconnect parents from children, so it's not a matter of somebody disconnecting them because of errors in the data.)

Also, when I accidentally isolated profiles by skipping profiles during import, the isolated families would still be connected to each other, just not to the rest of that GEDCOM or to the main tree. With the issue I'm talking about, most of the profiles aren't connected to anybody (except husbands and wives).

Of course, you could be dealing with a really old import where the import code hadn't matured yet?

Or the original gedcom file didn't include links either (though that wouldn't make much sense -- unless it was a fault of that program's export)

I've seen a few Gedcoms like the one Greg describes. Dozens, if not hundreds, of isolated profiles, mostly unsourced. Some old, at least one fairly recent. When a father and his 8 children are all created in the same gedcom, and none of them are connected to anybody, there's either a problem with the gedcom structure or a problem with the interface between that gedcom structure and Wikitree's import process. Either way, it's good to know about the plans for changes that ought to prevent new issues like these.
+2 votes
I like this project.

I have a small group of unconnected profiles that were imported via someone's gedcom as connected to the wrong parents. (they became disconnected when I corrected the parents)
by Dennis Wheeler G2G6 Pilot (535k points)
0 votes
I agree with the person who wrote to your wife  is correct that is good idea. Because some of mine got disconnected during GEDCOM transferring. Also I like this ideas too:

Recently, somebody wrote to my wife because he was preparing to import a GEDCOM which includes some people who already have profiles on her watchlist. He suggested (and I agree) that he should import those people anyway (because his GEDCOM contains a lot of information that she doesn't have) and then merge the profiles.

I understand that merges are less than ideal, because too many redirects slow down the system. What I have long thought would be preferable would be a setup which, when a match is identified during the upload process, the information from the duplicate profile gets added to the existing profile, rather than either skipped or created as a new profile and then merged. (In a case where the existing profile isn't Open, then what I'd like to see is some kind of temporary holding record created, which would get added to that profile once a profile manager approves the addition.) Granted, there would still be editing cleanup to do, but at least no redirects, and the connections and data from the GEDCOM would be preserved.
by Anonymous Barnett G2G6 Pilot (465k points)

Related questions

+11 votes
5 answers
+10 votes
1 answer
+5 votes
0 answers
103 views asked Jun 27, 2017 in WikiTree Tech by Roger Barnes G2G5 (5.0k points)
+6 votes
3 answers
290 views asked Feb 10, 2017 in WikiTree Tech by Dirk Laurie G2G6 Mach 3 (35.7k points)
+4 votes
1 answer
132 views asked Apr 13, 2018 in WikiTree Tech by Vicky Majewski G2G6 Mach 7 (75.1k points)
+10 votes
0 answers
132 views asked Oct 29, 2017 in WikiTree Tech by Karen Raichle G2G6 Mach 7 (73.5k points)
+7 votes
1 answer
250 views asked Sep 20, 2017 in WikiTree Tech by Lori Humphrey G2G1 (1.5k points)
+17 votes
2 answers

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...