Suggestion do a GEDCOM sanity check before importing using DB_Error software

+46 votes
934 views

Lesson learned is that GEDCOM import is the mother of all genealogy problems inside WikiTree ;-)

I checked new errors Space:Database_Errors_Project_2016-10-23#Errors many of them was uploaded by GEDCOM files last week..... 

Suggestion: Run DB_ERROR on the imported GEDCOM before it gets approved...

 

in The Tree House by Living Sälgö G2G6 Pilot (297k points)
retagged by Maggie N.
I think that would be an excellent idea, if it can be done.
This is one of the best ideas you've suggested, Magnus! Thank you; I concur!

This is the next logical step.  Yes!  Sorely needed!

We need to decide which errors and how many would send a gedcom back for revision, but remember they are never "rejected," but only returned for correction.  Approving a GEDCOM

And Aleš has already been talking to me about it! Gotta love Aleš.

We will be working on this.

Hallelujah... Choir of angelic voices! Manna falling from heaven.

(or in my case: happy dance!!!)

I just fixed a hundred or so death locations of Y today from an import done this week. Hmm. Yea. This is a great deal.

Marty

6 Answers

+19 votes
Yes!

(didn' know comments had to exceed 10 chars)
by Chris Little G2G6 Mach 5 (52.3k points)
+14 votes
As a Data Doctor, I say this is a great idea. Now we just need Ales from coming up with new errors (just joking). I've seen the numbers for some errors go up even though I might have corrected a 100 in the group. But, before running a GEDCOM through the DB_ERRORS a policy a process should be established and published.
by Bob Keniston G2G6 Pilot (264k points)

Create a quality measurement

  • 0 errors and 5 sources per profile is rank 5
  • If not every profile has a source = 0 --> please mark those profiles not to import and try again
  • If more than 5% of the profiles has an error --> correct profiles with error or mark them not to be imported

And Wikitree should have a quality statement like

better share 10 well researched and well sourced profiles than 2000 Unsourced profiles. Wikitree is a common family tree and a profile with no sources we cant trust...

It will need to be fine tuned to the types of errors.  For instance "unique name" should not block importation.
Step 1 is just to run db_error to give some indication to the person approving the import. Today I feel it just a guess work and a check if it has some sources
+14 votes
On the surface....this is a GREAT IDEA, but, I think we really need to think about the "errors" that would hold up a GEDCOM input.   Things like having USA on a location for someone born in 1710...do we really want to stop a load for that?   I understand stopping an upload because of impossible dates, and duplicates, lack of sources, etc.   But, unconnected profiles...that happens a lot because the parents already exist in Wikitree.  

  I think we really need to think this through.

I will be honest....if my GEDCOM load had to meet ALL the error criteria we have today....I probably would have gone somewhere else, manually typing in all that data would have overwhelmed me....
by Robin Lee G2G6 Pilot (862k points)
edited by Robin Lee
Are you satiesfied with the quality inside Wikitree...

For me it feels odd that one person can upload a family tree that take 100 of hours for other people to clean....
But Robin's right, Magnus, that we would need to determine exactly which errors would send a gedcom back for correction.  We should decide which are the most important; I doubt that anyone's gedcom would pass them all.
and there's the additional complication that many people may not know how to actually correct their gedcoms -- that they would have to go back to their own tree/program, correct the errors there, then re-export a new gedgom to re-import into WikiTree.
Thanks, Nan

So some minimal number of errors that would have to be met; here's the beginning of a list:

  1. Empty fields (beyond first name last name)
  2. Completely disconnected [actually I'm not sure this is a priority]
  3. No sources
  4. The only sources are:
    • Ancestry.com trees (not sure the easiest way to filter for this, but there must be one)
Those would be my highest priorities. I'd also like to prevent Millennium File, US & International Marriages, and Family Data files (or whatever they're called). But only if those were the only sources. But we should really pick our highest concerns.
I did some statistics of number of uploaded GEDCOMS that had sources and I think it was 1 out of 250 during 40 hours that had more than one source per profile...

---> Lesson learned people doing good genealogy are not as interested to upload to a site like Wikitree

I think we need numbers on those 249 badly sourced uploaded family trees will someone work with the family tree and add sources or is it just more unsourced profiles...

 If we see that most people do it then it's not a problem. If there is no activity after the first week then there is a problem...

Dennis, minor problems can be corrected in a simple text editor (just don't use Word -- the formatting really screws up the file). I've done this to GEDCOMs I've uploaded -- example: the location fields in my personal database only go to the state level for the USA, since (so far) none of the people in it were foreign-born. But obviously, that's not nearly good enough for WikiTree purposes, so I used a text editor on the exported GEDCOM to replace all my state abbreviations with full names & "United States". Far easier to do it that way than to change every entry in my personal file...

And that example shows that, perhaps, the use of "USA" or "United States" on a too-early date should be a reason to not approve a GEDCOM import -- a case where it's easier to fix before-the-fact rather than after, since "after" might never happen.

That said, obviously some designated errors in the database would not apply to a new import -- we (as WikiTree enthusiasts) just need to determine which errors should:

  • notify a user of a potential problem with a proposed profile that stills allows creation of the profile
  • notify a user of a problem with a proposed profile that prevents creation of that particular profile
  • prevent the approval of an import entirely (perhaps a maximum number of identified problems?)
not something I would recommend to anyone who was not already somewhat computer literate. because its too easy to create an unreadable data file.

Hm " the use of "USA" or "United States" "

Reality check I dont think that is the level of problems people uploading to WikiTree has they have mother dead before child born etc....

An indication that they can't have checked one single source...

To add to Jillaine's list:

5.  the impossible date errors (married/gave birth before born/after died; children born when a parent was under a certain cut-off age)
All this just points out that there are "errors" and there are "ERRORS"...

The easiest ones to correct before import are the "errors" such as incorrect place names; the harder ones, like impossible dates, are probably the ones that should flag a GEDCOM entry as "un-import-able", the way those without any dates are currently treated.

Magnus wrote "Lesson learned people doing good genealogy are not as interested to upload to a site like Wikitree"

This is what my business partners and I call "overlearning". Another possibility, Magnus, is that people doing good genealogy might prefer to create profiles manually. ;-)

That said, I don't necessarily disagree that many good genealogists are shying away from wikitree. None of the professionals I know of participate here. But I've also never seen any of them "dish" wikitree either. They may simply have other priorities for their time.

That said, there are many good genealogists here-- just a limited number. And I fear that the continued perpetuation of crap profiles will discourage any significant investment of time on their part. 

While I was initially very interested in the db_errors project, the sheer volume of bad data that it reveals disheartens me. 

I'm with you, Jillaine.  Some days I still enjoy working the db_errors, but more and more often I start thinking "I'm tired of fixing other people's crap when they don't care."

Maybe it's time for a db break.  :-)

>>  "I'm tired of fixing other people's crap when they don't care."

;-) crowdsourced genealogy maybe not always is a success

+5 votes
It might help, but how to correct them that way. You throwing all error Gedcoms out?
by Jon Czarowitz G2G6 Mach 4 (44.7k points)

No you don't let them in....

As most GEDCOM imported files seems not to be corrected just is an source for errors my thought was its better that people understand that they have problems with their family tree and then see how they react. 

Will they come back with better quality then they will be a good WikiTree member....

If WikiTree love to have many profiles we can let them in but we know they have problems..... or just tell them please skip profiles with mothers born after the childrens its not genealogy....

If quality is imported inside WikiTree lesson learned is that most GEDCOM profiles will never be good members in the WikiTree family and GEDCOM imported profiles with errors are the black sheep

+8 votes

Perhaps what we really need is a Help page to address what GEDCOM fields create valid WikiTree sources, and how to ensure that those fields are correctly populated when generating a GEDCOM.

I've been doing my own family's genealogy (well, I think) for almost 20 years now. My personal database includes many first-level sources -- county vital records books where I actually saw the records in person, rather than an online transcription. And yet, I'm not very happy with the way they translated when imported. (See Thomas Layfield [my g-grandfather].) In contrast, take a look at a profile I've done manually: Addie Belle (Sinnett) Stanley. The difference is so big I actually sometimes regret using an upload at all...

by Kitty Linch G2G6 Mach 4 (43.5k points)
@Kitty I feel you speak about how GEDCOM import gets formatted

The discussion is about genealogy errors in the uploaded file....

I agree with you that with today's WikiTree the import is not good looking so doing it manually is better

Bad formatting compared to unsourced profiles then I prefer every day of the week bad formatted... upload unsourced profiles are just a waste of everyones time....
I kind of got off-track, didn't I?
GEDCOM is always a interesting/sad subject to speak about at least with WikiTree addictive people...

I feel its sad that most WikiTree profiles are not readable they are more a GEDCOM dump.....
The current GEDCOM conversion is far better than its predecessor. Kitty, I don't think Layfield looks very bad at all.
+3 votes
I wonder if there could be a mapping document.  When I import a gedcom from one program to another I get a screen where I have to correct errros or map fields that the new program does not understand (I run Legacy Deluxe)

Basically, I can see what is causing the problem and fix it before the system actually completes the import.  

Example I can tell the import to take christening and put it into birth and add a note

Or I can map christening dates to notes instead of birth.  

Things that fail WikiTree Style guides like no hyphens, only one name in each name field would need to be thought through.  Those guides do not mirror real life with many hyphenated names and many accepted multiple names like Anne Marie, Jo Ann, Jean Paul, and similar...
by Laura Bozzay G2G6 Pilot (833k points)

Related questions

+29 votes
6 answers
+9 votes
1 answer
157 views asked Apr 29, 2017 in WikiTree Tech by Laura Bozzay G2G6 Pilot (833k points)
+6 votes
1 answer
177 views asked Jun 17, 2015 in WikiTree Tech by Kitty Linch G2G6 Mach 4 (43.5k points)
0 votes
2 answers
193 views asked Apr 22, 2017 in WikiTree Tech by Björn Grothkast G2G Rookie (130 points)
+5 votes
2 answers
303 views asked May 1, 2021 in Genealogy Help by Kristina Wheeler G2G6 Mach 1 (19.4k points)
+4 votes
3 answers
+11 votes
4 answers
+10 votes
2 answers

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...