Suggestion when importing a GEDCOM file add a template TOBECLEANED

+29 votes
405 views

Today we have too many "GEDCOM profiles" that are more like a data dump than a genealogy profile because 

  1. The profile is gedcom imported
  2. Many users on WikiTree seems to upload and then do nothing more
  3. Its difficult to find what profiles have I "cleaned" or not
  4. The import function in WikiTree generates more a dump format of GEDCOM than a readable file ==> it makes WikiTree look like a low quality genealogy site....
  5. Some people also get scared changing this format as some people get  upset as they say it contains genealogy valuable information....
  6. Most people dont understand the different parts of the GEDCOM and think its valuable....

Suggestion: When a new profile is created using GEDCOM import add a template GEDCOM IMPORTED shall be cleaned

==> 

  1. easier to find out what profiles that are GEDCOM imported
  2. its easier for db_error to identify and tell its an error as this profile is not cleaned...​
  3. we could have in the future user profile statistics of number of gedcom profiles still to clean
in The Tree House by Living Sälgö G2G6 Pilot (297k points)
retagged by Ellen Smith
Good idea as I find 90% of the problems I find are on old Gedcom created profiles. Will not help the old ones already created but will certainly help us spot new ones.
And also get some statistics if GEDCOM files increase and/or get cleaned

Please explain "add a template GEDCOM IMPORTED shall be cleaned". I don't understand where or how in the process a template is added. Thanks.

  1. Today
    1. I as an user upload a GEDCOM file to WIkiTree
    2. I do some work and compare my file with what's in WikiTree and select what I think should be imported
    3. I approve my GEDCOM for import
    4. The file is reviewed by someone and approved
    5. The file will be part of  WikiTree
  2. My suggestion
    1. Add a new step before 5 and add to all profiles that are approved a template {{GEDCOM IMPORTED}}
      1. this template displays some text 
        "This profile is GEDCOM imported and have a very machine-generated feel to them and will need your human touch."
      2. With a template it will be easier to find profiles that have never been edited after import
      3. We could have functions like Watchlist to see number of profiles not "cleaned"
      4. We could have user statistics of number imported profile/number cleaned
Thanks for the suggestion. Most of the profiles imported through my GEDCOMS appear to apply the Biography quite well. Here is an example of a profile I uploaded yesterday as part of a GEDCOM. The Biography and sources are exactly as imported by WikiTree without any cleaning or fixing.

https://www.wikitree.com/wiki/Beezley-367

Perhaps there are some fixes to be made, but it would seem that it is good enough to warrant low priority. What do you think?

but it would seem that it is good enough to warrant low priority

Sorry but I think it's a mess. I would like to see that WikiTree add a button to give feedback to people who has created well written profiles that they like to read and has a good genealogy quality

The example Beezley-367 is not easy to read.... byt that is my opinion.... I nearly never read profiles on WikiTree because of the quality...

  • References to picture that is not there
  • All those span html links that makes it unreadable
  1.  Source: #S58 Year: 1930; Census Place: Lincoln, Lancaster, Nebraska; Roll: 1285; Page: 21B; Enumeration District: 21; Image: 604.0; FHL microfilm: 2341020 Page Year: 1930; Census Place: Lincoln, Lancaster, Nebraska; Roll: 1285; Page: 21B; Enumeration District: 21; Image: 604.0; FHL microfilm: 2341020 File Format: jpg 1930 United States Federal Census Note: Year: 1930; Census Place: Lincoln, Lancaster, Nebraska; Roll: 1285; Page: 21B; Enumeration District: 21; Image: 604.0; FHL microfilm: 2341020 PHOTO Scrapbook: N
  2.  Source: #S113 Page
  3.  Source: #S10 Page
  4.  Source: #S5 Page
I understand and appreciate your opinion. I'm sure a careful one-by-one effort would be better. I do put priority to fixing much greater messes than this and adding information where none exists at all. It's not a matter of disagreement, it is a matter of priority of effort. As for "spam" links, you will see that in each case exhibited in your response, the link refers back to a fuller description of the source. And, yes, source listings are not fun reading, but they do show information about where the source can be found. As for the image not there? Sources do refer to where images are available and the fact that the image doesn't exist in the source note seems hardly a fault.

But I do appreciate your input. Anyone else?

span is html links not spam I prefer inline links using <ref> you get cleaner profile - (is my opinion)

The problem I see when getting the gedcom import

  1.  the text is not formatted with carriage returns etc.
  2. You have Source: #S10 Page and then you need to find #S10 to find the info its confused,,,,
  3. A nice profile has the images also uploaded to WikiTree and also included in the text... is my opinion.... but I guess the JPG references above is Ancestry.com pictures....
  4. I prefer 1 well written profile over 2000 GEDCOM imported... 

Yes genealogy is a never ending process and if you try to get structure on all profiles on WikiTree it feels more like a Sisyphus work...

6 Answers

+19 votes
 
Best answer
Sounds like it would be easy to implement.

Wish I could mark your question as the best answer ;-)
by Eva Ekeblad G2G6 Pilot (573k points)
selected by Living Sälgö
+4 votes
I know that I have a lot of GEDCOMs that need work but I'm having a hard time finding them. I really liked the new error 831 multiple duplicate lines. That was a way for me to do more research and get better sources. Then the next week the profiles that I worked on the week before were shown as duplicate lines as I had several different years of census. Is there something that I'm missing that would be easier for me to find the profiles that I need to do work on?
by Sherry Wells G2G6 Mach 1 (18.7k points)
On another thread

https://www.wikitree.com/g2g/319482/database-errors-are-out-of-control

(scroll all the way to the bottom) he says he is going to modify 831 so it is less likely to trigger on multiple census lines.

@Janet 

>> modify 831 so it is less likely to trigger on multiple census lines.

Swedish census also or?

I think this is an area that also need to be changes but should not be done in the Database Error project instead sources should be marked as sources in WikiTree and if you do your genealogy by the book you always transcribe the source(s) ==> you will have more duplicates and the Database Error project can't have all possible transcription texts as exceptions in the Database Error project....

Structured data is when you add things not as free text instead as data with a meaning i.e. 

  • a birth date as a date
  • a census as a block of text with attributes with
    • census year
    • quality
    • unique id of census
    • trustworthiness 
  • location not as text but as an object with a location and a name and maybe also between dates, GPS location....
+3 votes
I'm giving a vote of agreement to this suggestion too. I have over 30,000 people in my database and plan to just keep uploading a hundred or so at a time so I can keep on top of the cleanup. But it is so easy to miss profiles when I have to randomly search for which profiles I haven't yet cleaned. I've been using the date of upload in my watchlist to help me. But as my watchlist of people that I want to continue managing gets bigger that doesn't work easily either. IMHO gedcom upload is still the best option on WikiTree - creating all those profiles individually is too daunting to contemplate. Happy to hear/discuss alternative suggestions, but if we want to improve the quality of profiles,while getting more data on WikiTree, we need to sort this out.
by Gillian Thomas G2G6 Pilot (266k points)
+3 votes
Definitely not a bad idea. Gives a chance to get them first hand with the provider of the Gedcom close by.
by Jon Czarowitz G2G6 Mach 4 (44.7k points)
+2 votes

The idea is a sound one, but for one thing: it does not reduce the amount of work to be done. Someone still has to do the cleaning.

The GEDCOM import function does a lot of things incredibly well, but the generated biography sucks (in spades). Some effort should be put in there too: it will makes everybody's job easier.

Instead of trying to put everything that was in the GEDCOM file into the biography, there should be a form for a decent WikiTree biography where certain fields could be filled in, The rest of the GEDCOM should just be ignored.

For example, FamilySearch URLs are easy to spot. Keep them. User and profile numbers from mycromagnon.com should just be left out.

by Dirk Laurie G2G6 Mach 3 (39.4k points)
+2 votes
I think a lot of these bad gedcom profiles can be found in errors

802, 803, 901 and 902

But having a quick way to find them is always an improvement!
by Laura Bozzay G2G6 Pilot (833k points)

Related questions

+7 votes
1 answer
216 views asked Feb 13, 2016 in The Tree House by Stuart Purvis-Smith G2G Crew (340 points)
+7 votes
2 answers
133 views asked Apr 22, 2015 in Genealogy Help by Tim Blosser G2G1 (1.2k points)
+6 votes
1 answer
191 views asked Aug 31, 2014 in WikiTree Tech by Ethan McCoy G2G Crew (310 points)
+3 votes
1 answer
166 views asked Apr 10, 2013 in Genealogy Help by Ron Abiri G2G Rookie (220 points)
+6 votes
1 answer
177 views asked Jun 17, 2015 in WikiTree Tech by Kitty Linch G2G6 Mach 4 (43.5k points)

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...