Challenge of the week: Clean up GEDCOM-generated data [closed]

+11 votes

Hi WikiTreers,

Will you join our "Data Doctor" Challenge of the week?

Once again, Aleš has come up with something to help us work together on a category of profiles that need our help. This time it's GEDCOM-generated data that needs to be reviewed and cleaned up.

Here is the list of profiles that could use some TLC.

Will you join us?

Every time you record a status update it will earn you a point. The member with the most points at 11:59pm EDT on Sunday night will get the Winner badge this week and the bragging rights. But we'll all benefit from a neater, cleaner shared tree.

If you're participating, please post here to let us know. It's nice to cheer each other on. Or post if you have any questions about how to participate.

Thanks for helping!

P.S. If you want to chat or coordinate what you are working on with others, in addition to this G2G post there is a handy spreadsheet courtesy of Steven Tibbetts. 

Real-Time Tracking Stats

Top 10

closed with the note: Challenge is finished
in The Tree House by Eowyn Langholf G2G Astronaut (2.7m points)
closed by Eowyn Langholf
Challenge is active.
I'll spend some time on this.

Scanning the "851 GEDCOM uncleaned Interpret date" list, I have a suggestion. There are dozens of death dates given as "DECEASED". Someone should train our GEDCOM import to interpret this as it's obviously meant. It would seem to be something that could be done after the fact (ie. on existing profiles) as well.

The text DECEASED seem to be on little over 100 profiles out of several millions. That doesn't make it very common.

I'm also going to help.
I will do a few

23 Answers

+9 votes

Hello! I will work on some of these this week.

Missy smiley

by Missy Berryann G2G6 Pilot (238k points)
+10 votes
This stuff is greatly annoying, so I'm ready to scrub profiles.
by Charles Avis G2G6 Mach 4 (43.6k points)
+9 votes
Cleaning junk? I'm in!
by Kathy Zipperer G2G6 Pilot (516k points)
+8 votes
Scottish ones again for me
by Sheena Tait G2G6 Pilot (145k points)
+7 votes
I will participate.
by Carolyn Adams G2G6 Mach 9 (97.9k points)
+5 votes
Will get some done as I can.
by Kandita Post G2G6 Mach 5 (50.4k points)
+5 votes
I'll chip in and get some done too.
by AM Hayes G2G6 Mach 2 (25.0k points)
+6 votes
I'll join in for a good clean up!
by Lyn Gulbransen G2G6 Mach 5 (51.6k points)
+5 votes
I'll tackle a couple of these
by Emily Holmberg G2G6 Pilot (174k points)
+5 votes
Hello, I am cleaning up the profiles my gedcom generated in 2012 whenever i get notice of them.the gedcom and the paf it comes from no longer exist and the trees in it have no sources. so all reference to it should be removed.
by Edwin Reffell G2G6 (7.0k points)
+5 votes
Yes, I'd like to help.
by Anonymous Stadelbauer G2G3 (3.8k points)

I don't see myself on the participant list -- is there something else I should have done to sign up? Not that the competition aspect is important to me wink

I think I just figured it out -- I didn't update the status of each profile. All's well.
+5 votes
I'll try and do a few,  I worry that I might offend someone by deleting some GEDCOM user id or something they thought was somehow valuable.
by Kelly O'Hair G2G4 (4.9k points)
Would we agree that GEDCOM user id's are useless to WikiTree?  I always delete them.

What I do leave on is the source of the original GEDCOM.  It helps us to know where it came from, and those with memberships in Ancestry can look at originals if open.

The GEDCOM UserID is worthless; however, be careful if you see FSFTID [FamilySearch  Family Tree ID] or ANCESTRYID [ profile ID] as these are legitimate identification numbers to the profiles that may have valid sources and family information you can transfer to WikiTree.  

WikiTreeX is an app on WikiTree that you can use to update a WikiTree profile or add parents, children and spouse to WikiTree along with the facts from the sources on the profile from  It is a real time saver.

For this challenge, do not remove these IDs - you can move them to the source section.


Why not move to the research notes or acknowledgements?

The Sources section is the recommendation in the Help pages and the Data Doctors Project. Suggestion 853 GEDCOM Junk page under Technical Stuff lists the headings that can be removed as well as duplicative or "junk" info under the headings.  There is also a video which is very good.

For example, if there is a heading ==Birth== and the next line says Birth, you can remove that since the heading is already there.

Thank you Sheryl for putting the link for the video, it was very informative.
You are very welcome.  I am glad you liked it.  

Please feel free to email/PM me if you have any further questions.  It does help if you see another heading that isn't on the list to be reviewed and added.

The DD Project is working on documentation and a form to make requests and suggestions are welcome.  Email/Pm me and I will be posting the form soon.
I am finding MH ID numbers also. I have been leaving them on profiles. I have not heard of what to do with the MH ID numbers, so I am treating them like the others.
Thank you for posting.  I will email you.
With something like MyHeritage numbers where there are no real sources, I suggest adding NO SOURCES to above the biography. I forget what the format is.

Hi, Judy,

This post is more about GEDCOM junk and what can be removed.  The MHID number should stay on the profile as the profile [and a subscription] to MyHeritage may lead to reliable sources.

You are correct - if there are no sources, the {{Unsourced}} template would be put on the profile above the Biography heading.

+4 votes
I am in.  Donna
by Donna Michelstetter G2G6 (7.1k points)
+4 votes
I have already started helping with a Connelly tree, but didn't look for credit on it.

May find others to improve.
by Joanna Gariepy G2G6 Mach 1 (15.5k points)
+5 votes
Are there instructions on how to clean one, what should a cleaned one look like?
by Cheryl Cunningham G2G6 (9.9k points)
I may be stepping on toes, but I get rid of all the unnecessary headers leaving: Biography, Research Notes, Sources, and Acknowledgements.  Then I move the GEDCOM reference to the Acknowledgements, fill in the Bio from data available, put any contradictory data in Research Notes, and, if there are no sources other than the GEDCOM, add Category: Sources Needed.

How does that sound?

Please take a look at my reply to Judy's question:

That will give you where to look for instructions.  Please feel free to PM me if you need additional assistance.

Thank you Judy, there is so much to learn and I want to make these things look better, appreciate your help.
Umm, I replied to you - Sheryl - I did put Judy's name in my reply so I apologize for the confusion.

Feel free to email me if you have questions.
Can I take advantage of this conversation to ask for general advice? I'm afraid of making mistakes so I just leave Gedcom where it is, but it would be better to learn more. The video helped a lot. But I don't see anything about Ancestry junk. Maybe it's not all junk? (example below). Are we supposed to keep the number in brackets? Are we supposed to keep the link? Sometimes an Ancestry record is not findable on FamilySearch, so in that case is it better to keep it? Thanks for any clues.

=== Source ===

: Source: [[#S1217571077]]

:: Page:  Ancestry Family Trees

:: Note:  

:: Data:  

::: Text:

=== Sources ===

: Source <span id='S1217571077'>S1217571077</span>

: Repository: [[#R1217476885]]

: Title:  Ancestry Family Trees

: Publication:  Online publication - Provo, UT, USA:  Original data:  Family Tree files submitted by Ancestry members.

: Note:  This information comes from 1 or more individual Ancestry Family Tree files. This source citation points you to a current version of those files.  Note:  The owners of these tree files may have removed or changed information since this source citation was created.

No REPO record found with id R1217476885.
Will send you a PM.
+5 votes
I'll try to spend some time on this. I always enjoy seeing a cleaned up profile when I am done. I'm working on bad dates.  Oops. I can't remember the code for NO SOURCES

by Judy Bramlage G2G6 Pilot (267k points)
edited by Judy Bramlage
+5 votes
I'll have a go but time is very limited this weekend
by Anon Sharkey G2G6 Pilot (173k points)
+4 votes

I will work on cleaning up some of the many that I have created frown.  It is a goal for me this year to clean up my profiles.

by Beth Blankenship G2G2 (2.6k points)
+3 votes

I watched the video and I can generate the list of profiles I manage that have GEDCOM junk, but how do I get the table of suggestions so that I can change the status of the ones I work on?

So I figured out how to get suggestions, but not for GEDCOM junk.  I can't get any for the ones on my list sad.  Can someone please point me to the instructions? Thanks!  In meantime, I'm finding lots of things to clean up . . .

by Beth Blankenship G2G2 (2.6k points)
edited by Beth Blankenship
+3 votes
I'll do a couple.
by Tracy Frayne G2G6 Mach 3 (37.3k points)

Related questions

+8 votes
8 answers
+16 votes
14 answers
+13 votes
10 answers
+12 votes
23 answers
+15 votes
15 answers
+16 votes
17 answers
+20 votes
22 answers
+13 votes
9 answers
+15 votes
6 answers
+11 votes
10 answers

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright
