Challenge of the week: Clean up GEDCOM-generated data

+9 votes

Hi WikiTreers,

Will you join our "Data Doctor" Challenge of the week?

Once again, Aleš has come up with something to help us work together on a category of profiles that need our help. This time it's GEDCOM-generated data that needs to be reviewed and cleaned up.

Here is the list of profiles that could use some TLC.

Will you join us?

Every time you record a status update it will earn you a point. The member with the most points at 11:59pm EDT on Sunday night will get the Winner badge this week and the bragging rights. But we'll all benefit from a neater, cleaner shared tree.

If you're participating, please post here to let us know. It's nice to cheer each other on. Or post if you have any questions about how to participate.

Thanks for helping!

P.S. If you want to chat or coordinate what you are working on with others, in addition to this G2G post there is a handy spreadsheet courtesy of Steven Tibbetts. 

Real-Time Tracking Stats

in The Tree House by Eowyn Walker G2G Astronaut (1.6m points)
reshown by Aleš Trtnik
Challenge is active.
I'll spend some time on this.

Scanning the "851 GEDCOM uncleaned Interpret date" list, I have a suggestion. There are dozens of death dates given as "DECEASED". Someone should train our GEDCOM import to interpret this as it's obviously meant. It would seem to be something that could be done after the fact (ie. on existing profiles) as well.

The text DECEASED seem to be on little over 100 profiles out of several millions. That doesn't make it very common.

I'm also going to help.

22 Answers

+7 votes

Hello! I will work on some of these this week.

Missy smiley

by Missy Berryann G2G6 Mach 6 (67.8k points)
+8 votes
This stuff is greatly annoying, so I'm ready to scrub profiles.
by Charles Avis G2G6 Mach 1 (12.1k points)
+7 votes
Cleaning junk? I'm in!
by Kathy Zipperer G2G6 Pilot (266k points)
+6 votes
Scottish ones again for me
by Sheena Tait G2G6 Mach 5 (54.8k points)
+5 votes
I will participate.
by Carolyn Adams G2G6 Mach 4 (41.1k points)
+3 votes
Will get some done as I can.
ago by Kandita Post G2G6 Mach 1 (12.2k points)
+3 votes
I'll chip in and get some done too.
ago by A Hayes G2G6 Mach 1 (10.8k points)
+4 votes
I'll join in for a good clean up!
ago by Lyn Sara Gulbransen G2G6 Mach 3 (37.1k points)
+3 votes
I'll tackle a couple of these
ago by Emily Holmberg G2G6 Mach 7 (75.6k points)
+3 votes
Hello, I am cleaning up the profiles my gedcom generated in 2012 whenever i get notice of them.the gedcom and the paf it comes from no longer exist and the trees in it have no sources. so all reference to it should be removed.
ago by Edwin Reffell G2G5 (5.3k points)
+3 votes
Yes, I'd like to help.
ago by Kelly Stadelbauer G2G2 (2.1k points)

I don't see myself on the participant list -- is there something else I should have done to sign up? Not that the competition aspect is important to me wink

I think I just figured it out -- I didn't update the status of each profile. All's well.
+3 votes
I'll try and do a few,  I worry that I might offend someone by deleting some GEDCOM user id or something they thought was somehow valuable.
ago by Kelly O'Hair G2G2 (2.7k points)
Would we agree that GEDCOM user id's are useless to WikiTree?  I always delete them.

What I do leave on is the source of the original GEDCOM.  It helps us to know where it came from, and those with memberships in Ancestry can look at originals if open.

The GEDCOM UserID is worthless; however, be careful if you see FSFTID [FamilySearch  Family Tree ID] or ANCESTRYID [ profile ID] as these are legitimate identification numbers to the profiles that may have valid sources and family information you can transfer to WikiTree.  

WikiTreeX is an app on WikiTree that you can use to update a WikiTree profile or add parents, children and spouse to WikiTree along with the facts from the sources on the profile from  It is a real time saver.

For this challenge, do not remove these IDs - you can move them to the source section.


Why not move to the research notes or acknowledgements?

The Sources section is the recommendation in the Help pages and the Data Doctors Project. Suggestion 853 GEDCOM Junk page under Technical Stuff lists the headings that can be removed as well as duplicative or "junk" info under the headings.  There is also a video which is very good.

For example, if there is a heading ==Birth== and the next line says Birth, you can remove that since the heading is already there.

Thank you Sheryl for putting the link for the video, it was very informative.
You are very welcome.  I am glad you liked it.  

Please feel free to email/PM me if you have any further questions.  It does help if you see another heading that isn't on the list to be reviewed and added.

The DD Project is working on documentation and a form to make requests and suggestions are welcome.  Email/Pm me and I will be posting the form soon.
I am finding MH ID numbers also. I have been leaving them on profiles. I have not heard of what to do with the MH ID numbers, so I am treating them like the others.
Thank you for posting.  I will email you.
With something like MyHeritage numbers where there are no real sources, I suggest adding NO SOURCES to above the biography. I forget what the format is.

Hi, Judy,

This post is more about GEDCOM junk and what can be removed.  The MHID number should stay on the profile as the profile [and a subscription] to MyHeritage may lead to reliable sources.

You are correct - if there are no sources, the {{Unsourced}} template would be put on the profile above the Biography heading.

+2 votes
I am in.  Donna
ago by Donna Graves G2G1 (1.3k points)
+2 votes
I have already started helping with a Connelly tree, but didn't look for credit on it.

May find others to improve.
ago by Joanna Gariepy G2G4 (4.9k points)
+3 votes
Are there instructions on how to clean one, what should a cleaned one look like?
ago by Cheryl Cunningham G2G5 (5.9k points)
I may be stepping on toes, but I get rid of all the unnecessary headers leaving: Biography, Research Notes, Sources, and Acknowledgements.  Then I move the GEDCOM reference to the Acknowledgements, fill in the Bio from data available, put any contradictory data in Research Notes, and, if there are no sources other than the GEDCOM, add Category: Sources Needed.

How does that sound?

Please take a look at my reply to Judy's question:

That will give you where to look for instructions.  Please feel free to PM me if you need additional assistance.

Thank you Judy, there is so much to learn and I want to make these things look better, appreciate your help.
Umm, I replied to you - Sheryl - I did put Judy's name in my reply so I apologize for the confusion.

Feel free to email me if you have questions.
+3 votes
I'll try to spend some time on this. I always enjoy seeing a cleaned up profile when I am done. I'm working on bad dates.  Oops. I can't remember the code for NO SOURCES

ago by Judy Bramlage G2G6 Mach 9 (97.8k points)
edited ago by Judy Bramlage
+3 votes
I'll have a go but time is very limited this weekend
ago by Anon Cormack G2G6 (9.7k points)
+2 votes

I will work on cleaning up some of the many that I have created frown.  It is a goal for me this year to clean up my profiles.

ago by Beth Blankenship G2G1 (1.7k points)
+1 vote

I watched the video and I can generate the list of profiles I manage that have GEDCOM junk, but how do I get the table of suggestions so that I can change the status of the ones I work on?

So I figured out how to get suggestions, but not for GEDCOM junk.  I can't get any for the ones on my list sad.  Can someone please point me to the instructions? Thanks!  In meantime, I'm finding lots of things to clean up . . .

ago by Beth Blankenship G2G1 (1.7k points)
edited ago by Beth Blankenship
+1 vote
I'll do a couple.
ago by Tracy Frayne G2G1 (1.2k points)

Related questions

+13 votes
22 answers
+13 votes
10 answers
+12 votes
15 answers
+12 votes
11 answers
+10 votes
9 answers
+8 votes
9 answers
+6 votes
3 answers
+8 votes
7 answers
+9 votes
4 answers

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright