Automatic GEDCOM Cleanup

+46 votes
2.2k views

Are you importing a GEDCOM or trying to cleanup existing profiles created from a GEDCOM?

I have developed a free Chrome extension that will reformat the profile text created by GEDCOMpare (and some earlier GEDCOM import methods) into a chronological narrative.

It has been beta tested and currently has about 20 users. I am inviting anyone working on GEDCOM created profiles to give it a try.

I'm happy to hear suggestions for improvements :)

Details of the extension are here: https://www.wikitree.com/index.php?title=Space:WikiTree_AGC

Thanks!

WikiTree profile: Space:WikiTree_AGC
in The Tree House by Rob Pavey G2G6 Pilot (207k points)

Please see this problem that has just been reported. Is this actually something that was done by WikiTreeAGC, or was this a user-generated problem?

16 Answers

+14 votes
I'll give it a try.
by Marcie Ruiz G2G6 Mach 5 (59.8k points)
+13 votes
Looks interesting- will give it a try
by Shirley Gilbert G2G6 Mach 6 (66.4k points)
+17 votes

This is a nifty extension.

Here's an edit on a profile I adopted that was assisted by the extension (the edit also includes a couple of categories that I added manually).

https://www.wikitree.com/index.php?title=Webb-3023&diff=110019634&oldid=7501963

Thanks for building this, Rob!

by E. Compton G2G6 Pilot (194k points)

Thanks for sharing that. This is the older format where the fact sections are "===" headings rather than bold titles. Your share actually showed me a bug. I was not recognizing "=== Burial ===" as a burial (it uses '''Buried''' in the newer format so I was looking for that word). If it had recognized it it would have put it after the death. I will fix that in the next version.

That particular example doesn't have much in the way of sources so it doesn't demonstrate how those are handled. Here are the changes that the extension made on a fairly simple profile that I added from my GEDCOM today:

https://www.wikitree.com/index.php?title=Badger-1133&diff=110004928&oldid=110004893

"This is the older format where the fact sections are "===" headings rather than bold titles." - does this mean that we should not be using this style?

I'm talking about the older format that the GEDCOM import used to create before 2017 or so. 

The older format had headings like:

=== Birth ===
Birth facts here
=== Death ===
All death facts here
=== Marriage ===
All facts for all marriage here
=== Residence ===
All facts for all residences here
etc.

While the newer format (2017ish to present) uses:
'''Born'''
Birth facts here
'''Died'''
All death facts here
'''Marriage'''
All facts for all marriage here
'''Residence'''
All facts for all residences here
etc.

Your question "does this mean that we should not be using this style?" is presumably about biographies you are writing by hand. I'm not trying to tell anyone how they should write their profiles. Personally do not think you should use either of those styles exactly since they do not conform to the style guide for biographies: https://www.wikitree.com/wiki/Help:Biographies 

I would use that as your guide to the style to use :)

Got it. I have seen both styles in GEDCOM imports that need cleaning. I will check out the link for writing biographies. Thank you!

I do like the new extension. Unfortunately, Chrome doesn't work very well for me. I use Firefox and would loved to have an extension for that some day.
Vote for that as well. I don't like Chrome because of the usage tracking it does for Google.

I am also offended by popups that include the phrases: "For You" and "You Deserve."

rsl

Using the headings in Bold or in the ===XXX=== format produces a a profile that looks similar but using the "===" on the headings also makes a table of contents at the top.

When you are editing the profile in the edit version, it also makes finding the sections easier.

I guess if you want to only work in the text editing mode for research, fine, but I personally prefer the look of the Public Profile (left green box) and try to tailor my biographies to look good in that format, with a table of contents on the top, and in-line references on the bottom with the other sources.

And I'm just the opposite:  I'll do almost anything NOT to have a table of contents, which  pushes the start of the biography beyond the fold (2nd screen).  I especially hate biographies that could be written in one paragraph, because they contain only the vitals (BMD), that have a table of contents.
I'm glad I could help with improving the extension, Rob.
+13 votes
Sounds and looks pretty cool. Will there a version for Firefox as well? Maybe a bookmarklet would also be an idea ...
by Florian Straub G2G6 Pilot (197k points)
Thanks for the suggestions. I just read up on bookmarklets, interesting!

Once I get to the point that I'm not doing an update every day or two I will investigate that. It might allow people using iPads to use it.
+17 votes
I'm very impressed with how well this AGC extension transforms ugly GEDCOM imports with its reams of redundant text and meaningless source numbers into something that is very readable without losing any of the essential information.

Rob is also very responsive to including new ideas and fixes. Originally this extension only handled Ancestry GEDCOM imports - now it seems to be coping with any type.
by Jo Fitz-Henry G2G6 Pilot (171k points)
+15 votes

HELP

I am a new to using Chrome and don't know how to make this extension do it's thing.  I have it Pined to toolbar.

by David Dodd G2G6 Mach 3 (34.1k points)

Hi David,

It sounds like you have it installed OK. There is actually no need to pin it to the toolbar because this extension does not use the popup menu that helps to access.

What the extension does is make a new button appear in the WikiTree page itself, just above the biography text area. Only when you are in edit mode on the profile. See this page for more explanation.

https://www.wikitree.com/index.php?title=Space:WikiTree_AGC&public=1#What_does_the_extension_do.3F

Search in that page for this text "If the user goes into edit mode they will see a new button on left hand end of the toolbar above the biography".

I hope that helps.

All I can say is WOW.

If there are conflicts in data like Birth, It creates a == Research Notes == section with 'Issues to be resolved'.

If you are a Data Doctor and do GEDcom junk errors, try it.

+13 votes

Thank you Rob for making this extension.

I am working on 811 cleaning merges. So I cleaned the merge and then clicked that button. It was not a bad profile but this helped so quickly to get rid of the junk and make it easily readable. I am impressed. A real time saver!

Also, I am older and work from a desktop. Not familiar with computer workings, extensions, apps, etc. Due to fear of messing up my computer, I almost didn't add the extension. But it was an easy click and done, no download, run/save. Which is important for us that are not computer savvy to know.

Again Thanks

Terry

https://www.wikitree.com/wiki/Calvert-480

by Terry Fillow G2G6 Mach 8 (81.9k points)
Is this only for Ancestry? Not sure how to identify a Gedcom from Ancestry.
No it doesn't have to be from Ancestry. I initially developed it for my GEDCOM I am importing from Ancestry but it now works for most GEDCOM created profiles. If you find anything it doesn't handle well please let me know.
Will do!  Thanks
+14 votes

Space:WikiTree_AGC works great. All you have to do is a little profile clean up after using this extension. So much easier, and less time consuming cleaning up GEDCOM created profiles. Everyone should try this extension out.

by Keith Mann Spencer G2G6 Mach 3 (31.3k points)
+12 votes

Well done Rob - tested on https://www.wikitree.com/wiki/Eddy-781 & all good

Next tried https://www.wikitree.com/wiki/Biddick-9 & after clicking on button - no reaction

The two profiles both started with 

== Biography ==

''This biography is a rough draft. It was auto-generated by a GEDCOM import and needs to be edited.''

Eddy profile then had birth
Biddick had Christening
Most other gedcom stuff similiar
by Roger Davey G2G6 Mach 3 (36.2k points)
The second profile did not have any sources only user ID so it will not remove them as they are easy to remove if required.

It cleans up those with sources that show up several times as the same one is attached to name, birth, census, residence as an individual source.
Thanks for the examples. I will take a look at Biddick-9. As Hilary says it doesn't have any sources.

The reason WikiTree AGC decides to do nothing is that it looks for either a Birth, Death or Name section in order to find the start of the GEDCOM created part of the bio and this profile has none of those. I can probably make it handle this case though.
Cheers Rob & Hilary

In the latest version the extension now works for https://www.wikitree.com/wiki/Biddick-9

As discussed above there is very little in that profile once the GEDCOM junk is removed - just a baptism with no source.

The extension now adds the {{Unsourced}} template if there are no sources.

+11 votes
This is an awesome tool.  Thanks for creating and sharing it!  I've been using it on a gedcom recently imported from Ancestry.  It does a great job of putting everything in order and separating a big "lump" of data into separate events.  Makes the final editing go much faster.  It also nicely highlights possible issues like multiple birth dates or names.
by Paige Kolze G2G6 Mach 5 (55.3k points)
+14 votes

Version 0.1.11 of WikiTree AGC was published on the Chrome store today. This implements some of the requests I received on this topic and from some Data Doctors.

You can see the release notes on https://www.wikitree.com/wiki/Space:WikiTree_AGC. I posted this topic when version 0.1.7 was the current version.

But here are the main changes:

  • Removes a lot more of the "GEDCOM Junk" as described in this help page: https://www.wikitree.com/wiki/Help:GEDCOM-Created_Biographies and this data doctors video: https://youtu.be/yNnIv9JvOQA
  • Works on more old format GEDCOM profiles
  • A few other improvements regardless of the original GEDCOM profile format:
    • Combine the death and burial narrative when there is no burial date
    • Add “See also:” if there are additional sources
    • Add options to add newlines to before and within refs to make it easier to edit
Please let me know if you see any issues
by Rob Pavey G2G6 Pilot (207k points)

Hi Rob,

There is a slight cosmetic problem when running on Linux (and probably the Mac)—the agc icon is not found and so we get a broken image icon.  I took a look in the resources and it is the result of attempting to load 'images/agc.png' or 'images/agc_undo.png' but the files have the 'agc' in uppercase.  Windows filing systems ignore the difference, however, *nix based filing systems are case sensitive.

Hope that helps.

Geoff

Not sure if this is intended or not, but on some profiles, like 

https://www.wikitree.com/wiki/Arseneau-156

the tool simply doesn't work. All it does is turn off the enhanced editor, nothing else.

Thank you again Rob! I'm using this all over the place.

Heres another one that isn't reformatting

Heres another non responsive profile

Hi Rob, I applaud your efforts in this area.  I installed the extension today, then tried it on a randomly selected profile that contained gedjunk. 

https://www.wikitree.com/wiki/Mountz-21

Result:  Nothing.  I have the button in my menu, but it does not seem to do anything.  I'm using Chrome on MacOS.

I so want your extension to work.  I have strong desires to zap-away Wikitree gedjunk with one easy click!

Thanks for the feedback everyone.

I have a fix in the pipeline for these two:
https://www.wikitree.com/wiki/Arseneau-156
https://www.wikitree.com/wiki/Mountz-21
There was a bug in the parsing of the old format when there was no === Birth ===, === Death === or === Name === section.

Morrison-2820 is a bit more challenging. Not only are the sources between the == Sources == line and the <references /> rather than after the <references /> line (which I do handle in recent versions) but there is a === Notes === subheading in the sources which seems to cross referenced from the earlier === Note === section. I have never seen that before - it seems rather nonsensical. Removing the === Notes === subheading manually will allow the extension to work. I'm not sure that I will make the extension handle this case unless it crops up again.

I do have a task on my wishlist to do better error reporting in the case that it fails - so that the user has an idea what manual changes they could make to fix it.

Geoff, thanks for reporting the filename case sensitivity issue. Strangely I wasn't seeing an issue on my Mac.

It should be fixed in v0.1.12 which I just submitted for review.
Aaron, the bug where it turned off the enhanced editor but did not turn it back on is fixed in the upcoming v 0.1.12.

Thanks for reporting the issue.
Hi Christina,

Morrision-2820 now reformats in version 0.1.14 which should be on the Chrome store soon.
You're awesome Rob! Thank you!
+6 votes
Hi Rob,

I saw Hilary Buckle demonstrate the beta version on a YouTube video last week and it was AWESOME! I will definitely give it a try when I come across GEDCOMs.

Thank you!
by Carol Baldwin G2G Astronaut (1.2m points)
+8 votes
I am happy to give this a shot....
by Staci Golladay G2G6 Mach 6 (62.6k points)
+9 votes
I like it! Thank you for developing this.
by B. W. J. Molier G2G6 Mach 9 (91.0k points)
The options tab shows up as "My Test Extension Options", this should be something like "WikiTree AGC options".

Great tool!
Well spotted! I never noticed that because I always have so many tabs open I can't see the titles :)

I will fix it in the next version (0.1.14)

Thank you, and again! Awesome tool. I'm going to use it a lotlaugh

Sometimes, the button [A G C] is enabled but has no reaction. One example is this Dutch profile: Jan van Urk.

Any reason? This is a profile, imported in 2013 and changed once.

Thanks for pointing that one out. I have seen this issue before. It seems that, at one time, the GEDCOM import would put not just sources but whole === Notes === sections in BETWEEN the == Sources == and the references line. These days there is not supposed to be anything between these lines.

My parser currently doesn't handle that. I will put it on my list to fix.

Cheers,
Rob
I have submitted a new version (0.1.14) which works for Jan van Urk now. It should be available in the Chrome store in a day or so.
Thanks Rob! If I notice something, I'll let you know.
+6 votes
it only tells you the issues but does not clean geo for you

i like what it does cleans up bio nicely
by Kimberly Becerra G2G3 (4.0k points)
+5 votes
I really like what you have done.

Thanks!

..but I have one request - Could you please retain the info re the GEDCOM import, and just move it to an Acknowledgements section at the end, instead of removing it completely.

For example:

== Acknowledgements ==

* Pike-123456 was created by [[Pike-5935 | Christine Pike]] through the import of SOMEONES tree.ged on Jun 19, 2021.
by Christine Pike G2G6 Mach 6 (61.5k points)
There is an option to do this on the options page.

Some teams prefer that it is removed once the profile is cleaned up and others like to keep it around.
Thank you for pointing that out - I had missed it

Related questions

+6 votes
1 answer
+27 votes
4 answers
+38 votes
6 answers
732 views asked Sep 19, 2017 in The Tree House by Natalie Trott G2G Astronaut (1.3m points)
+7 votes
0 answers
+3 votes
0 answers
217 views asked Jun 24, 2022 in WikiTree Tech by Pam Smith G2G6 Mach 2 (28.8k points)
+10 votes
1 answer

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...