Please cleanup GEDCOM imported profiles first...

+55 votes
1.7k views

The WikiTree AGC (Automatic GEDCOM Cleanup) is a browser extension to remove GEDCOM formatting from a profile.

If you have not tried this extension, it is highly recommended.

To work on a profile, it is much easier if AGC is used first, before adding sources, manually removing GEDCOM cruft, or completing a merge. Making those changes before using AGC may make it impossible for AGC to clean the profile.

For merges especially, the profiles should be cleaned (AGC’d) before the merge. It is much easier to do post merge cleanup if the GEDCOM portion is cleaned before the merge.

As an example, I was reviewing some profiles yesterday and those for which I could use AGC took about a minute, but those that had been merged before the GEDCOM was cleaned took about 30 minutes to clean.

And special thanks to the developer of this extension; he has made us all much more productive.

WikiTree profile: Space:WikiTree_AGC
in The Tree House by Kay Knight G2G6 Pilot (599k points)
edited by Kay Knight

I have started to use this tool and I think it is great. However, I noticed that it removes the sentence about who created the profile, such as "WikiTree profile <wikitree-id> created through the import of xxx.ged on <Date>".
I thought that those should be retained. So I usually edit the profile again to get it back. Should this tool retain that detail?

Susan,

Under AGC go to Preferences, and unclick "Remove the GEDCOM import text that states which gedcom the profile was created from (only do this if the profile will be fully cleaned up and sourced)."  On Firefox this is under the Add-ons Manager then Preferences.

It's up to you to keep the Acknowledgement. I recommend keeping it since then its easier to find the collection of profiles from the GEDCOM, since WikiTree+ will then find them searching gedfile=gedcom name. We use this for the GEDI Challenge.

Thank you, Kay for the advice. I didn't know something like this existed smiley

I'll be installing the add-on shortly. 

Thanks Kay!

10 Answers

+29 votes
 
Best answer
Totally agree with what Kay is saying here. I have occasionally removed sources to a text file so that I can revert a profile and clean it without losing any original information.  Then the sources can be returned but the process is much longer. Using AGC is impossible if profiles have been merged. Profiles sometimes were created with errors as the GEDCOM could not be read properly if dates were in certain formats the app can detect comments added and sometimes these need to be removed before the app will work.

I remember the time before the app was available and how long it took to tidy a profile. The app is really helpful so please use it and support the developer.
by Hilary Gadsby G2G6 Pilot (316k points)
selected by Kathy Nava
+26 votes
Thanks Kay for the advise on this.  We can all benefit from the apps that Rob Pavey has developed.

Thanks again to Kay Knight for bringing this valuable tip to our attention and also a huge shoutout (and maybe a cup of coffee) to Rob Pavey!!
by Brad Cunningham G2G6 Pilot (190k points)
+17 votes
I've just recently starting that, but I am having a hard time understanding everything it is trying to tell me, and what to do about it. What qualifies as 'gedcom junk' ?
by Marty Franke G2G6 Pilot (791k points)
edited by Marty Franke
Thank You Kay

Sometimes it feels like there is so much information, here I get lost.
Marty, check the Technical Stuff on the DBE space page. That has all the GEDCOM junk identified that Ales looks for. Rob's app knows to just ignore those items.

If the AGC identifier is shown, just select that and check what is put in the Research Notes, as well as a section that might be just above that section, for the parts of the profile that the AGC app had trouble with.
+14 votes

Thank you so much for this post.  I've been a data doctor & cleaned junk by hand.  Then learned Wikitree's Gedcom upload & compare & thought that was easy until I looked into the AGC. Wow!!  Rob Pavey is way smart. (too much wikitree today, couldn't spell what I wanted to say, lol)

by Kathy Schleicher G2G6 Mach 1 (11.7k points)
+15 votes

AGC App is extremely helpful whenever a Gedcom formatted profile is going to be improved.  It is helpful to keep the Gedcom name in the Acknowledgements section.  If it is left in the profile, then Wikitree + can be used to find the profiles with that gedcom name, which can help to connect profiles together that were split apart during some Gedcom imports. If it is removed and only exists in the Changes Log, it cannot be found by others trying to work on that gedcom. 

To keep gedcom name in the Acknowledgements section during the AGC cleanup, you have to make sure that the last item is unchecked in the 'Biography main text' user options, which states 

Remove the GEDCOM import text that states which gedcom the profile was created from (only do this if the profile will be fully cleaned up and sourced)

by Linda Peterson G2G6 Pilot (780k points)
After reading your comment, this option should be REMOVED from the app.
Whether it is in the app as an option or not, some projects had told people to remove the gedcom name, so many people do that.
+12 votes
I read this post yesterday. Since I spend a great deal of time sourcing unsourced profiles, I encounter many that need GEDCOM cleaning. I have been hesitant to try the different apps because the instructions are often over my head as an older non-technical type. A couple months ago I tried Rob Pavey's Sourcers App. It was easy for me to use and I am a big fan of it.

Reading that Rob developed this GEDCOM cleanup app, I thought I would give it a try. Wow! Easy and so useful. I will now use it whenever I encounter a profile the needs some GEDCOM clean-up love.
by Nancy Thomas G2G6 Pilot (207k points)
+8 votes
Thank you for this! I was frustrated by the removal of the gedcom import information and I am very glad to know that can be addressed.
by Rae Davis G2G6 (8.7k points)
+4 votes
I've not used the app, and so can't comment on how well it works, but many people seem to like what it does. Would it be worth giving any consideration to running it regularly across the entire Wikitree platform as a clean-up tool?
by Gina Meyers G2G5 (5.3k points)

The app is an add-on to your browser and will appear when you are in edit mode if available for the profile you are working on. It appears as a colorful box to the left of the tools with big letters AGC.

The extension is free to install and use (except on the Apple App Store). It works in many different browsers.

  • For Chrome, Opera, Brave, Edge, Vivaldi and other Chromium based browsers , install it from the Chrome Web Store.
  • For Firefox install it from the Firefox Add-ons page.
  • For Safari on Mac or iOS go to the App Store and search for "WikiTree AGC"
My question is whether it would be a better use of everyone’s time to have it automated across the platform, rather than having individuals run it on an ad hoc basis.
Gina,

The problem is that AGC may have difficulty with profiles that have been merged or otherwise edited. Although it does add research notes for issues to investigate, it still requires a person to review. Try it on a few profiles, and you'll see. It's similar to BioCheck - also needs a person to review.
+4 votes
Looks like an excellent tool, but I do have a couple of concerns that would impact what I am doing with the Dyer gedcom cleanup I have been working on for over a year:

1. Seems that the AGC could be a good starting point but it removes the span id from the source statements. I use the span ID to associate the gedcom loaded sources to the data they support. Many of these are decent sources, maybe not first quality, but they do include family histories, etc. that provide good hints. For example, a marriage in the text of the document may reference span id that points to a family history book, census, vital records, etc.

2. It does leave the sources in a format that does not match wikitree standards. So I still have to make updates

I see that there is a page where I can request updates, so I will head there with my suggestions.

Will use where I can!

Sally
by s Davenport G2G6 Mach 6 (66.1k points)
Sally,

Can you provide an example profile Id? Before using AGC

I haven't seen it lose sources - typically the span converts to an inline ref. I don't believe this is controlled by preferences.

I'm not sure about the source not meeting WikiTree standards, since the guidance is quite open. Again check the AGC preferences settings.

On edit - it could be a case where the profile has been edited or merged. Sometimes you can get AGC to work anyway, but it's tricky.

Thanks for the quick response. Will put on my developer hat and try to explain the issue clearly - I remember how complicated communication between user and developer can be! 

Here's a sample 

[[Burrage-3|William Champlin Burrage]]

(Note: I have to remove the statement: "While processing....." from the text before running AGC to avoid an error message. Within the DYER gedcom, the marriage date of the subject and of his/her parents always ends up in this section. There are about 30,000 entries in the DYER gedcom load, so it would be worth it to be able to run the tool).

Comment re: "I haven't seen it lose sources - typically the span converts to an inline ref. I don't believe this is controlled by preferences.

Agree that I see the inline reference

Date: 12 JUN 1906
Place: Providence, Providence, Ri
Source: #S28
Page: 20:298

but my concern is that the corresponding source statement in the sources section does not include the span id # to tie back: S28. See below:

* Title: Vital Records of Rhode Island, 1636-1930 (database online) Orem, UT: Ancestry, Inc. 2000. Abbreviation: Vital Records of Rhode Island, 1636-1930 (database online) Orem, UT: Ancestry, Inc. 2000. Note: Providence Births 1636-1920, Providence Marriages 1851-1920, Providence Deaths 1636-1930, Bristol County - Barrington, Bristol, and Warren, Kent County - Coventry, East Greenwich, Warwick, and West Greenwich, Richmond, South Kingstown, and Westerly. NS401323. Source Media Type: Electronic. Master Listing Source: Y

It's not really a problem in this particular profile, since there is only the one source. However, in cases where there are several, it's difficult to associate the sources with the facts. Many of the ones look more like this one:

[[Clough-26|Charles Osborne Clough (1820-1908)]]

Some are even more complicated!

The issue could be addressed by including the span id # in the source statement (maybe like this?):

Title: Vital Records of Rhode Island, 1636-1930 (database online) Orem, UT: Ancestry, Inc. 2000. Abbreviation: Vital Records of Rhode Island, 1636-1930 (database online) Orem, UT: Ancestry, Inc. 2000. Note: Providence Births 1636-1920, Providence Marriages 1851-1920, Providence Deaths 1636-1930, Bristol County - Barrington, Bristol, and Warren, Kent County - Coventry, East Greenwich, Warwick, and West Greenwich, Richmond, South Kingstown, and Westerly. NS401323. Source Media Type: Electronic. Master Listing Source: Y. Ref # S28.

That way a researcher could see which reference supports each fact.

In regard to the source format question, the generated format is fine - everything is there. My question is related to my understanding that we were supposed to use Evidence Explained format. It's not a big deal. AGC does include all the proper information. If the extension could just include "ref #" in the source, that would help immensely.

Does that make any sense? 

(Background: I have been working on the DYER gedcom for over a year. What I have done is create a file that contains all the sources I have found so far, organized by the span id (which is pretty consistent throughout - eg S4 is always the same reference within any profile associated with the DYER gedcom.  Then I just look up in my DB, copy and paste into the profile, usually as an inline reference instead of in the sources section). I prefer inline.

Let me know if I am totally unintelligible. I have been retired from the IT world for a long time....

Hi Sally,

I am the developer of AGC. Thanks for reporting this issue.

I will take a look at this profile when next working on AGC (I'm working on my Sourcer extension right now). It is a 2010 import which is a very early one, the format produced by GEDCOM imports kept changing.

I may be able to add to add some code to handle the: "While processing relationships in the gedcom some additional information was found which may be relevant." section.

I may also be able to add a scan for any use of a span id (even in unrecognized text) before removing the span anchor.

Cheers,
Rob

Sally,

I took a look at the profiles. Yes, its up to Rob to solve this. Both do seem to be related to the unrecognized marriage of unrecognized people. I would guess that maybe there is no 'fact' to tie the reference back to so we don't get the ref. Maybe in this case it should be a see also under the references.

Kay
+4 votes
This is a great reminder, Kay.  I discovered that I already have the extension but had never used it.  So I checked and set all the preferences.  Now to find some profiles to use it on.  I have never myself imported any GEDcom, but have adopted profiles that  were done this way.  I'll try to clean them up!
by Cindy Cooper G2G6 Pilot (329k points)
So nice to first use this one, then use the auto bio! Wham bam! Big fan of both!
Love auto bio!

Related questions

+3 votes
1 answer
+12 votes
1 answer
+3 votes
1 answer
+12 votes
5 answers
+24 votes
3 answers
+11 votes
2 answers
+15 votes
7 answers
309 views asked Dec 24, 2014 in Policy and Style by Julie Ricketts G2G6 Pilot (487k points)

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...