upload image

Cleaning up Ancestry GEDCOM imports with the 'References First' method

Privacy Level: Open (White)
Date: [unknown]
Surnames/tags: sources ancestry biographies
Profile manager: Rob Jacobson private message [send private message]
This page has been accessed 461 times.
Credit goes to Dale Byers for the idea, Gaile Connolly (love your bio!) for the presentation of it, Chase Ashley a strong proponent of it, someone else for the bulleted facts, and Deborah Pate for exposing it to me. All I'm doing here is running with their ideas!
First, some history - Deborah mentioned it here, and I quickly recognized its excellence, especially at cleaning up Ancestry imports. So I presented it here and in a G2G thread, ONLY to find out the idea behind it was well known and well discussed (embarrassing, no!?). Gaile had already presented it in the 'Ultimate Solution' thread, where it has been thoroughly discussed. More discussion is here and a great tutorial on tags here. I strongly recommend reviewing Gaile's work, and the discussions.


References First

Note: Whether you call the method References First or Ultimate Solution or Dale's system or something you like better does not matter. It's all about gathering the sources into one place, and only inserting named references in the biography. References First refers to putting them first, at the top before the Biography, whereas the Ultimate Solution puts them in the Sources section, before the <references /> line. Either way is fine, although putting them at the top allows you to put them in the order you like, while putting sources in the Sources section makes more sense to many.
Note: Sourcing is so important, but while it's easy for some, it's a pain for many of us, a very tedious task, especially for those of us used to Ancestry and similar. In Ancestry, we could point and click on so many different sources, and easily add them to the person, with facts automatically linked to sources. In WikiTree, it's all manual, requires meticulous attention to punctuation and other details. It would be so nice if all our sourcing could easily transfer to WikiTree. But when an Ancestry GEDCOM file is uploaded to WikiTree, the result is pretty awful, requires a lot of handwork to make it presentable.
In WikiTree's Edit mode, embedded references make a bio look very cluttered and hard to read. They make a bio look unorganized, even though it may be well organized. The References First method is not perfect, but almost completely cleans up the bio, makes sourcing much easier, and self-organizes the whole page. Plus, when applied to Ancestry GEDCOM imports, completely cleans them up, easier and faster than any other method.
Basically, the difference is that instead of copying the citation into the text of the bio, you copy it to the top of the bio, before the "== Biography ==" line, then name each one, and refer to it whenever necessary by that name. This allows you to organize them chronologically at the top (or by any system you like), and removes all of the clutter from the bio text. You can still add items in the "Sources" section, but don't need to, as every single source can be grouped at the top. They won't display at the top, only at the bottom in the Sources section, just as they always have.
How do you form a reference at the top? Put <ref name=blah> before it, then the text of the citation (either you create it or copy it from somewhere else), then </ref> after it. The name part (<ref name=blah>) can be immediately before it, or on the line above, and most of us put it on the line above for better visual separation between the citations. An example - (see also the examples listed below, in Edit mode)
<ref name="1920Census">
Text of source citation about 1920 census ... blah blah blah....</ref>
<ref name='Census1930'>
Text of source citation about 1930 census ... blah blah blah....</ref>
<ref name=c1940>
Text of source citation about 1940 census ... blah blah blah....</ref>
How do you refer to that source within the Biography? Insert <ref name=blah/> wherever you want the source to be referenced. Note: the only difference in the naming of the reference and the using of it - is that slash just inside the right angle.
The names you choose don't really matter, as they are not visible outside of Edit mode. They do have to have a letter somewhere, so a 1930 census cannot be named 1930, but could be c1930 or 1930Census or C30. I often like using the year plus a leading letter. Marriages could be M or M2 or m1951 or Married or mRuth, etc. If the name you choose is only letters and digits (no punctuation or spaces), then you don't need single or double quotes around it. If you want a space within, then you need quotes around the full name (e.g. "1930 Census"). Single quotes work like double quotes unless you want an apostrophe, then you need double quotes (e.g. "Bannie's Notes").
My naming scheme is based on the event years, plus a leading letter that categorizes it. It provides an easy chronology of the sources and their associated events. I use b for birth sources, c for census sources, m for marriage events, w for war or military events, and d for death sources, plus some fixed names for common undated sources - fag=FindAGrave page, anc=Ancestry public profile, fs=FamilySearch page, obit=obituary. Here's an example list of source names for one life - anc, fs, b1897, c1900, c1910, w1918, c1920, m1922, b1923, b1925, c1930, c1940, m1942, w1942, d1947, d1947-2, fag. See how it provides a built-in chronology, plus without even looking at the source, you know what it is.
How you order and format the references is up to you, as there are a number of ways to customize them. Deborah did them one way, I did them a little differently, and the examples show other styles. I liked Deborah's use of bold (see the profile that she did in Edit mode, for her naming and her use of bold), although I preferred a simplified version with less typing. See Mary Lavender's profile for a way to add the corresponding facts to them. And I am sure you can do better! I always enjoy seeing others' ideas.
If you have multiple but different references to the same source, such as facts on different pages of the same document, here are two ways:
  • You can treat them as separate sources.
  • Or you can make one source, but add the differing parts as bulleted lines under it (lines beginning with an asterisk, see the profiles of Mary and Lydia). You can also use extra bullets to add notes and other facts from the source. Just remember that the asterisk MUST be in the first column, even in Edit mode (not even spaces before the asterisk).
Seeing is often more important than telling, so here are some examples. Make sure you view them in both normal and Edit mode!
In normal viewing mode, you may notice the line of references across the top, above the bio. That doesn't bother me at all, but if it bothers you, add <span class="hidden"> before the first reference, and </span> after the last reference, and the line of reference numbers will not be visible. You can see it in the examples above, in Edit mode.
References - top or bottom?
The Ultimate Solution method is basically the same as the References First method except for the placement of the citations - top vs bottom. But there's one behavioral difference, and here is a profile that demonstrates it. You can see that [1] refers to FS1, [4] refers to FS4, and [5] refers to FS5, but [3] refers to FS2, and [2] refers to FS3! And if you look at the Sources section, you'll see that they aren't in the order they were written in (in Edit mode), the second and third are reversed. That's because FS3 is referred to about 19 lines before FS2, and references are ordered by 'first mention'. How important is that? Not very! Unless you really wanted them in a certain order. That's what putting them at the top does for you, as that becomes 'first mention', invisibly. Now most of the time we don't really care what order the sources are, but sometimes we do, plus it's nice for [2] to refer to our second source. You can see the difference yourself on Dorothea's profile by moving the section of references to the top, then Previewing (don't save it!). [2] will refer to FS2 then.
I don't expect to change most minds of those already putting them in the Sources section, as that does make sense. But I have to say that putting them at the top is simpler and cleaner to me. It's Deborah's idea, and I like them there. It makes the profile in Edit mode so simple, all sources at the top before the Biography line, then the Biography, and last the two lines that say == Sources == and <references />, untouched. It's harder to get it wrong this way. But I'll fully understand those that prefer them in the Sources section.

Ancestry Cleanup

Ancestry imports are still hard, no way around that, but this method makes them manageable, and leaves both the normal Biography and the Edit mode clean and readable, presentable. (On completion, you can dress them up and rewrite them as you like, easier now without all the garbage.) Here's what I do, to apply this 'References First' method:
First, some background on Ancestry structure:
  • Starting every Ancestry citation is a Source ID, a long number, typically 10 digits, preceded by an S. The first part is the same for all of the citations, so ignore it, and just identify them by their last 3 or 4 digits (e.g. think 486 when you see S1214720486).
  • For many sources, Ancestry splits the citation into 2 parts - the general part (e.g. the 1930 Census) and the more specific part (e.g. the page in the census), and puts the general parts in the Sources section, and embeds the specific parts in the text, with a source ID linking to its general part (see Gilbert's profile in Edit mode for an example). Ancestry does this so you could refer to different pages of the same source. But this does not happen very often, so I rarely try to preserve the distinction between general and specific. We usually recombine them at the top. If you wish, you can apply the bulleted lines method above, to keep distinct parts of the same source together but separate (see the profiles of Mary and Lydia).
  • Actually, Ancestry splits them into 3 parts, adding a repository, that each general part links to. But since this is almost always Ancestry.com, I drop it. How many times should a citation say 'Ancestry'? 3 times, 4 times, 5 times? For me, once is enough. I don't go out of my way to remove 'Ancestry', so citations usually still have 2 instances. If the repository is anything but Ancestry, then you would keep it, move it into the citation that referred to it.
Now the steps:
  • In Edit mode, find the first ref and copy (not move) the entire citation (including the <ref> and </ref>) to the top, several lines before the "== Biography ==" line. You could move it if you want, but for safety it's better to be sure it's correct in its new location before deleting it from the old spot.
  • Look for the source ID at the beginning of that citation, then look for it at the bottom in the Sources section. Highlight only the text, not the source ID or spanning info or repository info, and copy it over the source ID (replacing it, including any punctuation around it) in your citation at top. This should complete that citation. At the bottom where you copied from, mark it so you'll know you have used it, and can delete it later. DO NOT delete it yet, until you've copied all citations out of your bio, as you may need it again. I place an x at the beginning of the text, so it's visible on Preview (let's me know I forgot it), but you can mark them any way you like.
  • Give the citation a name. To name it abc or c1930, change <ref> to <ref name='abc'> or <ref name='c1930'>. I like to press Enter after the ref, just for looks and visual separation.
  • To refer to those named citations at the top, you replace the entire embedded citation with the 'ref name' above plus a slash, just inside the right angle (e.g. <ref name='abc'/>, <ref name='c1930'/>). We're replacing the entire citation with a named reference, and we'll do that for every citation, even if it's only a source ID. Go back to the embedded citation and edit the <ref> to include the name and the slash. Then delete everything after the right angle to and including the ending </ref>. (e.g. <ref>Source S12345 blah blah blah</ref> will be replaced by <ref name='blah'/>)
  • Continue with the next citation, and do it the same way. Check first whether you already have copied it to the top, and if you have, just replace it with its named reference. Some sources aren't split, so all you'll see embedded is the source ID. They are done the same way as if they had text here. Their text comes entirely from the Sources section. In other words, they only have the general part, no specific part.
  • Once you arrive at the Sources section, you're basically done, and there should be no more citations or source ID's within the text of the bio. Now you can delete all of the sources in the Sources section that are marked as used (has an X if you did it like I do). Ancestry often will still have another source left, not referenced above, possibly something about Ancestry Family Trees. You can leave it there (after removing any source ID's, spanning info, and unnecessary repository info). Or you can copy the text out of it to the top also, and delete it. Since it isn't referred to, the name doesn't matter. There should now be nothing left under the "<references />" line. And there should not be any source ID's left, anywhere, on the entire page!
  • Now if you wish, you can reorder the citations at the top, into whatever order you like, such as chronological. Their order doesn't matter in the slightest.
  • I liked Deborah's use of bold (see the profile that she did, in Edit mode, for her naming and her use of bold), although I preferred a simplified version, with less typing.
  • Lastly, I take one pass through the cleaned up text, and add asterisks, commas, a <br> when needed, and correct the spacing, all to make it presentable (see examples above). You can reformat it any way you like, simpler now without all the garbage.
That's a basic cleanup. If I care about the profile and have time, I'll also want to connect it to FamilySearch, then use WikiTree-X to add more citations, add direct URL's to the images of the originals on FamilySearch (census's and other documents), and copy the actual facts from that source below the citation for that source. I strongly believe in associating facts with sources. For an example, see Mary Lavender's profile.

Status of the Method

This includes an appeal to the WikiTree staff for reconsideration!
If you read the previous discussions of this method, you will find that it is not currently approved for general use, although allowed for your own use. It's my opinion that the case for approving it was not well presented, at all. One person felt it was harder to learn, but I don't think anyone that has actually tried it ever thinks that, rather the opposite. It's easier to learn, particularly because the idea is intuitive. Another could not see any advantage of a new method over the current method (inline citations), but again, anyone familiar with inline citations, especially using named references, already knows how to do it and can easily appreciate the advantages of the new method. There's no WikiTree development needed, no features to add to support it, it already works great. And it doesn't force anyone to change anything. I really don't know of any downsides, and I do know of huge upsides.
I have to say that for myself, even if not fully approved, the way I feel now, I am totally sold on it! It's that good, that superior to any other method I've seen:
  • It gathers all sources into one place, where they are easier to manage, organize, reorder, and style. You can put them in the order you want, chronologically, grouped by type, whatever...
  • It removes almost all clutter from the biography, making it much easier to read, less confusing and error-prone, with only named references embedded. Named references are simpler and intuitive to use, therefore less prone to errors. Current biographies with inline citations are extremely daunting to many users wishing to edit them, especially new users.
  • It self-organizes the Edit page - sources in one place, bio in another, and no need to touch any other section.
  • Having the sources all together in one place makes it easy to apply the same styling and level of detail to all of them. Have it your way - fully traditional academic style or an abbreviated style, with little detail or a lot of detail, optionally with bulleted facts and links for each source.
  • Requiring embedded inline citations limits sourcing to those with an academic bent, and those who will reluctantly try it for a little while before giving up, or will rarely do it, as just too cumbersome a task. Editing requires more instruction, and meticulous care, more prone to errors.
  • Using this new method instead opens up sourcing to the world, makes it much easier, much more intuitive, more fun to do, and therefore is likely to result in much more sourcing by everyone. There's less instruction needed, as it just makes sense. "Here's how you form a citation", "Here's how you refer to it", done!
  • The 'coup de grâce' - just look at what this method does to profiles imported from Ancestry GEDCOM's. Check the examples above, and look at a before and after conversion using this method, in Edit mode. Check Gilbert, then Anita and Frank. How can anyone say this method is not superior, and should not be approved for general use, especially on Ancestry imported profiles?
The advantages of this method are too important, too superior not to be approved for full use. I personally would go a step farther and request future consideration for deprecating use of inline citations. That doesn't mean stop using them, if someone still prefers them, or remove any currently in use. Existing methods can stay approved and supported. But I would like to see some form of the new method not just approved but recommended for ongoing work. Once most users compare methods, they will clearly prefer this new method (in my view!). I can't see any downsides. And I really think you will see a lot more sourcing because of it, and fewer people leaving WikiTree.
I'd love to insert some flattery here, of our wonderful WikiTree developers, staff, and leaders! But that doesn't seem right, so I won't. You know how awesome you are, without me telling you!

Note: I'm not a good writer! If you see anything that could be improved, please do! Or let me know.

  • Login to edit this profile and add images.
  • Private Messages: Send a private message to the Profile Manager. (Best when privacy is an issue.)
  • Public Comments: Login to post. (Best for messages specifically directed to those editing this profile. Limit 20 per day.)
Comments: 5

Leave a message for others who see this profile.
There are no comments yet.
Login to post a comment.
What is the recommended order between the ref tags and the Categories. I've been putting the ref tags at the very top, then an empty line, then categories before the Biography header. This seems to work with the Category parsing, but I had someone try to rearrange the top of a profile I manage as "Minor corrections." https://www.wikitree.com/index.php?title=Rogers-41151&diff=163373669&oldid=163361015
posted by Andrew Zellman
- thank you for the page , Rob - great job - -

For the 'stand alone' ref numbers [1] [2] - I use the word 'index' , to help in their meaning -

= Sources

Index [2] [3] [4] [5] [6] [7] [8] [9][10] [11] [12] [13] [14] = - = https://www.wikitree.com/wiki/Keane-48 = cheers - john.a

posted by John Andrewartha
edited by John Andrewartha
Reference names don't have to start with a letter; they do have to include letters. I frequently use 1930Census as a ref name. It's a few more characters but leaves no doubt that I mean census rather than circa.
posted by Debi (McGee) Hoag
I do not like the sources listed first. I like them in Research Notes See Rolla Weaver
posted by Pat (Fuller) Credit
Sourcing is not a pain. Sourcing is easy.
posted by J. Crook