New dump is being finalized Biography preview

+18 votes
538 views

I did receive first preview of biography from Chris and have some questions and examples. I got 1% of complete database and they are all from early days of wikitree. So I think newer biographies will look better on average.  

Longest profiles (longer than 100K):

http://www.wikitree.com/wiki/Brown-665
(a lot of duplicated sources)
http://www.wikitree.com/wiki/Turner-176
(a lot of duplicated sources, nothing wisible unless in edit mode)
http://www.wikitree.com/wiki/Hart-69
(Nice extended profile)
http://www.wikitree.com/wiki/Mayo-13
(Nice extended profile)
http://www.wikitree.com/wiki/De_Forest-15
(Nice extended profile)
http://www.wikitree.com/wiki/Sisson-12
(Nice extended profile)

23655 profiles are shorter then 100 letters, like

  • (all GEDCOM data imported)
  • This person was created on 09 March 2010 through the import of arie.ged.
  • ...

http://www.wikitree.com/wiki/Kosta-1

http://www.wikitree.com/wiki/Conn-5

 

With such ratio Empty profile error will have too many errors

 

Is this just recommended or obligatory form of bio.  

 

== Biography ==

...

== Sources ==

...

Enough for beggining

WikiTree profile: Emily Turner
in WikiTree Tech by Aleš Trtnik G2G6 Pilot (804k points)
retagged by Maggie N.
== Biography ==
== Sources ==  - Should also contain
<references />

And I believe it's a style guideline, not an obligation.

"23655 profiles are shorter then 100 letters" -- unsourced, no doubt.  *deep breath*

What's your question ;-)

A) With such ratio Empty profile error will have too many errors

Is that a problem? I started manually add profiles to Wikitree and just to test how GEDCOM import worked I took a small GEDCOM of a family tree. And lesson learned it's a nightmare to find those profiles to edit them.... All indication where I can find my small GEDCOM profiles would be excellent

My understanding one approach in this project is to define errors and then finetune later if you get too many false errors?!?!?

B) Is this just recommended or obligatory form of bio.  

No opinion....

Suggestion A maybe have a list of words that a profile at least should contain one of the terms as those terms indicates that the profile is following the honor code and use sources e.g.

  1. Source
  2. Sources
  3. Footnotes
  4. Census
  5. Källor (Swedish for sources)
  6. ....
  7. ....

A Google search on wikitree with some keywords that should be on a profile I feel gives an indication that something on the profile is missing 
site:www.wikitree.com/wiki -Source -Sources -Footnotes -Census

Suggestion B check if a profile has CONT in it ==> it is a GEDCOM imported profile that hasn't been cleaned

site:www.wikitree.com/wiki CONT ==> 72800 profiles (see also G2G)

Suggestion C is term "firsthand knowledge" ok to have on profiles older than 100 years (counted from death date).... see G2G

Suggestion D maybe have a list of terms indicating that more work need to be done

  1. "Prior to import, this record was last changed"
  2. "This person was created through the import of "

Suggestion E: Short profiles are often the case with Volunteers signing up and then don't do anything more. Maybe check short profile and not badge 

  1. Volunteer 
  2. Wiki Genealogist
  3. Family Members
  4. Guest members

  5. .....

In this thread started by Magnus I posted this comment:

Just a couple of results:

Over 5,000,000 profiles with size < 498

Over 5,625,000 profiles with size < 566

Over 6,250,000 profiles with size < 631

Over 7,500,000 profiles with size < 817

Over 10,000,000 profiles with size < 1743

Over 11,000,000 profiles with size < 4040

I think it can be concluded that over 50% of the profiles is unsourced, or insufficiently sourced.

Another "unsourced" indication is the following text (variables enclosed by < >):

No sources. The events of <profile's first name>'s life were either witnessed by <profile manager's name> or <profile manager's first name> plans to add sources here later.

Example:

No sources. The events of Indore's life were either witnessed by Ludwig Kraayenbrink or Ludwig plans to add sources here later.

Google search site:www.wikitree.com/wiki "No sources. The events of " gives about 98600 hits

footnotes (as a heading) and census are not required

 

Suggestion C is term "firsthand knowledge" ok to have on profiles older than 100 years.... 

 

My grandmother that lived with us was born over 120 years ago.  I knew her really well and have lots of memories.  [In my case, though, I do not post any profiles that recent.]  You could make it death >100y instead of birth and probably be OK.

Suggestion E: Short profiles are often the case with Volunteers signing up and then don't do anything more. Maybe check short profile and not badge

 

Nothing wrong with empty profile of living person.  Anonymous members in particular don't put much, if anything, in our data or bios because we aren't here to build our ego.  A better indication would be activity level or something along those lines.  Even if you mean profiles of ancestors, sometimes they're short because that's all we know.  A lot of common folk didn't make the history books and families didn't do a good job of passing on written records.

Headings like Biography and Sources are not required either, for they can be there in other languages. 

When I am working on a profile I will add the headings recommended in the Style Guide and then try to find at least one source for it. I am afraid that the majority of the over 4000 profiles I manage would have an "error" from this type of checking because doing good genealogy and finding good sources is not something that is a quick fix, and that has to be done before I could even think about writing a biography. In the early days the program did not put the same headings in or even put them in the same place as they are now and most of the profiles I work with are from that period, which also happens to be from a couple of years before I joined WikiTree. Even today new profiles created do not contain all of the wording recommended in the Style Guide so this is going to be a very tough error to work with even before you factor in all of the different languages you would have to deal with.
But that's ok, isn't it? The error report would be able to help you quickly identify which of your many profiles still need work.

The error report doesn't mean you have to fix the "errors" now.
Dennis, My point is I do not need the report to identify them, I just work from the earliest edit date because if they have not been edited since 2011 there is no way they would be in compliance and that is the majority of the profiles on my watchlist. The vast majority of profiles on here could be listed and that could make us look worse than we actually are. We need to have guidelines for a minimal biography section for say an infant who was born and died on the same day, and they are allowed on here, as well as for someone who was very famous and lived a long time before you could even try to fix them.
Sure, I'm glad you have a system that works for you. And I don't mean to be argumentative, but I submit that your method may not be quite as reliable as you think. For example, what if a cousin or someone else makes a small edit to one or more of your profiles? That will get your dates out of order now. :)

And what's wrong with looking bad? We are what we are. The error report helps us get (and look) better :)

And identifying what might be considered an error (in terms of bio text), is still a work in progress. I don't think anything is set in stone yet (and I've found that headstones aren't always accurate either)
FYI: I'm editing Brown-665 ... can't stand it.

Julie let me know if I should export a gedcom for Brown-665 from WikiTree and import it again so that you have merge work and cleaning to do for the next weeks....

At your service or as we say in Sweden ”Kämpa på"

 

I looked at that one, Julie.  You have my sympathy.
Magnus -- I can always count on you to have my back! ;-)

Nan -- Thank you. It's a sickness. I can't help myself.
I will miss that one. I am sure we will have better ones when all data is available. Sample I got had 110K profiles. That is only 1%.
Done.

It contained the same facts repeated over and over and over and over ... plus a duplicated profile with the same facts repeated over and over and over and over. I probably left more info than I need to.

If someone would like to track down some more accessible source citations, that would be awesome. I'm out of time now and have to go run a bunch of errands.

Cleaning GEDCOM 

Start feeling Talk pages would have been nice

On the profile just show nice information would be the best....the "ugly" gedcom could be moved to the talk page plus also discussions like below about Birth locations

Birth location 

Maybe it's just the same location explained in 4 different ways.... why didn't they use GPS and smartphones?

  1. London, England
  2. Middx City, Middlesex, England
    1. Middlesex is greater London
  3. London City
  4. Aldersgate, Middlesex, England
    1. Looks like Aldersgate is explaining if it was inside or outside the city walls see link
    2. More about Aldersgate a gate in the London wall

 


Big pic Aldgate is NE

Magnus --

You should add your notes to the profile in a Research Notes section ;-)

Julie This is pre pre research... but I will do... and be a good citizen in WikiTree land hope pass by London in the next month so then maybe I have more to add to the research section....  

Ps. I connected Brown-665 to this G2G topic... at least

Nice!!
We need something like a separate research notes page where our entries can not be deleted by anyone else including PM.
You mean something like the "Talk"-page on Wikipedia?
Not sure we should make notes that can't be deleted by anyone.  We've had some malicious members in the past, and some that get carried away sometimes and leave rude messages.
I'm not involved in Wikipedia, so no idea what Talk page is like.

And I didn't mean the staff couldn't delete, just something like a forum where only moderators and admin can delete.

Re Julie did some checking about the London Wall and it looks like Wikimedia/Wikipedia has a new cool function to annotate pictures,.,,, feels it would be magic to have it on WikiTree and be able to comment on old pictures....

Picture Link
Video

3 Answers

+8 votes
 
Best answer

Hi Aleš,

"Recommended vs. obligatory" ... that's actually a hard question to answer.

I think it's the same question as this from the style FAQ:

Is it forbidden to break the style rules?

We don't usually use the word "forbidden" when talking about style rules.

Things like pornography and spam are forbidden through our legal Terms of Service. The points of the Honor Code, such as those on courtesy and citing sources, are rules that all active members are expected to follow. Styles and standards are more like guidelines. Style rules are the community consensus for what should be done.

That said, we strongly recommend against using anything other than recommended styles, especially on Open profiles. If you do something that isn't specifically recommended on private or free-space profiles, you do so at your own risk. See below.

 

by Chris Whitten G2G Astronaut (1.5m points)
selected by Maryann Hurt
+10 votes
Duplicated captions could be an indicator of not editing the Bio after merge.

== Sources ==

== Sources ==

http://www.wikitree.com/wiki/Clarke-93

http://www.wikitree.com/wiki/Kidd-37
by Aleš Trtnik G2G6 Pilot (804k points)
Will the bio be compared with the dump?
I don't quite understand.

Bio will be included in new dump that Chris is preparing, so we can create new errors based on biography part of the profile.
I was thinking that another indicator of a bio not big revisited would be when a DOB is revised in the top part but not explained in the bio.
I understand what you mean.

No. Computers are still not smart enough for such things. Maybe in 20 years or more.
+8 votes
I could also validate all links on the profiles.

One error could be DNS part of the url.

The other would be whole URL, but there might be the problem, since some source links require login to access data. Those could be identified and ignored for standard sites (ancestry, findagrave,...). For others, should it be even allowed.
by Aleš Trtnik G2G6 Pilot (804k points)
  1. Do you get the raw text or the template?!?!
     
  2. Do you get the categories? I feel we have some links on categories but we have no bot checking if they are valid
     
  3. Links
    1. One problem is dead links 
    2. Another problem is that to often uploaded GEDCOM files create links to Ancestry that when you follow them has no genealogy value....
       
      1. Are empty
        example Sweden-29 links to Ancestry family tree that is empty and of no use and have no genealogy interest ==> should be flagged as an error and then the link should be deleted... 



        Example "empty" Ancestry page that are created in many WikiTree uploaded GEDCOMs and adds no value to genealogy 
         
        1. Maybe a Xpath can be used and see what e.g. 
          //*[@id="fixed_div"]/div/div/div/div[1]

          contains.....
           
      2. Needs login ==> a non prefered source inside WikiTree 
        exemple Eisenhart-39 links to Ancestry private tree not prefered way of sourcing 
        1. Feels like you get redirected to RequestTreeAccess if that could help
1.) I get Wiki text. What you see in editor of biography.

2.) Categories are separately extracted in additional table by Chris. So I also get categories added by template.

3.2.1 Require login

3.2.2 Require login

We could group login links as third error.

Ok Aleš 

The Ancestry links are rather depressing when you check them
220 000 hits site:www.wikitree.com/wiki AMTCitationRedir  

feels 9 out of 10 are useless....

My odd personal opinion is that all the sources from an external family tree as Ancestry should be moved over to WikiTree and the Ancestry family tree should just be in the See also section

You're a rock star Aleš. Validating links is just one of the hundreds of things we should be doing, but never actually get done. You're ticking off item after item on WikiTree's to-do list (and items we'd never thought of before).

I sometimes feel like my biggest use of my Ancestry account is to determine which Ancestry links are dead and which still function. Displays like that one from Sweden-29 indicate that the person who made the Ancestry tree has abandoned the tree, and the content is permanently gone. All "AMTCitationRedir" URLs with that same Tree ID (in that instance tid=21525863) will be equally useless. Removing the dead links and the associated advertorial text is tedious, even if a person could quickly identify which profiles have the URL (and which of the multiple Ancestry URLs on that page are affected), but an automated search designed to find all instances of a particular confirmed-dead tid would make the removal process go more quickly.

However, for WikiTree profiles that have little or no other indication of their information sources, I don't think that we should remove:

  1. Ancestry Tree links that require a subscription but are still working
  2. Ancestry Tree links that still work, but require permission from the content owner for access
  3. For instances where there is no working link to the Ancestry Tree, a brief indication that "Ancestry Family Trees" was the source of the content in the profile. I currently prefer to replace the lengthy citation to a dead Family Tree with *Ancestry Family Trees. Online publication - Provo, UT, USA: Ancestry.com.  Original data:  Family Tree files submitted by Ancestry members.

Note: People who don't have access to Ancestry should be aware that the string trees.ancestry.com in an URL does not necessarily indicate a link to an Ancestry Family Tree. That same link format also applies to other user-contributed content and records. Apparently when people save a link to an Ancestry record in their own Ancestry area (something I've never done), the link gets a trees.ancestry.com URL that carries forward when they later create a gedcom. These links generally continue working (with content quality that ranges from execrable to excellent); they don't go away when a Tree owner quits.

Related questions

+34 votes
8 answers
+7 votes
0 answers
+3 votes
1 answer
201 views asked Jul 26, 2020 in WikiTree Tech by Justin Cascio G2G5 (6.0k points)
+2 votes
1 answer
+4 votes
1 answer
+8 votes
2 answers
+15 votes
2 answers
217 views asked Aug 2, 2017 in WikiTree Tech by Helen Ford G2G6 Pilot (470k points)
+13 votes
3 answers
+3 votes
1 answer

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...