Is Sourcing = Quality. Benchmark on Notables: WikiTree FindAGrave Genealogics

+12 votes
422 views

As we now connects with Wikidata we can start benchmark what quality WikiTree has compared to FindAGrave, Wikipedia and Genealogics....

Please compare and comment. 

  1. Are we good enough compared to FindAGrave?
  2. Do we work enough with quality measurements of profiles?  
  3. Should a WikiTree profile have some "genealogy" lowest level... 
    1. If yes how do we define this level? and how do we start working in that direction?
    2. ?

I am not convinced that just chasing Unsourced profiles is the best way to add quality to WikiTree. Lesson learned is that if you dont measure quality you dont get it...

 


Bic pic  

Queries

  1. All profiles were all 4 sites have entries = 175 records 
  2. Comparing WikiTree and FindAGrave = +800 records
    1. Same query with pictures
    2. Same query displayed on a map
    3. Just people with Swedish citizenship

  

     

    asked Aug 11, 2016 in The Tree House by Magnus Sälgö G2G6 Pilot (237,570 points)
    edited Aug 12, 2016 by Magnus Sälgö
    With this you are only checking if a person is really famous.

    I am trying to validate data with wikipedia data. I did some comparison on cca 4000 profiles.

    For gender we have cca 30 missing. 1 was wrong.

    Birth date is many more differences. it is cca 800 of them, but I am still working on matching date ranges from Wikidata. So for those I will create an error, that date needs to be checked. Possibilities are that Wikidata has more or less accurate date, There can be also different dates and correct one must be established. If date is not correct It can be corrected in wikitree or if date is correct, error should be marked as false error after validation since error is on wikipedia side. It can be also corrected there, but I don't know if that is in our interest. There are also cases of two dates on wikipedia. Another cause for this error is wrong connection to Wikidata profile and must be deleted/changed.

    I will check also other dates, locations, maybe names and relations.

    I can also add an error, that there is missing relative on wikitree. Relative is present on wikipedia.

    Sounds great Aleš

    I think all tools/reports that help us better understand and make the quality better should be used.... 

    The cool thing I think with your tools/reports is that you just by checking the data find all those errors.... 

    My vision:

    Is that we need also to start to check "genealogy quality" 

    • if the conclusion are right from the sources
    • are we using all the sources we can find
    • are all the evidences from the sources correlated

    also see my answer below, but I have a huge problem with any comparison of Wikitree with Wikipedia.  There are some great well researched Wikipedia articles, and there are some that are just plain awful.  This is particularly the case with pre-1500 profiles.

    @John
    Other people do compare and the reaction is WikiTree is just linking to Wikipedia..... and I think most profiles are not well researched....

    A) If you compare all 4 do you agree?

    B) Should we change how WikiTree works or is it ok.....

    WikiTree should change but we (regular members) can't make the necessary changes.  I don't know of any genealogy site that enforces quality.  There is a need for it,  but only for a small number of serious researchers.  A quality site would not and could not ever be massively popular/mainstream.

    Mikey! 

    "A quality site would not and could not ever be massively popular/mainstream." 

    I completely disagree. JPVIV

    Hi

    I ran into the birthday problem yesterday with Katharine Faulkner (Faulkner-2078 ) and I do not know which is correct. I don't know the correct birthdate, there is documentation for both.

    My Mother has a similar problem and on that I know why. My Mother was born August 30, her family lived in the country they registered her in Sept.. When she signed up for her social security card, they told her, her birthday was listed as Sept 30 (early in the 20th C) They would not allow her to go outside the month of Sept to correct it, so she chose Sept 1st, but everyone in the family knows her b'day is really Aug 30. I can see where people might find documents that show her real b'day and  later her social security b'day.

    I always try to find a book reference etc. Wikipedia normally has a bibliography with bios. When you go to source a bio, information may or may not be correct, even in a book. I descend from Shirleys, before dna people assumed families living in the same area were connected and information was written that way and made public. We now know that is not always true.

    4 Answers

    +8 votes
    Sorry Magnus, I'm not exactly sure what your question is, but sources such as FindaGrave, Wikipedia and Genealogics are all what I call secondary sources, and as such their quality can vary enormously.

    The aim of any good genealogy should be to use primary sources where at all possible, or secondary sources that at least quote primary sources.  A FindaGrave record that has a photograph of the gravestone, or an exact transcription of the gravestone might be a quality source, a FindaGrave record that tells me a complete family history based on seemingly nothing at all, is very suspect.

    The same goes for a Wikipedia article or a record from the Genealogics database.  If we are trying to equate quality comparing Wikitree to these websites, then I think there is a problem in recognising what makes up a quality profile.
    answered Aug 12, 2016 by John Atkinson G2G6 Pilot (206,810 points)

    The aim of any good genealogy should be to use primary sources where at all possible, or secondary sources that at least quote primary sources.

    Yes that is the theory the reality you have in the report above

    Question John: Do you see that WikiTree has a quality problem? Other people do....

    I miss those well researched profiles with good quality. People check WikiTree and say we are just linking FindAGrave and Wikipedia..... Hearing that one time is one time too much....

    If we are trying to equate quality comparing Wikitree to these websites, then I think there is a problem in recognising what makes up a quality profile....

    NO you define the quality but a reality is the list above... 

    Status today 2016-aug 

    • we never review profiles in WikiTree
    • never check if a profile is just links to Find A Grave or Wikipedia or if it has genealogy value
      • As we don't use Templates for sources we need to do this manually

    Do we agree that we would like to change way of working or don't we care?


    My feeling is that profiles in projects like US presidents has some QA process but most profiles don't.... is that the way forward or should we add a Quality dimension to profiles?

     

    • just by knowing that someone will review a profile I think you get better quality from the writer....

      plus a review is also a way to transfer skills and a way to be a better researcher....

    My reaction after adding +3000 links between WikiTree and Wikipedia/FindAGrave is WikiTree miss a quality review of profiles

    From the list above I grade 1-5 1 is lowest 5 highest on the following parameters

    Genealogy value:   if WikiTree it's a good researched profile 
    Genealogy Ancestors: If we have some kind of family Tree
    Primary sources: If the profile cite birth records etc..
    Primary sources online: If other people can easy check sources...
    Good reading:   if it has a section interesting to read
    Categorization: is the profile categorized in a good way


    Adams-10 Tree I miss good genealogy links like to birth records, estate and inventories but it is at least not just links to WIkipedia. The links to sources miss ISBN number or other easy way to find the citations e.g. Mary Fairchild, "Christian Quotes of the Founding Fathers." I guess is copied from this webpage

    Genealogy value:  3
    Genealogy Ancestors: 4
    Primary sources: 0
    Primary sources online: 0
    Good reading:  4
    Categorization: 5

    Adams-12 Tree same quality as Adams-10

    Genealogy value:  3
    Genealogy Ancestors: 4
    Primary sources: ?
    Primary sources online: 0
    Good reading:  4
    Categorization: 5

    Adams-15 Tree is messy and also I miss good genealogy qualities. Links Find a Grave twice guess don't understand wiki formatting and using inline quotes with the name attribute see pdf

    Genealogy value:  2
    Genealogy Ancestors: 4
    Primary sources: 0
    Primary sources online: 0
    Good reading:  2
    Categorization: 3

    Adams-19 Tree just a link to Wikipedia no added genealogy value on the profile has a family tree. Links done don't use Wikipedia formatting is just a pasted URL. Categorization could be better feels like this person didn't had his own life and link Wikipedia is wrong (better link to WIkidata object Q75174 and redirect to Wikiprofile see new WikiData template)

    Genealogy value:  0
    Genealogy Ancestors: 4
    Primary sources: 0
    Primary sources online: 0
    Good reading:  0
    Categorization: 2

    Alcott-73 Tree has some genealogy value a good family tree miss primary sources. Primary sources not linked directly. Links done not using Wiki Formatting ==> looks messy. Categories just used for beeing an Author

    Genealogy value:  0
    Genealogy Ancestors: 4
    Primary sources: 2
    Primary sources online: 1
    Good reading:  0
    Categorization: 2

    My guess profiles about the US president has some kind of review in Project:US_Presidents
     


    Allen-1 Tree has some genealogy value a good family tree Miss primary sources. There is a family tree but some profiles is not cleaned GEDCOM imports

    Genealogy value:  4
    Genealogy Ancestors: 4
    Primary sources: 3
    Primary sources online: 3
    Good reading:  4

    Anthony-14 good story but not cleaned profile has 2 Bios miss primary sources just links to Wikipedia FindAGrave with the URL feels like skills in Wiki formatting is missing ...

    Has a family tree but lacks sources on some profiles

    Genealogy value:  0
    Genealogy Ancestors: 4
    Primary sources: 2
    Primary sources online: 2
    Good reading:  4

    Arthur-49 Tree no story and no sources is a family tree but not good researched

    Genealogy value:  0
    Genealogy Ancestors: 3
    Primary sources: 0
    Primary sources online: 0
    Good reading:  1

    Astaire-5 Tree GEDCOM imported profile not cleaned. Links to Ancestry that I can't follow. Sources follow Ancestry "standard" so maybe you can find them. Messy impression.... Has a Family tree but not well researched just linking Find A Grave

    Genealogy value:  2
    Genealogy Ancestors: 3
    Primary sources: 0
    Primary sources online: 0
    Good reading:  1
     

    Attlee-6 Tree Readable but just Wikipedia and FindAGrave miss primary genealogy sources. Has a Family Tree but not well researched

    Genealogy value:  0
    Genealogy Ancestors: 1
    Primary sources: 0
    Primary sources online: 0
    Good reading:  2

    This is great idea. We should make a template for that, that should be added to a profile. We could have Reviewers, who would grade profiles and add template. Maybe even bot could be made for that. It wouldn't be able to rate profile that well, but could rate presence of links to other sites, number of sources quotations, Length of bio, GEDCOM cleanup, Merge cleanup, Presence of categories and specific templates, ...
    Adams is a hard line to follow. I have yet the determination & expertise to hack / connect this line of ancestors. lol :)
    I like the idea of a ratings system for the profiles covered by the Notables project. I don't think it is realistic to apply it to every Wikitree profile though.
    I like it Magnus.  Do you use a rubric to help grade each parameter?  Something that specifies the difference between a 2 and a 3 for example?

    Where does DNA fit it?

    >> I like it Magnus.  Do you use a rubric to help grade each parameter?  Something that specifies the difference between a 2 and a 3 for example?
    >>Where does DNA fit it?

    Its just a draft I did a year ago.... As always with genealogy things will change. But just starting to read each others profiles and give feedback I think is a big step forward.... 

    I all the time get feedback that my Swedish citations are too complex 

    +13 votes
    I think that the statement "Most WikiTree Profiles Need Improvement" is completely non-controversial.  The question posed here is, "What Do We Do About It?

    At this point because of our restrictions on Gedcoms and pre-1700 and pre-1500 profiles, my hunch is that we are improving bad profiles faster than we are adding new bad profiles.  If this is true (and I haven't a clue how to measure it!) then gradually WikiTree is getting better.  

    What will make WikiTree better is spending as much time as possible actually improving the profile.  Adding inadequate sourcing makes it better than having no sourcing at all, and it may provide clues for the next person who finds the primary sources.  Plus there is always that tricky business where my idea of improving your profile just undoes all the work you thought would improve the profile...  

    If there are simple digital ways of automatically putting a little box on the profile that identifies the absence of sources or broken links, that's nice, but I don't think it would be worth while spending much labor-intensive time identifying bad profiles for others to fix.  Time is precious, and the time should be spent actually fixing things rather than going here and there identifying things for others to do.  Perhaps when we get down to less than 10,000 bad profiles it will be time to start making lists of the things remaining to do before we achieve perfection, but I don't think we're there yet!
    answered Aug 12, 2016 by Jack Day G2G6 Pilot (144,930 points)
    I agree with Jack 100%
    I believe (when it comes to Notables especially but family as well) that there are two phases that can often be separated by large periods of time (even though there are exceptions). - Profile Initial Entry - and - Profile Improvement.

    It's tough, when you're first starting out, as you want to see your tree emerge, fully formed, and there is a period where profiles are either GEDCOM created or quickly entered one after the other in a rush to populate the tree. I get that. It's exciting to see new levels pop up from your tree and there is a sense of satisfaction looking at a tree view and knowing that you have profiles in each of the spots.

    However, I get your point - when do we return to improve said profiles that may have been quickly generated with no sources, not linked to the global tree, and whose story has truly yet to be told (no biography)? And how do we keep track of all these profiles? Even more interesting, with Notables at least, there's always a push to churn out another Notable into the tree (being famous in some way, there's always interest in so-and-so who I saw on TV, film, heard about in history class, etc.). And unfortunately, many of them do get created with a name, a few dates, and a wikipedia link.

    I've been treating this like the whole eating the elephant principle. I can only improve one profile at a time, and I start with family and sources and hope to one day return to do more. I try never to leave one behind that doesn't have at least one primary source, has the family identified and profiles entered including father, mother, siblings, spouses, and children. It's time consuming - yes - but it's so very important. We never manage to get these strays connected to the tree or sources established unless we make a concerted effort to keep improving our profiles. I think the more profiles we can establish that separate us from a simple wikipedia entry where we have established family links, true genealogical research being done, clear links to the global tree - then I feel I can sit back and feel that we've accomplished something great. And it starts with one profile at a time.
    I'll disagree with Jack, sort of.  I'd say the genealogical value of a profile lies in what you think you can rely on.  If you were writing an article, how much would you take from a WikiTree profile without checking it out?

    If nothing, the profile might as well not exist.  It doesn't need improving, it needs creating, from scratch.

    So really we have 10m+ people that have IDs allocated, but are waiting for a profile to be created.

    But then again, there are loads of people who don't have an ID yet, who might be more useful, interesting or worthy than the ones we happen to be carrying as ballast.  Should we create those profiles instead?

    Or should we be more focussed on something else, like merging the duplicates or fixing the bad genealogy?
    I agree with RJ 100% (playing off D.B.'s reply).  10 M figure may be similar to my own guesstimate based on what I see that ~90% are poor quality.  The thing is that discussion here doesn't get anything implemented (well, almost never).

    Chipping away at the mountain or taking bites off the elephant are fine for people who prefer dealing with things that way.  But any tech savvy person should know that is not the efficient way to handle large data problems.  Data entry standards/rules, data validation algorithms, data correction management protocols,etc.are needed to turn the tide in a reasonable time.

    There has to be a commitment to quality first before anything like that is worth pursuing.  But among other things this site puts not hurting anyone's feelings above having the data correct.  That distracts from and ultimately severely curtails data correction efforts.  To make any progress, it must be acceptable to point out or correct errors without worrying how someone may feel about it (data analysts and tech gurus are not psychological counselors).  If we have to accept all the wacky stuff people post the ship is already scuttled.
    Mikey and RJ, Automation or database errors will never solve the problem without sources and someone who is willing to take the time to sort things out. As an experiment I searched just my name on only familysearch.org and came up with 42,410 results. By adding my birth state and year and searching only for males the count went down to 29 but none of those results were for me.Genealogy requires careful attention to detail and good sources for any accuracy and in my limited experience with the db errors project most of the "errors they report about profiles I manage are proven false errors by good sources.
    +4 votes
    There are lots and lots and lots of profiles with no source and no dates. I was reminded of the extent of the problem as I reviewed the site looking for profiles of people who have US Counties named after them.

    I don't know if they all come from GEDCOM uploads, but plenty of them do. I have edited literally hundreds of profiles to move "created by GEDCOM upload" from Source to Acknowledgements, and adding the "Unsourced" program label.

    I've started considering whether it might be worth the time to try and add dates to some of those profiles. I'm curious how many of that kind of profile we have on WikiTree. Anybody know how to search on that?

    As to findagrave as a source, it really does vary. I've used it a lot for the LDS Mexican Colonias project. There's a huge body of connected and sourced memorials there. I don't download and copy the pictures and documents and text, because of copyright and permission issues. But I can say, confidently, that many of the findagrave profiles are better than WikiTree's with text from wills, diaries and obituaries, family photographs and tombstone pictures; many with attached images of sources such as death certificates & census records.

    One place to be careful on findagrave is with the relationships. I find errors there from time to time. But when you have enough information, you can also add relationships, mostly in shared burial plots. One finds errors here on WikiTree from time to time, too. I often submit corrections an Ancestry.com, too, mostly because of errors in transcription on the index. And when there's a source like the Hale Collection inventory of Connecticut graveyards conducted during the New Deal, I give it some credence. Iowa had a WPA inventory during the 1930s, too. Those are IMO a credible substitute when a stone has gone missing in the c. 80 years since those inventories were performed.
    answered Aug 12, 2016 by Elizabeth Winter G2G6 Mach 4 (44,920 points)

    You can check the numbers on http://wikitree.sdms.si/default.htm group Statistics Item Database dump. Select Birth date in list. It is 1857888 and the number is growing 2000/week. The percent of this profile is falling, so it is improving and is cca 15%. With new checks on entry, I think this will improve even faster.

    Thanks. Interesting.1.8 million plus. Yeah, that's a LOT of them!
    +7 votes
    Find a Grave is a good secondary source. All the information is based on what is on a headstone. I've been working mainly pre-1800 profiles in Sourcing and DB errors. In the area of sourcing, I work exclusively in Massachusetts, which has a large number of published  vital records.
    answered Aug 13, 2016 by Bob Keniston G2G6 Pilot (122,370 points)
    There are too many findagrave memorials with no photo, no source, nothing but a memorial.

    Even my early contributions were entered without a source.  I know where I got the information about the burial (Allen County Public Library indexes of cemeteries) but nobody else looking at the memorial would know that. I'm fortunate in that another Findagrave user has photographed many of my early submissions to at least give them something to go on. At this point, do I feel like combing through several hundred profiles to add the source?

    And now with the error report including Findagrave errors, my error report is populated with Findagrave. Many of them are because I have researched and found no source for birth/death dates, but the person who created the FAG memorial has presented no source at all or the memorial is "burial unknown" with the dates, but no sources. I refuse to amend my WT profiles until I have a source to back up the dates.

    Frustrating!
    Natalie, I feel your pain, but in a different area. Profiles that I sourced early in the Sourcerer days don't fit today's criteria for citing. As I stumble across them, I fix them. I think the error report is going after FAG sources with no ID number. I know you don't want to just add an ID if the profile also needs a good source. Maybe just check some of the "offending" profiles every week and fix what you can, and mark the others as needing more research. That'll give you a list of profiles to try to add sources to. Ales has added some nice tools to the DB errors.

    >>There are too many findagrave memorials with no photo, no source, nothing but a memorial. 

    Why not see this as an possibility to

    1. request a grave photos using Find A grave
    2. Try to get the administrative documents 
    I do request photos, but most are requests are never filled.

    Related questions

    +14 votes
    1 answer
    157 views asked Sep 23, 2016 in The Tree House by Magnus Sälgö G2G6 Pilot (237,570 points)
    +11 votes
    2 answers
    197 views asked Jun 13 in WikiTree Tech by Magnus Sälgö G2G6 Pilot (237,570 points)
    +9 votes
    2 answers
    91 views asked Sep 27, 2016 in The Tree House by Magnus Sälgö G2G6 Pilot (237,570 points)
    +5 votes
    1 answer
    +14 votes
    1 answer
    +11 votes
    2 answers
    +5 votes
    0 answers
    +5 votes
    2 answers
    105 views asked Oct 5, 2016 in Policy and Style by Magnus Sälgö G2G6 Pilot (237,570 points)

    WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

    disclaimer - terms - copyright

    ...