More Wikitree Error statistics

+13 votes
306 views
  All Errors New Errors
Following is an analysis of the errors in the 11 June 2017 release of the Database Errors Report. The various identified errors have been grouped into the 12 categories listed in the left column.  The middle column is the percentage of errors in each category from the entire 11 June error report (about 1.95 million), while the right column is the percentage of errors in each category that are new errors as of the 11 June error report (about 14,000).  The similarity or difference shows how the types of currently generated errors compare with the existing errors in Wikitree. 
Consistency error 10% 8%
Gender issue 6 9
Unconnected / Empty profile 8 4
Profile should be opened 1 1
Duplicate profile 1 5
Wikidata mismatch 1 1
Find a Grave mismatch 9 14
USA too early in location 14 11
Uncleaned after merge 10 9
All upper or lower case 5 2
Unique name 14 14
Other spelling / style / field issue 21 23

 

in WikiTree Tech by Paul Gierszewski G2G6 Mach 8 (88.8k points)
edited by Ellen Smith
Thanks, Paul.  Great work!

I have been playing around with a genealogy wiki WeRelate and on the page WeRelate_talk:Vision I found those numbers from 2014.... could be interesting to relate number of WikiTree people from different time periods/locations with how many people have been living

 

107,600,000,000 Number of people who have ever lived [1]
21,100,000,000 Number of people who have been born since 1650  
7,100,000,000 Number of people alive in 2013 [2]
4,000,000,000 Total profiles on ancestry.com (incl. Duplicates) [3]
1,000,000,000 MyHeritage [4]
780,000,000 GenesReunited [5]
640,000,000 Names on RootsWeb World Connect Project [6]
460,000,000 Community Indexed International Genealogical Index [7]
400,000,000 GeneaNet [8]
130,000,000 FreeBMD Births [9]
98,600,000 Geni [10]
60,000,000 FamilySearch Pedigree Resource Files [11]
12,493,344 FreeREG Baptisms [12]
6,815,024 WikiTree.com [13]
5,038,660 GEDCOMIndex Open Source library [14]
2,489,522 Number of people on WeRelate [15]
1,335,000 Thomson-Gale's Biography Resource Center [16]
800,000 Rodovid Wiki [17]
630,600 Biographies on English Wikipedia [18]
224,000 The Political Graveyard [19]
139,929 FamilyPedia Wikia [20]
 

 

2 Answers

+3 votes
That is great summary. I like the categories you made.

Can you email me exactly which errors you put in specific category?

About new errors, some of those are the result of merges and LNAB changes. But I can't track that.
by Aleš Trtnik G2G6 Pilot (804k points)
The new errors is annoying.  People work hard correcting existing errors, and as fast as they are corrected, new ones are made. It is soul destroying.
Are you suggesting we don't allow people to add new profiles? I think such action will not be welcomed. :-)

But I can assure you that the rate of new errors is much lower than it was a year ago.
No, sorry, it was just a comment. It is good to know that the 'new errors' is getting lower, which in a way must be down to you. Until you decided to find and analyse errors, I don't suppose anyone knew how many errors there were, and where they were, until they were stumbled upon. Now every week we get to see how many there are, and get the links to go and sort them.

I think that working on sorting errors, from your data, has made people (well me anyway) just that little bit more careful when entering data. From comments on g2g, I feel that members are now aiming to have a blank error report, which is a good thing.
Ales, I will send you the spreadsheet tonight after work.

With respect to new errors, from a previous error analysis, about 70% of the "new" errors are in new profiles and 30% in existing profiles that were merged or changed LNAB.
I think main advantage in general is, that people are shown, what errors they make and will try to avoid them in the future.

Here you can see how number of each error changed in last year.

http://wikitree.sdms.si/default.htm?report=stat3
Paul, that sounds about correct. Do you want any more data from my servers. Some are available online, since I draw charts from it. And I can also add some, that might be interesting. Example:

http://wikitree.sdms.si/function/WTStatsJSON/Stats.json?dataID=3102

http://wikitree.sdms.si/function/WTStatsJSON/Stats.json?dataID=13

Some (perhaps many) of these "new errors" are inevitable. Examples:

  • When I add a findagrave citation to a profile, together with a discussion of the fact that the death certificate and gravestone have different dates of death, I fully expect that a new db_error will be generated, but I can't mark it as a false error until the new error report has been generated.
  • Similarly, when I find a record that includes a strange new spelling of a person's first name and I add that spelling to the "Nicknames" field, I fully expect that a new db_error will be generated, but I can't mark it as a false error until the new error report has been generated.

>> but I can't mark it as a false error until the new error report has been generated.

That could be fixed if you used the FindAGrave template and we added a parameter to support miss match of dates/names or not the same person
 

{{FindAGrave|69423684|~~~~|Napoleon Bonaparte|false=date}}

{{FindAGrave|69423684|~~~~|Napoleon Bonaparte|false=name}}

{{FindAGrave|69423684|~~~~|Napoleon Bonaparte|false=notsameas}}

Its much better that we add this information about known errors on the WikiTree profile than they will be stored outside Wikitree on Aleš server....

Magnus, I'm not concerned about the occasional need to mark a "false error" in the error report -- and I'd rather spend my time creating good content than on implementing an elaborate system to avoid having an item added to the report. I posted my comment because I was bothered by the implication that people who create new errors are bad people.

>> elaborate system to avoid having an item added to the report.

When doing genealogy you are always working with more or less certain things...

So its the other way around that the suggested change is a way to document 
uncertainty in a structured way that both human and machines can read....

WIkiTree miss often good tools for that we have some templates....

but marking everything as false errors that is unsure is what I call an anti-pattern that is counter-productive is my humble opinion.... we need to make it easier for the next researcher to understand that we have a problem or something uncertain...

As long as the total is going down new errors are not a problem.

Many will be fixed by the manager, but if they’re not noticed by them, the best thing is the manager is ACTIVE.

You can post a comment and more often than not they will then correct it. There is so much less work involved with new errors.

It’s the old errors, especially those locked behind public profiles that trouble me the most. I have to keep a spreadsheet log of my contacts and then go through the unresponsive manager process (although my experience of the UR process has been great so far).
+1 vote

Aleš  In one of your comments you asked if there was additional information from your servers that would be useful.

One thought is that if you (or Chris) could generate a list of random profiles from across Wikitree, then these could be analyzed to provide a statistical analysis of all the profiles in Wikitree.  Right now my analysis is just on the "error" profiles. 

For example, what percentage of profiles in Wikitree are unsourced; what percentage have clean bios, etc.  I would focus on big picture measures of quality. 

I'd need about 100 random profiles to comment on trends with about +/-10% accuracy.  400 profiles for +/- 5% accuracy [I would need help to evaluate this many profiles].  

Thoughts? 

by Paul Gierszewski G2G6 Mach 8 (88.8k points)

Related questions

+38 votes
5 answers
468 views asked Nov 19, 2023 in The Tree House by Paul Gierszewski G2G6 Mach 8 (88.8k points)
+75 votes
11 answers
1.3k views asked Nov 11, 2022 in The Tree House by Paul Gierszewski G2G6 Mach 8 (88.8k points)
+93 votes
9 answers
1.6k views asked Jun 10, 2022 in The Tree House by Paul Gierszewski G2G6 Mach 8 (88.8k points)
+50 votes
5 answers
539 views asked Nov 12, 2021 in The Tree House by Paul Gierszewski G2G6 Mach 8 (88.8k points)
+36 votes
4 answers
347 views asked Jun 9, 2021 in The Tree House by Paul Gierszewski G2G6 Mach 8 (88.8k points)
+32 votes
2 answers
411 views asked Nov 4, 2020 in The Tree House by Paul Gierszewski G2G6 Mach 8 (88.8k points)
+63 votes
10 answers
693 views asked Jun 11, 2020 in The Tree House by Paul Gierszewski G2G6 Mach 8 (88.8k points)
+81 votes
7 answers
614 views asked Nov 8, 2019 in WikiTree Tech by Paul Gierszewski G2G6 Mach 8 (88.8k points)
+29 votes
2 answers
284 views asked Jun 21, 2019 in WikiTree Tech by Paul Gierszewski G2G6 Mach 8 (88.8k points)
+12 votes
2 answers

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...