Wikitree statistics - June 2018

+26 votes
238 views

I have been tracking several statistics that approximately represent the quality of the Wikitree database.  Following is a summary of current information: 

Overall status:  17.5 M total profiles; 13.9 M or 80% are connected; 4.0 M or 23% have DNA links (from Wikitree info).

Profiles with known internal consistency issues:  154,000 or 0.9% of all profiles (based on Suggestions report data).

Sourcing:  about 11% with 3 or more original sources, 27% with 1-2 sources, 15% poorly sourced, 33% unsourced, and 14% Unavailable (Unlisted/Red/Orange privacy) (based on random sampling).

Identified Duplicates:  about 11,700 (based on Suggestions report data).

In general, these statistics are either modestly improved from last November when I last reported them, or are largely unchanged even though about 1.5 M profiles have been added since then.  Of particular note, the number of profiles with known consistency errors has dropped from 180,000 last October to 154,000 now.

A Free Space page has been created that has graphs, historical data, and the technical details – see https://www.wikitree.com/wiki/Space:Wikitree_Statistics

 

asked in WikiTree Tech by Paul Gierszewski G2G6 Mach 2 (25.2k points)
edited by Paul Gierszewski
Very cool, Paul. One suggestion on the calculation of the sourcing stats: I think you should exclude the profiles that are Unavailable because they are not meaningful with respect to level of sourcing and make the other sourcing stats harder to interpret. For example, 48% of the profiles fall in the categories of poorly sourced or unsourced, but the more meaningful stat is that is that 56% of profiles for which sourcing info is available are poorly sourced or unsourced. It might be interesting to have a separate stat for % of profiles with the different privacy levels.
Chase - thanks for suggestions.  I have included the "unavailable" profiles because they are included in the 17.5 M profiles that Wikitree lists on its website.  However, it is easy to correct for that if you prefer, as you have done.

I don't track the % of profiles with different privacy levels as it is not directly a quality statistic, but I think it is available from other means if you are interested.  From my sampling, 70-75% are open profiles.

2 Answers

+5 votes
Paul,

A somewhat related number is the surprisingly low number (365) of WTers who have made a correction as using the Status report (Statuses by User for week 20180527).

On the home page the number of participating genealogists is over 533,000. But how many are active?

There must be a way to remove all those that have not been active (at least one addition or edit) in the past year -- maybe two years -- and orphan and open all of their profiles. That would help solve the number of edits and merges being blocked by privacy and unresponsive mangers.

Data Doctor Walt
answered by Walter Steesy G2G6 (8.4k points)
That could cause some problems. I had occasion recently to propose a couple of merges that I expected would have to go to 30 day default because the other profile manager had not been active in well over a year. Imagine my surprise when she came on and approved the merges within 24 hours. Some of these "inactive members" are still around and might resent being removed and their profiles orphaned.
I think the topic of what to do with inactive profile managers has a long history of discussion on the G2G forum, and probably is a separate topic.  However your point on the relatively small number of users actively looking at the Suggestions report may bear more thought.  The strength of any wiki should be the large number of active users, and the Suggestions report is a great tool that makes it easy to focus on what needs to be checked.  Are most users aware of the Suggestions report?  I don't know. Should we advertise it more?  If so, how?
I think one of the problems with wikitree+ is not to get to the suggestion report itself, but to customise it for your own "need". I for example like to work on one or two particular errors, but I actually don't get the suggestion report that way that I can really work on it. So for now I do a workaround by using a list of a past challenge and work my way through that one. On the other hand I'd like to kick out all the profiles that are improved in the meanwhile. But how? Or am I the only one who is too stupid to get the way the suggestion report works?
The weekly data doctor's suggestion is by catagory and then error number. I keep a list of the ones (in a MS word document) I'm particularly interested in (gender in the 500s and format errors in the 800s) and my "stock" comment to put on a profile when I do one of those corrections. It then makes it easy to go find them each week.

I also have been keeping a listing of those favorites with the numbers each week (just of the open profiles) for over a year and have been able to watch my efforts (and others who also are working on them) on them. #603 has seen over 50,000 corrected since last October 1st thanks to the efforts of many data doctors.
Data Doctors and how they use the Suggestions report is one topic.  But I am curious whether the larger set of users are aware of the Suggestions report and how to use that information on their own watchlist.  Rather than relying on a small number of dedicated data doctors to correct 50,000 gender errors, how do we get 50,000 users to each correct the one on their watchlist?
The weekly news feed that goes to all on the active list could contain a short statement of what the suggestion list is and where to find it under the "My WikiTree" tab would be the best place to start.
+1 vote

If you go to the latest report or to the suggestion report tab on the data doctor sheet they are broken down in the individual cells by date across and by error going down. These are redone weekly to remove those that have been fixed.

They also have suggestion lists by state/ province and by some countries. The ones marked "Challenge" and "WikiTree" do not get updated. These can be reached in the upper left corner of the suggestions sheet or at THIS LINK.

There is also a tab called "Difference" that shows the change from the previous weeks suggestion report. There is also an INFO tab that has shortcuts to tasks, categories and stickers, and even tools to use. Feel free to add items that you find handy because this might make it easier for other people to find them also.

Suggestion Sheet

answered by Steven Tibbetts G2G6 Pilot (155k points)

Related questions

+27 votes
7 answers
+49 votes
6 answers
301 views asked Nov 9 in WikiTree Tech by Paul Gierszewski G2G6 Mach 2 (25.2k points)
+40 votes
5 answers
369 views asked Nov 5, 2017 in The Tree House by Paul Gierszewski G2G6 Mach 2 (25.2k points)
+13 votes
2 answers
243 views asked Jun 14, 2017 in WikiTree Tech by Paul Gierszewski G2G6 Mach 2 (25.2k points)
+16 votes
3 answers
+11 votes
3 answers
105 views asked Oct 5, 2017 in The Tree House by Lynda Crackett G2G6 Pilot (617k points)
+7 votes
1 answer
166 views asked Apr 13, 2017 in WikiTree Tech by Jack Day G2G6 Pilot (222k points)

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...