Wikitree Statistics - Nov 2018

+54 votes
432 views

I have been tracking several statistics that approximately represent the quality of the Wikitree database.  The last update was posted in June 2018 in G2G.  Following is a summary of current information: 

Overall status:  18.8 M total profiles; 15.1 M or 80% are connected; 4.6 M or 25% have DNA links (from Wikitree info).

Profiles with known internal consistency issues:  133,000 or 0.7% of all profiles (based on Suggestions report data).

Sourcing:  about 11% with 3 or more original sources, 32% with 1-2 sources, 13% poorly sourced, 29% unsourced, and 15% Unavailable (Unlisted/Red/Orange privacy) (based on random sampling).

Identified Duplicates:  about 8,805 or 0.05% (based on Suggestions report data).

Compared with June 2018 when I last reported on these statistics, there are 1.3 M more profiles.  Of particular note, the number of profiles with known consistency errors has dropped from 154,000 in June to 133,000 now.   Also, the fraction of profiles with 1 or more sources has increased from 38% to 43%, an increase which may be more than just sampling uncertainty (+/-5%).

A Free Space page with graphs, historical data and technical details is available here:https://www.wikitree.com/wiki/Space:Wikitree_Statistics

in WikiTree Tech by Paul Gierszewski G2G6 Mach 8 (87.9k points)
retagged by Paul Gierszewski
Very cool to see these stats. Please keep us posted!
Thanks so much Paul.  I get such a different perspective working on suggestions that it's nice to see these statistics.
Thank you, Paul.  Very interesting statistics.  I have been "sourcing" my heart out - maybe others are too!   Progress is good.  Go WikiTreers.

-NGP
A side note.  If I just look at profiles added since Jan 2017, the sourced fraction increases from 43 to 61%.  So we seem to be doing better now during profile creation.
Now that is good news!
Great stuff, Paul! Thank you so much for doing all this work and sharing it.
How do you qualify poorly sourced?
Poorly sourced is a link to an Ancestry tree or another website, or a vague source description.  The authors have provided something, but it is not directly useful.
OK, thanks!
I would not be surprised if there was a correlation.  Possibly one could test this by checking how many Suggestions there are for profiles (random sampled) and see if the number of suggestions correlates (inversely) with the degree of sourcing.

6 Answers

+25 votes
 
Best answer
Nice job Paul. Interesting statistics. A lot of duplicates. Sometimes the older profiles have PM's no longer active and never get merged. Good news about the 154,000 consistency errors dropping down to 133,000 now..
by Dorothy Barry G2G Astronaut (2.7m points)
selected by Susan Laursen
Actually, I was surprised at how low the duplicate numbers are in proportion to the total numbers. The consistency error and unsourced numbers are much higher. The arborists must be doing a terrific job at merging duplicates to keep those numbers so low.
I suspect that the duplicate profile count is an underestimate, but this measure drawn from the Suggestions report is presently the only reliable/repeatable estimate basis that I have on this topic.
+18 votes

This is so awesome, Paul. You've put together the sort of thing I was looking for with my WikiTree Dashboard suggestion. I'm only sorry I missed your earlier messages about it.

by Greg Slade G2G6 Pilot (664k points)
+17 votes
Thanks, Paul. I’ve been waiting on the update. Have had your stats page lately need on my Nav homepage since the last update.
by Pip Sheppard G2G Astronaut (2.7m points)
+9 votes
Thank you, Paul.  It's especially nice to see errors as a percent of profiles decreasing, as well as possible duplicates and unsourced profiles.  Makes all the Data Doctoring, and Source-a-thoning feel like we are making real progress.
by Cindy Cooper G2G6 Pilot (322k points)
+7 votes
Paul - this is great information.  I've always wondered how things were going.  Thank you for your work!!

Karen
by Karen Hoy G2G6 Mach 4 (41.4k points)
+7 votes
Thank you for sharing, it is always nice.to have a meter to look at to have some idea of where we are .
by SJ Baty G2G Astronaut (1.2m points)

Related questions

+32 votes
2 answers
395 views asked Nov 4, 2020 in The Tree House by Paul Gierszewski G2G6 Mach 8 (87.9k points)
+81 votes
7 answers
610 views asked Nov 8, 2019 in WikiTree Tech by Paul Gierszewski G2G6 Mach 8 (87.9k points)
+43 votes
5 answers
786 views asked Nov 5, 2017 in The Tree House by Paul Gierszewski G2G6 Mach 8 (87.9k points)
+30 votes
2 answers
466 views asked Jun 2, 2018 in WikiTree Tech by Paul Gierszewski G2G6 Mach 8 (87.9k points)
+38 votes
5 answers
440 views asked Nov 19, 2023 in The Tree House by Paul Gierszewski G2G6 Mach 8 (87.9k points)
+75 votes
11 answers
1.3k views asked Nov 11, 2022 in The Tree House by Paul Gierszewski G2G6 Mach 8 (87.9k points)
+93 votes
9 answers
1.6k views asked Jun 10, 2022 in The Tree House by Paul Gierszewski G2G6 Mach 8 (87.9k points)
+50 votes
5 answers
523 views asked Nov 12, 2021 in The Tree House by Paul Gierszewski G2G6 Mach 8 (87.9k points)
+36 votes
4 answers
337 views asked Jun 9, 2021 in The Tree House by Paul Gierszewski G2G6 Mach 8 (87.9k points)
+63 votes
10 answers
649 views asked Jun 11, 2020 in The Tree House by Paul Gierszewski G2G6 Mach 8 (87.9k points)

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...