Wikitree statistics - Nov 2019

+81 votes
614 views

I have been tracking several statistics that approximately represent the quality of the Wikitree database.  Following is a summary of current information as of Nov 2019: 

Overall status:  21.8 M total profiles; 17.9 M or 82% are connected; 6.0 M or 27% have DNA links (from Wikitree info).

Profiles with known internal consistency issues:  113,000 or 0.5% of all profiles (based on Suggestions report data).

Sourcing:  about 12% with 3 or more sources, 35% with 1-2 sources, 15% poorly sourced, 25% unsourced, and 13% Unavailable (Unlisted/Red/Orange privacy) (based on random sampling).

Duplicates:  about 1-9% (based on Wikitree Match suggestions and random sampling).

Compared with June 2019 when I last reported on these statistics, there are 1.2 M more profiles.  The number of profiles with known consistency errors has dropped from 117,000 in June to 113,000 now.  The fraction of profiles with 1 or more sources is about 47%, about the same as June within accuracy of this estimate.  The estimate of duplicates has not been updated since Jan 2019.

A Free Space page with graphs, historical data and technical details is available here:

https://www.wikitree.com/wiki/Space:Wikitree_Statistics

in WikiTree Tech by Paul Gierszewski G2G6 Mach 8 (89.3k points)

7 Answers

+24 votes
Thank YOU Paul this is the sort of information we can all appreciate
by Susan Smith G2G6 Pilot (657k points)
+20 votes
Very interesting to see the overall view.  Thank you for doing this.
by Karen Hoy G2G6 Mach 4 (43.0k points)
+19 votes
Paul, thank you for doing this.  the work, especially the sampling, is time consuming.

The percent of unsourced declined from 40% in April 2018 to 25% in 18 months.  Even with statistical error, this is astounding!

I think you should post the chart on the Consistency errors, as that should give WikiTreers a good sense of the impact of their efforts to manage such suggestions.  Really positive over time.
by Cindy Cooper G2G6 Pilot (329k points)
Hmmm.  I tried posting a chart to this G2G thread using the add-picture-process, but it did not seem to actually load.  Nothing shows in the preview box.
Paulo, thank you for trying.  I know a picture can load because I saw one recently, but have not done it myself.
+12 votes
Hi Paul,

First I want to thank you for putting this information together. I know it can be time consuming, even with report data available, you still have to do the comparisons etc.

Thank you for the FSP that gives the visual and numerical details, also. That also takes some time to put together. All this is time, which you have volunteered freely. It is appreciated!

It's amazing how many more profiles have been added in such a short time! Wow :)

I was curious if more national reports could be pulled from this information, that we could take back to our respective projects so we can say "this is where our focus should predominantly be". On initial inspection, I would hazard sourcing is still the primary issue. If profiles were properly sourced, this would have a much more positive impact on other issues, pertaining to the database errors as it would be easier to fix them.

Thanks again. You're awesome!

Raewyn.
by Raewyn Vincent G2G6 Mach 7 (77.8k points)
I can't pull out national reports from the data that I presently have; that would need to be specifically tracked during the sampling.

I can see an improvement in sourcing with time in the data.  For more recently added profiles (past few years), the fraction with one or more sources increases to over 60%.
+12 votes
So glad I saw this post on the G2G feed. As I work as a Data Doctor or sourcing profiles sometimes I feel like there are more errors, duplicates, etc. and now I see that my assumption was there were a lot more problems, I estimated out of the 21.8 million profiles, that there was at least a 25 to 30% error rate.

Thanks you for your report and time Paul!
by Louann Halpin G2G6 Mach 7 (71.2k points)
+9 votes
Thank you Paul and well done to all wikitreers that have helped improve the quality of all the profiles!

In two years (since Nov 2017) the number of profiles has increased 39.9% (15.58 M to 21.8 M) while at the same time unsourced profiles has decreased from 48% to 25% AND sourced profiles has increased from 22% to 47%! (Poor remained the same at 15%)

This is a tribute to all the hard work everyone has made. Keep up the great work!
by John Sigh G2G6 Mach 1 (19.1k points)
+6 votes
This is really great information.  One thing I would really like to see added that might not be too much trouble, would be profiles without dates (I think that can be taken directly from the Suggestions report).

Profiles without dates really impact the search and matching capabilities and I think really can discourage new users.  Plus, since you can't create profiles like that anymore, we should be able to see that number go down. To me, its important enough that it should be highlighted.
by M Cole G2G6 Mach 8 (89.6k points)
I've been thinking along these lines also.  These profiles certainly detract from Wikitree quality, and there are quite a number of them.  And yes this number is available from the Suggestions report (#131-134).  However it has only been available since mid-2018, so does not yet have much of a record.  Numerical value is holding around 500,000 profiles, although as a percentage it is dropping slowly from around 2.7% down to current 2.3% of all profiles.
Interesting.  That makes sense that the number hasn't changed much.  It seems like a lot of the cleanup work is driven by challenges and "thons."  If you're going for quantity, a profile without dates is likely the last one you'll choose to work on.

I should provide more details.  During 2018, the total increased.  But over 2019, the total has decreased about 600 profiles per week on average.  So clearly people are working hard on these profiles. It just takes time to make a big dent when it starts from a large number.  I'll probably add this to the Statistics page on its next major update.

You have fill stats since 2016 on birth date set.

http://wikitree.sdms.si/default.htm?report=stat1&dataID=21&Year=2

It is going down since GDPR.

Related questions

+32 votes
2 answers
412 views asked Nov 4, 2020 in The Tree House by Paul Gierszewski G2G6 Mach 8 (89.3k points)
+54 votes
6 answers
451 views asked Nov 9, 2018 in WikiTree Tech by Paul Gierszewski G2G6 Mach 8 (89.3k points)
+43 votes
5 answers
833 views asked Nov 5, 2017 in The Tree House by Paul Gierszewski G2G6 Mach 8 (89.3k points)
+29 votes
2 answers
284 views asked Jun 21, 2019 in WikiTree Tech by Paul Gierszewski G2G6 Mach 8 (89.3k points)
+38 votes
5 answers
469 views asked Nov 19, 2023 in The Tree House by Paul Gierszewski G2G6 Mach 8 (89.3k points)
+75 votes
11 answers
1.3k views asked Nov 11, 2022 in The Tree House by Paul Gierszewski G2G6 Mach 8 (89.3k points)
+93 votes
9 answers
1.6k views asked Jun 10, 2022 in The Tree House by Paul Gierszewski G2G6 Mach 8 (89.3k points)
+50 votes
5 answers
540 views asked Nov 12, 2021 in The Tree House by Paul Gierszewski G2G6 Mach 8 (89.3k points)
+36 votes
4 answers
347 views asked Jun 9, 2021 in The Tree House by Paul Gierszewski G2G6 Mach 8 (89.3k points)
+63 votes
10 answers
693 views asked Jun 11, 2020 in The Tree House by Paul Gierszewski G2G6 Mach 8 (89.3k points)

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...