Wikitree Statistics - Nov 2020

+32 votes
411 views
Since 2017, I have been tracking several statistics that approximately represent the quality of the Wikitree database.  Following is a summary as of Nov 2020:

Overall status:  25 M total profiles; 20.8 M or 83% are connected; 7.4 M or 30% have DNA links (from Wikitree info).

Profiles with known consistency issues:  104,000 or 0.4% of all profiles (based on Suggestions report data).

Sourcing:  about 18% with 3 or more sources, 34% with 1-2 sources, 12% poorly sourced, 26% unsourced, and 10% unavailable (Unlisted/Red/Orange privacy) (based on random sampling).

Undated profiles:   506,000 (based on Suggestions report) (Profiles with no date information, and often no information.)

Identified Duplicates:  about 1-8% (based on Wikitree Match suggestions and random sampling).

Compared with Jun 2020 when I last reported on these statistics, there are 1.4 M more profiles.  The fraction of profiles with 1 or more sources is about 52%, up from June.

The number of profiles with known consistency errors has dropped from 107,000 to 104,000.  The number of undated profiles has dropped from 517,000 to 506,000.  The estimate of % duplicates is unchanged.

A Free Space page is available with graphs, historical data and technical details.
WikiTree profile: Space:Wikitree_Statistics
in The Tree House by Paul Gierszewski G2G6 Mach 8 (89.3k points)

Here is a graph showing Wikitree growth with time.

Wikitree statistics

Thanks for doing this, Paul.  This sort of tracking is important.

2 Answers

+13 votes

yikes, the number of profiles with no sources is insane! 

I knew it was bad, but not that bad.

Go sourcerers wink

by Anonymous Grand'Maison G2G6 Mach 2 (21.5k points)

Yes, but it was 40%, so overall it is really getting better smiley

Yes, optimistically, everything is going in the right direction. The tree is growing and getting higher quality at the same time. Totally unsourced profiles are in the minority.
I'd be willing to wager most of those unsourced profiles are post-1800, people entering trees that came from their grandma's notes. Those are probably pretty accurate, but of course not proven to be accurate.

Another group, and this would be a minority, are profiles that have sources in prose form, not as links or items in the references section. So the algorithm doesn't detect them as a source.

On the flip side, there's probably a lot of older ones that just have a vague reference to GEDCOM import from Ancestry. I wouldn't count that as a source, not sure what the computer algorithm does.
Okay, reading the description, it looks like the sourcing is done manually on randomly selected profiles. So I take back most of what I said. In addition, Ancestry links fall into the "poorly sourced" category.

https://www.wikitree.com/wiki/Space:Wikitree_Statistics#Wikitree_Profile_Sourcing
As Rob notes, the analysis is manual so I do look at the nature of the sourcing.  I consider a profile where information is clearly stated as entered by a child or grandchild as having one source.  There were a few of those in this sample analysis, but it was not common.

I have not tracked the results by profile date, so have no comment on which century is the least sourced.

Paul,

If you are comfortable with doing queries via WikiTree+ I can help you with crafting some tests to do some random sampling using Bio Check, after the initial flurry dies down.

See the evolving help. Maybe might be of interest.

https://www.wikitree.com/wiki/Space:BioCheckHelp

Kay, It would be great if we can provide a reasonable automated estimate of profile sourcing.  I had just noted the Sourcing feature in BioCheck, following up on its mention in today's weekly Wikitree Family News.  I am not sure the technology is quite up to the task yet, but would be happy to try it out.  I'll email you directly to discuss.
+10 votes
We just passed 25 million profiles and 750,000 members.

The 33 profiles per member rule is still holding. That's been true for several years.
by Rob Neff G2G6 Pilot (136k points)

Related questions

+81 votes
7 answers
614 views asked Nov 8, 2019 in WikiTree Tech by Paul Gierszewski G2G6 Mach 8 (89.3k points)
+54 votes
6 answers
451 views asked Nov 9, 2018 in WikiTree Tech by Paul Gierszewski G2G6 Mach 8 (89.3k points)
+43 votes
5 answers
833 views asked Nov 5, 2017 in The Tree House by Paul Gierszewski G2G6 Mach 8 (89.3k points)
+63 votes
10 answers
693 views asked Jun 11, 2020 in The Tree House by Paul Gierszewski G2G6 Mach 8 (89.3k points)
+38 votes
5 answers
468 views asked Nov 19, 2023 in The Tree House by Paul Gierszewski G2G6 Mach 8 (89.3k points)
+75 votes
11 answers
1.3k views asked Nov 11, 2022 in The Tree House by Paul Gierszewski G2G6 Mach 8 (89.3k points)
+93 votes
9 answers
1.6k views asked Jun 10, 2022 in The Tree House by Paul Gierszewski G2G6 Mach 8 (89.3k points)
+50 votes
5 answers
540 views asked Nov 12, 2021 in The Tree House by Paul Gierszewski G2G6 Mach 8 (89.3k points)
+36 votes
4 answers
347 views asked Jun 9, 2021 in The Tree House by Paul Gierszewski G2G6 Mach 8 (89.3k points)

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...