Since 2017, I have been tracking several statistics that approximately represent the quality of the Wikitree database. Following is a summary as of Nov 2020:
Overall status: 25 M total profiles; 20.8 M or 83% are connected; 7.4 M or 30% have DNA links (from Wikitree info).
Profiles with known consistency issues: 104,000 or 0.4% of all profiles (based on Suggestions report data).
Sourcing: about 18% with 3 or more sources, 34% with 1-2 sources, 12% poorly sourced, 26% unsourced, and 10% unavailable (Unlisted/Red/Orange privacy) (based on random sampling).
Undated profiles: 506,000 (based on Suggestions report) (Profiles with no date information, and often no information.)
Identified Duplicates: about 1-8% (based on Wikitree Match suggestions and random sampling).
Compared with Jun 2020 when I last reported on these statistics, there are 1.4 M more profiles. The fraction of profiles with 1 or more sources is about 52%, up from June.
The number of profiles with known consistency errors has dropped from 107,000 to 104,000. The number of undated profiles has dropped from 517,000 to 506,000. The estimate of % duplicates is unchanged.
A Free Space page is available with graphs, historical data and technical details.