Location: [unknown]
Surname/tag: Statistics
Contents |
Wikitree Statistics
Last updated 17 Nov 2024 - added Missing Location results
This page is for an ongoing summary of Wikitree quality indicators.
Information on general statistics on profiles within Wikitree can be found at: https://plus.wikitree.com/default.htm?report=stat1 Under "Database dump" select the type of statistics of interest and click "Get table".
Overall Wikitree Status
The following table of data shows the growth of Wikitree over time in terms of total number of profiles, number of connected profiles, and number of profiles with DNA test connections. These numbers are as reported by Wikitree.
Technical notes:
- Prior to about Feb 2016 the total number was 8% overstated since it included merged profiles.
- Drop in Number of DNA linked profiles in May 2018 due to changes to comply with EU privacy rules.
- Some early values are monthly averages.
Date | Total Profiles | Connected Profiles | Profiles with DNA | |
14 Nov 2024 | 39,973,814 | 34,689,761 | 14,041,855 | |
15 Nov 2023 | 36,210,475 | 31,230,244 | 12,415,008 | |
10 Jun 2023 | 34,651,302 | 29,747,035 | 11,700,445 | |
8 Nov 2022 | 32,353,319 | 27,594,516 | 10,664,900 | |
7 Jun 2022 | 30,774,171 | 26,127,434 | 9,956,897 | |
8 Nov 2021 | 28,620,028 | 24,127,642 | 8,958,243 | |
6 Jun 2021 | 27,138,283 | 23,744,052 | 8,274,116 | |
3 Nov 2020 | 24,996,828 | 20,803,351 | 7,397,210 | |
9 Jun 2020 | 23,596,972 | 19,522,130 | 6,742,871 | |
1 Jan 2020 | 22,210,087 | 18,268,598 | 6,173,368 | |
6 Nov 2019 | 21,798,127 | 17,868,106 | 5,982,205 | |
30 Jul 2019 | 21,011,879 | 17,155,386 | 5,678,443 | |
18 Jun 2019 | 20,645,245 | 16,800,932 | 5,524,106 | |
27 Mar 2019 | 20,005,060 | 16,224,171 | 5,220,156 | |
8 Jan 2019 | 19,308,041 | 15,575,517 | 4,866,658 | |
6 Nov 2018 | 18,779,693 | 15,076,525 | 4,629,120 | |
1 Oct 2018 | 18,496,225 | 14,804,060 | 4,499,992 | |
28 May 2018 | 17,482,356 | 13,852,666 | 4,004,584 | |
16 May 2018 | 17,437,255 | 13,802,722 | 4,270,056 | |
8 May 2018 | 17,372,228 | 13,738,055 | 4,235,537 | |
8 Apr 2018 | 17,105,544 | 13,497,437 | 4,088,486 | |
28 Jan 2018 | 16,213,128 | 12,860,615 | ||
29 Oct 2017 | 15,581,091 | 12,069,345 | 3,200,000 | |
9 Jul 2017 | 14,579,940 | 11,220,333 | ||
25 Apr 2017 | 13,902,843 | 10,642,591 | ||
16 Apr 2017 | 13,881,513 | 10,579,228 | ||
29 Jan 2017 | 13,157,338 | 10,023,513 | ||
24 Jul 2016 | 11,831,219 | 8,891,739 | ||
8 Feb 2016 | 10,629,448 | 7,869,136 | ||
1 Jan 2016 | 11,378,699 | 7,640,805 | ||
1 Jul 2015 | 10,259,275 | |||
1 Jan 2015 | 8,945,881 | |||
1 Sep 2014 | 8,094,866 | |||
1 Jan 2014 | 6,567,960 | |||
1 Jul 2013 | 5,489,983 | |||
1 Jan 2013 | 4,502,821 | |||
12 Jan 2012 | 3,000,000 | |||
23 Jul 2011 | 2,000,000 | |||
20 Dec 2010 | 700,000 | |||
31 Aug 2010 | 200,000 | |||
15 Nov 2009 | 50,000 | |||
18 Jun 2009 | 20,000 | |||
31 Jan 2009 | 50 | |||
1 Nov 2008 | 1 |
Wikitree Profile Accuracy
Ideally we would have a measure of correct profiles. Possibly in the future there will be some stamp of approval that indicates a profile has been reviewed and is considered accurate. However, in the meantime, we can monitor the amount of known incorrect profiles.
In particular, the Database Suggestions report runs a number of checks on profiles. Many of these checks are related to missing information (e.g. gender), unusual information (spelling of names) or formatting issues. However a number of these checks identify information in a profile summary that is not physically possible or at least internally consistent - for example, being born after a child was born. These identified consistency errors provide a measure of the known inaccuracy of the listed profiles.
Technical notes:
- Not all identified consistency errors are real, but in my experience most are.
- A profile can have more than one consistency error. No correction made for this.
- The following Suggestions Report items are considered consistency errors: 101, 102, 103, 104, 111, 112, 201, 202, 203, 205, 206, 207, 208, 209, 210, 301, 303, 305, 306, 307, 308, 309, 310, 401, 403, 404, 405, 406, 407, 408, 412, 413, 414, 415, 417, 418, 606, 636, 666, 911, 912. [slight changes over time]
- Total number of suggestions changes as new checks are added, and is not tracked here as many suggestions are not profile errors.
Date | Total Profiles | Consistency Errors | |
14 Nov 2024 | 39,973,814 | 90,507 | |
15 Nov 2023 | 36,210,475 | 101,158 | |
8 Nov 2022 | 32,353,319 | 96,533 | |
7 June 2022 | 30,774,171 | 96,719 | |
8 Nov 2021 | 28,620,028 | 99,261 | |
6 Jun 2021 | 27,138,283 | 99,927 | |
1 Nov 2020 | 24,986,750 | 103,686 | |
9 Jun 2020 | 23,596,972 | 106,938 | |
6 Nov 2019 | 21,798,127 | 112,818 | |
20 Jun 2019 | 20,645,245 | 117,033 | |
6 Nov 2018 | 18,779,693 | 132,604 | |
29 Jul 2018 | 17,994,278 | 147,578 | |
28 May 2018 | 17,482,356 | 154,059 | |
16 May 2018 | 17,437,255 | 154,007 | |
8 Apr 2018 | 17,105,544 | 161,762 | |
25 Feb 2018 | 16,755,359 | 164,800 | |
29 Oct 2017 | 15,581,091 | 180,122 | |
9 Jul 2017 | 14,632,639 | 198,757 | |
16 Apr 2017 | 13,881,513 | 204,923 | |
24 Jul 2016 | 11,765,939 | 258,754 | |
29 May 2016 | 11,442,341 | 276,549 |
The following chart shows how the number of known consistency errors has decreased over time.
Wikitree Profile Sourcing
A good Wikitree profile is well sourced. There are currently no counts available for the number of profiles with sources. A measure of the sourcing is provided here by analyzing a random sample of profiles. Profiles can be randomly sampled based on their Wikitree number (not their Wikitree id). Based on the size of Wikitree, a sample of about 300 profiles is used to get useful accuracy (roughly about +/-5%), while not posing an excessive manual analysis burden. (Technical note: merged duplicate profiles will be oversampled.)
Profiles are randomly sampled and assigned to the following categories:
- 3 or more sources, where sources are likely original records or books.
- 1 or 2 sources
- Poorly sourced, such as a link to an Ancestry tree or another website, or vague source description
- Unsourced
- Unavailable for analysis (Unlisted, Red or Orange privacy)
The results of the analyses to date are listed in the table below.
Date | Nbr Sampled | 3+ Sources | 1-2 Sources | Poorly Sourced | Unsourced | Unavailable | |
8 Nov 2024 | 352 | 27% | 39% | 10% | 14% | 10% | |
15 Nov 2023 | 321 | 22% | 36% | 12% | 22% | 8% | |
5 Nov 2022 | 319 | 18% | 37% | 10% | 24% | 11% | |
5 Jun 2022 | 309 | 15% | 41% | 11% | 22% | 11% | |
8 Nov 2021 | 322 | 17% | 34% | 11% | 25% | 13% | |
6 Jun 2021 | 323 | 18% | 35% | 10% | 26% | 11% | |
3 Nov 2020 | 322 | 18% | 34% | 12% | 26% | 10% | |
9 Jun 2020 | 321 | 13% | 33% | 14% | 27% | 13% | |
6 Nov 2019 | 314 | 12% | 35% | 15% | 25% | 13% | |
20 Jun 2019 | 316 | 15% | 34% | 16% | 20% | 15% | |
6 Nov 2018 | 316 | 11% | 32% | 13% | 29% | 15% | |
28 May 2018 | 302 | 11% | 27% | 15% | 33% | 14% | |
8 Apr 2018 | 284 | 12% | 26% | 12% | 40% | 11% |
The following chart shows the percentage breakdown by sourcing of the profiles at various times. The small kinks in the results with time are likely due to sampling uncertainty.
Profile Sourcing Estimate using BioCheck
I also report sourcing results from Kay Knightâs BioCheck app (now v1.7.14). This useful app can be run in random profile mode to estimate sourcing on up to 5000 profiles, or over 10x more than is practical using my manual process above. It credits some possible sources that I do not include in my manual source count. Effectively, BioCheck's reported "Sourced Profiles" count is roughly analogous to my "Sourced" plus "Poorly Sourced" count. BioCheck results are listed below.
Date | Total Profiles | Bio Not Open | Sourced | Uncertain | Marked Unsourced | |
10 Nov 2024 | 5000 | 13.4% | 72.0% | 12.0% | 2.6% | |
17 Nov 2023 | 5000 | 14.1% | 70.4% | 12.4% | 3.1% | |
8 Jun 2021 | 3580 | 16% | 64% | 16% | 4% |
Duplicate Profiles
In order to meet its goal of "One World Tree", there should be minimal duplicate profiles.
One measure is the number of pending merge requests. Of course not all of these are truly duplicates, but also this is likely not the full list of duplicates. As of Nov 2024, the pending merge list has about 11,200 entries.
As a more complete measure, I have randomly sampled profiles from across the entire Wikitree. For each of these profiles I checked the possible matches provided through the Search for Matches tab on the pull-down menu. Some were also checked with the built-in search function. Most were not a match. Based on an analysis in Nov 2024, I found 3 probable matches in 100 open profiles. This suggests the estimated number of duplicate profiles in Wikitree overall is in the range of 1-9% (95th percentile confidence interval).
Technical notes:
- This assumes Wikitree Match Search identifies most matches. There are a number of known cases where it does not. E.g. profiles in non-Roman characters; profiles with de/von in surname. This analysis assumes these are a small fraction of all profiles (or have similar duplicate rate).
- Unlisted/Red/Orange/Yellow profiles cannot be checked; they are assumed to have the same fraction of duplicates as the rest of the tree.
- Matches may not be identified if the profiles have little information to compare.
Date | Total Profiles | Estimated Duplicates | Sampling Basis | Pending Merges |
16 Nov 2024 | 39,931,000 | 1 - 9% | 3/100 | 11,180 |
25 Nov 2023 | 36,298,183 | 0.2 - 7% | 2/100 | 5,430 |
30 Dec 2021 | 29,059,846 | 3 - 13% | 7/110 | 22,100 |
20 Feb 2020 | 22,641,134 | 1 - 8% | 4/106 | 13,600 |
26 Jan 2019 | 19,481,987 | 1 - 9% | 5/105 | 15,200 |
Undated Profiles
There are a relatively large number of profiles that are simply linked names with no location, dates or other information. Suggestions 131, 132, 133 and 134 provide a good estimate of these profiles - although strictly these suggestions only identify undated profiles, they often have no other information.
Technical notes:
- Undated profiles can no longer be created in Wikitree.
- This count is limited to Open (white profiles).
- As of Nov 2022, the Open undated profiles were about 29% of the total number of undated profiles (excluding unlisted which could not be analyzed).
Date | Undated Open Profiles | |
14 Nov 2024 | 385,294 | |
15 Nov 2023 | 417,680 | |
8 Nov 2022 | 443,687 | |
7 June 2022 | 455,829 | |
8 Nov 2021 | 468,252 | |
6 Jun 2021 | 484,384 | |
1 Nov 2020 | 506,144 | |
9 Jun 2020 | 517,222 | |
6 Nov 2019 | 519,047 | |
18 Jun 2019 | 529,130 | |
6 Nov 2018 | 550,241 | |
29 Jul 2018 | 528,512 | |
8 May 2018 | 480,950 |
Unlocated Profiles
Profiles should have a location to help identify the person and to avoid duplicates. Using Wikitree+, we can find profiles that are missing Birth, Marriage and/or Death Locations. The results indicate that while there are a large number of profiles missing one of the locations. The count for profiles with no locations is presently in development.
Technical notes:
- This count is limited to Open (white)/Public (green) profiles.
- It is plausible that a large number of the non-open profiles do not have a location, based on information for undated profiles.
Date | Open Profiles | No Death Location | No Birth Location | No Birth/Death Location | No BMD Location | |
10 Nov 2024 | 33,875,608 | 13,562,775 | 5,709,964 | 4,390,380 | 1,844,677 | |
25 Dec 2022 | 27,201,437 | 11,699,274 | 5,448,422 | |||
25 Dec 2016 | 9,860,617 | 5,414,916 | 3,081,742 |
- Wikitree Quality Statistics - Nov 2024 Nov 17, 2024.
- Wikitree sourcing progress Nov 29, 2023.
- Wikitree Statistics - November 2023 Nov 19, 2023.
- Wikitree Statistics - November 2022 Nov 11, 2022.
- Wikitree statistics - June 2022 Jun 10, 2022.
- WikiTree statistics - November 2021 Nov 12, 2021.
- Login to edit this profile and add images.
- Private Messages: Send a private message to the Profile Manager. (Best when privacy is an issue.)
- Public Comments: Login to post. (Best for messages specifically directed to those editing this profile. Limit 20 per day.)
I have a suggestion and a question.
Consider calculating the consistency errors as a percentage of the total records. Then present the data year over year - it tells a stronger story. You really see how much work the team has done.
Date Total Profiles Consistency Errors (%)
29-May-16 11,442,341 276,549 2.42%
29-Oct-17 15,581,091 180,122 1.16%
6-Nov-18 18,779,693 132,604 0.71%
6-Nov-19 21,798,127 112,818 0.52%
1-Nov-20 24,986,750 103,686 0.41%
8-Nov-21 28,620,028 99,261 0.35%
8-Nov-22 32,353,319 96,533 0.30% <<-- This is a small error rate. Nice Work
Question - why not just hide/remove/ignore the Undated profiles - especially if they are x years old? Or have a process that allows them to go to a "Freeze" state after someone reviews them to confirm they are not worth anymore time?
Again nice work.
Edits: Trying to address the formatting.
edited by Tricia Payne
Regarding the undated profiles, that's a question for Wikitree Admin. The guiding principle is that Wikitree rarely deletes profiles.
If it's easy to do, I'd also appreciate a histogram of profiles by year of birth.
here's a screeshot of the chart i made: https://imgur.com/a/3dN7RBy
Thanks for putting in the time and effort it must take to do this.
Since I am a coordinator with the DNA Project, I would be interested in DNA related stats. I see that you have already included a count of those who have DNA tests listed on their profile, but there are a few more things that would be helpful from an accuracy standpoint.
1. The number of profiles having a parent marked with "confirmed with DNA" status.
2. The number of 213 and 313 errors (a parent with "confirmed with DNA" status but no corresponding DNA source citation).
3. The number of profiles having a parent with "confirmed with DNA" status but is genealogically unsourced. (DNA confirmation needs genealogy to confirm; it is not a valid substitute for primary sources.)
John Kingman
edited by John Kingman
No. 2 yields a count of profiles which generate 213 and 313 errors. You may have this already since you deal with other 2xx and 3xx errors.
No 3. is like #2, but not currently generated by the error checker.
So a DNA data quality proxy to start with could be the #2 count divided by the #1 count.
This could be improved over time if more DNA related error codes are added by Ales or you can detect some DNA errors via your processing.
John
edited by John Kingman
Are you moving on to a 25 hour day then? :-)
Total profiles: 20,973,311 - up 1.6% since 18 June
Connected: 17,123,470 - up 1.9% since 18 June
With DNA: 5,660,455 - up 2.5% since 18 June