I have added a new measure to the spreadsheets I use to track profiles from ThePeerage.com, Wikipedia, and the Slade Genealogy site as part of the One Name Studies I manage. Taken together, those give me a sample size of 828 profiles. It's not random, the results are directly affected by my own efforts, and the sample size could definitely be larger, but at least it gives me the beginnings of a measure of the quality of sourcing on WikiTree.
The measure I use is derived from Paul Gierszewski's WikiTree Statistics page. On his page, Paul's system is outlined thus:
Profiles are randomly sampled and assigned to the following categories:
- 3 or more sources, where sources are likely original records or books.
- 1 or 2 sources
- Poorly sourced, such as a link to an Ancestry tree or another website, or vague source description
- Unsourced
- Unavailable for analysis (Unlisted, Red or Orange privacy)
For the sake of deriving an average score, I have put numerical values on Paul's system, thus:
3 = 3 or more primary sources, possibly plus secondary sources
2 = 2 primary sources, possibly plus secondary sources
1 = 1 primary source, possibly plus secondary sources
0.5 = One or more secondary sources, no primary sources
0 = Unsourced or unavailable for analysis due to privacy settings
I have set up my spreadsheets to give me a "sourcing sum", which is the total number of source level values for each data set. Then I add add up the sourcing sums from all the data sets I'm using, and divide it by the number of WikiTree profiles in those same data sets. Currently, that gives me an average score across all the data sets of 0.41 out of a possible maximum of 3.00. (Yes, that looks really bad, but that's largely because I haven't gone through the 484 Slade profiles from the Slade Genealogy site and assigned sourcing levels to them. If I remove those profiles from the equation, the average score for the remaining 344 profiles rises to 0.99, which is better, but still not very good.) If the average is less than one, then it means that most profiles in those data sets don't even have one primary source.
I haven't added this new measure to the table, and I don't have a chart for it, since it's only a single data point so far, but going forward, that data should start to become more useful.
Notes:
- In my system, a "primary" source is a birth, baptism, census, marriage, military, death, or burial record. A "secondary" source is an entry in Wikipedia or some other encyclopedia, ThePeerage.com, an online family tree, or a book or article which covers (or at least mentions) that person.
- I don't use scores like 1.5, 2.5, etc. The 0.5 score is only to show that there is something in that profile which is "better than nothing", like an entry in ThePeerage, Wikipedia, or some family tree online somewhere.)