WikiTree Dashboard

+84 votes
2.2k views

I had already mentioned this idea in another thread, but I decided to spin it out here and tag it so that the people who would need to work together to make it happen can see it and provide their input:

There's an old saying that that "what gets measured gets managed." In the Connectors Project, we've seen a slow but steady decrease in the percentage of profiles which aren't connected to the main tree. (Granted, we're not going to see 0% unconnected profiles, because new people are signing up all the time, but it looks to me as if the network effect is already starting to kick in: as the number of profiles connected to the main tree increases, it gets easier for people to make connections, so the main tree gets bigger still, and we end up with a virtuous cycle.)

I suspect that it encourages people to keep at connecting (which can be frustratingly difficult when all the leads you follow keep running into brick walls) when they can see the progress that they're making, so I've been providing graphs to show how the number of connected profiles as a percentage of total profiles has been changing over time:


The thought occurred to me that it might be helpful to have a sort of "Dashboard" screen for WikiTree: one screen with a whole bunch of indicators for how things are going in terms of tree size, active WikiTreers, percentage unconnected, percentage unsourced, percentage uncategorised, etc.

So what I'm thinking of is a page with entries like:

Sample Dashboard Page
Measure Today Yesterday Change Last Week Change Last Month Change Last Year Change
Total Profiles 13,273,1571 13,264,828 0.063% 13,214,854 0.441% 13,014,958 1.984% 10,233,072 29.708%
Connected Profiles 10,142,352 10,135,364 0.069% 10.093,435 0.485% 9,925,718 2.183% 7,591,659 33.599%
Sourced Profiles etc.
Categorized Profiles
Located Profiles2
Dated Profiles3

 Notes: 

  1. Except for today's total profiles, these numbers are all made up. I don't have access to most of the real numbers, which is why people like Chris and Aleš would need to work together to make something like this happen.
  2. In researching unconnected branches, I have noticed that a number of branches have no locations for births, deaths, marriages, or anything else -- on any of the profiles. I haven't heard of a project to find and document locations for people, but it might be a good idea.
  3. Similarly, I have noticed that a number of branches have no dates for births, deaths, marriages, or anything else -- on any of the profiles.

I figure that there would be some advantages to such a page:

  1. I think that making it possible for people to see that their efforts are making a difference would encourage them to keep on plugging away, even when things get difficult. Granted, things like badges for contributions and the various challenges help with that, but I think it would be encouraging for people to be able to see, at a glance, "This is what we have accomplished together."
  2. And, along the lines of "what gets measured gets managed", people who have been on WikiTree long enough that they know how to handle multiple tasks would be able to log in and ask, "What part of WikiTree needs my help the most today?" (Granted, there are plenty of people who have a specific research plan for their WikiTreeing time, and that's fine. But for those who just login to help wherever they feel like that day, or want a break from some research line that's proving tedious, this might give them some ideas of where their help is most needed.)
in WikiTree Tech by Greg Slade G2G6 Pilot (678k points)
edited by Greg Slade

Also add measurement for

  1. Number of profile upvoted to have good quality
  2. Number of profiles connected and also have a DNA match
    1. Confirm the DNA match by using GEDMATCH compare 
      1. If the cM (measurement of shared DNA) is lower than expected create a Project Database Error 
Magnus, is that data even available?
  1. Number of profile upvoted to have good quality
    Not available but as we have upvote in G2G why not have it on good written profiles
  2. Number of profiles connected and also have a DNA match
    Is there but someone has to count them
    1. Confirm the DNA match by using GEDMATCH compare
      Not done today but doable but I would be good if we spoke with GEDMATCH to make it easier 
      1. If the cM (measurement of shared DNA) is lower than expected create a Project Database Error
         Not today but no rocket science to do 
        1. Chris knows the relation between the profile and the profile that has been tested
        2. Gedmatch knows the cM shared

I feel all the errors we see with Project Database error are adressing real bad genealogy. If we could start involving GEDMATCH and DNA and WikiTree we will have an edge you cant find anywhere else....

I can also see that all profiles with gedmatch tests done ==>

  1. Create error if 2 gedmatch profiles with more than 5cM in common and not connected is an error
    1. Add suggestions ==> if two gedmatch profiles match then with the new better location information we could tell a guess is that the match will be on profiles x xx xx and y yy yy as they are in the same area
      1. Something like this also need that WikiTree get a more mature location model so that we can do proximity searches on locations
    2. If we add templates for sources and get a standard how you reference sources you could also check if two people with high cM in common on gedmatch have people in the family tree referencing the same sources.....
I feel Aleš has opened up something that could be WikiTree 2.0 using computers to do the DNA genealogy more efficient....
 
Step number 1 is that WikiTree get a vision and then we need to understand how we need to change WikiTree...
Personally, Ilke the idea.

Pat
There are other measures that I have thought of, but didn't include in my original post. If we can implement Debi's idea of letting people choose whether a given measure shows up in their dashboard or not, then we can basically throw in anything that we can measure. (Although I suspect that Chris, Aleš, and anybody else who would need to put in the work to create each measure might baulk at suggested measures that they don't see as worth the effort.)

One measure that I'd like to see is comparing the total number of profiles on WikiTree to the estimated number of people who have been born since 1 AD. I worked out the numbers once, in the context of another conversation here, and the percentage was depressingly small. (The source I used was the Population Reference Bureau site: http://www.prb.org/Publications/Articles/2002/HowManyPeopleHaveEverLivedonEarth.aspx) But as long as the number of WikiTree profiles increases at a faster rate than the number of people being born, then that particular number should improve over time.

Which reminds me, I meant to say in my original post (and then forgot to include it) that what I had in mind was using dark WikiTree green for those measures which are growing faster than the rate of growth in total profiles, and dark WikiTree orange for those measures which are growing more slowly. (And whether the total profiles number is green or orange depends on whether it's growing faster or more slowly than the estimated growth of the total population.) Thus, any measure which shows in orange would be one where those capable of working on that particular issue might consider directing their attention.

I finally got around to working out the formulas to get an estimate of the total number of people born since 1 AD for each month, and worked up graphs based on that. The bad news (which I already knew) is that we're starting from a really low base. I started out as a percentage, and the numbers were absurdly small. I had to put it up a per million to get the numbers up to integers. The good news (which pleasantly surprised me), is that the numbers are steadily improving, and at a pretty decent pace, too. Here's a graph:

Or, if it makes it easier to understand, here's a graph which shows the (estimated) total number of people born since 1 AD for each month per WikiTree Profile. As you can see, that number is decreasing steadily:

Now, this number is somewhat misleading, since we have much better coverage of the past two centuries or so than we do before that, so most people don't really need to create thousands of profiles before they find a connection to the main tree1.  But even so, once we get it down to a thousand or so, I think we'll find that people have a much easier time getting connected.

Notes:

  1. Although we do have some unconnected branches (boughs) sized in the thousands, most of them have 500 profiles or fewer.
I have edited the charts in this thread with this month's numbers for the measures I'm tracking.
The image in the question isn't showing for me.
Aaron, can your browser show .png files?
I have edited the charts again to show this month's numbers. The good news is that the numbers are continuing to improve every single month. The number of people in the target range per WikiTree profile has dropped from 5,729 in March 2016 to 3,866 this month. (Which doesn't really surprise me, since new profiles were added at an average rate of 8,875 per day in December.)
I can see the images now. Not sure why I couldn't see them before.
You don't know of any project that finds and documents???? The Sourcerers are not gonna be happy with you. LOL That is kinda their reason to exist.

I do like the dashboard idea however. If this is something you can show in say an excel spreadsheet, feel free to make a tab on our spreadsheet and show us what you got. ;)

Steven, what I said was "I haven't heard of a project to find and document locations for people", stress on locations. Yes, the Sourcerers (of which I am one) frequently end up adding locations to a profile in the course of their work, which is a good thing. Many other people add locations in the course of improving profiles, which is also a good thing. But the point I was trying to make is that, as far as I knew at the time, nobody was specifically seeking out and fixing profiles with no locations. (Since then, Laura has proven me wrong.)

And, no, I can't add a tab to your spreadsheet, because I don't have most of the data. The reason that I started this thread in the first place was to suggest that Chris, Aleš, and whoever else needs to work together to make it happen build a dashboard page to show us all those cool numbers and encourage us in our work. The numbers are out there. They just need to be crunched and displayed in a format that people can understand. (Preferably by scripts, so they're updated on the fly, rather than depending on somebody to remember to copy numbers into a spreadsheet, with all the possibility of transcription errors and other problems.)

I get what you mean about monitoring numbers being a driving force.  I am reasonably new and finally mastered the WT+ reports.  I began on Unconnected Invercargill New Zealand Orphans which had 20, it is now at 10, my only gripe is that I have to wait a week between updated reports.  In reality the report is now under five (and I can't do anymore to those other profiles).  I will keep working on them and the key to it is knowing it is having an impact.  More power to you and your idea.

15 Answers

+43 votes
 
Best answer
Great post, Greg.

When Ales sees this I know he will have input. Just combining some of the stats that he's been reporting with those that you've been keeping would go a long way toward what you're proposing.
by Chris Whitten G2G Astronaut (1.5m points)
selected by Eowyn Walker
+18 votes
The 901 error looks for people who have what appears to be unconnected profiles and no data in birth, death, or locations.  I work on this error with Loretta Corbin.  There are over 32k of them as of last week.  Some are false errors because some profiles are connected to private profiles and some uploaded gedcoms put the details in notes or under source.  So we contact the profile managers in an effort to get these fixed or after a period of time I think they get deleted.  Not sure about that process.
by Laura Bozzay G2G6 Pilot (830k points)
Thank you for that, Laura. I missed that when I was looking through the errors list to see if Aleš already had errors for no dates and no locations. But while I do think we should have some kind of measure for profiles which are "good" or "reasonably complete" or whatever (it might be easiest just to count profiles which show no database errors), what I'm trying to do here is to break down the basic problem (a lot of profiles which have various problems with them) into smaller chunks, so that people can look at the dashboard and say, "Hey. I could help to improve that number today." rather than looking at the whole mass of profiles that aren't up to snuff, throw up their hands in despair, and say, "It's impossible. We'll never get all these profiles cleaned up!"
Greg if you take a look at the current total error report you will see all kinds of errors people can work on.  Some have only a few errors showing so if someone is intimidated by large numbers they can pick one with a smaller more manageable number.   I would just love it if we could everyone on wikitree to run the error report and fix the errors for their watchlist once a month.  Maybe we could have a spring cleaning day where everyone is asked to run their error report (I would suggest putting instructions on how to do that because in working with errors I have found many do not know how to run it so I include instructions for what works for me with my watch list)  If everyone did this and we even diminished errors by 10% that would be a huge reduction.  If we could reduce it by 25% stupendous and higher... well you get the idea!  It is generally not as daunting for a single profile manager unless they have a gedcom issue that has made a mess (I see lots of those where the data is not in the fields but all sitting in the notes) or where they data was entered before birth data was required.
+21 votes

I'd like to add to the dashboard proposal that it be set up like the badges arrangement screen. Provide the ability to rearrange the metrics and to turn the display of each off or on. That would let each WIkiTreer focus on the metrics that are most important to them at that time.

I love the visual aspect of the dashboard and can see the potential for it on a "inspiration" basis. Seeing consistent movement (in the right direction) on the current Improvement projects has the potential to motivate.

While I understand Magnus' desire to add additional improvements to WikiTree, I personally feel it dilutes the impact of both the db_errors project and the conceptual dashboard to continuously be adding more items. Allowing one to turn off the display of new metrics alleviates that issue.

by Debi Hoag G2G6 Pilot (395k points)
I like this idea a lot. It could save a lot of arguing over "the measure I want is more important than the measure you want!" Basically, if we can measure it, then make it an option, and let people select whether they want it on their own personal dashboard or not.

Come to think of it, we could give people the option to have measures that apply specifically to them on their dashboards, like progress in connecting or sourcing or whatever in their own watchlist, their contributions, etc.
+16 votes
The proposed metering of "unsourced" profiles would be seriously misleading, since the Unsourced category includes only a fraction of the unsourced profiles (only profiles that were created without text in the text section since the Unsourced template started being automatically added, plus proifles where someone manually added the category). It excludes the myriad unsourced profiles created before the template started to be used, as well as newer profiles that were created without sources but with some sort of text in the text section -- and I'm not sure that the Unsourced template gets added during Gedcom imports.

Ironically, even though I believe that the percentage of profiles with sources is growing, I expect that an Unsourced Meter would show negative progress, since I'm rather sure that on most days more profiles have the template added (either as newly created profiles or because a member  is busily adding it to old unsourced profiles) than have the template removed after someone adds a source.
by Ellen Smith G2G Astronaut (1.5m points)
Great answer Ellen, but you missed all the profiles that have "sources", but the sources are unsourced Ancestry trees. And the truly unsourced profiles that were created before the template was introduced. I would say that about 95% of the profiles I've sourced don't have the template,and my number is north of 4000.
Ellen, I'm assuming that you were thinking that I intended to use the presence or absence of "Unsourced" categories to measure unsourced or not. Actually, I wasn't planning on that, because not only have I seen unsourced profiles that didn't have the template, I've also seen profiles where somebody did come along and add the source, but forgot to remove the template.

Assuming that Aleš thinks that this is a project worth pursuing, I was planning on discussing with him the sorts of things that his database searches could look for to identify sourced profiles. A few of the giveaways that I thought of were the presence of a closing </ref> tag, the presence of HTML (although that would give a false positive to profiles with non-sources, such as Ancestry or FamilySearch family trees -- but there may be a way to design the search so that those are detected and disregarded), and strings from book references (like "New York: " or "Press").

In fact, he may end up adding a couple of errors to his database reports, like "No sources present, but no Unsourced category applied" and "Sources present, but Unsourced category applied", so that the accuracy of the Unsourced categories will improve over time.
I'm helping to add to the Unsourced population, as I add people from the census and I check the matches, I look at them, tag them if needed, which includes no source or broken ancestry links or just giving a book title with no other information. I also try to add a location if one can be reasonably ascertained.
+15 votes
This is well thought out Greg, I like the concept.
by Cynthia Rushing G2G6 Mach 3 (36.4k points)
+15 votes

I crunched the numbers for real today, and here's a sample:

 
    Last Week Last Month Last Year
Measure Today Value Change Value Change Value Change
Total Profiles 14,630,053 14,563,362 0.46% 14,352,891 1.93% 11,770,510 24.29%
Profiles Per Million People 236 235 0.43% 229 3.06% 189 24.87%
Connected Profiles 11,278,853 11,223,215 0.50% 11,039,107 2.17% 8,845,949 27.50%
Percent Connected 77.09% 77.06% 0.04% 76.91% 0.24% 75.15% 2.58%

 

by Greg Slade G2G6 Pilot (678k points)
Dear Greg,

  This is fascinating. I bet it would be one more way to capture the imaginations of those who wish to help improve WikiTree.  -NGP
Awesome, Greg!

There should be some quality measure also.... compare WP:MEASURE/A instead of importance we could have century?!?!?

 

Some further numbers on the topic of quality:

Measure 9 Jul 2017 24 July 2016
% All Errors in Error Report 13.2% 10%
% Known Consistency Errors 1.4% 2.2%

where these are the ratio of (Total errors in Database Errors Report) over (Total number of Profiles), and (Total Consistency Errors) over (Total number of Profiles).  The latter are a subset of the Database Errors that identify a clear internal consistency error (like "Mother died before birth"), rather than other "errors" like "USA too early in birth location".

The first value has increased since a year ago because of new errors now checked, while the second is decreasing as the number of real errors is actually reducing over time as they get fixed.

I think we are missing measures of quality of profiles, and number of duplicate profiles.  Probably the only way at present to get a measure of the profile quality across the whole tree is to randomly sample and manually analyze a number (about 300).  We might be able to get an estimate of the number of duplicates through Matchbot.

+12 votes

After neglecting this for several months, I have updated the charts today (thanks to Carol's careful record keeping).

In related news, Paul has started a free space page for WikiTree Statistics, which I think is a good start towards building a dashboard. 

(Of course, what I really want to see is having these numbers updated by scripts, so they won't depending on people having the time to copy down numbers and update charts.)

by Greg Slade G2G6 Pilot (678k points)
The WikiTree Statistics page is very interesting.

I have updated the charts above with this month's numbers, and I thought of a couple of more measures to add:

Total Profiles on WikiTree

This chart shows the total number of profiles on WikiTree on a monthly basis since March 2016. The upper line in light green shows total profiles, and the lower line shows the net increase in the number of profiles each month.

That chart shows that there's some variation in the number of profiles added per month, and, more distressingly, that the number has been going down for the past few months, but having both numbers in the same chart makes those variations harder to read, so I tried tracking just the monthly increases and got this:

Profiles Added To WikiTree Per Month

That makes it much more clear what's been happening: there was a general trend upwards until February 2018, and then a sharp drop in June. Then another upward trend until this February, which was a real spike, and a decline since then. As it happens, there was some discussion of whether we would reach 20 Million Profiles in time for RootsTech this year, so that may have contributed to that spike, although I don't see similar spikes every year just before RootsTech, so there may have been some other factor going on.

Also, I've been putting up charts like this on the Connectors Chat page for some time, but should probably include it here:

Unconnected and Unlinked Profiles

+9 votes

I know this is an old thread but it showed up in my WikiTree Feed so it caught my eye again!

Some updates.

First of all Ales was able to pretty much clean up a lot of the 901s as many were false errors having private spouses and children.  

Second of all, there was a question last week in the Weekend Chat  asked by Beulah Cramer about the progress of the Sourcerers Challenge.  I am one of the 4 leaders for that Challenge.  I did some stats based on a spreadsheet we keep.  It ranges in dates from July 2015 through Sept 2018 (at the time I did this the Oct stats had not been finally calculated)  

My long answer can be seen at:

https://www.wikitree.com/g2g/711050/welcome-the-weekend-chat-all-members-invited-november-2018?show=711050#q711050   scroll down almost to the bottom of the page to see it.

I distilled this down to a chart which I am having trouble getting to appear in the thread...  

From google sheet
July - Dec 2015 8104
2016 43973
2017 77443
Jan-Sept 2018 86829
Total 216349
 

Those totals do not include the Source A Thons


This years Source A Thon showed 72,713 improved profiles.

2017 Source A Thon showed 53,245 improved profiles.

So I am pretty confident those are not in the totals I had above!


 

by Laura Bozzay G2G6 Pilot (830k points)
+9 votes
This is a brilliant idea! I hope this gets off the ground and running soon.
by Alex Stronach G2G6 Pilot (364k points)
+8 votes
I like the idea of a dashboard. It make understanding the data much easier and gives one a sense of achievement.
by Deborah Talbot G2G6 Mach 7 (70.4k points)
+13 votes

Just today, I saw a thread ("How many WikiTree contributors are there?") which posed a question I have asked myself many times, but for some reason, didn't think to add to the dashboard: "How many people have actually made contributions to WikiTree in the past [24 hours/week/month/quarter/year/pick one]?"

by Greg Slade G2G6 Pilot (678k points)
+10 votes

I crunched the numbers again today, adding in some more measures (the ones I have access to), and the results look like this:

Measure This Month Last Month Last Year
Value Change Value Change
Total Profiles 21,027,338 20,729,118 1.44% 18,004,939 16.79%
Profiles Per Million People 337 332 1.51% 289 16.61%
Profiles Added Per Month 297,266 213,586 39.18% 253,486 17.27%
Connected Profiles 17,218,928 16,926,132 1.73% 14,389,944 19.66%
Percent Connected 81.89% 81.65% 0.29% 79.92% 2.46%
Unconnected Profiles 3,808,410 3,802,986 0.14% 3,614,995 5.35%
Percent Unconnected 18.11% 18.35% -1.28% 20.08% -9.79%
Unlinked Profiles 1,056,866 1,050,700 0.59% 975,254 8.37%
Percent Unlinked 5.02% 5.06% -0.79% 5.41% -7.21%

Except for the increases in the absolute numbers of unconnected and unlinked profiles, every single measure is moving in the direction we want, although the big jump in profiles added during the Connect-A-Thon will probably push that measure into the negative next month.

by Greg Slade G2G6 Pilot (678k points)

Unconnected and Unlinked Profiles - August 2019

Last month's Connect-A-Thon increased the improvement in the percentage of profiles which are connected at about double the normal monthly rate, so it was clearly a successful effort. 

Profiles Added to WikiTree Per Month - August 2019

The last time we had a big spike in new profiles being added was in February of this year, and while 7,885 more profiles were added in February, the percent connected increased by 0.04% more last month than it did in February, so specifically seeking to add profiles for the sake of connecting unconnected branches seems to have led to the improved numbers. 

Thanks so much for doing these numbers, Greg! I find them really encouraging.
Greg, if it makes you feel better, PEI had a 24.4% reduction in unlinked profiles during CAT. For Canada, as a total, unlinked profiles dropped 2.1% (276 profiles) during CAT.

Improvement in Rate of Connected Profiles - August 2019This month, I have added a new measure: the rate at which the percent of profiles on WikiTree which are connected to the main tree increases each month. I figured that separating out the rate of improvement from the overall percent connected would make variations more visible, and that certainly turned out to be the case. The rate at which the percentage has increased each month has varied between 0.04% (in April 2018) and 0.36% (in January 2018). That's quite a difference, with the record high rate being 9 times higher than the record low rate.

Overall, the average rate of increase is 0.19% per month. The rate of increase last month was 0.23%, or 0.04% above average. I had been misled by the previous three months having increases of about 0.10% into thinking that last month's increase was about double the average rate, but taking a long-term view, while it was above average, it wasn't double the average.

I have a question for Aleš,

Because I'm active in the Connectors Project, most of the stats I'm tracking are related to connected and unconnected profiles. But of course there are other issues on WikiTree, and other projects to deal with those issues. (For example, I have long puzzled over how to track the percentage of profiles which really are unsourced, as opposed to depending on people manually tagging profiles as unsourced.)

But today, one possible measure occurred to me which it might be possible to implement: would it be possible to add a  table to https://wikitree.sdms.si/default.htm?report=stat1 with a count of those profiles on WikiTree which don't have any suggestions linked to them? If so, we should be able to derive percentages of the total number of profiles on WikiTree which do or don't need attention from Data Doctors.
Those P.E.I. and Canada numbers are definitely good news, S!
+5 votes

I crunched the numbers again today, using numbers from the beginning of December, so these results don't include WikiTree going over 22 million profiles this month:

Six out of ten measures are moving in the direction we want. The percent of unconnected and unlinked profiles continue to decline, although the absolute numbers of unconnected and unlinked profiles continue to rise. The number of profiles added per month seems to be declining over time, although that measure is pretty volatile. The rate of increase of the percentage of connected profiles is also declining, possibly because we have already picked most of the low-hanging fruit when it comes to connecting. 

A big bright spot is that, between November 1 and December 1, the number of WikiTree profiles per million people estimated to have been born since 1 AD hit double the number it was (175) when I first calculated that measure back in March, 2016. So, assuming that the distribution of WikiTree profiles versus actual people was completely random (which of course it isn't), people should now be able to connect their branch to an existing profile after adding approximately half as many family members as it would have taken back in 2016. As the number of WikiTree profiles continues to increase over time, we should have fewer cases of people joining WikiTree, adding a bunch of profiles for family members, getting frustrated, and then quitting, leaving an unconnected branch behind. 

by Greg Slade G2G6 Pilot (678k points)
edited by Greg Slade
Measure This Month Last Month Last Year
Value Change Value Change
Total Profiles 21,982,352 21,752,681 1.06% 18,890,675 15.81%
Profiles Per Million People 352 348 1.15% 304 15.79%
Profiles Added Per Month 229,671 242,324 -5.22% 247,311 -7.13%
Measure This Month Last Month Last Year
Value Change Value Change
Connected Profiles 18,097,628 17,882,226 1.20% 14,312,453 26.45%
Percent Connected 82.33% 82.21% 0.15% 75.41% 9.18%
Rate of Increase in Connected 1.20% 1.26% -4.76% 1.57% -23.57%
Unconnected Profiles 3,884,724 3,870,455 0.37% 3,668,222 5.90%
Percent Unconnected 17.67% 17.79% -0.68% 19.33% -8.56%
Unlinked Profiles 1,086,809 1,079,784 0.65% 999,393 8.75%
Percent Unlinked 4.94% 4.96% -0.40% 5.26% -6.08%
 

As far as connecting goes, we are continuing our unbroken string of the percentage of unconnected and unlinked profiles going down every month. (Although, sadly, the absolute numbers continue to increase.)

The rate at which profiles get added per month goes up and down a lot. I haven't actually derived monthly averages yet, but it seems to me that it tends to drop when it's summer in the Northern Hemisphere, which is where most WikiTreers live. That would tend to suggest that most of us have real lives, and have more time for "genealogising" when it's too nasty to go outside. That's probably a healthy thing.

Still, I find it disturbing that after hitting an all-time high in February 2018, the number of profiles added per month seems to have been in general decline since.

The number of WikiTree profiles per million people estimated to have been born since 1 AD continues to increase, and passed double to mark that it was in March 2016 when I first calculated it.

WikiTree Profiles per Million People Born Since 1 ADOr, if you prefer to look at those numbers another way, the estimated number of people born since 1 AD per WikiTree profile has dropped by half in the same time.

People per WikiTree Profile

I have added a new measure to the spreadsheets I use to track profiles from ThePeerage.com, Wikipedia, and the Slade Genealogy site as part of the One Name Studies I manage. Taken together, those give me a sample size of 828 profiles. It's not random, the results are directly affected by my own efforts, and the sample size could definitely be larger, but at least it gives me the beginnings of a measure of the quality of sourcing on WikiTree. 

The measure I use is derived from Paul Gierszewski's WikiTree Statistics page. On his page, Paul's system is outlined thus:

Profiles are randomly sampled and assigned to the following categories:

  • 3 or more sources, where sources are likely original records or books.
  • 1 or 2 sources
  • Poorly sourced, such as a link to an Ancestry tree or another website, or vague source description
  • Unsourced
  • Unavailable for analysis (Unlisted, Red or Orange privacy)

For the sake of deriving an average score, I have put numerical values on Paul's system, thus:

3 = 3 or more primary sources, possibly plus secondary sources

2 = 2 primary sources, possibly plus secondary sources

1 = 1 primary source, possibly plus secondary sources

0.5 = One or more secondary sources, no primary sources

0 = Unsourced or unavailable for analysis due to privacy settings

I have set up my spreadsheets to give me a "sourcing sum", which is the total number of source level values for each data set. Then I add add up the sourcing sums from all the data sets I'm using, and divide it by the number of WikiTree profiles in those same data sets. Currently, that gives me an average score across all the data sets of 0.41 out of a possible maximum of 3.00. (Yes, that looks really bad, but that's largely because I haven't gone through the 484 Slade profiles from the Slade Genealogy site and assigned sourcing levels to them. If I remove those profiles from the equation, the average score for the remaining 344 profiles rises to 0.99, which is better, but still not very good.) If the average is less than one, then it means that most profiles in those data sets don't even have one primary source.

I haven't added this new measure to the table, and I don't have a chart for it, since it's only a single data point so far, but going forward, that data should start to become more useful.

Notes: 

  1. In my system, a "primary" source is a birth, baptism, census, marriage, military, death, or burial record. A "secondary" source is an entry in Wikipedia or some other encyclopedia, ThePeerage.com, an online family tree, or a book or article which covers (or at least mentions) that person.
  2. I don't use scores like 1.5, 2.5, etc. The 0.5 score is only to show that there is something in that profile which is "better than nothing", like an entry in ThePeerage, Wikipedia, or some family tree online somewhere.) 

I just found this post.  I have no specific comment related to it other than I'm impressed with the amount of data and the presentation. 

I will say that some of the unconnection will result from, as someone said in previous thread, bad geneology, but also from people like me who enter in all they can based on what they find in their research.  (in my case, I upload information based on stones that I photograph in cemeteries) There are times when I do it just for global cemetery project and go as far as I can, but it takes me on a tangents (like ALL the siblings and their descendants) and then I get bogged down with this and don't get to the uploading of the photos and information related to it. There may be others who upload when they can and hope that someone else "fixes" it.  In the end, I hope that the goal of WikiTree will be achieved rather than creating a mess. 

I have enjoyed the people who contact me asking for more information on the people that I have uploaded and hope that I have helped in some small way towards the goal. wink

After a couple of years of talking about this idea, and about measuring other things than how many profiles are connected to the main tree, and specifically the state of sourcing, I finally decided to do something about trying to measure the state of sourcing on WikiTree. As of the past hour or so, the numbers look something like this:

  • The total number of profiles on WikiTree is 22,170,511.
  • If I'm reading Aleš's WikiTree+ reports correctly, there are currently 987,809 profiles with the {{Unsourced}} template applied. (I'm assuming, possibly incorrectly, that that number includes {{Unsourced|Placename}} variants.)
  • Again, if I'm reading Aleš's WikiTree+ reports correctly, there are currently 21,020 profiles with [[Category:Needs_More_Records]] (again possibly including [[Category:Needs_More_Records|Placename]] variants) and 334 profiles with [[Category:Profiles_With_Incomplete_Sourcing]] (and possibly regional variants) applied, for a total of 21,354 profiles between those two variants.
  • Subtracting 987,809+21,354 from 22,170,511 leaves 21,161,348 profiles that possibly have anywhere from one source up.
Profiles with Unsourced Template, Needs More Records, or No CategoriesNow, I know these numbers are wrong, for a number of reasons:
  • For the most part, people have to apply the {{Unsourced}} template manually, and there are probably thousands of profiles on WikiTree which haven't even been checked to see whether they're sourced or not, let alone had the template applied.
  • Similarly, [[Category:Needs_More_Records]] absolutely has to be applied manually, and while it definitely seems to be catching on, as near as I can tell it was only created in September 2017, so it hasn't been around anywhere near as long as the {{Unsourced}} template, and not very many people even seem to know that it exists.
  • There are also large numbers of profiles set to Public or higher, so, whether they're sourced or not, nobody can do anything (whether adding sources, the {{Unsourced}} template, or [[Category:Needs_More_Records]]) except for the Profile Manager for each profile.
Similarly, there are a number of things that could be done to improve the accuracy of these numbers:
  1. Somebody (or, to be more honest, a whooooole bunch of somebodies) would have to go through every Open profile on WikiTree, evaluate the sourcing situation, and apply the {{Unsourced}} template or [[Category:Needs_More_Records]], as appropriate. This would kind of go against the usual habits of Sourcerers, who normally go looking for sources rather than tagging a profile and moving on, but part of fixing the problem is identifying those profiles which need help. (And, yes, making that happen may well require a thon-scale effort, or at the very least hundreds of volunteers and a lot of coordination: "Okay, you check the profiles of people born in New South Wales between 1800 and 1899 with a Last Name at Birth beginning with 'G'...") One bright side would be that the number of potential profiles for people to work on during Saturday Sourcing Sprints and the next Source-A-Thon (or at least the ones that are already tagged and easy to find) would probably increase about ten-fold.
  2. I need to talk to Aleš and ask about what syntax I need to use to search for the number of profiles which are both set to Open and have either the {{Unsourced}} template or [[Category:Needs_More_Records]], or neither.
  3. A whole bunch more people need to learn about and start applying the [[Category:Needs_More_Records]] categories. I have a hard time believing that the vast majority of profiles are well and truly sourced, with 4% unsourced, and only a few thousand in the middle. Even without looking at the data, I'd expect something more like a bell curve, with a small(ish) number of profiles with no sources at all, a similarly small(ish) number of profiles that are well-sourced, and most profiles somewhere in the middle.
  4. The Categorization Project (when they have time, since they're dealing with a Lot Of Stuff) might want to consider whether it makes more sense to roll the [[Category:Profiles_With_Incomplete_Sourcing]] categories into the [[Category:Needs_More_Records]] categories, since they're pretty similar, and it's clearly the latter that have taken off.
+7 votes

Hi Greg,

As a response to your "How to increase a country's presence on WikiTree" I made this comparison for some European countries that have profiles on Wikitree:

Land WT prof Unconn Pop ppm
Denmark 74865 9110 5.5 mil 13611
Norway 98154 11613 4.7 mil 20883
Sweden 160544 26374 9.2 mil 17450
Netherlands 347760 9021 16.5 mil 21067

The numbers mentioned are taken from DBE_Unconnected_Europe (august 9) and the population in 2002 according to Wikipedia. The number ppm (profiles per million current inhabitants) does not mean anything, it only makes some sort of comparison possible. This setup could be used to find countries to create challenges like the one you did last week in The Netherlands. 

For my own motivation to keep connecting new and unconnected branches, I keep a spreadsheet with the numbers from DD_Unconnected_List_NLD. Since I started doing that in april 2018 I see this:

Date Unconn/Total
April 1, 2018 11213/179379
August 19, 2018 11019/198572
April 7, 2019 8808/235945
August 18. 2019 8327/257644
April 12, 2020 7042/313150

And today the new numbers: 9065/349957

I have seen a huge increase in number of profiles added with The Netherlands as country and until april this year a steady decrease of number unconnected profiles.

by B. W. J. Molier G2G6 Mach 9 (90.2k points)
edited by B. W. J. Molier

Those unconnected numbers for the Netherlands are excellent, B. W. J.! I calculate today's unconnected numbers as 2.89%, as compared to 13.94% for WikiTree as a whole. It's also very cool that you have consistently been able to drive down the unconnected numbers, not just as a percentage, but in absolute terms. That has only happened once for WikiTree as a whole.

+3 votes

B.W.J.'s answer prompted me to do some number crunching on a different measure that I've been thinking of for some time. Namely, I've been trying to figure out some way to represent how much a given country is under-represented on WikiTree. That is to say, the difference between the number of profiles from that country are on WikiTree, and how many profiles we'd expect to see from that country if the profiles on WikiTree perfectly represented each country's share of world population. 

Just to see a sample of the way things are going, I looked at Wikipedia's List of countries and dependencies by population, and picked the ten largest. Then I took another ten countries which have active projects, for which I have run challenges, or I have been working in for some other reason. For each country, I captured the total population and the percentage of the world's population. Then, using Aleš's reports, I looked up how many profiles are from that country. From that, I calculated what percentage of all WikiTree profiles are from that country.

Let's start with the obvious: the United States of America represents roughly 4.23% of the population of the world. The total number of profiles when I checked earlier today was 24,813,446, so 4.23% of that would be about 1,049,609 profiles. However, the number of profiles from the USA actually comes to 11,668,636, or 47.03% of the total profiles. So the USA is over-represented on WikiTree by over 10 times. (It's no wonder that so many people from other countries visit WikiTree, poke around for a bit, conclude that it's a site for Americans, and then leave!)

Please note that I am not saying that there are too many American profiles on WikiTree, or that we should stop adding profiles for Americans. All those profiles still only represent 3.53% of the current population of the USA, never mind all those Americans who are no longer alive. But what I am saying is that the ratio of profiles to people is far worse for other countries than it is for the USA.

by Greg Slade G2G6 Pilot (678k points)
edited by Greg Slade
About 865,000 orphaned profiles with neither birth nor death date, if I'm remembering the number correctly. Best guess, many of those are unsourced.

As for unsourced profiles, my rough estimate would be that 50% to 75% of unsourced profiles actually have the unsourced tag/category.

It's also cool that the Netherlands has profiles roughly equal to 2% of the current population. (Although Ireland has everybody else beat there, too.) Total WikiTree profiles are equal to 0.32% of the current estimated world population. I think it would be great if we could drive that number up a little bit each month, paying special attention to those countries where the percentage is below average.

我在香港的時候應該花更多的時間和精力在學習廣東話上!

Kay, I agree. There are huge numbers of unsourced profiles, and many of them don't have the {{Unsourced}} template. I love it when I read somebody saying that they're working on applying it where it's needed. Yes, sourcing those profiles would be very good indeed. But even applying the template at least gives us a clearer picture of the scale of the problem.

There is also a category for Unlocated Profiles, but I'm pretty sure that it probably includes less than 1% of the profiles which have no location information in them.

Like somebody said once, "What gets measured gets managed."

Oh, yes. For those wanting to know which of Aleš's reports I got the numbers from, take a look at the DBE Unconnected page. You can drill down by general area, and then select the country you're interested in. For most countries, you should see an entry near the bottom of the section for that country that says something like:

Table prepared at 13.10.2020 08:10:51 (Slovenian time). Condition to prepare list (Country is Canada). Profiles: 1017567

If I understand correctly, that last number (1017567) is the total number of profiles located in that country. 

There are over 150K of Dutch profiles that were never edited (after the day of creation). I expect large numbers of them are unsourced without being labeled as such...
Greg,  I guess I do not understand the metric at all.  In 2011 it was estimated that there had been a total of over 107 billion people who had inhabited the earth.   Of those, about 50 billion have births in AD, with about 21 billion since about 1650.    The estimate about how many people have lived in the United states since 1600 is 650 million....or 3% of the total world population since apporx 1600.  Since WikiTree represents profiles starting in 0001 AD, how does the current population numbers relate?

If you are interested in Netherlands, these two GEDCOM might be a place to start Bouwmeester Family Tree_2014-02-27_01.ged and Zwinderman.ged

Robin, I understand that the current population of a given country now isn't going to be the same as the total population  of that country (or at least, the land that it currently occupies) since 1 AD, but since I didn't have access to more precise numbers, using the current population should be close enough to act as a rough approximation. (Or, as a friend of mine used to put it, "Close enough for Rock and Roll.")

Only very rough, of course, since wars, natural disasters, migration, and so on keep changing the rate of growth in different countries, to the point where the population is actually declining in certain countries (Japan, Russia, and much of Eastern Europe, for example), even as it increases in others. Still, I think it's reasonable to assume that China and India have had a larger population than the USA (and France and the UK have had smaller populations) for a good long time.

Nevertheless, the numbers you give intrigue me. The 107 billion in 2011 sounds like the estimate by the Population Research Bureau, whose numbers I used to calculate the "WikiTree Profiles per Million People Born Since 1 AD" and "People Per WikiTree Profile" charts earlier on in this thread. But I have never seen a breakdown by country. Where did you get the numbers for the United States, and does that source provide estimates for other countries?

The 15 Nations Project thread prompted me to take another look at the numbers, so I added more countries until I had the largest 31 countries in the world, plus the extra countries from the first round. Then, I looked up the current population numbers and number of WikiTree profiles. (For those countries not listed in the unconnected profile reports, I did a search like on WikiTree+ like:https://wikitree.sdms.si/function/WTWebProfileSearch/Profiles.htm?Query=BirthCountry=Kenya .)
 

Country Population Rank Profiles Score
Democratic Republic of the Congo 99,010,000 15 25 0.00006
Nigeria 218,541,000 6 66 0.00007
Ethiopia 105,163,988 13 32 0.00007
Bangladesh 165,158,616 8 58 0.00009
Tanzania 61,741,120 23 28 0.00011
China 1,412,600,000 1 4,421 0.00076
Thailand 66,874,167 22 210 0.00077
Kenya 47,564,296 30 214 0.00110
Pakistan 235,825,000 5 1,337 0.00138
Iran 86,045,623 17 544 0.00154
Myanmar 55,294,979 26 403 0.00178

The last column, "Score", is what I call a "Representation Score". When I checked today, there were 32,770,043 profiles on WikiTree. That represents 0.41% of the current estimated world population (8,000,443,000). So if a country has profiles for people born in that country equal to 0.41% of the current population of that country, it would get a "Representation Score" of 1.0. So scores under 1.0 suggest that a country is under-represented on WikiTree, and scores over 1.0 suggest that a country is over-represented on WikiTree.

For this round, while I crunched the numbers for 36 countries, I'm only tabulating those countries with the lowest Representation Scores out of that set. (And, as it happens, all of those are in the 30 largest countries in the world.) It is not my goal to try to discourage people from adding profiles for people from over-represented countries (or, worse yet, delete existing profiles), but rather to encourage people to pay a little more attention to under-represented countries.

The thought occurred to me that the same kind of calculations could be applied to regions within a country, so that even people who only know how to source profiles within their own country could at least try to pay attention to under-represented counties/departments/districts/landes/provinces/states/territories (or at least those which aren't quite as over-represented as others) within that country. So, I tried crunching the numbers for Canada:

Province/Territory Population Profiles Score
Nunavut 36,858 40 0.26
Alberta 4,601,314 39,738 2.11
British Columbia 5,368,266 67,968 3.09
Yukon 43,964 886 4.92
Northwest Territories 45,602 1,419 7.59
Ontario 15,262,660 500,986 8.01
Saskatchewan 1,205,119 41,342 8.37
Manitoba 1,420,228 50,555 8.68
Québec 8,751,352 355,052 9.90
Newfoundland and Labrador 528,818 71,806 33.12
Prince Edward Island 172,707 27,772 39.22
Nova Scotia 1,030,953 168,545 39.87
New Brunswick 820,786 181,460 53.92

Related questions

+19 votes
39 answers
+25 votes
45 answers
+30 votes
63 answers

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...