Connection_Facts.jpg

Connection Facts and Figures

Privacy Level: Public (Green)
Date: [unknown] [unknown]
Location: [unknown]
Surname/tag: connectors
This page has been accessed 807 times.

We are all connected.

But what does "connected" mean? And who are "We"? And how are we connected?

Contents

Vocabulary

  • Connection between profiles uses the family relationships: parent, child, sibling, spouse. Two profiles are said connected (to each other) if there is at least one path in WikiTree linking those two profiles, using a sequence of the above said relationships. The Connection Finder application allows to easily find out if, and how, two profiles are connected. Example : Mary Stuart and Barack Obama are connected.
  • The length of a shortest path, as indicated by the Connection Finder, is called the distance between the two profiles. This distance is measured in "degrees" or "steps". Mary Stuart and Barack Obama are at distance 20 from each other, in the current state of affairs (Jan 2022 - down to 19 in July 2023)
  • Two profiles being "connected" does not mean they are "relatives" or "related" in the common meaning of the term (ancestors, descendants, cousins, collaterals ...), unless under a very extensive definition of the notion of family. Which is, actually, the philosophy of WikiTree.
  • The "Big Tree" or "Main Tree" or "Single Tree" or even simply "The Tree" is gathering about 85% of profiles, about to pass the 32 million milestone in Januay 2024 (30 million by July 2023). All profiles in the Big Tree are connected to each other. The usual WikiTree jargon is using "connected" and "unconnected" in an absolute way to indicate if a profile belongs or not to the Big Tree. The number of connected profiles is given by the Connection Finder page, updated daily.
  • A profile connected to no other one is said "unlinked". Otherwise it is sometimes called "linked" or "related", both terms potentially confusing.
  • An "unconnected branch" is a set of profiles all connected to each other, but not to the Big Tree. An unlinked profile is under this definition, technically an unconnected branch of size 1, although calling this a branch might seem strange.

This vocabulary is not globally consistent, and people don't always use it in a consistent way, and it's easy to be confused when the context is unclear. For example "an unconnected branch is composed of unconnected profiles, all connected to each other", sounds weird until you grasp that "unconnected" is taken with an absolute meaning, whereas "connected" is relative (to each other).

Many small unconnected branches

How many distinct unconnected branches there are outside the Big Tree is not obvious to assess. Looking up the Unconnected People pages tends to indicate that most unconnected branches are very small, like ten members or less.

Thanks to data provided by Aleš Trtnik on the total number of branches and their distribution by size, we got a confirmation of this first impression. The following table is a summary of this distribution, data as of January 2022, updated in January 2024.

Size 1 (unlinked) 2 - 5 6 - 10 11 - 50 51 - 249 250+ total
Jan 2022 1,093,595 270,986 83,758 68,608 5,765 338 1,522,712
Jan 2024 1,366,074 335,993 98,335 70,737 4,604 256 1,875,743

The evolution shows a diminution of 20% for the number of large unconnected branches (size over 50). Those are listed and in the radar of Connectors. But on the other hand the number of very small branches (size up to 5) has grown by about 30%.

The following plots gives the number distribution for branches of sizes from 2 to 20 in January 2022. It shows clearly the exponential decay of the number of branches. The distribution as of January 2024 is similar.

Number of unconnected branches by size

Details of the "long tail" of largest unconnected branches are found in the Largest Unconnected Branches page. As of January 2024, the largest branch counts over 5,000 profiles, and there are 18 branches of size over 1,000.

No large branch outside the Big Tree

No unconnected branch has more than a few thousands members. Even the most "exotic" and largest ones are smaller than the Big Tree by at least 3 orders of magnitude.

Why is it so? Let's try two kinds of explanation.

First obvious explanation is Connectors! Large unconnected branches are in their radar of course, and they work hard, 24/24 and 365/365, to connect them to the Big Tree. They are of course all the more efficient than the countries of unconnected profiles are already well represented in WikiTree, the sources easily available etc. It explains why most largest branches belong to regions of space and/or time a bit distant from the (Western Anglo-Saxon) bulk of the Big Tree, making them more difficult to connect.

The other possible explanation calls probability computation, and is more arguable, but let's try it nevertheless.

The total number of humans likely to ever have a profile in WikiTree - that is whoever has been named and documented in at least one reliable source - is difficult to assess. Most people who lived before the recent centuries have never been documented. The possible number of potential profiles could be anywhere in the 10 to 50 billions range, growing by tens of millions of births yearly. For the present computation, let's take a conservative upper value of 25 billions.

The current size of the Big Tree, 25 millions, is roughly one thousandth of the above. Otherwise said, if you pick a random documented human X, the probability for X to have a profile in WikiTree is about 0.001, assuming that the Big Tree is somehow a big random sample of all potential profiles. Otherwise said, the probability of X to fall outside the Big Tree is 0.999.

Now take a random branch of n profiles. The probability that no profile in this branch fall in the Big Tree is about 0.999^n. Let's compute it for different values of n.

  • 0.999^100 = 0.90
  • 0.999^1000 = 0.37
  • 0.999^5000 = 0.01
  • 0.999^10000 = 0.000045

A random branch of 100 members has only 10% chance to be connected, the probabilty is 63% for 1,000 members, 99% for 5,000 ... and practically 1 (certainty) for n over 10,000.

Branches are not random samples of documented humans, but the more they grow, the more they expand in all directions and the closer to random they become. Samely, the Big Tree is not a random sample of all potential profiles, but the more it grows, the more random it becomes. Nevertheless, despite all its biases, this simple model agrees in its main conclusion with the observed data : unconnected branches size is very unlikely to pass the thousands threshold.

Time distribution of connected and unconnected profiles

The following plot shows the breakdown of connected and unconnected profiles by ten-year period, as provided by the WikiTree+ application (data January 2024).

Breakdown of connected profiles by ten-year periods

Several remarks about this plot

  • Each profile is counted in every period the person has been found living, based on presence of birth and death date.
  • It is based on WikiTree+, so it excludes Living and Private profiles, which explains the low numbers of profiles post-1950, either connected or not.
  • Despite this bias, the best connected period is around 1800, where almost 95% of profiles are connected.
Breakdown of connected profiles by ten-year periods

Distances in the Big Tree : closer and closer

The 100 Circles and related pages are discussing in details the distribution of distances in the Big Tree, aka "population of circles" for various reference profiles. We list here the main points to bring home.

  • Central profiles of the Big Tree have a typical mean distance of 20 degrees to other profiles, with 95% at distance lower than 30. The long tail of far-flung branches, extending typically at distances from 40 to 80, is representing less than 1% of profiles.
  • The average distance between two random connected profiles, based on samples constructed using Jamie Nelson's very cool application, can be assessed, as of Aug 2022, to be in the interval [24,27] with a confidence of about 95%. See G2G discussion for more details.
  • The mean distance is slowly but steadily getting smaller for all studied reference profiles. The peak of the distance distribution is getting sharper and shifted to lower values. The following table shows the changes in mean distance for a few reference profiles we've been following in the framework of the 100 Circles project.
Profile Jan 21 Jan 22 Jan 23 Jan 24
Samuel Lothrop 17.5 17.3 17.1 16.9
Queen Elizabeth II 22.0 21.8 21.4 21.1
Olof Andersson 32.0 30.0 25.0 24.5
Jean-Joseph Vatant 33.1 32.4 30.9 30.6

Could this trend keep on going? There are still a lot of missing relatives everywhere in the Big Tree, and more reconnections will happen, and distances are likely to keep getting smaller in the near future.

There are of course limits to the collapse of distances and completion of circles. In the page Ten circles meet each other, based on the case study of Samuel Lothrop, we conjecture that in the long run, most profiles in the Big Tree could be less than 20 steps from each other.

All streams flow to the sea

For all of us connectors, the main practical lesson to bring home from the above facts and figures could be the famous quote of a great martial artist. Be like water, my friend.

Don't waste your energy banging your head on a brick wall when so many side paths are open, waiting for you to pass by without efforts. Don't fight an impossible way uphill when the flow is inviting you downhill or sideways. If you don't find the father, follow the sister and her in-laws to wherever they want to lead you.

Like water, follow the flow through all open serendipitous paths, let connection streams find their way through any obscure narrow ravines, let them meander in unknown valleys. As long as they flow, at some point they'll reach the ocean of the Big Tree.

Like water, move gently outwards and around, in all possible directions, leading you beyond your current projects, categories, countries, families, names, languages ... all things likely to close your eyes and mind to possible paths. Don't be afraid of following the outliers, the renegades, the migrants, the illegitimate, the runaways, all likely to be the best of connectors.

Forget where you're bound to go, the path will lead you there. And enjoy each step.





Collaboration


Comments: 13

Leave a message for others who see this profile.
There are no comments yet.
Login to post a comment.
This got surprisingly poetic there at the end. :-) Love this philosophy.
posted by Savanna King
I agree, because in a sense WikiTree is not an idea tool for the masses of individuals that you pull out of the records with the same surname. However, logically they should represent a much smaller set of connected family lines, that may intersect at some time in the undocumented past. In WikiTree, an ONS depends upon all the differently surnamed relatives to connect them into the tree, rather than leaving short strands floating unconnected. At some point I am goin to do an OPS and test my conjecture that, distant cousin intermarriages make The Tree far more interconnected than we might think. I am not sure what sort of node marking techniques you use in your circle computations to ensure that you don’t do double counting?
posted by Stephen Adey
The circles computation is made by a query provided by Aleš. I trust him on the algorithm being right, and certainly won't be able to check it anyway. You can ask him directly.
posted by Bernard Vatant
No, I was just thinking out loud about the impact of cousin marriages upon the tree. Like you, I will trust Ales to get the computation right. The impact of those marriages on the circles doesn't interest me much, more the visualisation of what they are doing to the network connectivity.
posted by Stephen Adey
Bernard, I vaguely recall (or think I do) the Circles group discussing that question some time ago, maybe in connection with one of Eva's pages...

There is a brief comment on her "Families cluster" page that there is no double counting. https://www.wikitree.com/wiki/Space:Families_cluster

posted by [Living Kelts]
edited by [Living Kelts]
Yes we had several conversations along those lines, mostly with Shawn Ligocki. I have made an intriguing finding in my Jean Joseph circles exploration, which are very endogamic. Many profiles even at small distances are reached by several distinct paths. The finding is that when you have a profile at distance 5, say, and you find another path, it's also about of length 5. Redundancy of paths does not impact drastically the distance. I've observed that at small scale, but enough times to be just a random effect. There must be some underlying rule, but I don't know if it scales to greatest distances. Networks have still a lot of mysteries ...
posted by Bernard Vatant
edited by Bernard Vatant
Visualizing endogamous clusters is not easy.

I think the SpiderWebs app works fairly well: https://apps.wikitree.com/apps/clarke11007/webs.php

Interesting observation about the same-length redundant paths in a strongly endogamic population. I ought to have a few spots I could look and see if I get the same phenomenon.

posted by Eva Ekeblad
Can we learn anything from one name studies about the connectivity of The Tree? Do we generally just end up with shortish lengths of surname continuity, through the well documented 19th and early 20th century, before privacy issues hit the genealogical researcher?
posted by Stephen Adey
I'm not a big fan of one name studies. Indeed, mostly they produce a lot of unconnected branches. Like one place study, actually.
posted by Bernard Vatant
Actually, one of the functions of One Name Studies is to link together profiles in the same lineage which may already be connected to the main tree, but aren't linked to one another.
posted by Greg Slade
Greg, two vocabulary comments

- Profiles "connected to the main tree" are necessarily connected to each other, so how could they be "not linked to each other"?

- What do you mean by "lineage"? Focusing on names, in patronymic (not to say patriarcal) cultures, is de facto considering only male lineage, yet again pushing women aside, and leads to an over-representation of male lines. Which, as everybody should know, are far more likely to be wrong. Motherhood is always more obvious to prove than fatherhood :-)

That said, yes, it can certainly be a useful tool. Just not my way, based on point #2 above. But the wealth of WikiTree is in the diversity of approaches.

posted by Bernard Vatant
Let me explain what I mean by "not linked to each other":

This month, one of the surnames I'm working on is Miller. In working through the Millers listed on ThePeerage.com, I came across Henry John Miller (Miller-30214) and Henry Holmes Miller (Miller-62691). Henry John was the father of Henry Holmes, and his profile was created in 2015. Henry Holmes' profile was created in 2019, but not linked to his father's profile. Because they were both connected to the main tree, you could have traced a connection between them (I didn't bother to check how many degrees it would have taken), but Henry John was not listed on Henry Holmes' profile as father, and Henry Holmes wasn't listed on Henry John's profiles as son. So I linked them together, and now they are just one degree from one another, as they should be. But I wouldn't even have been working on their profiles if I wasn't working on the surname that they share.

And, yes, I have seen what you're talking about. Occasionally, I come across an unconnected branch which just lists son, father, grandfather. Not a single woman in the whole branch. While I recognise that there is systemic sexism in record keeping which makes it hard to find the Last Name At Birth for married women (at least in cultures where the woman takes a man's surname upon marriage), but some of these branches don't even have entries like "Sarah Unknown", so it's hard to escape the conclusion that the sexism wasn't limited to the rcord keepers in the case of those particular branches. (And I have to confess that I'm not terribly sympathetic to the people who created partial branches like that. Normally, I do my best to help connect an unconnected branch, because I remember how discouraged I was when I had built my branch up to several hundred profiles, and still not found a connection to the main tree. But if somebody ignores their own mother, they ignore half of their chances to connect. And if they ignore their grandmothers, they're ignoring 3/4 of the chances to connect, and then 7/8, and then 15/16, and so on. Whatever that kind of thinking is about, it's clearly not about genealogy. Or at least, not as I understand genealogy.)

What I mean by "lineage" is being able to trace *all* of my ancestors back as far as possible. That's why I'm working on 31 different surnames. (It should be 32, but we haven't been able to find either parent of our "brick wall" Miller, and specifically his mother, so I have no idea what the missing surname is.) The farthest back I've been able to trace is my paternal grandmother's line, so if I was ignoring the females in my family, my branch would be really small (and still unconnected). But I don't want just a generation or two for each surname. I want to be able to go back multiple generations for each surname (and for the surnames that being able to go back that far on the surnames I already know would reveal).

posted by Greg Slade
OK, I've no problem with that :-)
posted by Bernard Vatant