WikiTree Network Defined

+18 votes
758 views

It was sooo exciting to see so many of you respond to my last post "WikiTree and Network Theory". You all had such interesting questions and ideas! I've added a second blog post in my series exploring the WikiTree Network. In this post, I dig into the details of how to define the WikiTree Network. In fact, I introduce 3 different networks that all take slightly different approaches to representing the connectivity of the WikiTree dataset. I hope you enjoy! Please keep leaving me questions and comments here, I love to hear from you all!

https://www.sligocki.com/2021/06/24/wikitree-network-definition.html

I promise, the next post will have some exciting new discoveries :)

WikiTree profile: Space:100_Circles
in The Tree House by Shawn Ligocki G2G6 Mach 2 (29.1k points)

4 Answers

+16 votes
 
Best answer

I did something like your Person Network for my own closest circles, manually, some time ago. IFamily Networkt's on a Space page here. This image includes two circles; I managed to squeeze also the third circle into 2D space. Continuing further made no sense to me.

Your Bipartite Network seems to be how the database of thedesktop software I occasionally use is organized. I's a bit fond of creating duplicate family nodes that have to be merged :-)

I like your Family Network, with nodes only for each family. With the introduction going from the first, through the second to the third, I don't think it's hard to understand, basically.

It does need a few more complicated examples; I'm thinking of people with multiple marriages but there may be other complications.

by Eva Ekeblad G2G6 Pilot (570k points)
selected by Shawn Ligocki

Indeed Eva my first thought was how this bipartite graph model applies to multiple marriages and their children. What is the nuclear family in the case of my great-grand-uncle Pierre Marie Vatant, with his four wives, three of them being widows. Does it include only his own 6 children, or the 25 children of his wives all together? Or do we have a nuclear family by mother?

Not that I dislike the bipartite graph idea, which brings me back 20 years, when I was working with a couple of graph wizzards on a formal model for Topic Maps. In a Topic Maps representation, persons would be "topics" and families would be "associations". In the book XML Topic Maps published in 2002, one chapter was dedicated to such a genealogical Topic Map.

From MacFamilyTree

I didn't think of a multiple-families problem until in the next step, where the families have been "packed into" nodes.

I made a screen dump of a serial-monogamy family from MacFamilyTree, where I think the database is organized in bipartite fashion. It doesn't visualize all that well in a snapshot, but with the tree in dynamic mode and the other representations available it works well - apart from the creation of extra family nodes if you don't look out, which kind of pushes their existence in your face.

Bernard, the nuclear family concept looks like a genetic connection, so there would be one per mother, and Pierre would just happen to be common to all of them. Not sure how Shawn is treating adoption, but it could be assumed to be equivalent.

Eva, I have always enjoyed the visual aspects you've brought to the Circles project.  You really have an artistic touch.

(Readers who are interested in seeing more of Eva's work can find a list of her free-space pages in the "See also" section of the 100 Circles main page.)

I have indeed left out some of the complex corner cases, most notably multiple marriages. The way I handle multiple marriages is that each union (pair of parents/spouses) forms a distinct Family node. In the Bipartite Network, those Family nodes are all connected to the person who married multiple times. In the Family Network, they are all connected to the childhood family node of the person who was married multiple times.

As for adoption and more complex social family connections, I am so-far keeping this simple and using whatever data comes out of WikiTree. My idea here is for this to be mostly a genetic network where the child is connected to the Family node for their genetic parents. I would love to see more diversity of social connections being included in WikiTree. Right now we record marriages (which are non-genetic) in a structured way, perhaps in the future we could extend this to other non-genetic personal connections (long-term partners, adopted parents, god-parents, close friendships, etc.).
Thanks Eva, I have seen these images you produced for your circles and I think it is partially what motivated me to make these images of my own!

This image is a perfect encapsulation of why I was motivated to look at other Network representations :) It is strangely beautiful, but also difficult to interpret and forboding to consider expanding further.

Shawn, this I'm not completely happy with :

Now, how do we draw edges on this network? For every person, we connect their childhood Family node to their adult Family node (if they ever marry/have children).

I agree each person appears as a child in at most one Family node (the node of her parents), but she can appear in several nodes as parent is she has mutiple unions. So, it should be "adult Family node(s)".

If people are edges, the simple connection rule should be : there is an edge linking two Family nodes if they have a common member. This would settle nicely the case of multiple unions and half-siblings.

Example :

Node 1 : parents John and Mary, 2 children Jane and Paul.

Node 2 : parents Fred and Mary, 1 child Debbie.

Node 3 : Fred and Jenny, 2 children Pat and Julie.

There is an edge between N1 and N2 (Mary), and an edge between N2 and N3 (Fred). N1 and N3 are at distance 2. That way half-siblings belong to nodes at distance 1, full siblings belong to the same node.

I am writing a post now to clarify this detail concretely. But as a spoiler, in my current model I am not connecting together directly two Family nodes from a person's remarriage. This is a subjective judgement I have made and I can understand why you would prefer the opposite choice. The reason I made this choice is that I want to avoid cliques as much as possible from popping up in the Family Network. For example, if you connect all unions a person has to each other, then Brigham Young causes a 56 node clique to appear (with 1540 edges) and this really distorts the network. Unfortunately the downside of this choice is that the Family Network is no longer a Bipartite Network Projection of the Bipartite Network and thus the distances are not precisely half of the Bipartite Network. Sigh, c'est la vie.

I see the point. But simplifying the graph for sake of simplification should not create artificial distances between people who have been raised like siblings and consider themselves so, never using this ugly "half" qualifier. I've been a stepson myself, have children and stepchildren, and this distance 2 is really too far away.

You can say I'm very biased by my personal story. I am. But I'm not the only one.smiley

Bernard, yes I understand what you mean! I was raised by a step-mother for most of my life and she is also distance 1 for me :) But then, let me tell you another story: My grandfather had two children with his first wife, they divorced and he married my grandmother and they had one child, my dad. So my dad has two half-siblings, but these two families were raised 100% separately and my dad rarely met, let alone got to develop a sibling relationship to his half-siblings. In this case it feels like distance 2 (or more). And how about the case of adoption? Or those raised by their grandparents?

It seems impossible to me to deduce who the person "felt" their "family unit" simply by looking at the WikiTree data dump. Perhaps at some time in the future we will have a richer API that allows specifying something like "intimate family connections" or "households", etc.

Also, I think there needs to be a separation made between different uses of a network model. One use is to specify the precise details of these intimate connections. And so it makes sense that WikiTree shows your half-siblings as distance 1 (although I think it does not show step-parents/children as distance 1 currently). Yet, my goal is to try and find some high-level topology of the Network as a whole which avoids the details of individuals. And for this purpose, I have found that avoiding cliques seems quite useful. And so this model is not quite as good at representing the local details everywhere of the Person Network, but it does serve another purpose.
Bernard and Shawn, It doesn't seem to me that Shawn was simplifying for the sake of simplification, and even if he was, isn't it inevitable that all the little details will be lost in the big picture?
The database or the connection finder don't care about social or psychological distance. They just go by the simple rules set for them.
+11 votes
While I like seeing all the edges, and in general I dislike treating genealogy as a collection of "families" (mostly because records dont attach to families, they attach to individuals), in this particular case i like the simplification of the graph into familial nodes.

I do not agree with this last statement though, "family units as objects (nodes) and People as connections (edges)". The connection isnt actually people at all, it is a true representation of a "relationship" that is independent of the specific people, and yet expressed and defined by people if you look close enough at the detail. I also dont think you are excluding childless or unmarried people at all, they still could be identified as part of their family with their parents and siblings, there just isn't an additional node for their own spouse and children.
by Jonathan Crawford G2G6 Pilot (278k points)

Jon, without presuming of Shawn's answers, my understanding of "Families as nodes and People as edges" is a purely mathematical one : there is an edge in the graph between two Family nodes iff they contain the same Person. Family relationships (parent, child, sibling, spouse) appear only when you "open the node", so to speak.

Saying "people are edges" is just a provocative shortcut, and I agree it can be confusing for people not familiar with graph theory, and the notion of dual graphs.

Something I'm not very happy with is the vocabulary, though. "Family" is too overloaded a word, and many people will argue that the definition of "Family node" conflates with whatever, more or less fuzzy, meaning they give to this term. So, I would suggest Shawn to call the nodes in this model a more neutral name, to avoid such misunderstandings. Or to be crystal clear that a "Family Node" in the model is maybe not what one usually calls a family. An occasion to (re)read 白馬非馬 . smiley

"Gene puddles"?
Jonathan: Glad you like it. I agree that the edges in the Family Network can represent all the complexity of a relationship between two family units and not just a single person :) My statement was mostly a quick jot down of my ideas here and, as Bernard mentions, the fact that there is a unique Person to apply to each edge encourages me to think about that person as being the edge.

Bernard: I both like and dislike the name "Family node", if you have any suggestions for an alternative name, I'm open. I like it because it's simple, I would rather not have to say "Nuclear Family Unit node" or some other large phrase every time I talk about it. Yet, I agree with you that Family often describes much larger groups than this simple parents+children unit. In my code I call them Union nodes because they are defined by a union of two people (either officially through marriage or informally through co-parantage). I have also called them Nuclear nodes for specificity (although I don't love this because of the linguistic connection to atoms and bombs).
Shawn, maybe I'm just too simple-minded, but I like the term "Family node" just fine.  When I hear the word "family" I generally think of a nuclear family, and in my non-mathematician's mind, the word "node" reinforces that idea.  I suppose it partly depends on who you see as your target audience.

I found your second blog post really, really interesting.  I do agree with one point Jonathan made above--it seems confusing when you say in the Family Network section that people who never marry or have children are completely excluded.  In your illustration, persons B and N are included with their parents (as it seems to me they should be).
Julie: Glad to hear you like "Family node". Perhaps this could be a cultural thing? Perhaps we Americans are more likely to interpret Family in our individualistic Nuclear Family way (a la Family Group Sheet) whereas perhaps French folks are more likely to have a communal mindset or have multi-generation households and thus interpret family more broadly? Just musings, perhaps Bernard will correct my stereotypes :)

I've decided to remove the comment about unmarried people being excluded from my post. You're right, they are included as part of the family they were born into. They are, perhaps, less emphasized than married folks in the Family Network ... but they are also less emphasized than married folks in the Person Network, etc. So perhaps this is not very notable.

Your stereotype of French multi-generation family is not completely wrong. At least it used to be that way in many places, in particular when successive generations lived in the same place. To be "de" somewhere/something is not a mark of nobility, as is often believed, but the mark of some ancestral attachment to a place, be it a modest farmhouse or a large castle, and is still used a lot where I live in the mountains. You will hear speak about les Fournier de Ceillac, les Bonnafoux de Risoul, who are just commoners, but locals know pretty well the limits of such familles, who's been in for generations, who are the friends and the hereditary foes, the succession issues, the fratricid wars, terrible secrets etc.

This story, twenty years ago in the village I was living. A young woman marries the son of the mayor. The day of her marriage, her new mother-in-law warns her : "Now you belong to our family, from now on, those are the ones you won't greet anymore : ..."

And this harsh notion of famille is still in the background of many people, although it's been seriously challenged by social mobility, distances between siblings, and growing frequency of monoparental families following divorces or preceding marriage etc. More American, in short :-)

+13 votes
Shawn, thanks for this, see also my comments to Eva's and Jon's answers, but I have a precise question I'm sure you have considered.

In your data analysis, you have looked at the distribution of distances between "Family nodes", defined as length of shortest path. How does this distance compare with the usual distance between people as per the Connection Finder? Is it correct to assume that the distance between Family nodes is always slightly lower or equal to the CF distance? In your toy example there are people at  CF distance 3 while distance between their Family nodes is only 2. Have you looked at how this is scaling?
by Bernard Vatant G2G6 Pilot (171k points)

I have wondered about this as well hoping that there would be some consistency between the two. Unfortunately there are many corner cases.

In the normal case, the distance in the Bipartite Network is 2x as long as in the Person Network (because you must step through a new Family node between each Person-Person connection) and I think the distance in the Family Network is one less than in the Person Network (assuming you pick the start and end Family nodes optimally ... since a person now belongs to multiple nodes, there is choice involved there).

The complications begin to arise when you have multiple-marriages and most notably with half-siblings. In the connection finder (and thus in my Person Network) half-siblings are connected directly. However in the Bipartite Network, you must take 4 steps to get from one half-sibling to another (through the shared parent node). And in the Family Network you will need to take 2 steps to get to a half-sibling node (up to the parent's childhood node and then back down to the other Family node).

See my above proposal to bring Family nodes of half-siblings to distance 1, the common parent being the edge.
+10 votes
All of the network definitions (and Connection Finder) treat a sibling relationship as a link or connection. That is inconsistent with standard genealogy practice, and I think misleading. The relationship between siblings is derived solely from their parents; it is not an independent connection. Connecting two siblings is no more valid than connecting first cousins (where the connection is derived from the grandparents) or second cousins (where the connection is derived from common great grandparents). Connecting siblings also loses information about half-siblings.
by Chase Ashley G2G6 Pilot (312k points)
Well, that's definitely a reasonable opinion. You could easily define a Person Network variant that did not include sibling connections and I think there could be a lot of value to that model. However, I disagree with you on two different counts here: (1) The Bipartite and Family Networks do not treat sibling relationships as connections and (2) I don't agree that treating siblings as connections is "inconsistent with standard genealogy practice" or "misleading". It is just a different choice that has pros and cons depending on the use.

I think this all comes down to what your goal is. If your goal is to figure out your interpersonal connection to someone (a la Connection Finder), then I think considering siblings to be one step away makes a lot of sense, my sister-in-law is my brother's wife, not my dad's son's wife. I grew up living with my brother, he is as close as my parents are to me. However, for the purpose of constructing a network, I do prefer the Bipartite or Family Networks because they do not add all of these redundant sibling connections. When you add a sibling to the Bipartite Network, you add one Person node and one edge connecting that person to the Family which feels to me like what is actually being done conceptually, adding a person profile and connecting them to an existing family.

Related questions

+16 votes
3 answers
435 views asked Jun 24, 2021 in The Tree House by Shawn Ligocki G2G6 Mach 2 (29.1k points)
+15 votes
5 answers
632 views asked Jul 1, 2021 in The Tree House by Shawn Ligocki G2G6 Mach 2 (29.1k points)
+31 votes
8 answers
+22 votes
5 answers
813 views asked Jun 29, 2021 in The Tree House by Aleš Trtnik G2G6 Pilot (805k points)
+14 votes
9 answers
+30 votes
2 answers
472 views asked Jun 2, 2018 in WikiTree Tech by Paul Gierszewski G2G6 Mach 8 (88.9k points)
+34 votes
8 answers
+13 votes
5 answers
+25 votes
1 answer
+35 votes
2 answers

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...