Mathematical graph structure of global tree

+17 votes
382 views
Is there anyone doing any sort of Mathematical graph theory analysis of the global tree? Is it possible to get access to the entire tree (or public subset) in order to analyze it's structure? I'm interested in knowing factoids like:
* What is the average distance between two random people in the tree?
* What is the average distance between me and another random person in the tree?
* Who is in the center of the tree (smallest average distance to other random people).
* For the average two people in the tree, how many people/relationships would need to be deleted to disconnect the two?
* How important is each individual connection in connecting me to the global tree.
* Etc
in The Tree House by Shawn Ligocki G2G6 Mach 1 (16.6k points)
retagged by Ellen Smith
Very interesting questions -- and related to the reasons why the Connection Finder tool exists.

Currently most of the people who might be interested in discussing this topic are preoccupied with preparations for the GDPR, but later on you might get some interesting engagement with folks who share this interest.

If  you do start digging in to this, Shawn, give me a shout. I'm very curious and I think that we share a similar perspective. Outside of WikiTree, I'm a scientist, though not a mathematician, but one with a little experience with Network analysis packages and a bunch with data analysis. 

Over the last 3 years, I've explored these questions in a number of ways and I've started writing up some of my discoveries in my blog: https://www.sligocki.com/2021/06/23/wikitree-and-network-theory.html

5 Answers

+10 votes
 
Best answer

There is information about accessing the Wikitree API here:  https://www.wikitree.com/wiki/Help:API_Documentation

by Kerry Larson G2G6 Pilot (159k points)
selected by Eric Weddington
Thanks! I just requested access to the data dump: https://www.wikitree.com/wiki/Help:Database_Dumps

And a full database dump here:  https://www.wikitree.com/wiki/Help:Database_Dumps

+5 votes
Seems like a lot of processing for little return. What do you see the benefit of this information would be to the average Wikitree member?
by Lynda Crackett G2G6 Pilot (630k points)
Using a randomized algorithm to estimate it should be reasonably cheep. Also compute per is very cheep these days. The benefit would be twofold:
1) Curiosity - In the same way that people want to know how closely they are related to the Queen of England, I think people would be curious how well they were connected to the average person.
2) To give some measure of progress a person has made in connecting themselves (or others) more fully into the global tree. I'm new here, but I feel the community would like the global tree as fully connected as possible. It would be nice to have some way to measure that.

Finally, I just personally love data and math and so would be interested in the mathematical structure here. I was curious if anyone else shared my interest.

Hi Shawn, 

I really like your #2 in the above comment. There are periods of time when I "work" and "work" on getting myself "more" connected to the Tree, but I cannot really tell if I've made tangible progress. Some kind of way of measuring this connectedness beyond a simple yes/no "we're connected" or a count of the "relationship steps" between two individuals might be instructive.

Have you continued with your quest expressed in your opening post, and if so, those of us interested may enjoy an update. smiley

Hey Susan, I have been working with data dumps to try and evaluate some measurement of "connectedness". I have two metrics so far:

  1. Average connection distance all other profiles. I iterate over all profiles on WikiTree and find the average distance from you. This gives a very general sense for how closely connected you are to the center of The Global Tree.
  2. Percent of "closest connection paths" which go through individual profiles. Here I evaluate how often your connection to random other profiles goes through each of your direct relatives (say 90% go through your Mother), etc. for each distance away from you. This was inspired by looking into Antoine de Saint-Exupéry's connection. He is vary narrowly connected. I have created graphs to represent these results.
Unfortunately, I have not been able to turn this into a web app that everyone can use. It takes a lot of computation and so I can only run it locally. Here are some data points:
Person Average distance Connection Graph
Queen Victoria 22.29 Graph
Alexander Pushkin 26.36 Graph
Antoine de Saint-Exupéry 43.21 Graph
Albert Einstein 41.39 Graph
Brigham Young 18.38 Graph
me (Shawn Ligocki) 26.90 Graph
Samuel Lothrop  17.69 Graph

Samuel Lothrop has the miniumum average distance of anyone I found.

Note: The Graphs show the starting person at the bottom (using their WikiTree ID), then it lists edges above that to all direct connections for whom at least 5% of all "shortest connections" go through that person. There is a number listed with each edge to show how many % go of "shortest connections" go through that person. Graphs with lots of branches are better connected. Telephone poles are not as well connected.

Sadly, I don't have anything yet you can use to measure your connectedness, but hopefully these techniques or something like them might be useful in the future.
Interesting! Thanks for investigating and sharing. It's interesting for Antoine that after a long spell of little connectedness, he's more connected further back in time ... if I'm looking at the graph correctly. It seems that we collectively have yet to pursue descendant lines for any of his siblings, etc.

You might want to do a shout out to [[JN Murphy | Murphy-4835]] if you haven't already as from the other comments here it sounds like he also was interested in these kinds of metrics.
FYI: Those graphs are a little confusing: Up is NOT necessarily back in time. Up is just more "connections away from the starting person. So for Antoine this is just saying that almost all of his connections to the main tree are through one narrow branch of his tree.

Ah, okay. I see. Thanks for clarifying. smiley

+9 votes

This might be slightly tangential, but I recalled that some researchers had analyzed WikiTree's network a few years ago to examine changes in lifespans. The researchers have a program focused on "computational genealogy" at the University of Washington

There are two preprints of their papers on ArXiv:

A similar project was undertaken with, data from (IIRC) Geni, providing a larger scale version of their data experiment. It was just published in Science:
 
It seems a reasonably hot topic for network research, since these are a form of social network, but which also give new insights into epidemiology, social structure, and genetics. 
 
I'd be interested to see if it were possible to look for data gaps that have been missed on WikiTree, but might be out there in census data. 
by anonymous G2G6 Pilot (128k points)
+3 votes
I didn't see this before.

I do connection validation on complete public tree. I start my tree calculation from Chris Whitten and do complete public tree in currently 123 cycles. Each cycle I add all parents, children and spouses. Tree size is 10526195 profiles. most profiles are added from cycle 45-55. It takes 01:45:47 to calculate it.

 

[00:01:13][01:13] Cycle 0 Rec:1 Tot: 1
[00:01:28][00:15] Cycle 1 Rec:3 Tot: 4
[00:01:29][00:00] Cycle 2 Rec:9 Tot: 13
[00:01:30][00:00] Cycle 3 Rec:19 Tot: 32
[00:01:31][00:01] Cycle 4 Rec:22 Tot: 54
[00:01:32][00:00] Cycle 5 Rec:19 Tot: 73
[00:01:32][00:00] Cycle 6 Rec:19 Tot: 92
[00:01:34][00:01] Cycle 7 Rec:28 Tot: 120
[00:01:34][00:00] Cycle 8 Rec:42 Tot: 162
[00:01:36][00:01] Cycle 9 Rec:42 Tot: 204
[00:01:37][00:01] Cycle 10 Rec:56 Tot: 260
[00:01:38][00:00] Cycle 11 Rec:62 Tot: 322
[00:01:41][00:02] Cycle 12 Rec:59 Tot: 381
[00:01:42][00:00] Cycle 13 Rec:61 Tot: 442
[00:01:43][00:01] Cycle 14 Rec:54 Tot: 496
[00:01:44][00:01] Cycle 15 Rec:62 Tot: 558
[00:01:46][00:01] Cycle 16 Rec:65 Tot: 623
[00:01:48][00:02] Cycle 17 Rec:72 Tot: 695
[00:01:51][00:02] Cycle 18 Rec:119 Tot: 814
[00:01:56][00:05] Cycle 19 Rec:158 Tot: 972
[00:02:02][00:05] Cycle 20 Rec:175 Tot: 1147
[00:02:08][00:06] Cycle 21 Rec:222 Tot: 1369
[00:02:17][00:08] Cycle 22 Rec:260 Tot: 1629
[00:02:41][00:24] Cycle 23 Rec:420 Tot: 2049
[00:03:01][00:20] Cycle 24 Rec:460 Tot: 2509
[00:03:25][00:23] Cycle 25 Rec:676 Tot: 3185
[00:04:05][00:40] Cycle 26 Rec:942 Tot: 4127
[00:04:43][00:37] Cycle 27 Rec:1392 Tot: 5519
[00:05:37][00:54] Cycle 28 Rec:2160 Tot: 7679
[00:06:45][01:07] Cycle 29 Rec:2926 Tot: 10605
[00:08:07][01:21] Cycle 30 Rec:3996 Tot: 14601
[00:09:24][01:17] Cycle 31 Rec:5253 Tot: 19854
[00:10:37][01:12] Cycle 32 Rec:6274 Tot: 26128
[00:12:09][01:32] Cycle 33 Rec:7510 Tot: 33638
[00:13:25][01:15] Cycle 34 Rec:9105 Tot: 42743
[00:14:54][01:29] Cycle 35 Rec:11467 Tot: 54210
[00:16:33][01:39] Cycle 36 Rec:14414 Tot: 68624
[00:18:06][01:32] Cycle 37 Rec:18199 Tot: 86823
[00:20:01][01:55] Cycle 38 Rec:23067 Tot: 109890
[00:22:06][02:05] Cycle 39 Rec:31836 Tot: 141726
[00:23:36][01:29] Cycle 40 Rec:52201 Tot: 193927
[00:25:15][01:39] Cycle 41 Rec:96875 Tot: 290802
[00:26:54][01:38] Cycle 42 Rec:181396 Tot: 472198
[00:28:42][01:47] Cycle 43 Rec:299476 Tot: 771674
[00:30:30][01:48] Cycle 44 Rec:433804 Tot: 1205478
[00:33:15][02:44] Cycle 45 Rec:572934 Tot: 1778412
[00:36:49][03:34] Cycle 46 Rec:700350 Tot: 2478762
[00:40:59][04:10] Cycle 47 Rec:791569 Tot: 3270331
[00:51:02][10:03] Cycle 48 Rec:833317 Tot: 4103648
[00:59:14][08:11] Cycle 49 Rec:821084 Tot: 4924732
[01:09:33][10:19] Cycle 50 Rec:756848 Tot: 5681580
[01:18:13][08:39] Cycle 51 Rec:677813 Tot: 6359393
[01:22:48][04:35] Cycle 52 Rec:587544 Tot: 6946937
[01:25:50][03:02] Cycle 53 Rec:499498 Tot: 7446435
[01:28:38][02:47] Cycle 54 Rec:423886 Tot: 7870321
[01:30:44][02:05] Cycle 55 Rec:362067 Tot: 8232388
[01:33:01][02:16] Cycle 56 Rec:316184 Tot: 8548572
[01:34:35][01:33] Cycle 57 Rec:276813 Tot: 8825385
[01:36:29][01:54] Cycle 58 Rec:238980 Tot: 9064365
[01:37:38][01:09] Cycle 59 Rec:207569 Tot: 9271934
[01:38:39][01:00] Cycle 60 Rec:180728 Tot: 9452662
[01:39:56][01:17] Cycle 61 Rec:154381 Tot: 9607043
[01:40:42][00:46] Cycle 62 Rec:130061 Tot: 9737104
[01:41:32][00:49] Cycle 63 Rec:112127 Tot: 9849231
[01:42:05][00:33] Cycle 64 Rec:96979 Tot: 9946210
[01:42:34][00:28] Cycle 65 Rec:82874 Tot: 10029084
[01:43:09][00:35] Cycle 66 Rec:70513 Tot: 10099597
[01:43:38][00:28] Cycle 67 Rec:61201 Tot: 10160798
[01:43:56][00:18] Cycle 68 Rec:52139 Tot: 10212937
[01:44:12][00:15] Cycle 69 Rec:44420 Tot: 10257357
[01:44:25][00:12] Cycle 70 Rec:37017 Tot: 10294374
[01:44:36][00:11] Cycle 71 Rec:31721 Tot: 10326095
[01:44:46][00:09] Cycle 72 Rec:27471 Tot: 10353566
[01:44:54][00:08] Cycle 73 Rec:24387 Tot: 10377953
[01:45:01][00:07] Cycle 74 Rec:21133 Tot: 10399086
[01:45:08][00:06] Cycle 75 Rec:18191 Tot: 10417277
[01:45:13][00:05] Cycle 76 Rec:15791 Tot: 10433068
[01:45:18][00:04] Cycle 77 Rec:13643 Tot: 10446711
[01:45:22][00:04] Cycle 78 Rec:11793 Tot: 10458504
[01:45:26][00:03] Cycle 79 Rec:10499 Tot: 10469003
[01:45:29][00:03] Cycle 80 Rec:8224 Tot: 10477227
[01:45:32][00:02] Cycle 81 Rec:6724 Tot: 10483951
[01:45:34][00:02] Cycle 82 Rec:5824 Tot: 10489775
[01:45:35][00:01] Cycle 83 Rec:5091 Tot: 10494866
[01:45:37][00:01] Cycle 84 Rec:4361 Tot: 10499227
[01:45:38][00:01] Cycle 85 Rec:3490 Tot: 10502717
[01:45:39][00:01] Cycle 86 Rec:2953 Tot: 10505670
[01:45:40][00:00] Cycle 87 Rec:2544 Tot: 10508214
[01:45:41][00:00] Cycle 88 Rec:2112 Tot: 10510326
[01:45:42][00:00] Cycle 89 Rec:1627 Tot: 10511953
[01:45:42][00:00] Cycle 90 Rec:1257 Tot: 10513210
[01:45:43][00:00] Cycle 91 Rec:1173 Tot: 10514383
[01:45:43][00:00] Cycle 92 Rec:960 Tot: 10515343
[01:45:43][00:00] Cycle 93 Rec:895 Tot: 10516238
[01:45:44][00:00] Cycle 94 Rec:945 Tot: 10517183
[01:45:44][00:00] Cycle 95 Rec:1143 Tot: 10518326
[01:45:44][00:00] Cycle 96 Rec:1144 Tot: 10519470
[01:45:45][00:00] Cycle 97 Rec:1099 Tot: 10520569
[01:45:45][00:00] Cycle 98 Rec:935 Tot: 10521504
[01:45:45][00:00] Cycle 99 Rec:695 Tot: 10522199
[01:45:45][00:00] Cycle 100 Rec:586 Tot: 10522785
[01:45:46][00:00] Cycle 101 Rec:438 Tot: 10523223
[01:45:46][00:00] Cycle 102 Rec:393 Tot: 10523616
[01:45:46][00:00] Cycle 103 Rec:346 Tot: 10523962
[01:45:46][00:00] Cycle 104 Rec:344 Tot: 10524306
[01:45:46][00:00] Cycle 105 Rec:363 Tot: 10524669
[01:45:46][00:00] Cycle 106 Rec:251 Tot: 10524920
[01:45:46][00:00] Cycle 107 Rec:232 Tot: 10525152
[01:45:46][00:00] Cycle 108 Rec:146 Tot: 10525298
[01:45:46][00:00] Cycle 109 Rec:134 Tot: 10525432
[01:45:46][00:00] Cycle 110 Rec:139 Tot: 10525571
[01:45:46][00:00] Cycle 111 Rec:106 Tot: 10525677
[01:45:46][00:00] Cycle 112 Rec:119 Tot: 10525796
[01:45:47][00:00] Cycle 113 Rec:64 Tot: 10525860
[01:45:47][00:00] Cycle 114 Rec:38 Tot: 10525898
[01:45:47][00:00] Cycle 115 Rec:29 Tot: 10525927
[01:45:47][00:00] Cycle 116 Rec:38 Tot: 10525965
[01:45:47][00:00] Cycle 117 Rec:51 Tot: 10526016
[01:45:47][00:00] Cycle 118 Rec:38 Tot: 10526054
[01:45:47][00:00] Cycle 119 Rec:52 Tot: 10526106
[01:45:47][00:00] Cycle 120 Rec:66 Tot: 10526172
[01:45:47][00:00] Cycle 121 Rec:15 Tot: 10526187
[01:45:47][00:00] Cycle 122 Rec:6 Tot: 10526193
[01:45:47][00:00] Cycle 123 Rec:2 Tot: 10526195
by Aleš Trtnik G2G6 Pilot (559k points)
If you need I have a table of all relations together in both directions, if you need it. But you can import that from the dump.

There is 9385364 Father relations, 8800907 Mother relations and 4160435 marriage relations.

You might also find WikiTree+ usefull for some analyses. Here are all Chris's ancestors. http://wikitree.sdms.si/default.htm?report=rep1&WikiTreeID=Whitten-1

You can also check Descendant numbers http://wikitree.sdms.si/default.htm?report=rep2&WikiTreeID=Dolgan-14

You have also some historical charts there http://wikitree.sdms.si/default.htm?report=stat1

If you need something more, Just ask.
+5 votes
Shawn, I'm stumbling upon this old conversation marked as related to our new "100 Circles" work. You might be interested and bring useful insights (and/or tools)

https://www.wikitree.com/wiki/Space:100_Circles
by Bernard Vatant G2G6 Mach 5 (55.6k points)

Oh, that's some neat work. And I'm not surprised to see a Puritan Great Migration profile, Samuel Lothrop, as the best connected profile found.

Related questions

+25 votes
8 answers
+12 votes
5 answers
+12 votes
3 answers
+13 votes
4 answers
+27 votes
2 answers
360 views asked Jun 2, 2018 in WikiTree Tech by Paul Gierszewski G2G6 Mach 5 (53.1k points)
+30 votes
8 answers

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...