Location: [unknown]
Surname/tag: CC7
Contents |
Introduction
When it was decided, in June 2022, that CC7 would be available for every WikiTreer profile, the "magic" of number 7 has been among the arguments to choose this value rather than CC6 or CC8. See the original discussion.
The magic has seemed to work. Only a few months were needed for some members' CC7 to skyrocket to unexpected heights. To date (Feb 9, 2023), 25 "Top Connected" have a CC7 over 10,000, and the champion Patty LaPlante is hitting the 30,000 wall. Amazing figures? Barely one thousandth of the current size of the Single Tree (28M+ connected profiles), but ...
Variation on the Birthday Paradox
Probability computing often leads to counter-intuitive results, like the so-called Birthday Paradox. The following computation is a variation on this problem, with a possible application to the seven circles magic.
Take two random samples S1 and S2 of similar size n from a population of size P. What is the probability that those two samples overlap, in other words, have a least a common element?
It is not too difficult to prove that this probability E is not smaller than the following expression.
E > 1 - (1 - n/P)n
Now, given P, what is the minimal value of n such as the two samples have "almost certainly" a common element? Let's define "almost certainly" as a value of E greater than 99.99%
Setting P to 30 million, the order of magnitude of the current Single Tree, let's compute this lower bound of E for different values of n.
n | 1 - (1 - n/P)n |
---|---|
500 | 0.008 |
1,000 | 0.03 |
2,000 | 0.12 |
5,000 | 0.56 |
10,000 | 0.96 |
15,000 | 0.9994 |
20,000 | 0.999998 |
30,000 | 1 - 10-13 |
The size of the two random samples having almost certainly a common element may seem surprisingly small compared to the total population : less than one thousandth of the total. This is the same kind of counter-intuitive result as the "Birthday Paradox", and actually the two problems are quite similar.
The last section "Mathematical Annex" provides further details for the curious reader.
Testing the seven circles randomness
1,000 to 10,000 is the typical range of many current values of CC7. As of Feb 8, 2023, about 16,600 members have a CC7 over 1,000, 4,700 over 2,000, 422 over 5,000, and 25 over 10,000.
The figures above show that over this interval [1,000 - 10,000], the probability of overlapping samples changes drastically from 3% to 96%. Consider our two top connected profiles Patty LaPlante and Jim Loden. Supposing their respective seven circles were random samples, they should "almost certainly" overlap, with a probability greater than 99.999%.
The above randomness hypothesis is of course wrong. Your seven circles are not randomly filled. However, if you have really expanded your circles in all directions and look at who is in the 7th circle, you might be wondering how you eventually got there in only 7 steps, and if it's not a really random sample, it somehow looks like it. But does it behave like it, regarding the above overlap problem? Do seven circles really overlap when they reach the five-digits range?
Overlapping seven circles for Patty and Jim would mean that there is at least a profile at distance 7 or less from both of them. Which means, in other words, that distance from Patty to Jim should be at most 14 (from the triangle inequality).
Guess what? To date, the distance from Patty to Jim is exactly 14! Their 7th circles do overlap, at least in the middle profile of the shortest path, namely Sarah (Ralls) Briscoe (1827-1888). Are there other profiles belonging to both 7th circles? Not sure. If you strike Sarah from the path, the Connection Finder proposes a new path of length 16 ..
Too good to be true? This is after all only one example, and we might have been lucky. Looking further on the distances between various Top Connected, we discovered a bit more magic:
- The distance from Patty to all US profiles with a CC7 over 10,000 is 14 or less.
- The US Top Connected themselves are typically at distance around 15 or less from each other.
- Distances from US Top Connected to other countries Top Connected are found in the following ranges :
- 17-22 for Canadians
- 21-25 for South Africans
- 19-23 for Australians
- Inside each national non-US cluster, distances are 13 or less between top South Africans, 13 between Canadians, 12 between Australians.
Those results are based on a very small set of profiles so far, but the number of profiles with a CC7 around or over 10,000 is growing, and we'll be able soon to consolidate or invalidate those results. We're still lacking European profiles, but they should come.
The provisional conclusion is that inside every geographical cluster, the seven circles seem to overlap for two profiles when their CC7 reaches the five-digit range, behaving like random samples.
Reaching the five-digit for CC7 should be possible for many, maybe most profiles, with work and time. We have already conjectured that Ten circles meet each other. But it might be that most of the time, Seven circles meet each other.
Is not that magic enough? There is more in the following section, for maths lovers.
Mathematical Annex
The probability of overlapping samples
S1 and S2 are two samples of same size n in the population of size P. Let's say S1 has been fixed.
The probability for the first element of S2 to be also in S1 is n/P. The second element is taken independently among the remaining P-1, with a probability n/(P-1) to fall in S1 ... and so on, up to the last element of S2 taken among (P-n+1), with a probability of n/(P-n+1) to fall in S1.
What we want to compute is the probabilty E of "at least one of S2 falling in S1". It's easier to compute the probability Ê=1-E, of the complementary event, "S1 and S2 are disjoint", meaning all elements of S2 have fallen outside S1. Ê is the following product of n factors, each corresponding to the probability for each successive element in S2 of not falling in S1. Ê = (1-n/P)(1-n/(P-1))...(1-n/(P-n+1)) Each one of the factors in this product is smaller than the first one, so the whole product Ê < (1-n/P)n. Hence the complementary E > 1 - (1-n/P)n
The critical sample size is SQRT(P)
In the numerical exemple we have fixed P to 30 million, and seen that the probability of overlapping samples was getting over 0.5 by n=5,000. This result can be generalized to any value of P, using a well-known limit calculus.
Let's take n and P in the above calculus such as n2=P or otherwise said n=SQRT(P) In that case, n/P = 1/n, and the expression simplifies into 1 - (1-1/n)n
(1-1/n)n is a well known example of converging sequence, its limit being 1/e, the inverse of the Euler number. This is a variant of the better known result : lim (1+1/n)n = e. See e.g., this video for a quick and cool demo.
Given large values of n used considered here, we can take this limit as a good approximation of Ê, with the corresponding value of E being slightly over 1-1/e, about 0.632. Such a value can be considered as a "moderately fair" probability.
This result gives a quick and easy to remember result : the probability of overlapping becomes fairly good when the size of the two samples passes over the square root of the total population.
For the time being and years to come, this critical value will pass 6,000 when the Single Tree gets to 36 million profiles, 7,000 when we get to 49 million connected, and so on. The number of connected profiles with a CC7 over the critical value SQRT(P) is likely to grow faster than the critical value itself.
The numbers lovers will of course appreciate to see the magic number 7 meet here the not less magic number e...
About the image
The image illustrating this page is the reproduction of the front page of the first volume of "Seven Circles", the official Periodical of the International Magic Circle, which was published from April 1931 until June 1934. More details can be found here at MagicPedia. This publication had several editors and contributors quoted in this page, with linked biographies, like Walter B. Gibson, John Northern Hilliard, and more. I could not find any of those people in WikiTree. Would be a good idea to create profiles and try to connect them.
- CC7 over 30,000 aka Seven Circles Magic Feb 9, 2023.
- Login to request to the join the Trusted List so that you can edit and add images.
- Private Messages: Send a private message to the Profile Manager. (Best when privacy is an issue.)
- Public Comments: Login to post. (Best for messages specifically directed to those editing this profile. Limit 20 per day.)
- Public Q&A: These will appear above and in the Genealogist-to-Genealogist (G2G) Forum. (Best for anything directed to the wider genealogy community.)
Now striking Sarah from the path there is a new, closer connection of 15! Strike mid again and the 16 is still there. – Patty
edited by Patty (Luker) LaPlante
m=2 gives E > 0.98
m=3 gives E > 0.9998
Thus with n more than 3×SQR(P) overlap becomes almost certain.
edited by Andrew Millard