# What do you recommend for the mathematics of genetic genealogy?

2.0k views

I have been reviewing what people are doing with their DNA results and their GEDCOMs. I have yet to try out the Genetic Genealogy Kit and affiliated tools, but I have tried out GRAMPS, RootMagic, Ancestral Quest, Genome Mate Pro, GEDMatch, and a few of Felix Immanuel's genetic genealogy tools.

I've been going through my library of mathematics and genetics textbooks including Schaums Outlines Genetics 4th ed, Snustad and Simmons' Principle of Genetics 4th edition, my references on vector analysis, linear algebra, and the python programming language. I haven't been able to find what I'm looking for.

The basic of it is that a person can be represented in a genetic genealogy by their DNA plus annotations like name, date, location, and events. In simplified formal notation, their DNA can be written as a physical measurement of an experimental subject. In physics, forces and force interactions or waveform interference can be written in terms of vectors. Physical measurements of physical systems can generally be written as vectors, so your DNA should be able to be represented as vector.

I want to do it this way because I want to be able to decompose the vectors into subvectors representing the contributions of genetics from other family members. This way my DNA can be effectively factored recursively into maternal vs paternal, maternal grandfather vs maternal grandmother, paternal grandfather vs paternal grandmother, and so on. You could then compare the factored or phased DNA to matches shared with other family members and determine immediately where in your family tree they must be. Likewise, you could use the vector representation of other people's DNA in order to automatically generate genealogies and check for intersections.

So what references do you all recommend for doing mathematical or quantitative genetic genealogies?

Update: For common reference.

COOP Lab at UC Davis:

edited
Let's suppose you have lots of cousins and get them all tested.  One in 8 will give you a Y match, your father's brothers' sons.

Half will give you an X match.  They're on your mother's side.

Those cousins will also give you loads of autosomal matches.  For simplicity we'll suppose that all the matches are through your mother.

So, looking at segment Blah1 to Blah2 on your Q chromosome-pair, if you have a match with maternal cousin Fred, you know one chromosome of the pair came from your mother.  But you knew that anyway.

You also now know that any other match on the same chromosome comes through your mother.

But you don't know which other matches are on the same chromosome.  The testing people say chromosome when they mean chromosome-pair because they can't separate the pair.  This is why they have to do fuzzy matching and triangulation.

Every segment of every chromosome-pair except XY will match cousins on both sides if you have enough cousins.  Which matches happen to exist in your sample isn't information, it's just a sampling artefact.  But those matches won't yield any new information about who is on which side.

RJ, This is a good question.

You begin with a false premise, based on what you have been told on this site.

"Every segment of every chromosome-pair except XY will match cousins on both sides if you have enough cousins"

The opposite is most likely true, especially for IBD, which can be traced to a unique common ancestor.

The reason for this is that that you may have a segment which matches a maternal cousin Fred, but on the paternal strand, part of the strand could be via the paternal grandfather, and part of the strand the paternal grandmother. Testing siblings, parents, aunts, uncles, and cousins, will identify those that have such a condition.

In these cases, the probability of the same segment matching more than one side is near 0%.

I would also like to correct you on your statement...

"This is why they have to do fuzzy matching and triangulation."

1st, DNA services, as far as I know do not include triangulation in their matching algorithm, they provide reports that allow you to create your own TG's.

2nd, I know that Wikitree likes to characterize the matching as "Fuzzy Matching', but that is not how it has been used, at least in the past, outside of wikitree. The logic which determines the endpoints has been described as using fuzzy logic, which is why different DNA services may report different end points.  The matching algorithms use what may be better characterized as "Educated guess" or "Prediction".

A 7cm segment may actually be a 5cM or 6cM segment.  This is why AncestryDNA encourages parents and children to test. Phasing the DNA Data works to eliminate the fuzziness, make the predictions more accurate, and extends the distance of the predictions.

A) "1st, DNA services, as far as I know do not include triangulation in their matching algorithm, they provide reports that allow you to create your own TG's."

?!?!? its easier to tell what you refer to.... think this is a never ending discussion.....

1) FTDNA just do segment matching based on size and total in common and don't display results lower than a threshold

2) Ancestry DNA have DNA circles that are secret but we can guess they use the family tree available,.... ==> the have triangulation somehow...?!?!?

3) 23andMe ?!?!?

4) ?!?!?

B) 7cm segment may actually be a 5cM or 6cM segment ?!?!?

Do you mean something that looks like a IBD is a IBS sounds less possible or do we have numbers on that?

Because everyone has two of each chromosome, the matching used is not precise.  See http://www.bishir.org/misc/alternatingdna.jpg

a1) FTDNA just do segment matching based on size and total in common. Yes, and this is not triangulation.

a2) Ancestry DNA has circles that are secret but we can guess they use the family tree available,.... ==> the have triangulation somehow

Here is the Help on AncestryDNA Circles

"DNA Circles show you which members share DNA with one another in the genome, but not where in the genome they share that DNA. This is because our studies of genetic inheritance and DNA Circles have shown us that individuals in DNA Circles very rarely share the same matching segments"

I have been told that on Wikitree, the term triangulation means they are part of a triangulated Group. AncestryDNA clearly does not use triangulation.  This clearly tells us that AncestryDNA does not use triangulation.

a3). 23andme does not use triangulation to determine what is a match, or prediction.  You can run reports that will provide you the data for you to determine what is and what is not triangulated, but they do not report on what matches are triangulated.

a4) ??

a5) "B) 7cm segment may actually be a 5cM or 6cM segment ?!?!?"

FTDNA and 23andme may report a 7cM because the endpoints are "Fuzzy", but when AncestryDNA takes that same Raw Data and phrases it, the fuzziness is nearly eliminated, and the more accurate result is 5cM.

These are both IBD because they are the same segment, but the fact AncestryDNA will phase data when available, it results in a more accurate results.  This is why AncestryDNA minimum is 5cm and the others 7cM.​

The reason for this is that that you may have a segment which matches a maternal cousin Fred, but on the paternal strand, part of the strand could be via the paternal grandfather, and part of the strand the paternal grandmother. Testing siblings, parents, aunts, uncles, and cousins, will identify those that have such a condition.

Comes to the same thing.  If you have a match with an unknown person, you'll need them to match a known relative to be able to find out which side of the tree they're on.  But then the answer is immediate and doesn't need any further analysis.

RJ, "Comes to the same thing." - Not on Wikitree.  Wikitree only accepts triangulation when there is a triangulation group which shares the same 7cm or greater.

1. Wikitree does not accept less than 7cm. We on Wikitree  can't say a person is related via one parent, based on evidence that a less than 7cM IBS segment absolutely did not come from the other parent. IMO, this is logic 101.

Outside of wikitree, I doubt many people will agree " If you have a match with an unknown person, you'll need them to match a known relative".  Simple logic tells us given only 2 choices, and we eliminate one, the other must be true. If I can prove the match is not my mother, then it must be via my father.

I would like to correct you on the following.

"If you have a match with an unknown person, you'll need them to match a known relative to be able to find out which side of the tree they're on.  Then the answer is immediate and doesn't need any further analysis."

Although I agree with this statement, it is not within the Wikitree guidelines or the comments made.  You can not just match. If this were the case, then you would not have to look at segments.  Even though you might have a completely documented connection to a cousin and you match that cousin, you have to find a third cousin who shares a triangulated segment. Why? it adds nothing when deciding which side of the family a cousin is related on.

Magnus,

If I match an unknown cousin on a 10cm segment, and my mother does not match on any part of that segment, the probability is that this segment came via my father. There are no Triangulated Groups involved. Just to be clear, you disagree because it seems you are still supporting the claims

1. "If you want to know which side of your tree somebody is on, you need a triangulated 3-way match with another cousin."?”

If we agreed that this was a smaller IBS 6cM segment, and the mother did not share this 6cM segment, are you still supporting the claim…

1. Do you still disagree that segments from matches which are IBS segment are probably related to the son via his father?”
2. Every segment of every chromosome-pair except XY will match cousins on both sides if you have enough cousins. Even though segments on one strand came from the same paternal grandparent, but the other strand was split between the maternal grandfather and maternal grandmother.
Like Ancestry care if your tree is all wrong?

The Smiths, Browns and Joneses each have a son and a daughter.

John Smith marries Mary Brown, John Brown marries Mary Jones, John Jones marries Mary Smith.  Each couple has a child called Zebedee.

Each Zebedee shares a lot of DNA with the other 2.  But the other two are on opposite sides of his tree, even though they share a lot of DNA with each other.

And of course there is no common ancestor, and no pedigree collapse.

23andMe has recently added a triangulation groups system for open profiles. The system shows both In Common With indirect matching (Ancestry.com style DNA circles) and lists what ICW matches also form triangulation groups.

--------------------------------------

I figure that we should explicitly state something that is basically important: If you share a segment of DNA with someone then they are probably related to you; segments over 7 cM are more likely to be positive matches than to be false matches, and segments over 10 cM are almost certainly not false positives (ISOGGWiki).

You don't need triangulation groups to establish that you are related to someone by DNA comparison in some way. Sharing more than 15 cM can be considered to be almost certainly a direct genetic relationship.

Triangulation groups are necessary for establishing probable most recent common ancestors; I need my two sibling's and my DNA to indirectly establish my mother as our common ancestor, or I need one sibling, my mother, and my own DNA to directly establish that my mother is our common ancestor by a triangulation group. I need one sibling or my mother, one of either my maternal aunt or my first cousins by my maternal aunt, and my DNA to indirectly establish either of my grand parents as a common ancestor. This logic extends up through all genetic ancestors but not necessarily for every genealogical ancestor (See the UC Davis genetic genealogy blog in the OP for details)

Ian,

I have been working on this - mentioned it to Andreas West a week or so ago.

It would be a matter of probabilities, I match these cousins, who also match my Dad's Phased Data on this chromosome, at this location and based on how much we all match have a probable amount of DNA I may have inherited back to our MCRA. The out-put would be in the form of a fan chart which WikiTree would auto-populate with the Ancestor names. This would be a huge help in Adoption work.

Mags
by Mags Gaulden G2G6 Pilot (663k points)
selected
I have a couple of genetic matches to people who were adopted, and I would really like to figure out what branches of my family they relate to. Which is why I've been trying so hard to figure out the mathematical model for this search and sort method.

I had thought about a fan chart; I think that is the best way to represent whole genome sequences or to keep track of exome results like what has been derived by labs like 23andMe and FamilyTreeDNA. The structure I would like to find is the one which shows what fragments of DNA I got from who.

Think of it like this. At me, the structure would ideally have 100% of my DNA exactly as it is. At my parents, they'd each have roughly 50% of my DNA representing the portions they passed on to me; this would be basically my phased DNA showing exactly what I got from my father and exactly what I got from my mother.

Normally in figuring out what my mother and father are going to pass on to a child it is a matter of some randomness and probability, but in the case where we're examining me, my DNA, and my parents and their DNA there isn't strictly a probabilistic relationship to be concerned about; we should be able to use strict differences to deduce what actually happened as compared to what could have happened from the actual measurements.

In practice for figuring out where distant cousins go in the family tree, I do think it would be a probability or at least a degrees of truth problem written in statistical or fuzzy logic.

A bonus to making the kind of map that I am thinking about is that we'd eventually see what DNA survived from my ancestors to me and see what is missing from the puzzle. With enough people represented in this same kind of structure, we could start to see where the pieces fit together, so we could reconstruct the whole genome sequences of common ancestors that we don't necessarily know. To me that would be useful for determining where I fit in the global family graph, and I imagine it would be similarly useful to other people looking for how they fit in.
Ian, I am presuming that the Adoptees are on gedmatch.  If this is so, then I would suggest using the Tier 1 Lazarus support. See if you can create kits using known relatives and see if any of those kits provide any insight.

One way to experiment a little is to include in group 2, those kits that are in both your kit and the adoptees.  Remember, you are only looking to establish which parent, not which common ancestor.  You narrowing down the possibilities, one generation at a time.
Yes Ian - and using the known phased data and the known matches to that phased data and knowing the segment locations and %'s using probability you could forecast/determine a reasonable theory of where YOUR random inheritance falls. Mags
Ken, the first problem I have to solve is which side of my family they are related to me from. For my maternal grandmother's side of the family, I know with relative certainty that one of the adoptees is not related to me by my great grandparents or lower; I know all my great aunts and uncles and all their children and all the great grandchildren. There's a distinct possibility that they are related through my maternal grandfather's side of the family, but I have put figuring that out specifically on hold until I can rule out the more difficult case: they are related to me through my father's side of the family.

I know very little about my father's side of the family relatively speaking. I barely have my relatives documented out to my paternal grandparents. One of the adoptees shares X chromosome DNA with me, so I can generally assume she is a relative on my mother's side, but the main adoptee that I want to help shares only autosomal DNA with me, so it is ambiguous as to where they are in my family.

In order to figure out their parents, I need to figure out which side of the family I need to look on. From there I need to figure out our most recent common ancestor. From the most common recent ancestor, I can then trace down the line to the adoptee and at least one of their parents; the adoptee has already found their parent of record at least under a pseudonym, and from what is known of their father, I am related to them through their mother.

I'll keep the Lazarus kits in mind for this, but the Lazarus kits depend on having solved more basic problems.
I like your model Ian - and of course it should work. Magnus and Ken are smart cookies and they've pointed out some issues. Nevertheless the data should tell the story. And that for me is the rub - the data aren't necessarily there yet. Despite the seeming precision of these tests we don't know the error rate or variation in results due to testing procedures, lower level data sorts, different tolerances, thresholds, or magnitudes for categorizing the lower-level data, etc. None of these data from these tests are ready for the precision of the 'exacto knife' of a model you have in mind presently. Even the underlying proteins themselves appear to behave in unpredictable ways so while I am hopeful better models will be developed for predictive as well historical reasons I'm not sure were aren't stuck in the Sherlock Holmes era for a bit longer. I would think Mathmatica might do some interesting things with the data but you are likely to have to rely on statistics and categorical analysis for the state of the art.

I get the issues with the available data. But the precision isn't so much the issue anymore, and as time goes by, it is going to become less the issue. Error rates are entering into the 1% range and rapidly diminishing for individual genetic tests. Comparison between old kits and new kits or between standard kits and custom kits are the major problem at the moment.

With the cost per genome rapidly approaching 0, the issue of precision or lack of data is going to effectively go away entirely. For my personal case, I have most of my immediate family members totally on board for genetic sequencing and analysis, so I am not concerned about not having access to the minimum data necessary to pull apart my genome and figure out deductively and experimentally where I got what from whom. To me, it is simply a matter of finding and learning to use the correct tools. Or inventing them where they don't yet exist.

For me the major issue isn't the reliability of the specific genetic testing kits though. What I want is the basic mathematical model for the simplest case: the generalized family tree without pedigree collapse.

That model isn't going to depend on any of those factors, and we can actually use deviations from the simplest model as a way to infer information that wouldn't otherwise be obvious.

The mathematical model can be constructed without actually depending directly on any given test or precision. The data may not be present or up to the required precision, but we have the basic theories for vector analysis, physical measurement, computer coding, and genetics. The mathematical theory of genetic genealogy can be written before we have the data to test the theory of genetic genealogy. Data developed later can then be used to test the theory and possibly refute it or some of its assumptions.

The basic structure is actually already relatively well known: "[Identical by state data] may be useless in identifying the common ancestor but it [is useful in] an iterative process of building a decision tree [...] based on probabilities." -Ken Sargent

Ian,  I recommend you follow some of postings of the Coop Lab at

https://gcbias.org/2013/11/04/how-much-of-your-genome-do-you-inherit-from-a-particular-ancestor/

Sincerely,
Can anybody produce an example of this kind of argument producing any non-obvious non-circular results?
Hello RJ,

Thanks.
Thanks, Peter.
The problem, as I see it, is that only a sampling of a given person's genome is tested, so that while getting a statistical measure of relatedness for a few generations is easy enough, it will only work definitively for a very few generations.  Further back patterns will exist for areas of the genome, but since they are characteristic of a large number of people in a given area, it isn't possible to do what you suggest if there is a very large tested group.  I've been looking a bit for a good source of technical explanations but so far I mostly see stuff for the non-technically oriented.  But Wikitree is a large group and I'm pretty sure actual articles will be cited here which don't take a bunch of money to read.  Meanwhile I have more to do here than I can get even started on. But I'll keep an eye on you to see if you get a handle on how to handle things.
by Living Dardinger G2G6 Pilot (452k points)

I believe what you are saying is true but some clarification is necessary.

Using your conclusion: "This way my DNA can be effectively factored recursively into maternal vs paternal, maternal grandfather vs maternal grandmother, paternal grandfather vs paternal grandmother, and so on. You could then compare the factored or phased DNA to matches shared with other family members and determine immediately where in your family tree they must be. Likewise, you could use the vector representation of other people's DNA in order to automatically generate genealogies and check for intersections."

My two brothers also have been DNA tested, and without a tree, we can make certain conclusions about the source even though we can't specifically identify which parent or another ancestor is the source.

For example, if the two oldest brothers in my family share a segment, but the 3rd brother does not, it means that the 3rd brother received that particular segment from a different paternal grandparent and a different material grandparent than the other two.

If a cousin matches the 3rd brother, then you can also make some assumptions that about the grandparent of the 1st two brothers. We presume that the well-documented tree is correct, and by using that tree, it is determined that this match is via his paternal grandfather, you can presume that other matches on that same segment to only the two oldest brothers was inherited via your paternal grandmother.

I can make this presumption because I know my parents share no segments.

A tree and DNA are mutually dependent on each other for the answers we are asking  A tree and DNA are either consistent with each other or they are not.  DNA alone can not independently Confirm nor Prove particular relationship. It can only further support or refute an existing claim.

by Ken Sargent G2G6 Mach 6 (64.1k points)

"If a cousin matches the 3rd brother, then you can also make some assumptions that about the grandparent of the 1st two brothers. We presume that the well-documented tree is correct, and by using that tree, it is determined that this match is via his paternal grandfather, you can presume that other matches on that same segment to only the two oldest brothers was inherited via your paternal grandmother.

I can make this presumption because I know my parents share no segments."

This is exactly the kind of logic I am interested in. Even if your parents do share some segments in common, we can filter those up to a point by checking the near generational relevance of the shared segments; if your parents share 5cM for example then that probably isn't going to be an issue, but if they share >7cM then it will almost certainly be an issue for this kind of inference.

Thanks for the clarification.

IMO, you have to understand the synergistic relationship that exists between genetics and genealogy.  For example, if you test 2 people who are predicted to be in a child/parent relationship. Without any additional information, you don't know who is the parent and who is the child. You may have to introduce valid genealogical information which answers the question, which donor is older/younger. You are relying on nonDNA data to make this distinction.

But if you incorporate additional tests, you should be able to conclude which is the older/younger and then compare against the genealogical information to see if the results are consistent/inconsistent.

I haven't had the time to pursue this other than spot checks, but I measure success differently than others and treat unverified relationships differently as well.

I have also tested both my parents, 2 siblings and a few uncles. I obviously have virtually no problem distinguishing which parent my matches fall onto since they have been tested.

I do have a good idea of the matches which are on my maternal grandmother's side even though she has not been tested. http://www.wikitree.com/wiki/Gillis-418. Her Gedmatch id is LL747479.

I treat these a probably related to my mother via her mother.

Also, anyone in group 2 (cousins of my grandmother) can begin to create kits for ancestors up to but not including their common ancestors by reversing the groups.

There is a lot more that can be done, I hope you get the idea.

My paternal grandmothers side of the family came from a very specific area of Poland.  I realize that some people may object to this, but I have no problem creating Lazarus kit for her which includes cousins born in Poland with family histories limited to just Poland. My paternal grandfather's side has been in New England for generations via the United Kingdom.

I don't care at this point the exact relationship between DNA Testers. I am only interested in narrowing down which parent on a particular profile that connects me to a match.

Here is my exchange with Ann Cousin aka DNACousins.

I asked for clarification on some things to make it clearer but this morning I told her it was not needed. I understood why she answered as she did but just reading the response.

1. Conceptually, a match for a son not found in his mother can be attributed to his father. This includes IBD and IBS, but not IBD.  I have no problem limiting those matches (without triangulation), to only those that include an IBD segment, which is all I initially intended.

2. Given the answer to #1 only requires a match, it indicates that triangulation is not necessary. Since triangulation is only used to find common ancestors for those without a tree, she interpreted the question that way.

Do I really have to ask Ann to clarify her last statement by telling her that the Wikitree technical group believes that triangulation is used for something other than finding the common ancestor? Do I really have to say to her that the Wikitree Technical members are not convinced her answer to #1 because "If you want to know which side of your tree somebody is on, you need a triangulated 3-way match with another cousin. "

I tried to ask the question so not to bias her answer but I should have noted that Wikitree places a triangulation requirement on more than finding common ancestors.

From: Ann Turner

Sent: Saturday, June 04, 2016 4:32 PM

Subject: Re: I am hoping you will clear up a disagreement.

I've been wishing I could spend more time on WikiTree, but it seems like there's always something else demanding my attention.

1) Conceptually, a match for a son not found in his mother can be attributed to his father. There are a couple of "gotchas", though. The segment must be long enough that you can rule out a coincidental match. There's no consensus on how long that should be. And there is also a possibility of a false negative in the mother, e.g. at FTDNA (which requires a total of 20 cM, including small 1-3 cM pseudo-segments), AncestryDNA (with its TIMBER algorithm discounting some segments) and 23andMe (with a cap on the number of DNA Relatives). GEDmatch lets you look at everyone through the same lens.

2) There's also no consensus on whether you "need" a triangulated group. AncestryDNA uses more of a network approach. I wrote up some material about how difficult it is to assemble TGs here:  http://tinyurl.com/TheTroubleWithTriangulation.  But if you have the good fortune to get a triangulated group with pretty robust segment sizes, I do think it's possible to attribute it to a specific ancestral couple if it's not too many generations back. When you go back many generations, that brings up the possibility of multiple lines of descent.

Hope that helps,

Ann

On Sat, Jun 4, 2016 at 9:48 AM, Kenneth Sargent <msnkjsargent@msn.com> wrote:

Hi Ann,

I’ve been spending too much time on Wikitree, devoted almost entirely to the discussions on DNA. I suspect that Wikitree is the best source for publicly available documented trees but the discussions are not at the level as 23andme used to be. I was hoping to ask you two basic questions and get your permission to post your response. We are discussing the “mathematics of genetic genealogy”.

Your responses to these questions could significantly affect how Wikitree users think about how to use DNA in their research.

Scenario: We have the raw data for a mother and a son available to us for customization.  There are matches to the son, that are not matches to his mother. More specifically for these matches, the segments are shared with the son, but none are shared with the mother. Since the data can be phased, I am presuming the process could phase the data first.

1. Is it possible, using the data available, in these cases, that a match to the son, and not the mother, is probably related to the son via the father?

2. Do you agree “If you want to know which side of your tree somebody is on, you need a triangulated 3-way match with another cousin. “ FYI – a triangulated 3-way match” means part of a Triangulated Group. You don’t have to go further than yes or no, but feel free to comment.

Thank you

by Ken Sargent G2G6 Mach 6 (64.1k points)
Actually I wasn't talking about WikiTree when I said that.  But surely if you have a 3-way triangulated match with the son, WikiTree doesn't also demand one for the father.  Most ancestors are inaccessible.

RJ, the scenario we have been using only involves 3 people who are not all biologically related to each other. The son and the mother are related to each other, but the DNA Cousin is only related to the son.  There is no triangulation match with the son. It is a simple match which contains IBD AND possibly IBS Segments.

Given there is NO TRIANGULATION in this scenario, I maintain "you need don't need a triangulated 3-way match with another cousin." which is directly contrary to your assertion. This same principle that is applied to Wikitree requirements that only the approved method of the confirmation of a father or mother requires triangulation.

I am not sure what you mean by "Most ancestors are inaccessible". We are not looking at any ancestors of the son other than the mother and father.

It seems that you believe (and wikitree) that you have to know the common ancestors in order to determine if a match is related to the father or to the mother in every case.

But how do you know which side of your tree the son is on?
The DNA Cousin does not know which side. Not with these three tests. Only the son knows that he is related via his father to the DNA Cousin.
But if you're coming at it from that direction, the sticking point is which side of the father's tree the match is on.  The son isn't the problem.  Going downwards is easy.
The result is not affected by "coming at it from that direction".

The stated problem is to determine the son's side of the family. Determining the DNA cousins side of the family or the father's side of the family are completely different scenario's.

I still assert the son is probably related to the cousin via the son's father. You and others deny this is true.
Of course I'm not denying it.  It's obvious.  It's not the question being asked.

Based on your last post, I will assume some misunderstanding.

I think it important then to identify the problem with communication in this case.

1st The title implies a more technical discussion on “mathematics of genetic genealogy?” in which a higher level of precision is assumed.

2. You stated, “If you want to know which side of your tree somebody is on, you need a triangulated 3-way match with another cousin.”

This has really been then the focus of the exchange. To Ian and I, this was obviously false.

Ian provided examples that contradicted this proposition by providing examples of showing which side of your tree somebody is on, without a triangulated 3-way match with another cousin.

3. I provided a simple scenario of the son, mother, and cousin where the son is related to the cousin via his father and repeatedly used it to show my point. I took your responses as denials. This is also without a triangulated 3-way match with another cousin.

I thought I was very specific about the scope of my statements. Please understand that I am unclear about what you believe is obvious.

Do you still believe…

• If you want to know which side of your tree somebody is on, you need a triangulated 3-way match with another cousin.”

Because if you still support this, then you can’t believe

• the son is probably related to the cousin via the son's father” because there is no triangulated 3-way match with another cousin
The following is a pseudocode rendering of an algorithm for determining genetic relationships between anonymous or pseudoanonymous genetic samples.

First step: determine which side of the family a match is for your genetic genealogy via comparison to yourself and at least one parent; mitochondrial matches are going to be strictly along your matrilineal relations; X chromosome matches can be effectively treated as being strictly maternal for XY karyotypes but maybe paternal for non-XY karyotypes; Y chromosome matches can be effectively treated as strictly paternal.

Second step: repeat the above for n matches to create a pool of sorted matches which have been determined to be on your father's side, your mother's side, both, or neither (you might have matches due to mutation). The choice of the size of the pool, n, needs to be based on standards for statistical significance.

Third step: find all matches that share a sex-linked segment and an autosomal segment. These are weakly patrlineal (Y Chromosome), weakly matrilineal (mitochondrial), or weakly maternal (X Chromosome) autosomal matches; there is a probable relationship of inheritance between.the autosomal match and the sex-linked match; this is a correlative relationship but not necessarily a causal relationship. This group of matches are useful for figuring out what autosomes to target first in the search and sort.

Fourth step: analyze the matches and sort according to probable degree of relationship. Naively, order the sorts according to cM lengths, the number of shared segments, and total shared cM lengths; a more sophisticated algorithm for determining probable degree of relationship can and should be be used.

Fifth step: diagram yourself at the center of a bifurcated polar coordinate system with the sorted matches plotted to intervals representing the range of probable degree of relationship over the rings radiating out from you on the appropriate side of the map. Mother's matches on one side and father's matches on the other side; I would probably exclude plotting the both or neither matches for now. The idea is to find clusters of matches that match each other and graph those clusters according to their probable degree of relationship; by graphing their probable degree of relationship to you and their probable degree of relationship to each other, you create a relative topological reference of distance and connection.

Sixth step: find all triangulations between you and your mother's matches; find all triangulations between you and your father's matches. Mark the abstract relationship of you, your mother, and your match's most recent common ancestor; at this point, the graph should begin to show a structure of relationships resembling a familiar genetic genealogy; it will likely be incomplete and will have islands of disconnected relations.

Steps beyond this really depend on what you want to accomplish. The islands can be recursively connected by performing steps 1 through 6 for each child-parent pair you can find among your matches. There's a critical threshold of matches that would result in a chain reacting algorithm that would tend towards total connectivity.

A DNA cousin can know which side of the family I am on by the mirror image of the process by which I discovered what side of the family they are on.

To determine what side of my father's family tree a given match is more information is required. In my case, I basically do not have access to my father's DNA directly, so the best I can do is phase my DNA with my mother and my siblings to composite my father's DNA via Lazarus kits or similar.

However, I can also take all of the cousins that I am able to discern are not related to my mother and composite their DNA matches with me as well into the image of my father's DNA; I don't know what side of his tree they all are on, but I don't need to know either because I only need to know that they are not on my mother's side of the family tree. I can composite a functional image of my father between my siblings, my mother, my father's pseudonymous genetic relations, and me; the issue then is to determine his mother or father's DNA. Obviously, his mother's DNA can't be fully reconstructed without a genetic kit from his daughter, maternal sisters, maternal aunts, or maternal uncles because I probably share no X Chromosome or mitochondrial DNA in common with my father. His father can be partially reconstructed sans X chromosome and mitochondrial DNA because I share upwards of 1/4th my autosomal DNA and almost my whole Y Chromosome in common with him, but again, we would need genetic kits from my father's paternal sisters, paternal aunts, paternal uncles, or daughter. Without going through all that, I would guess that we can use my DNA and my father's partially reconstructed DNA to sort paternal DNA cousins into probable pools of paternal grandfather and grandmother matches by those who do not share DNA with me and my partrilineal uncles or paternal XY-karyotype first cousins.

Though we are now getting into why it is important to derive the mathematical genetic decomposition or "factorization" of people for comparison. The determination of which side a DNA cousin lies on of my father is answered by how the pieces fit together to form a completed puzzle and depends on the mathematical decomposition of a genome into a quantitative genetic genealogy; unlike a common puzzle where each piece has a unique fit, this puzzle can be assembled multiple ways from multiple other puzzles. The classification and categorization of which are methodically significant and mathematically possible.

401 views
300 views
202 views
713 views
420 views