What do you recommend for the mathematics of genetic genealogy?

+14 votes
1.6k views

I have been reviewing what people are doing with their DNA results and their GEDCOMs. I have yet to try out the Genetic Genealogy Kit and affiliated tools, but I have tried out GRAMPS, RootMagic, Ancestral Quest, Genome Mate Pro, GEDMatch, and a few of Felix Immanuel's genetic genealogy tools.

I've been going through my library of mathematics and genetics textbooks including Schaums Outlines Genetics 4th ed, Snustad and Simmons' Principle of Genetics 4th edition, my references on vector analysis, linear algebra, and the python programming language. I haven't been able to find what I'm looking for.

The basic of it is that a person can be represented in a genetic genealogy by their DNA plus annotations like name, date, location, and events. In simplified formal notation, their DNA can be written as a physical measurement of an experimental subject. In physics, forces and force interactions or waveform interference can be written in terms of vectors. Physical measurements of physical systems can generally be written as vectors, so your DNA should be able to be represented as vector.

I want to do it this way because I want to be able to decompose the vectors into subvectors representing the contributions of genetics from other family members. This way my DNA can be effectively factored recursively into maternal vs paternal, maternal grandfather vs maternal grandmother, paternal grandfather vs paternal grandmother, and so on. You could then compare the factored or phased DNA to matches shared with other family members and determine immediately where in your family tree they must be. Likewise, you could use the vector representation of other people's DNA in order to automatically generate genealogies and check for intersections.

So what references do you all recommend for doing mathematical or quantitative genetic genealogies?

 

Update: For common reference.

COOP Lab at UC Davis:

 

in The Tree House by Ian Mclean G2G6 Mach 1 (13.6k points)
edited by Ian Mclean
I don't think the info is complete enough.  Your DNA isn't labelled with where it came from.
Isn't labelled with where it came from? You mean other than from me.

Do you mean what kind of lab or the state I was born in? The location data for vector decomposition of my DNA sequence is optional as far as I know. Relies only on logical relationships between me and my parents and our common relatives.

The decomposition of the vectors is possible by comparison to other known vectors like the DNA results of my chromosome matches. Not all the decompositions are going to be unique at first. That's why you triangulate your data with relatives. The more triangulations you can make in your genetic genealogy, the more unique decompositions you can make and the more definite your genetic genealogy becomes.

OH. "You could then compare the factored or phased DNA to matches shared with other family members and determine immediately where in your family tree they must be. Likewise, you could use the vector representation of other people's DNA in order to automatically generate genealogies and check for intersections."

You think I mean where people are in the world in my family tree. Whereas I meant their abstract relationship to me in terms of the family graph not their GPS location. That's something which sleuthing through government records and family albums will solve more readily.

I meant, you typically have two of each chromosome, and the testing process works with a mixture.  You can match somebody on the basis that you share enough peculiarities, but the test doesn't know if they came from the sperm or the egg.  And if you match two people on the same segment of the same chromosome-pair, you don't immediately know whether both matches are on the same chromosome, or on one each.

I figure you'll need a lot more data to map out the DNA than to draw the tree.
I really think you'd need a genetics lab with staff and a supercomputer to tackle this. Apart from the X and Y chromosomes I don't think it's possible to even determine whether a specific part of an individual's DNA comes from their mother or father. It's usually expressed as "random" which copy of a gene you get. Now, while there is clearly randomness, I am sure there must be "clustering" where certain regions are copied from one parent or the other just simply due to three-dimensional molecules splitting and recombining. However this seems way beyond the resources of a lone researcher.
It isn't as complicated as all that.

With my mother's DNA and my father's DNA results in hand, I can do what is called phasing. Where I compare my DNA to their DNA and I see what DNA comes specifically from my mother and what DNA comes specifically from my father. This part of the vector decomposition is easy relatively speaking as long as we neglect mutations and transcription errors. My mother gave me either one of her X chromosomes whole or she gave me a combination of her X chromosomes; Schaum's Outlines for Genetics 4th edition page 150 has a handy diagram of what is shared along the X and Y chromosomes, and there are what are called sex-linked genes that are passed strictly from one parent to their child. X chromosomes have a large segment that is non-homologous or completely sex-linked; if I share segments from that part of my X chromosome with someone else then it is most probable that I share matrilineal ancestors with them. This goes both ways for male to male comparisons such that if we share X chromosome segments from the completely sex-linked region then we have to share a common matrilineal ancestor.

These facts can be used in vector analysis in order to differentiate at least some genetic matches into a pool of most probably matrilineal common ancestors and least probably matrilineal ancestors. If I have my DNA phased with both my parents then I can actually positively identify which portions of my DNA came from which parents, so if I share segments of DNA with someone else then I have to share them through that parent.

There will be a lot of DNA that is in common between parents, and the problem is worse for those populations where the parents are close relatives, but for a lot of people, you can sort the DNA data into shared by both parents, shared by father, shared by mother, and shared by neither parent.

This would be a mere novelty in genetic genealogy except that my cousin's cousins are my cousins and my mother's cousin's cousins are her cousins roughly speaking. Because you can sort shared genetic DNA in this way, you can triangulate cousins against each other using your completely sex-linked differentiated shares as a control group for sorting shared autosomal results onto one side or the other of your family tree. If you do vector decomposition and use phased genetic analysis then you can actually sort your autosomal DNA into different pools; you got autosomal DNA from each of your parents, but it is unlikely you got the same autosomal DNA from both, and it is unlikely they got the same autosomal DNA from their parents, and so on. Theoretically, you can search and sort your autosomal DNA via phasing, triangulation, and chromosome mapping according to which parent, grandparent, great grand parent, etc gave it to you.

This can all be done using common scientific programming libraries like Anaconda for python. In fact, this can all be done in such a way that much of this becomes a push-button-get-results kind of application.

Here's a rough pictorial representation of what I am talking about. In this diagram, I or any xy-karyotype am the root of the tree, yamx; I inherited chromosome y, chromosome x, mitochondrial DNA, and the autosomal chromosomes 1-22. My father contributed my chromosome y and a portion of his autosomal chromosomes, a. My mother contributed chromosome x, my mitochondrial DNA, and a portion of her autosomal chromosomes, A. My autosomal DNA is a combination of my mother and my father's Aa.

This is a simplification of the situation because there is some crossover from my father's x and y chromosomes. And the exact composition of my mitochondrial DNA and X chromosome from my mother is not captured in the diagram. But it serves as a rough outline of a basic model.

If my DNA is represented by a vector, I, then we might model it as [y a m x]=I. Mother * Father = [0 a_0 m x] * [y a_1 0 0] = I where * is a reproduction operator. Could probably do it as some form of bra-ket: <Mother|Father> = <I>.

"There will be a lot of DNA that is in common between parents" ?!?!? do you mean pedigree collapse is common?

I did a GEDmatch Segment Triangulation and lesson learned is that maybe I have a sticky segment that makes the equation more difficult as you need to know those sticky segments as they mess upp things. I had 94% (3540 out of 3749) of the segments matches on Chr 15 see link.... the implication of this is that you need a a lot of data to find those sticky segments and that's why e.g. Ancestry has an advantage

""There will be a lot of DNA that is in common between parents" ?!?!? do you mean pedigree collapse is common?"

Statistically speaking over the entire length of human history, yes, pedigree collapse is common. That's why human beings generally share the majority of their genome in common.

Though in this case, I wasn't referring as much to recent (within 10 generations) pedigree collapse. I am just referring to the fact that if we were working with whole genome sequences that comparing the sequences of anyone together base-by-base would produce more in common than different. Geneticists and genetic genealogists have to pick and choose what genetic information to examine and compare, so we choose hot spots of variation in the human genome. What amounts to about 2-3% of each of our genomes.

So for example, suppose you're comparing distant cousins with unknown relations to your parents. If your family graph was strictly a tree--no pedigree collapse anywhere in it--then the cousins would strictly be sortable as either paternal cousins XOR  maternal cousins. However, with any quantity of pedigree collapse in your family tree anywhere, some of your cousins are going to be both paternal and maternal cousins because you and them will share a distant common ancestor that is also shared by your mother and father.

Under conditions approaching the no-pedigree-collapse-model for XY-karyotypes, finding a match on your X chromosome in the nonhomologous portion conclusively means that person is related to you strictly through your mother and not your father. However, if your father and mother share a distant maternal ancestor then some matches in that region will be from that distant maternal ancestor, so you will find cousins on your father's side who share X chromosome matches with you despite the fact that your father didn't pass on his X chromosome to you. In practice for XY-karyotypes, you can generally assume X chromosome matches are probably maternal relatives because the probability of them being on your father's side is low especially for matches below 7 cM.

Huston Huston we have a problem Identical_by_state

Identical by state vs identical by descent is roughly analogous to correlation vs causation. We can represent correlations in vector analysis as single direction vectors; causations can be represented as bi-directional or bijective vectors. That's actually a major part of what we're supposed to be doing when we're looking at potential matches. Finding a correlation is a starting point; the process of triangulation creates a categorical system of correlates that should knock out mismatches especially for measurements larger than the margin of error in measurement and in comparison.

If A correlates with B and if B correlates with C and if C correlates with A then A = B = C; this condition is a kind of mathematical closure which is why creating polygons like triangles is important.

The vector model that I am interested in should strictly differentiate IBD from IBS, and in fact, the vector decomposition process should explicitly return strictly the IBD data from comparisons.

Identical by state vs identical by descent is roughly analogous to correlation vs causation

Hm IBD = 100% correlation. IBS = 0% correlations

But how do you know if they correlate? There is the rub

IBS does not equal 0% correlation. IBS, IBD > 0% correlation. If there's 0% correlation then we can conclusively say that there is no match or that two or more segments are not equivalent or not equal.

The presence of a positive correlation suggests the possibility of a causative relationship but is not enough by itself to be conclusive. The absence of a positive correlation is sufficient to conclude no causative relationship has been found.

IBS ==> there is not 0 correlation but its random ==> useless... ==> back to square 1

correlation defined The tendency for two values or variables to change together, in either the same or opposite way

 

Magnus, I have to disagree that IBS is useless.

It may be useless in identifying the common ancestor but it an iterative process of building a decision tree (for lack of a better analogy) based on probabilities. These probabilities effect our confidence level.

If I know that neither of my parents share any segment greater than 3 cM, then the probability of match that includes a 4cM segment came from the same parent as the segment(s) which caused the match.  

Because I tested both my brothers, there are segments that overlay, some 4cM, but when combined into a new Lazarus Kit, those segments are joined to form a new larger segment, providing further evidence of its usefulness.

Using just DNA, you can begin to create Parent A, Parent B, Grand Parent A, Grand Parents B, etc. without assigning a gender to these ancestors. Once you do assign a gender to one, there should be a domino effect in assigning a gender to others.
Re Ken: The future will tell if this bigger segment is true or not ;-)

The problem I think we will run into is to identify the "new" thing called sticky segments ==> you need to have all the data that gedmatch and Ancestry and FTDNA has to build statistics and say this segment should not be part when we are finding relations....

As DNA genealogy is a rather un mature  "science" I assume we will learn a lot more. Maybe not every segment is inherit with the same probability etc....etc....
Magnus, I believe the future is now. :)

The smaller segments are identified via matches. If there are small segments that overlap a match with 3 siblings, the new bigger segment location should be close to the earliest start and the latest end of the 3 siblings.

To verify, you only have to compare a match with the new Lazarus kit and to see if there is a match on the bigger segment. I believe this is the type of evidence that supports which parent the segment originated from.
Just catching up, 3 points

1 - X and Y may be partly homologous, but they don't cross over.  The homology is only relevant for tracing genetic traits and diseases.

2 - if you've got your parents' DNA, you don't need yours at all.  Nothing is gained by factoring your own.  And you can't factor theirs, so there's nothing recursive.

3 - If you get an X match with somebody on your father's side, the position is the same as if you get one with a total stranger, ie they're a cousin on your mother's side somewhere.  There's no "probably".
2 - My parent's DNA doesn't tell me what DNA I got from whom. It doesn't tell me or my siblings what my siblings got from whom, and how we differ. Factors can be discovered by shared segments between me and my siblings, between my siblings and my cousins, between my parents and my aunt, etc. Given the nature of the non-associative algebra "Factor" might be somewhat misleading. But the recursive nature of DNA is not a point of controversy; we are able to do DNA transcription because of its similarities to computer code, so it is at least partially governed by principles of coding and generally recursive functions.

3 - The probably is in "Only a maternal relative" vs "Both patrernal and maternal relative."
1 - X and Y are sex chromosomes

The existence of a Y tells us the sample is male, which is relevant to how the X is treated.  Also, the Y may be relevant when 2 relatives claim to share the same distant common ancestor along their patrilineal lines. It may further support the claim to some degree, or refute it indicating a Non-Paternal event.

2 - There is much to be gained by factoring in children's DNA. I gave an example earlier, but the clearest gains are accuracy and depth.  AncestryDNA factors these child/parent relationships and accomplished both.  The process is called phasing. In my case, their predictions have been consistent with the documentation, even out to 8th cousins. There is one prediction that is further than the documentation. Both i and my match believe that a Non-Paternal event occurred.

3 - Matching segments do not always follow the same path in and endogamic relationship.  You could match your non-sex segments to a 1st cousin (who is not a stranger), and also be related to that same person as a 6th cousin on your mother's side sharing no non-sex segments.

2 - My parent's DNA doesn't tell me what DNA I got from whom.

For genealogical purposes, it won't help to know.

Factors can be discovered by shared segments between me and my siblings, between my siblings and my cousins, between my parents and my aunt, etc.

Nothing you don't already know

But the recursive nature of DNA is not a point of controversy; we are able to do DNA transcription because of its similarities to computer code, so it is at least partially governed by principles of coding and generally recursive functions.

Now you're just playing games with different meanings of recursive

3 - The probably is in "Only a maternal relative" vs "Both patrernal and maternal relative."

As to a possible link on the other side, you none the wiser.  You can't introduce a no-pedigree-collapse assumption to infer that a link on one side makes a link on the other side less likely than it otherwise would be.

There's no free lunch here.  There's no substitute for lots of data from cousins and cousins of cousins, including people you didn't know were cousins.  But given the data, the conclusions aren't that hard to reach.

 

3 - "You can't introduce a no-pedigree-collapse assumption to infer that a link on one side makes a link on the other side less likely than it otherwise would be."

I don't see where Ian "makes a link on the other side less likely".  There is a presumption that all the results are based on some probability. His response, IMO, was directed to your statement which seems to refute this. You say "There's no "probably"." when it is clear, at least to me, that there exists a possible alternative.

RJ, you do not get it. I hear that you are frustrated, and you are formally advised to walk away. Further replies on your part will be received as aggression. If this all doesn't seem worth the time or to offer any value to you then leave it; you are not required or desired to participate and this thread is not about trying to convincing people that what they are doing is not worth the time or effort. You don't get to police what we waste our time and effort doing.

2) It won't help in the conventional genealogical sense to know what DNA I got from whom. But I am not necessarily interested knowing the name of the person or any of the usual genealogical details of the people I get my DNA; my genetic genealogy in a form which only represents the structure of my DNA inheritance can be completed irrespective of the status of my WikiTree genealogy. For me, the genetic genealogy is more important because I can directly, accurately, and precisely know what the genetic profile of my ancestors look like; when I compare those profiles with the genetic profiles of others then I can know with little doubt that I am genetically related to them or not.

The kind of genealogy that I am interested in and which I am discussing the mathematical model of here is not directly the kind of genealogy that WikiTree is constructing. The WikiTree genealogy puts the social relationships and records first and uses the genetic genealogy to support those social relationships and records.

To me this is backwards because the more reliable data for inheritance relationships is the data that is produced by genetics; much of conventional genealogies beyond immediate relationships is largely speculative in nature and subject to a plurality of errors that results in genealogies that are often inaccurate or simply causes genealogies to dead-end with no discernible trail.

The kind of genetic genealogy I am interested in producing by these mathematics would then have conventional genealogies mapped to it hypothetically rather than the other way around; I know I am related to the person with the genetic profile produced, but I do not necessarily know that I am related to the person who has the WikiTree profile attached to my family tree. I want to build my genealogy on what is known and knowable rather than mere speculation and family mythology.
None of this is personal and I've no intention of trying to police anything.  But this is a public forum and people need to comment on anything posted which they think might mislead other readers.

Genetics loses half the information at each generation.  The only way to get it back, short of digging up skeletons, is to test lots of relatives and relatives of relatives.

If you want to know which side of your tree somebody is on, you need a triangulated 3-way match with another cousin.  You can get there the hard way from first principles, or you can take the message that's already been potted and packaged.  It will come to the same thing.

To pin somebody down further, ask the same question from other points of view.  You'll need more 3-way matches.
And you are again showing confusion based on your own presumptions.

Yes, genetic information is lost at each generation. But not the same genetic information. I have about half of my mother and father's DNA. My brother has about half of my mother and father's DNA. One of my half-sisters has about half of my mother's DNA. My brother, my sister, and I do not share the exact same half of our mother's DNA, so we do not share the exact same quarter of our grandparent's DNA and so on. If my brother, sister, and I compare our genetic results then each of us will have parts of our parent's DNA that would be absence in examining only one of us. This basic principle extends to cousins, aunts, and uncles for grand parents on up, so if I compare all my first cousins, aunts, siblings, parents, and myself together then we can get some percentage of our grand parent's DNA reconstructed without ever digging anyone up or even involving my living grand parents in the process.

But that's only a part of what I am talking about. Without regard to any other DNA kits besides my own, my DNA has to split in certain ways. I didn't inherit a random assortment of DNA. I inherited roughly half of my father's autsomal DNA and roughly half of my mother's autosomal DNA; the half I inherited from my father in general isn't identical to the half I inherited from my mother, and which half goes where is roughly linked with which sex-determining chromosomes I got from whom. Which portions of what chromosomes won't be known without making matches to other people's DNA, but it doesn't matter in the mathematical model. My DNA is treated as a variable or what is called an UNKNOWN in the 1800s language of mathematics; what DNA my father contributed is another different variable or another UNKNOWN. Same for my mother. Same for my siblings. Same for my cousins, aunts, uncles, grand parents, and people totally unrelated to me.

Graphs and tables can be constructed which show what abstract portion of autosomal DNA I got from whom. I am interested at the moment ONLY in the abstract relations. Once I have the parameters of the problem to plug into a fully constructed model, I can actually start doing comparisons in order to analytically link certain portions of my autosomal DNA with certain sides of my genetic genealogy starting with the maternal or paternal difference. Like my maternal grandfather potentially gave me X-chromosome DNA but gave me no mitochondrial DNA and no Y-chromosome DNA; roughly a quarter of my autosomal DNA comes from my grandfather, and the quarter isn't continuously distributed across 1-22 of my chromosomes, so I might have my maternal grandfather's DNA on my 1, 3, 4, 6, 7, 9, and 10th chromosomes. If my maternal grandfather's DNA can be put into a set like (1, 3, 4, 6, 7, 9, 10), and I compare my DNA with a random stranger that happens to match in (1, 3, 4, 6, 7, 9, 10) then I know that random stranger is related to me through my maternal grandfather's side of the family. With successive comparisons and enough genetic samples from close family members, I can use chromosome maps of that kind to automatically sort future matches to their proper place in my family tree. I might not be able to immediately place them exactly where they are in relation to me, but I will quickly be able to place them on the maternal or paternal side then place them on the paternal or maternal's grandfather or grandmother, and so on.

Not all unknowns are equally unknown though; I know I got a Y chromosome, and I know I got an X chromosome, and I know I got the Y chromosome with roughly half my autosomal DNA, and I know I got the X chromosome with roughly half my autosomal DNA, so I know that the relationship between roughly half my autosomal DNA is not entirely independent of which sex-determining chromosomes I inherited. Because the autosomal DNA is not entirely independent then I can write a functional notation representing that non-independent relationship where either my sex-determining chromosome is dependent on roughly half my autosomal DNA or roughly half my autosomal DNA is dependent on my sex-determining chromosome. With the difference between mitochondrial DNA and X chromosomes, we can actually establish more nuanced relationships between X-linked, Y-linked, and MT-linked inheritance as cross compared to each other.

So in the way that my directly measured DNA can be treated as a variable in a system so can unmeasured DNA of ancestors long since dead. You can think of my DNA as a solved system of equations which can be compared with other partially solved or unsolved but expressed systems of equations in order to examine the state of unmeasured DNA ancestors by indirect inference. Rather than thinking of the ancestor as strictly solved or unsolved, we can think of the ancestor in terms of percentages. If you only have my DNA to work with then you can only have about (1/(2^n))% of a given ancestor at a generation n solved. But if you compare me and my siblings then you can have more than (1/(2^n))% of an ancestor solved, and if you keep adding descendants to the comparison then we can tell more about the common ancestor. There's a mathematical relationship telling us what the minimum or maximum number of such comparisons will be to get 100% of the ancestor's DNA reconstructed.

"But this is a public forum and people need to comment on anything posted which they think might mislead other readers."

This is a blatant admission on your part that you think I am trying to mislead others. Your posts so far have been technically hostile to the process of free inquiry. It is great that you want to go ahead and keep doing things the way they have always been done. I am certainly not trying to stop you from doing exactly that. I don't care that there are labor intensive alternatives to solve these problems individually or by strict experimental methods.

I know there are mathematical methods which would be somewhat difficult to develop but which would be instrumental in the development of automated reasoners for genetic genealogy which makes the problem push-button for the average user who doesn't have interest in doing genetic genealogy the way it has always been done. In the meantime, the mathematics of genetic genealogy can be used individually to setup spreadsheet macros or simple programs that search and sort through data sets to make the process of identification of family members simpler and less manually intensive.

Add more known to the equation!!!

I agree with RJ Horace you need more things to make it easier. A lot of unknowns doesn't make the equation easier to solve ....

As I am a big fan of open linked data I feel we have more known in the equation if ´we start adding more data as structured machine readable data in WikiTree ==>

  1. If we add coordinate templates with date timestamps to Wikitree ==> we can create a timeline with locations as one known parameter i the equation
    ==>
    1. Narrow down that two matching segments is in a particular area
       
  2. If we add templates for sources ==> for some church books we can easily see that two people has sources in the same church book. Combining that with matching DNA segments starts getting interesting
    ==> 
    1. then we can narrow it down to a parish if for a specific segment match

 

Sälgö, the problem at this particular moment is that the number of unknowns and the expressions of the equations are themselves largely unknown. Until the expressions are derived adding known information doesn't actually help you solve anything except on an ad hoc or piecemeal basis. There is such a thing as an overdetermined system in which too much information is presumed known for a consistent solution to be possible to derive.

The mathematical rules of genetic genealogy need to be developed or if they are already developed then they need to be found and cited here.

I think the problem is that we has to little Experience of

DNA Genealogy. What we see is that Ancestry change algorithm and

The reason is they have learned something and try something. You haven't comment sticky segments and how to approach them in your model. Are the friend or foe? I assume we need much more knowledge and people doing test to develop our understanding. Feels like gedmatch.com would be a good source....

The known

As I pointed out we have some knows like locations times sources and FTDNA result lists. I feel combining them is a good step number 1

Ian - I share your frustration. To be blunt, the source of the problem is that Wikitree leadership and the DNA project have been reinforcing the proposition that "you need a triangulated 3-way match with an[other] cousin[S]." as a prerequisite to further confirm virtually any part of the tree.

The initial DNA Wikitree support also required all tests to be on Gedmatch and the results of the tests made public. It seems there was a belief that the auDNA and yDNA tests and community could be treated the same.

This is why RJ believes "Nothing is gained by factoring your own [DNA with your parents]." but this proposition is directly contrary to AncestryDNA telling us that Accuracy and genetic distance is gained by factoring in your own DNA with a parent or child." There had also been a push at 23andme to incorporate the phasing used in Ancestry Composition to their DNA Relatives algorithm. There is no way that Wikitree and AncestryDNA can both be true. 

This is also why RJ believes "If you want to know which side of your tree somebody is on, you need a triangulated 3-way match with another cousin."

There is no DNA Service or Gedmatch that places this restriction. Triangulation has absolutely no effect on this outcome. It is a requirement unique to wikitree.

This restriction is almost always used when a 3rd match, who does not have documentation, wants to prove a relationship with 2 other DNA matches by identifying a common ancestor. 

In virtually every case, you only need (1) a tree and (2) a single match in order to confidently support which side of your tree somebody else is on.  

A perfect example implementing this logic is the Gedmatch Lazarus feature. You can generate a new kit for someone not tested based on knowing 2 groups. Group 1 descendants, and group 2 cousins. The Lazarus process DOES NOT care about triangulation.I have created a kit for my maternal grandmother.  I plug in the information based on my tree, and the process uses the segments based on MATCHES. I can then use this result to identify PROBABLE matches via my mother, and PROBABLE matches of my mother via her mother. Triangulation plays NO role.

Wikitree leadership or the DNA project needs to step in and correct this misunderstanding of triangulation and how it used.

Magnus, you wrote " IBS ==> there is not 0 correlation but its random ==> useless... ==> back to square 1"

This may be true if you are working on a formula that predicts a relationship between 2 DNA testers, but this is not IAN's objective. He is working to build up evidence one connection at a time between a child and a parent.

Here is an example where a Triangulation Group is not used and a child/parent is used. if a mother and son are DNA Tested, and the son matches (meaning they meet the mininum requirements of a match) a cousin, but the mother does not.  One of the segments is IBS, but the mother does not share that IBS segment with her sons match.  

The only question that is being asked from the DNA is "Is this match related to the son via the mother or the father.  Since we know that none of the segments are shared with the mother, we can infer from the evidence that the IBS segment came from the father but according to your statement 

"there is not 0 correlation but its random ==> useless..."

This seems to be true when addressing "sticky" segments, which seem similar to IBS. There is a correlation between this IBS segment and those matches to the son that share this IBS segment. These matches are probably related to the son via his father.

Do you agree with RJ and Wikitree that "If you want to know which side of your tree somebody is on, you need a triangulated 3-way match with another cousin."?

Do you disagree that matches which include this IBS segment are probably related to the son via his father?

Magnus, I haven't commented on sticky segments because I am not familiar with them, and they would represent corrections on the basic model we haven't yet discovered or developed. I can see how they might be useful in the end game.

The "little experience" argument makes sense for empirical arguments, but mathematics doesn't rely strictly on experience. It relies on rules and assumptions. Rules like "The son inherits a y chromosome from only their father." from that rule and some other assumptions we can infer the existence of patrilineal inheritance. This distinct difference is conceptualized in comparison between empirical methods and (deductive or constructive methods).

The basic expression that needs resolution before we can really progress on to a more complex model is a) can a genome be represented as a vector quantity b) what is the dimensionality of the vector quantity c) what operator represents reproduction and what operator represents the inverse operation on reproduction.

Towards that, I did find a highly technical mathematics paper on the mathematics of genetic inheritance that I've linked in the OP; the main thing I took away from the paper is that genetic algebras are non-associative and sex-linked inheritance is anti-symmetric both of which make the representation of the human genetic sample, the reproduction operation, and the decomposition operation less straightforward than I had hoped. I will note that some aspect of the genetic sample has a scalar representation possibly the whole genetic sample according to the paper if I interpreted it correctly.

The obvious representation of the genetic sample is as a column or row of chromosomes. I am not sure how to treat the mitochondrial DNA or the X and Y chromosomes in the formal picture though. Seems like those should be treated differently from the more symmetric autosomal chromosomes.

The UC Davis links include a visual representation of the structure I am describing, but they don't have the mathematics for a precise representation from real world genetic data.

I'll be posting a graph of the representations I've tried so far in a few days.

Hello Ken,  I don't understand how matches which include an "IBS segment are probably related to the son via his father."  IBS segments may be due to fuzzy matching (where e.g. AG = GG and AA and AG, AND CT = CC and TT and CT, etc.).  Thus they are a computer created segment and not a real segment match.  See https://segmentology.org/2015/10/02/anatomy-of-an-ibs-segment/

This is why unphased segments need to be more that about 7 cM and male to male X-DNA matches can use smaller segments because fuzzy matching is not needed (because males only have one X chromosome).
It isn't just matches in general. The specific context is important.

If you have a match with a cousin whether an IBD or IBS but your cousin doesn't share that match with your mother then that cousin probably isn't related to your mother. If your cousin isn't related to your mother but the cousin might be related to you then the cousin is probably related to you through your father or it is a false positive and likely a result of (mutation or machine error).
Peter, what you are describing is IBC, Identical by Coincidence.  This is different than a segment that has been identified as Identical By State (IBS).  In my example, for those matches to the son that include a valid IBS segment but that same IBS segment is not shared with the mother, who in this case could be phased with then son, is probably related to the son via his father.  

If it makes a difference, since we have the son and mother, lets only use their phased data which results in a match with one of the segments used to determine the relationship and range identified as IBS.

IBS causes problems in predicting a relationship and range between matches, but we don't care about this, we only care about the source of the segment in one specific case and question. Is the match related to the son via his father or not. If no part of a valid IBS segment is shared with the mother, then it must be shared with the father.

and the quarter isn't continuously distributed across 1-22 of my chromosomes, so I might have my maternal grandfather's DNA on my 1, 3, 4, 6, 7, 9, and 10th chromosomes.

Chromosomes come in pairs.  One of each pair was from the sperm and one from the egg, but they aren't labelled.

The 23 that came in the egg, one of each pair, all contain your maternal grandfather's DNA, alternating with his wife's in random stretches.

 

 

Yes. Chromosomes come in pairs. And yes, one of each pair was from the sperm and one from the egg, and no, they aren't labelled. Good for you, Horace.
Let's suppose you have lots of cousins and get them all tested.  One in 8 will give you a Y match, your father's brothers' sons.

Half will give you an X match.  They're on your mother's side.

Those cousins will also give you loads of autosomal matches.  For simplicity we'll suppose that all the matches are through your mother.

So, looking at segment Blah1 to Blah2 on your Q chromosome-pair, if you have a match with maternal cousin Fred, you know one chromosome of the pair came from your mother.  But you knew that anyway.

You also now know that any other match on the same chromosome comes through your mother.

But you don't know which other matches are on the same chromosome.  The testing people say chromosome when they mean chromosome-pair because they can't separate the pair.  This is why they have to do fuzzy matching and triangulation.

Every segment of every chromosome-pair except XY will match cousins on both sides if you have enough cousins.  Which matches happen to exist in your sample isn't information, it's just a sampling artefact.  But those matches won't yield any new information about who is on which side.

RJ, This is a good question.

You begin with a false premise, based on what you have been told on this site.

"Every segment of every chromosome-pair except XY will match cousins on both sides if you have enough cousins"

The opposite is most likely true, especially for IBD, which can be traced to a unique common ancestor.

The reason for this is that that you may have a segment which matches a maternal cousin Fred, but on the paternal strand, part of the strand could be via the paternal grandfather, and part of the strand the paternal grandmother. Testing siblings, parents, aunts, uncles, and cousins, will identify those that have such a condition.

In these cases, the probability of the same segment matching more than one side is near 0%.

I would also like to correct you on your statement... 

"This is why they have to do fuzzy matching and triangulation."

1st, DNA services, as far as I know do not include triangulation in their matching algorithm, they provide reports that allow you to create your own TG's.

2nd, I know that Wikitree likes to characterize the matching as "Fuzzy Matching', but that is not how it has been used, at least in the past, outside of wikitree. The logic which determines the endpoints has been described as using fuzzy logic, which is why different DNA services may report different end points.  The matching algorithms use what may be better characterized as "Educated guess" or "Prediction".

A 7cm segment may actually be a 5cM or 6cM segment.  This is why AncestryDNA encourages parents and children to test. Phasing the DNA Data works to eliminate the fuzziness, make the predictions more accurate, and extends the distance of the predictions.

A) "1st, DNA services, as far as I know do not include triangulation in their matching algorithm, they provide reports that allow you to create your own TG's."

?!?!? its easier to tell what you refer to.... think this is a never ending discussion.....

1) FTDNA just do segment matching based on size and total in common and don't display results lower than a threshold 

2) Ancestry DNA have DNA circles that are secret but we can guess they use the family tree available,.... ==> the have triangulation somehow...?!?!?

3) 23andMe ?!?!?

4) ?!?!? 

B) 7cm segment may actually be a 5cM or 6cM segment ?!?!?

Do you mean something that looks like a IBD is a IBS sounds less possible or do we have numbers on that? 

 

Because everyone has two of each chromosome, the matching used is not precise.  See http://www.bishir.org/misc/alternatingdna.jpg

a1) FTDNA just do segment matching based on size and total in common. Yes, and this is not triangulation.

a2) Ancestry DNA has circles that are secret but we can guess they use the family tree available,.... ==> the have triangulation somehow

        Here is the Help on AncestryDNA Circles

"DNA Circles show you which members share DNA with one another in the genome, but not where in the genome they share that DNA. This is because our studies of genetic inheritance and DNA Circles have shown us that individuals in DNA Circles very rarely share the same matching segments"

I have been told that on Wikitree, the term triangulation means they are part of a triangulated Group. AncestryDNA clearly does not use triangulation.  This clearly tells us that AncestryDNA does not use triangulation.  

a3). 23andme does not use triangulation to determine what is a match, or prediction.  You can run reports that will provide you the data for you to determine what is and what is not triangulated, but they do not report on what matches are triangulated.

 a4) ??

a5) "B) 7cm segment may actually be a 5cM or 6cM segment ?!?!?"

FTDNA and 23andme may report a 7cM because the endpoints are "Fuzzy", but when AncestryDNA takes that same Raw Data and phrases it, the fuzziness is nearly eliminated, and the more accurate result is 5cM.

These are both IBD because they are the same segment, but the fact AncestryDNA will phase data when available, it results in a more accurate results.  This is why AncestryDNA minimum is 5cm and the others 7cM.​

The reason for this is that that you may have a segment which matches a maternal cousin Fred, but on the paternal strand, part of the strand could be via the paternal grandfather, and part of the strand the paternal grandmother. Testing siblings, parents, aunts, uncles, and cousins, will identify those that have such a condition.

Comes to the same thing.  If you have a match with an unknown person, you'll need them to match a known relative to be able to find out which side of the tree they're on.  But then the answer is immediate and doesn't need any further analysis.

 

RJ, "Comes to the same thing." - Not on Wikitree.  Wikitree only accepts triangulation when there is a triangulation group which shares the same 7cm or greater.

1. Wikitree does not accept less than 7cm. We on Wikitree  can't say a person is related via one parent, based on evidence that a less than 7cM IBS segment absolutely did not come from the other parent. IMO, this is logic 101. 

Outside of wikitree, I doubt many people will agree " If you have a match with an unknown person, you'll need them to match a known relative".  Simple logic tells us given only 2 choices, and we eliminate one, the other must be true. If I can prove the match is not my mother, then it must be via my father.

I would like to correct you on the following.

"If you have a match with an unknown person, you'll need them to match a known relative to be able to find out which side of the tree they're on.  Then the answer is immediate and doesn't need any further analysis."

Although I agree with this statement, it is not within the Wikitree guidelines or the comments made.  You can not just match. If this were the case, then you would not have to look at segments.  Even though you might have a completely documented connection to a cousin and you match that cousin, you have to find a third cousin who shares a triangulated segment. Why? it adds nothing when deciding which side of the family a cousin is related on.

Magnus,

If I match an unknown cousin on a 10cm segment, and my mother does not match on any part of that segment, the probability is that this segment came via my father. There are no Triangulated Groups involved. Just to be clear, you disagree because it seems you are still supporting the claims

  1. "If you want to know which side of your tree somebody is on, you need a triangulated 3-way match with another cousin."?”

If we agreed that this was a smaller IBS 6cM segment, and the mother did not share this 6cM segment, are you still supporting the claim…

  1. Do you still disagree that segments from matches which are IBS segment are probably related to the son via his father?”
  2. Every segment of every chromosome-pair except XY will match cousins on both sides if you have enough cousins. Even though segments on one strand came from the same paternal grandparent, but the other strand was split between the maternal grandfather and maternal grandmother.  
Like Ancestry care if your tree is all wrong?

The Smiths, Browns and Joneses each have a son and a daughter.

John Smith marries Mary Brown, John Brown marries Mary Jones, John Jones marries Mary Smith.  Each couple has a child called Zebedee.

Each Zebedee shares a lot of DNA with the other 2.  But the other two are on opposite sides of his tree, even though they share a lot of DNA with each other.

And of course there is no common ancestor, and no pedigree collapse.

23andMe has recently added a triangulation groups system for open profiles. The system shows both In Common With indirect matching (Ancestry.com style DNA circles) and lists what ICW matches also form triangulation groups.

--------------------------------------

I figure that we should explicitly state something that is basically important: If you share a segment of DNA with someone then they are probably related to you; segments over 7 cM are more likely to be positive matches than to be false matches, and segments over 10 cM are almost certainly not false positives (ISOGGWiki).

You don't need triangulation groups to establish that you are related to someone by DNA comparison in some way. Sharing more than 15 cM can be considered to be almost certainly a direct genetic relationship.

Triangulation groups are necessary for establishing probable most recent common ancestors; I need my two sibling's and my DNA to indirectly establish my mother as our common ancestor, or I need one sibling, my mother, and my own DNA to directly establish that my mother is our common ancestor by a triangulation group. I need one sibling or my mother, one of either my maternal aunt or my first cousins by my maternal aunt, and my DNA to indirectly establish either of my grand parents as a common ancestor. This logic extends up through all genetic ancestors but not necessarily for every genealogical ancestor (See the UC Davis genetic genealogy blog in the OP for details)

4 Answers

+3 votes
 
Best answer
Ian,

I have been working on this - mentioned it to Andreas West a week or so ago.

It would be a matter of probabilities, I match these cousins, who also match my Dad's Phased Data on this chromosome, at this location and based on how much we all match have a probable amount of DNA I may have inherited back to our MCRA. The out-put would be in the form of a fan chart which WikiTree would auto-populate with the Ancestor names. This would be a huge help in Adoption work.

Mags
by Mags Gaulden G2G6 Pilot (640k points)
selected by Ian Mclean
I have a couple of genetic matches to people who were adopted, and I would really like to figure out what branches of my family they relate to. Which is why I've been trying so hard to figure out the mathematical model for this search and sort method.

I had thought about a fan chart; I think that is the best way to represent whole genome sequences or to keep track of exome results like what has been derived by labs like 23andMe and FamilyTreeDNA. The structure I would like to find is the one which shows what fragments of DNA I got from who.

Think of it like this. At me, the structure would ideally have 100% of my DNA exactly as it is. At my parents, they'd each have roughly 50% of my DNA representing the portions they passed on to me; this would be basically my phased DNA showing exactly what I got from my father and exactly what I got from my mother.

Normally in figuring out what my mother and father are going to pass on to a child it is a matter of some randomness and probability, but in the case where we're examining me, my DNA, and my parents and their DNA there isn't strictly a probabilistic relationship to be concerned about; we should be able to use strict differences to deduce what actually happened as compared to what could have happened from the actual measurements.

In practice for figuring out where distant cousins go in the family tree, I do think it would be a probability or at least a degrees of truth problem written in statistical or fuzzy logic.

A bonus to making the kind of map that I am thinking about is that we'd eventually see what DNA survived from my ancestors to me and see what is missing from the puzzle. With enough people represented in this same kind of structure, we could start to see where the pieces fit together, so we could reconstruct the whole genome sequences of common ancestors that we don't necessarily know. To me that would be useful for determining where I fit in the global family graph, and I imagine it would be similarly useful to other people looking for how they fit in.
Ian, I am presuming that the Adoptees are on gedmatch.  If this is so, then I would suggest using the Tier 1 Lazarus support. See if you can create kits using known relatives and see if any of those kits provide any insight.

One way to experiment a little is to include in group 2, those kits that are in both your kit and the adoptees.  Remember, you are only looking to establish which parent, not which common ancestor.  You narrowing down the possibilities, one generation at a time.
Yes Ian - and using the known phased data and the known matches to that phased data and knowing the segment locations and %'s using probability you could forecast/determine a reasonable theory of where YOUR random inheritance falls. Mags
Ken, the first problem I have to solve is which side of my family they are related to me from. For my maternal grandmother's side of the family, I know with relative certainty that one of the adoptees is not related to me by my great grandparents or lower; I know all my great aunts and uncles and all their children and all the great grandchildren. There's a distinct possibility that they are related through my maternal grandfather's side of the family, but I have put figuring that out specifically on hold until I can rule out the more difficult case: they are related to me through my father's side of the family.

I know very little about my father's side of the family relatively speaking. I barely have my relatives documented out to my paternal grandparents. One of the adoptees shares X chromosome DNA with me, so I can generally assume she is a relative on my mother's side, but the main adoptee that I want to help shares only autosomal DNA with me, so it is ambiguous as to where they are in my family.

In order to figure out their parents, I need to figure out which side of the family I need to look on. From there I need to figure out our most recent common ancestor. From the most common recent ancestor, I can then trace down the line to the adoptee and at least one of their parents; the adoptee has already found their parent of record at least under a pseudonym, and from what is known of their father, I am related to them through their mother.

I'll keep the Lazarus kits in mind for this, but the Lazarus kits depend on having solved more basic problems.
I like your model Ian - and of course it should work. Magnus and Ken are smart cookies and they've pointed out some issues. Nevertheless the data should tell the story. And that for me is the rub - the data aren't necessarily there yet. Despite the seeming precision of these tests we don't know the error rate or variation in results due to testing procedures, lower level data sorts, different tolerances, thresholds, or magnitudes for categorizing the lower-level data, etc. None of these data from these tests are ready for the precision of the 'exacto knife' of a model you have in mind presently. Even the underlying proteins themselves appear to behave in unpredictable ways so while I am hopeful better models will be developed for predictive as well historical reasons I'm not sure were aren't stuck in the Sherlock Holmes era for a bit longer. I would think Mathmatica might do some interesting things with the data but you are likely to have to rely on statistics and categorical analysis for the state of the art.

I get the issues with the available data. But the precision isn't so much the issue anymore, and as time goes by, it is going to become less the issue. Error rates are entering into the 1% range and rapidly diminishing for individual genetic tests. Comparison between old kits and new kits or between standard kits and custom kits are the major problem at the moment.

With the cost per genome rapidly approaching 0, the issue of precision or lack of data is going to effectively go away entirely. For my personal case, I have most of my immediate family members totally on board for genetic sequencing and analysis, so I am not concerned about not having access to the minimum data necessary to pull apart my genome and figure out deductively and experimentally where I got what from whom. To me, it is simply a matter of finding and learning to use the correct tools. Or inventing them where they don't yet exist.

For me the major issue isn't the reliability of the specific genetic testing kits though. What I want is the basic mathematical model for the simplest case: the generalized family tree without pedigree collapse.

That model isn't going to depend on any of those factors, and we can actually use deviations from the simplest model as a way to infer information that wouldn't otherwise be obvious.

The mathematical model can be constructed without actually depending directly on any given test or precision. The data may not be present or up to the required precision, but we have the basic theories for vector analysis, physical measurement, computer coding, and genetics. The mathematical theory of genetic genealogy can be written before we have the data to test the theory of genetic genealogy. Data developed later can then be used to test the theory and possibly refute it or some of its assumptions.

The basic structure is actually already relatively well known: "[Identical by state data] may be useless in identifying the common ancestor but it [is useful in] an iterative process of building a decision tree [...] based on probabilities." -Ken Sargent

Ian,  I recommend you follow some of postings of the Coop Lab at

https://gcbias.org/2013/11/04/how-much-of-your-genome-do-you-inherit-from-a-particular-ancestor/

Sincerely,
Can anybody produce an example of this kind of argument producing any non-obvious non-circular results?
Hello RJ,  

Please ask your question on the comments section of the Coop Lab blog.  

Thanks.
Thanks, Peter.
+2 votes
The problem, as I see it, is that only a sampling of a given person's genome is tested, so that while getting a statistical measure of relatedness for a few generations is easy enough, it will only work definitively for a very few generations.  Further back patterns will exist for areas of the genome, but since they are characteristic of a large number of people in a given area, it isn't possible to do what you suggest if there is a very large tested group.  I've been looking a bit for a good source of technical explanations but so far I mostly see stuff for the non-technically oriented.  But Wikitree is a large group and I'm pretty sure actual articles will be cited here which don't take a bunch of money to read.  Meanwhile I have more to do here than I can get even started on. But I'll keep an eye on you to see if you get a handle on how to handle things.
by Dave Dardinger G2G6 Pilot (440k points)
+2 votes

I believe what you are saying is true but some clarification is necessary.

Using your conclusion: "This way my DNA can be effectively factored recursively into maternal vs paternal, maternal grandfather vs maternal grandmother, paternal grandfather vs paternal grandmother, and so on. You could then compare the factored or phased DNA to matches shared with other family members and determine immediately where in your family tree they must be. Likewise, you could use the vector representation of other people's DNA in order to automatically generate genealogies and check for intersections."

My two brothers also have been DNA tested, and without a tree, we can make certain conclusions about the source even though we can't specifically identify which parent or another ancestor is the source.

For example, if the two oldest brothers in my family share a segment, but the 3rd brother does not, it means that the 3rd brother received that particular segment from a different paternal grandparent and a different material grandparent than the other two. 

If a cousin matches the 3rd brother, then you can also make some assumptions that about the grandparent of the 1st two brothers. We presume that the well-documented tree is correct, and by using that tree, it is determined that this match is via his paternal grandfather, you can presume that other matches on that same segment to only the two oldest brothers was inherited via your paternal grandmother.

I can make this presumption because I know my parents share no segments.

A tree and DNA are mutually dependent on each other for the answers we are asking  A tree and DNA are either consistent with each other or they are not.  DNA alone can not independently Confirm nor Prove particular relationship. It can only further support or refute an existing claim.

by Ken Sargent G2G6 Mach 6 (61.9k points)

"If a cousin matches the 3rd brother, then you can also make some assumptions that about the grandparent of the 1st two brothers. We presume that the well-documented tree is correct, and by using that tree, it is determined that this match is via his paternal grandfather, you can presume that other matches on that same segment to only the two oldest brothers was inherited via your paternal grandmother.

I can make this presumption because I know my parents share no segments."

This is exactly the kind of logic I am interested in. Even if your parents do share some segments in common, we can filter those up to a point by checking the near generational relevance of the shared segments; if your parents share 5cM for example then that probably isn't going to be an issue, but if they share >7cM then it will almost certainly be an issue for this kind of inference.

Thanks for the clarification.

IMO, you have to understand the synergistic relationship that exists between genetics and genealogy.  For example, if you test 2 people who are predicted to be in a child/parent relationship. Without any additional information, you don't know who is the parent and who is the child. You may have to introduce valid genealogical information which answers the question, which donor is older/younger. You are relying on nonDNA data to make this distinction.  

But if you incorporate additional tests, you should be able to conclude which is the older/younger and then compare against the genealogical information to see if the results are consistent/inconsistent.

I haven't had the time to pursue this other than spot checks, but I measure success differently than others and treat unverified relationships differently as well.

I have also tested both my parents, 2 siblings and a few uncles. I obviously have virtually no problem distinguishing which parent my matches fall onto since they have been tested. 

I do have a good idea of the matches which are on my maternal grandmother's side even though she has not been tested. http://www.wikitree.com/wiki/Gillis-418. Her Gedmatch id is LL747479.

I treat these a probably related to my mother via her mother.

Also, anyone in group 2 (cousins of my grandmother) can begin to create kits for ancestors up to but not including their common ancestors by reversing the groups.

There is a lot more that can be done, I hope you get the idea.  

My paternal grandmothers side of the family came from a very specific area of Poland.  I realize that some people may object to this, but I have no problem creating Lazarus kit for her which includes cousins born in Poland with family histories limited to just Poland. My paternal grandfather's side has been in New England for generations via the United Kingdom.

I don't care at this point the exact relationship between DNA Testers. I am only interested in narrowing down which parent on a particular profile that connects me to a match.

+1 vote

Here is my exchange with Ann Cousin aka DNACousins.

I asked for clarification on some things to make it clearer but this morning I told her it was not needed. I understood why she answered as she did but just reading the response.

1. Conceptually, a match for a son not found in his mother can be attributed to his father. This includes IBD and IBS, but not IBD.  I have no problem limiting those matches (without triangulation), to only those that include an IBD segment, which is all I initially intended.

2. Given the answer to #1 only requires a match, it indicates that triangulation is not necessary. Since triangulation is only used to find common ancestors for those without a tree, she interpreted the question that way. 

Do I really have to ask Ann to clarify her last statement by telling her that the Wikitree technical group believes that triangulation is used for something other than finding the common ancestor? Do I really have to say to her that the Wikitree Technical members are not convinced her answer to #1 because "If you want to know which side of your tree somebody is on, you need a triangulated 3-way match with another cousin. "

I tried to ask the question so not to bias her answer but I should have noted that Wikitree places a triangulation requirement on more than finding common ancestors.

From: Ann Turner

Sent: Saturday, June 04, 2016 4:32 PM

To: Kenneth Sargent

Subject: Re: I am hoping you will clear up a disagreement.

 

I've been wishing I could spend more time on WikiTree, but it seems like there's always something else demanding my attention.
 

1) Conceptually, a match for a son not found in his mother can be attributed to his father. There are a couple of "gotchas", though. The segment must be long enough that you can rule out a coincidental match. There's no consensus on how long that should be. And there is also a possibility of a false negative in the mother, e.g. at FTDNA (which requires a total of 20 cM, including small 1-3 cM pseudo-segments), AncestryDNA (with its TIMBER algorithm discounting some segments) and 23andMe (with a cap on the number of DNA Relatives). GEDmatch lets you look at everyone through the same lens.

2) There's also no consensus on whether you "need" a triangulated group. AncestryDNA uses more of a network approach. I wrote up some material about how difficult it is to assemble TGs here:  http://tinyurl.com/TheTroubleWithTriangulation.  But if you have the good fortune to get a triangulated group with pretty robust segment sizes, I do think it's possible to attribute it to a specific ancestral couple if it's not too many generations back. When you go back many generations, that brings up the possibility of multiple lines of descent.

Hope that helps,

Ann

 

On Sat, Jun 4, 2016 at 9:48 AM, Kenneth Sargent <msnkjsargent@msn.com> wrote:

Hi Ann,

 

I’ve been spending too much time on Wikitree, devoted almost entirely to the discussions on DNA. I suspect that Wikitree is the best source for publicly available documented trees but the discussions are not at the level as 23andme used to be. I was hoping to ask you two basic questions and get your permission to post your response. We are discussing the “mathematics of genetic genealogy”.

 

Your responses to these questions could significantly affect how Wikitree users think about how to use DNA in their research.

 

Scenario: We have the raw data for a mother and a son available to us for customization.  There are matches to the son, that are not matches to his mother. More specifically for these matches, the segments are shared with the son, but none are shared with the mother. Since the data can be phased, I am presuming the process could phase the data first.

 

1. Is it possible, using the data available, in these cases, that a match to the son, and not the mother, is probably related to the son via the father?

 

2. Do you agree “If you want to know which side of your tree somebody is on, you need a triangulated 3-way match with another cousin. “ FYI – a triangulated 3-way match” means part of a Triangulated Group. You don’t have to go further than yes or no, but feel free to comment.

 

 

Thank you

by Ken Sargent G2G6 Mach 6 (61.9k points)
Actually I wasn't talking about WikiTree when I said that.  But surely if you have a 3-way triangulated match with the son, WikiTree doesn't also demand one for the father.  Most ancestors are inaccessible.

RJ, the scenario we have been using only involves 3 people who are not all biologically related to each other. The son and the mother are related to each other, but the DNA Cousin is only related to the son.  There is no triangulation match with the son. It is a simple match which contains IBD AND possibly IBS Segments.

Given there is NO TRIANGULATION in this scenario, I maintain "you need don't need a triangulated 3-way match with another cousin." which is directly contrary to your assertion. This same principle that is applied to Wikitree requirements that only the approved method of the confirmation of a father or mother requires triangulation.

I am not sure what you mean by "Most ancestors are inaccessible". We are not looking at any ancestors of the son other than the mother and father. 

It seems that you believe (and wikitree) that you have to know the common ancestors in order to determine if a match is related to the father or to the mother in every case.

But how do you know which side of your tree the son is on?
The DNA Cousin does not know which side. Not with these three tests. Only the son knows that he is related via his father to the DNA Cousin.
But if you're coming at it from that direction, the sticking point is which side of the father's tree the match is on.  The son isn't the problem.  Going downwards is easy.
The result is not affected by "coming at it from that direction".

The stated problem is to determine the son's side of the family. Determining the DNA cousins side of the family or the father's side of the family are completely different scenario's.

I still assert the son is probably related to the cousin via the son's father. You and others deny this is true.
Of course I'm not denying it.  It's obvious.  It's not the question being asked.

Based on your last post, I will assume some misunderstanding.

I think it important then to identify the problem with communication in this case.

1st The title implies a more technical discussion on “mathematics of genetic genealogy?” in which a higher level of precision is assumed.

2. You stated, “If you want to know which side of your tree somebody is on, you need a triangulated 3-way match with another cousin.”

This has really been then the focus of the exchange. To Ian and I, this was obviously false.

Ian provided examples that contradicted this proposition by providing examples of showing which side of your tree somebody is on, without a triangulated 3-way match with another cousin. 

3. I provided a simple scenario of the son, mother, and cousin where the son is related to the cousin via his father and repeatedly used it to show my point. I took your responses as denials. This is also without a triangulated 3-way match with another cousin. 

I thought I was very specific about the scope of my statements. Please understand that I am unclear about what you believe is obvious.

Do you still believe…

  • If you want to know which side of your tree somebody is on, you need a triangulated 3-way match with another cousin.”

Because if you still support this, then you can’t believe

  • the son is probably related to the cousin via the son's father” because there is no triangulated 3-way match with another cousin
The following is a pseudocode rendering of an algorithm for determining genetic relationships between anonymous or pseudoanonymous genetic samples.

First step: determine which side of the family a match is for your genetic genealogy via comparison to yourself and at least one parent; mitochondrial matches are going to be strictly along your matrilineal relations; X chromosome matches can be effectively treated as being strictly maternal for XY karyotypes but maybe paternal for non-XY karyotypes; Y chromosome matches can be effectively treated as strictly paternal.

Second step: repeat the above for n matches to create a pool of sorted matches which have been determined to be on your father's side, your mother's side, both, or neither (you might have matches due to mutation). The choice of the size of the pool, n, needs to be based on standards for statistical significance.

Third step: find all matches that share a sex-linked segment and an autosomal segment. These are weakly patrlineal (Y Chromosome), weakly matrilineal (mitochondrial), or weakly maternal (X Chromosome) autosomal matches; there is a probable relationship of inheritance between.the autosomal match and the sex-linked match; this is a correlative relationship but not necessarily a causal relationship. This group of matches are useful for figuring out what autosomes to target first in the search and sort.

Fourth step: analyze the matches and sort according to probable degree of relationship. Naively, order the sorts according to cM lengths, the number of shared segments, and total shared cM lengths; a more sophisticated algorithm for determining probable degree of relationship can and should be be used.

Fifth step: diagram yourself at the center of a bifurcated polar coordinate system with the sorted matches plotted to intervals representing the range of probable degree of relationship over the rings radiating out from you on the appropriate side of the map. Mother's matches on one side and father's matches on the other side; I would probably exclude plotting the both or neither matches for now. The idea is to find clusters of matches that match each other and graph those clusters according to their probable degree of relationship; by graphing their probable degree of relationship to you and their probable degree of relationship to each other, you create a relative topological reference of distance and connection.

Sixth step: find all triangulations between you and your mother's matches; find all triangulations between you and your father's matches. Mark the abstract relationship of you, your mother, and your match's most recent common ancestor; at this point, the graph should begin to show a structure of relationships resembling a familiar genetic genealogy; it will likely be incomplete and will have islands of disconnected relations.

Steps beyond this really depend on what you want to accomplish. The islands can be recursively connected by performing steps 1 through 6 for each child-parent pair you can find among your matches. There's a critical threshold of matches that would result in a chain reacting algorithm that would tend towards total connectivity.

A DNA cousin can know which side of the family I am on by the mirror image of the process by which I discovered what side of the family they are on.

To determine what side of my father's family tree a given match is more information is required. In my case, I basically do not have access to my father's DNA directly, so the best I can do is phase my DNA with my mother and my siblings to composite my father's DNA via Lazarus kits or similar.

However, I can also take all of the cousins that I am able to discern are not related to my mother and composite their DNA matches with me as well into the image of my father's DNA; I don't know what side of his tree they all are on, but I don't need to know either because I only need to know that they are not on my mother's side of the family tree. I can composite a functional image of my father between my siblings, my mother, my father's pseudonymous genetic relations, and me; the issue then is to determine his mother or father's DNA. Obviously, his mother's DNA can't be fully reconstructed without a genetic kit from his daughter, maternal sisters, maternal aunts, or maternal uncles because I probably share no X Chromosome or mitochondrial DNA in common with my father. His father can be partially reconstructed sans X chromosome and mitochondrial DNA because I share upwards of 1/4th my autosomal DNA and almost my whole Y Chromosome in common with him, but again, we would need genetic kits from my father's paternal sisters, paternal aunts, paternal uncles, or daughter. Without going through all that, I would guess that we can use my DNA and my father's partially reconstructed DNA to sort paternal DNA cousins into probable pools of paternal grandfather and grandmother matches by those who do not share DNA with me and my partrilineal uncles or paternal XY-karyotype first cousins.

Though we are now getting into why it is important to derive the mathematical genetic decomposition or "factorization" of people for comparison. The determination of which side a DNA cousin lies on of my father is answered by how the pieces fit together to form a completed puzzle and depends on the mathematical decomposition of a genome into a quantitative genetic genealogy; unlike a common puzzle where each piece has a unique fit, this puzzle can be assembled multiple ways from multiple other puzzles. The classification and categorization of which are methodically significant and mathematically possible.

Related questions

+9 votes
1 answer
333 views asked Jul 20, 2018 in The Tree House by Eddie King G2G6 Pilot (695k points)
+6 votes
4 answers
281 views asked Nov 19, 2015 in The Tree House by Michele Camera G2G Crew (600 points)
+11 votes
1 answer
241 views asked Nov 1, 2015 in The Tree House by James Stratman G2G6 Pilot (103k points)
+12 votes
5 answers
765 views asked May 25, 2017 in The Tree House by Mitchell Apperley G2G Crew (490 points)
+5 votes
4 answers
248 views asked Mar 8, 2017 in Genealogy Help by Bennet Stafford G2G Crew (580 points)
+5 votes
1 answer
778 views asked Apr 11, 2014 in Genealogy Help by Ray Jones G2G6 Pilot (162k points)

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...