Are top genetic matches typically more distant than suggested?

+8 votes
451 views
I've noticed that most of my top matches are more distantly related than the amount of shared DNA would suggest.  It seems to me this would typically be the case and is simply the result of "regression to the mean".  That is, cousins can be "lucky" and share more DNA than expected from their relationship or "unlucky" and share less.  At the upper extreme with the few people who share quite a bit of DNA, the explanation is more commonly that this is because they got lucky than the reverse.  

But the strength of this regression effect depends on the strength of the correlation between shared DNA amount and the distance of the relationship.  It also typically relies on some assumptions about the distribution of the underlying data.  I don't really know if it should be in play here.

So I want to ask here, do most people have this same experience with top matches -- more distant than you'd think?  Or is my experience atypical?
in The Tree House by Barry Smith G2G6 Pilot (293k points)
Three men, relationships confirmed by personal knowledge

Two are second cousins, which is reflected in Family Finder and they share a SNP YP27595 which appeared in 1850

 

The third is a 4th cousin once removed to both of the 2nd cousins, however he only shows in family finder to one of the two 2nd cousins.

Apparently the autosomal DNA associated with that surname, fell out or is too small to measure for one of the second cousins.

7 Answers

+12 votes
People who have a DNA match but come from endogamous societies should expect to heavily discount their projected relationship distance.

On the other hand presence of endogamous genes more easily confirm belonging to a specific tribe/clan/nation.

DNA matches for my dad that estimate a 2nd cousin relationship, in reality no closer than 4th. He's an Ashkenazi Jew with 16,000 matches on FTDNA.
by Patrick Munits G2G6 Mach 1 (12.6k points)
Yes, endogamy can be an important factor. I once looked at the Gedmatch list of a fellow WikiTree member with Ashkenazi heritage -- there were a couple of thousand people who were supposedly about 4th cousin or closer. Clearly they weren't all those closely related. The Gedmatch calculation doesn't take Ashkenazi heritage into account. And I have concluded that my own somewhat endogamous New England ancestry gives me a number of DNA matches that appear to be closer (according to Gedmatch's standard calculations) than they really are.
I may be the person Ellen is referring to because (a) I remember that she and I were discussing this a while ago and (b) my gedmatch match list fits the description - all 2,000 are 4th cousin or closer.

The only thing is, I have only been able to trace three relationships and all of them are EXACTLY as predicted, so ..... go figger!  They are 2nd cousin, 2nd cousin once removed, and half 2nd cousin once removed.

I've seen the results for a guy who's mother was from Puerto Rico. He has over 2400 matches (AncestryDNA) out to the 4th cousin level. Only a handful of his matches are on his dad's side. Many matches on his maternal grandfather's side have shared matches with the matches on his maternal grandmother's side.

So apparently PR is an endogamous place as well.

But I hadn't heard about "endogamous New England ancestry" before. Is that "a thing"? It might explain a thing or two in my own results.

The endogamy in New England is nowhere near as significant as the endogamy in the Ashkenazi population, but I think there's some reality to it. The "Great Migration" settler population of New England was only about 40,000 people (including family members of heads of households). There was some augmentation by subsequent arrivals, but the population of New England before the Industrial Revolution was derived in very large part from the progeny of those 40,000 people (an unusually large fraction of whom had large numbers of children who survived to have large families of their own). It seems likely that the explosion of New England's population from that relatively small founding group would lead to a high degree of relatedness.

I see it here -- if another WikiTree member with reasonably well-developed genealogy and substantial New England ancestry is identified in RelationshipFinder as a 6th through 9th cousin to me from a New England line, chances are good that we have several other cousin relationships at the 6th through 10th cousin level -- and that's without accounting for the situations where an ancestor appears multiple times in our trees because close cousins  married each other and increased our "DNA dose" from their ancestral lines.

In one case, Gedmatch showed me a DNA match to another member, with an estimated 4.8 generations to the MRCA. I compared us in Relationship Finder here, and found that we have two full 9th cousin relationships, plus a half 9th-cousin relationship, two 9th cousins twice removed relationships, two full 10th cousin relationships, a half 10th cousin relationship, and more -- 51 common ancestors within 15 generations (not counting ancestors who repeat in our trees). We appear to be closer cousins than we are because we descend from the same interconnected population!
+7 votes
I have gifted DNA kits to many relatives. You can see the list on my profile page. Anyway, FTDNA gets the DNA status spot on with every one of them!!
by Debbie Parsons G2G6 Pilot (151k points)
+3 votes

I would think there would be a tendency for that to happen when you don't have very many matches who are close relatives - and that will be typical unless DNA testing has become some sort of family event and/or one family member has gifted tests to other family members.

Close family members tend to be pretty spot-on - within the range they're supposed to be in - but for more distant relatives there would be more of them, and so there would be more outliers.

Since matches are ordered according to their centimorgans, the outliers with unusually high centimorgans will be listed first.

Patrick Munits offers another reason this would happen, too - endogamy.

That being said, I'm not really seeing it, myself. I have, starting with 2nd Cousin level matches:

("2nd Cousins")

* 430cM 1C1R (OK)

* 236cM 2C (OK)

* 222cM 2C (OK)

("3rd Cousins")

* 143cM 2C1R (OK)

* 140cM 2C (OK)

* 135cM 2C (OK)

* 123cM 2C1R (OK)

* 118cM 3C (OK)

* 104cM 3C (OK)

* 100cM 2C1R (OK)

("4th Cousins")

* 74cM 2C1R (OK)

* 69cM Half 2C1R (OK)

* 66cM 3C (OK)

* 64cM 3C1R (OK)

* 60cM 2C1R (OK)

etc. By "OK" I mean it falls within the 5% to 95% percentile for the relationship (the middle 90%), according to Blaine Bettinger's data:

1C1R:  215cM-635cM

2C: 93cM-390cM

2C1R: 31cM-221cM

Half 2C1R: 15cM-193cM

3C: 14cM-146cM

3C1R: 9cM-100cM

So maybe endogamy IS the best explanation.

by Living Stanley G2G6 Mach 9 (91.2k points)
edited by Living Stanley
I forgot - the last three I listed have SOME endogamy (my gt-gt grandparents on that side were 2nd cousins) - all the rest do not.
I had been coming to the same conclusion myself -- the regression effect would be strongest with weak correlation, and with close relatives, the correlation should be very strong.  But I don't have many close relatives, so I would expect to see it more than others.  I don't know if I have more pedigree collapse than most people, since I don't have a way to compare, but I suppose it's possible.  I've certainly observed it in my own lines on several occasions.
+5 votes
Amount of shared DNA is a function of

1) number of common ancestors

2) distance of common ancestors

3) a string of random multipliers

From which we get that

a) the random multipliers might or might not average out.  If they don't, the results will be skewed

b) multiple common ancestors will look like a single couple not so far back.

There are two ways to get multiple common ancestors

A) you and the other testee are both descended from two different couples, by coincidence

B) you and the other testee descend from one couple, but they were related to each other and shared some DNA.  In that case, their own common ancestor(s) enter the picture.
by Living Horace G2G6 Pilot (633k points)
+4 votes

You are correct "That is, cousins can be "lucky" and share more DNA than expected from their relationship or "unlucky" and share less.  At the upper extreme with the few people who share quite a bit of DNA, the explanation is more commonly that this is because they got lucky than the reverse."

I manage accounts that involve 2 and 3 siblings. The amount of shared DNA varies between these siblings and a DNA cousin. The greater the distance, the greater the variance. 

The statistics assume endogamy does not play a role.  I don't see in your tree any evidence that it plays a role in your tree or mine.

by Ken Sargent G2G6 Mach 6 (62.1k points)

It would also be helpful to understand how you determined what "the amount of shared DNA would suggest"

The amount of shared DNA is determined differently on each platform.

FTDNA - a shared segment is any segment >1cM and 500SNP's.

Gedmatch - I believe a shared segment is >1cm and 700 SNP's.  I also believe this was the original formula used by 23andme.

23andme - I believe a shared segment is 5cm> 500 or 700 cM.

AncestryDNA - I believe a shared segment is >6cM and 700 SNP's.

I believe the chart at https://isogg.org/wiki/Autosomal_DNA_statistics  uses the original 23andme and gedmatch rule of >1cM and 700SNP's.

What chart are you using and which formula are you using to determine the total shared segments.

Indeed - it seems like an example or two might do the trick.
+7 votes
My experience in working with the kits of my parents has been that the top matches tend to be in the relationship range predicted.  I tested them through both ancestryDNA and FTDNA and also have them on Gedmatch.  All three companies have been accurate in their prediction of the relationship (or # of generations to common ancestral couple) between the kits and their matches.
by Darlene Athey-Hill G2G6 Pilot (540k points)
+2 votes
With atDNA other thing you want to look at is half relationships. At the third cousin level, you are going to match about 50% of the time.
by Barbara Shoff G2G6 Mach 2 (22.8k points)

Related questions

+11 votes
1 answer
+12 votes
3 answers
426 views asked Oct 22, 2020 in The Tree House by Shirlea Smith G2G6 Pilot (284k points)
+16 votes
4 answers
571 views asked Apr 27, 2018 in The Tree House by anonymous G2G6 Pilot (139k points)
+9 votes
4 answers
407 views asked Dec 24, 2015 in WikiTree Tech by Peter Roberts G2G6 Pilot (705k points)
+4 votes
2 answers
+8 votes
4 answers
2.2k views asked Sep 21, 2017 in Genealogy Help by Jay Stone G2G1 (1.6k points)
+15 votes
3 answers
+16 votes
3 answers
+16 votes
2 answers
186 views asked Nov 28, 2022 in The Tree House by M Ross G2G6 Pilot (731k points)

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...