Have You Been Misled by AncestryDNA's "Cousin" Categories, or Blaine's Famous Chart?

+7 votes
2.5k views

This title is a somewhat rhetorical question - what I'm really asking is whether other WikiTree people have gotten into this enough that they can confirm or deny what I've been seeing in my own results.

AncestryDNA lists your DNA matches in decreasing order of centimorgans (cM), so right off I find the double categorization they do (Cousin levels & "Confidence Levels") to be a bit misleading. Every match in the "Extremely High", "Very High" and "High" confidence levels are at least 30cM, so as far as I know, they're all pretty certain to be a real match. The "Good" confidence levels go down to 16cM, so most - if not all - of THEM are real matches too.

AncestryDNA's various levels are basically fine, for the closer relatives. It can pick out a parent or full sibling "Immediate Family" with ease. I only have enough data from my own personal experience to discuss the "2nd Cousins" level and beyond, and that's where things get to be "off".

2ND COUSINS:

When I click on one of my matches in the "2nd Cousins" category, it tells me that the "Possible range: 2nd - 3rd cousins", but that's misleading. In my own case, I have two 1C1R and two 2C. When I look at my brother's results there is actually a 2C1R (who squeaks in at 213cM), but there's also a 1C (606cM). The rest of his are like mine. From what I'm seeing with my 3Cs, there's no way you're going to get a 3C in the 2C category.

So what "2nd Cousins" REALLY means is "2C or closer, with a small chance of a 2C1R sneaking in across the 200cM threshold". Almost all of the matches in this category will be 1C1R or 2C. But just as importantly - only about HALF of your 2C matches will actually be in this category - the others will be under "3rd Cousins".

Now, what does Blaine Bettinger say about 2nd cousins? He says you should see 46cM to 515cM, with an average of 233cM (or, if you look at his data, and use the center 90% of his results, it's 93cM to 390cM, with a median of 208cM). In my experience, they ran from 102cM to 326cM, with an average of 191cM. The thing is, he doesn't appear to throw away the endogamy-ridden data, that can be WAY off (so that's probably why he goes up to 515cM), and who knows if the fine folks reporting in really even know for sure what a 2C is.  Anyway, his average is about 20% higher than what I'm seeing, and his max value seems downright out-of-control.

3RD COUSINS:

Similar deal, but I have much more data of my own. It tells me that the "Possible range: 3rd - 4th cousins", but in actuality it's the bottom half of my 2C matches, the top half of my 2C1R matches, and a relative handful of actual 3Cs It really means "3C or closer - as close as 2C." Out of the 70 3Cs I've identified, only 10 were in the "3rd Cousins" category (46 were under "4th Cousins", the rest were "Distant") There could easily be more in the "4th Cousins" category, and especially in "Distant", since I haven't figured out who all of those are yet.

NOTE: For 2C1R and closer, the values do not go down to essentially zero, and a side-effect of that is that you will ALWAYS get a match. For 3C, I'm getting matches to about 85% of the 3Cs that I know are out there (so the real number has to be less).

My highest 3C value is 127cM, and the average is 47cM. Undoubtedly, that average will drop as I find more 3Cs buried with my "4th Cousins" and "Distant" categories. Blaine's chart says it can go up to 217cM, and that the average is 74cM. As the relation goes further out, it seems reasonable that his endogamy data is throwing things off more and more.

4TH COUSINS:

Mostly, these range from a few 2C1R to some 4C1R, but a few as distant as 6C1R. But I haven't figured out who they all are - not by a long shot - so there might be a lot more at the more distant relation level that I haven't found.

DISTANT COUSINS:

As low as 3C appear there. I have as remote as an 8C, too, but that hasn't really been verified rigorously.

SUMMARY:

It seems to me that one could get somewhat the wrong idea about how you might be related to someone, based on the cM of a match, using the small amount of info we have had at hand. I'd be interested to hear what others have seen for themselves!

in The Tree House by Living Stanley G2G6 Mach 9 (91.1k points)
I think it's no so much about being misled as it is about Ancestry being more "streamlined". It's like the Microsoft Windows of the DNA Genetics world--it's more intuitive, but less powerful ;)

For those that want to dig deeper (like WikiTree people!), "more intuitive" is less helpful, and yes, occasionally not very accurate.

I also suspect that Ancestry, like many other big corporations, is more interested in keeping its customers dependent on them for information. Empowering their customers could be a secondary concern. Some folks want to keep the market share in, others like to put the information out there. Note that GEDMatch is a volunteer effort, just like WikiTree hahaha.
I find Ancestry to be the most accurate. Of the 5 people I have listed as second cousins, two are 1st Cousins 1x removed, 1 second cousin and the other two I can't figure out. One because she has a cryptic name and hasn't responded to any of my messages. The other I just haven't been able to figure. Of the 53 third cousins, 2 are 3rd cousins and 1 is a 2nd cousin once removed. Being 100% European Jewish and understanding the endogamy that comes with it, I won't bother with the 12,132 Fourth Cousins and the 117,628 distant ones. By the way, my family tree now includes all 8 of my Great Grandparents and 8 of my 16 Great Great Grandparents.
Fully agree with your view and indeed one has to take the cM project with a bit more careful thinking as well.

8 Answers

+15 votes
 
Best answer
You always have to take these type of estimates with a grain of salt. I never felt deceived or misled, but knew that I couldn't depend on a single number, but had to consider the matches between several related people to form a hypothesis.

It is obvious that there is quite a bit of variability for more distant matches. The difficulty of guessing the relationship based on centimorgans increases as the relationship gets more distant. A parent-child match is unmistakeable. Almost the same for full sibling. From there things get murkier as you have to consider 1x removed and 1/2 cousin relationship possibilities and the general uncertainty of inheritance.

Right now, I only have one 2nd cousin match on Ancestry DNA. However my brother, (Full Sibling) has seven 2nd cousin matches. Those people all show up as 3rd cousin matches for me. It's just the luck of the draw. Whatever my true relationship is to these matches, my brother has exactly the same relationship. Whatever endogamy or lack thereof exists among my ancestors, it's exactly the same for him, since he has exactly the same ancestors I do.

If you are trying to come up with a single statistic for (say) 2nd cousin that works for everyone, you must include endogamous as well as non-endogamous families. The endogamous are people too. :)  So, for various reasons, you are going to have a fairly big standard-deviation around the mean.

There is a very interesting 2-D chart around that shows both cM and number-of-segments data for certain relationships. That inclusion of segments can help to distinguish certain close relationship, such as between grandparent and 1st cousin. But, it becomes less clear as relationships get more distant. Also, the chart doesn't include any half-relationships or 1x removed.
by Jamie Cox G2G6 Mach 1 (17.3k points)
selected by Living Stanley
I was first motived to pursue this because I was messaging someone, trying to figure out how a third match might be related (it would help prove this guy's story), and he said that the "third match" guy must be about a 3rd cousin - apparently since he's in the 3rd cousin category. Well, clearly third match guy is NOT a 3rd cousin (for reasons we could get into) but it was obvious that he was falling victim to the "misleading" I'm referring to.

I'm not suggesting that anybody is INTENTIONALLY misleading us, BTW - there's no reason why they would do that. Except on the "confidence level" thing - I think that's just for marketing, and therefore a bit manipulative.

But your own example is an excellent case study in how a more precise understanding of the reality could be helpful! If there's a bunch of matches that are in "2nd Cousins" for you, but "3rd Cousins" for your full-brother, my dataset tells me that (since the relationship can just as easily go one way as the other) that those are 2C (or equivalent). If they were 2C1R, they would rarely show in a 2C category. if they were 1C1R they would never show up in 3C category."Equivalent" relationships (in terms of shared DNA), to 2C, include: H1C1R, 1C2R, half-gtgt aunt/uncle/niece/nephew. How'd I do? If you didn't know the relation for some before, you do NOW!

I took a crack at trying to see what number of segments does for you, but didn't get anywhere. All I learned is that a segment averages about 15cM.

Endogamy is a Wild Card that can always bite you, but such data only poisons the statistics. If you're related to a 4C ways, the number could be outsized many times over - totally blowing both the average and the max, vs a typical case. You need to go on the presumption of no significant endogamy (unless the situation suggests otherwise) but be wary of that stealthy beast.

So, here's the actuals for the situation I described above.

"I only have one 2nd cousin match on Ancestry DNA. However my brother, (Full Sibling) has seven 2nd cousin matches"

DNA Matching Data for two brothers (Full Siblings sharing 2649 cM, 82 segments)

    Me        My Brother    Actual Relationship

A    316 cM    314 cM    2C
B    166        284        2C
C    168        281        2C
D    178        233        2C
E    94        216        2C
F    153        225        2C ?
G    138        204        2C1R

So, All of these matches are actually 2C except match G, which is a 2C1R.  There is a small doubt about match F, but likely 2C.  Match A shows up as 2C for me while the others are all shown as 3C for me. All of these are shown as 2C for my brother.

One thing I have learned is that the average cM of two or more siblings to a match is a much better indicator of the true relationship than a single measurement. If the match also has a sibling tested, then you can average  the four (or more) measurements for an even better estimate.

Thanks very much, Jamie, that's EXACTLY what I'm looking for - a real example with real data that really tests everything out! There several KEY points to make:

* Consider how - if you look at what AncestryDNA tells us about Jamie's matches that fall in the 3C category, it tells us "Possible range: 3rd - 4th cousins" If you believed THAT, you would have RULED OUT the possibility that they were 2C (if you didn't already know they were). You would have been MISLED!

* Because I looked at MY OWN numbers and analysis, which I know to be REAL, I was able to surmise that "B" thru "G" were likely 2C, and was RIGHT 5 out of 6 times! Without even seeing the numbers.

* What would we have gotten from Blaine's chart? We'd have concluded that "A" thru "G" could be 1C1R, 2C, OR 2C1R, although E & G could ALSO be 3C, but NOT 1C1R.

* If I only had the numbers, I would have guessed that (1) "A" was either a really good 2C match, or a poor 1C1R one. (2) "B" & "C" had to be 2C. 168cM is too low for a 1C1R, and 281cM is too high to be 2C2R. (3) I probably would have said that "D" & "F" are probably 2C, since 213cM is the highest 2C1R I have (and they're higher), but maybe if I had a bigger data set I might get a somewhat higher one, so I wouldn't be 100% sure.(4) "E" would be a tough call. On the one hand, 94cM is lower than my lowest 2C (102cM), but on the other hand, 216cM is higher than my highest 2C1R (213cM). I'd probably have given it a 50% chance of each. (5) "G" could have been either a low 2C or a high 2C1R.

* So AncestryDNA would have told you wrong on B through C, and Blaine gave you a less precise answer than I would have (because his ranges are too broad - from endogamy, and maybe data errors). Really, this 2C case isn't too bad vs Blaine, aside from the imprecision - I think the more distant relations would be worse. You might have looked at "E", averaged the values to get 155cM, and decided it had to be a 2C1R (average=123cM), but your average of 219cM isn't all that much lower than his 233cM (6% different)..

* To review my guesses: A: 1C1R/2C, B: 2C, C: 2C, D: 2C (probably), E: 2C/2C1R, F: 2C (probably), G: 2C/2C1R

vs Blaine's: A: 1C1R/2C/2C1R, B: 1C1R/2C/2C1R, C: 1C1R/2C/2C1R, D: 1C1R/2C/2C1R, E:  2C/2C1R/3C, F: 1C1R/2C/2C1R, G: 2C/2C1R/3C

* ANOTHER key point to revisit is that of using more than one sibling's results to get a better answer. I was really surprised at how my own results varied so much vs. my brother's, for the exact same people. I expected some sort of correlation, but if anything, when my value is high, his goes low. But while AVERAGING is the obvious thing to do, I would suggest that it isn't the average that's important, but the RANGE. I can tell - from MY data - that "B" and "C" are 2C because it's the SAME relation, AND yours is too low for 1C1R, AND his are too high for 2C1R.

So, in summary, we see what I'm complaining about. AncestryDNA can be just plain wrong, with its categories, and the ranges on Blaine's chart are too broad, probably due to endogamy (and maybe erroroneous data). It's not "wrong", but it doesn't reflect reality as well as it should, and that might have led you astray.

+9 votes
While Ancestry circles are a guide I would not put them into the same league as Blaine's tool.  I work with DNA relationships everyday and I can honestly say I am so glad we have Blaine's tool and shudder every time I have to start helping an adoptee with Ancestry circle data.

I am glad that Blaine's tool gives us the extremes.  Because working with the number of people we help and the kinds of communities some are coming from, it mirrors what we can see in small towns, rural communities, and other groups where endogamy can be present.  

Any tool is going to be only as good as the data feeding it.  So it is a good idea to question any extremes shown.  That doesn't make them inaccurate, just not as frequent.  

CMs are only part of the story.  You really need to get down to the chromosome level and SNP ranges to validate a match.  Neither Ancestry circles or Blaine's tool does that.  I look at Ancestry Circles as hints but not proof.  I look at Blaine's tool as a double check guide when trying to determine where someone might fall on a tree based on the DNA data.
by Laura Bozzay G2G6 Pilot (830k points)
I'm a fan of Blaine's chart myself - I, too, use it to see what the absolute max you would ever see might be. If Blaine says you'll never see higher than "X", you almost certainly won't. But my point is that - WITHOUT ENDOGAMY - you'll never see anywhere near THAT number, EITHER, and that could turn out to be important.

My main "beef" was when I put together a dozens of samples to submit (which I did) and I noticed that 90% of what I was submitting was below what he gives as the AVERAGE. One problem is that the distribution of values is skewed - so the MEAN, not the AVERAGE is really the number that would tell you what was more typical.

If you look at the chart for 3C, you'd get the idea that 74cM is about what you should expect to see for a 3C match, with about half being above, and half being below. But in reality, most (something like 85%) non-endogamic 3C results are BELOW 75cM and you should really expect values that are around 50cM (give or take a lot).

If you're working in a small town where you know endogamy can easily occur than really what you know is to pretty much ignore the chart altogether - the results you can see are essentially unlimited. It's a completely different case than non-endogomic, and mixing the two stats together just screws it up and weakens the utility of the tool.

I even see this chart on the ISOGG page, yet you have to admit - it's rather unscientific! You have no guarantee of the quality of the input data and you know darn well that the worse matches (3Cs buried in a sea of "Distant Cousin" matches, e.g.) are going to be underrepresented. STILL - without much in the way of OTHER options available, it represented (and still represents) a "MUCH better than nothing" option.

So what I'm doing here is bringing up the shortfalls, and soliciting other users who might have some real data handy to say if the numbers I'm quoting make more sense, and if they can confirm that my interpretations of the "Cousins" categories are what they've seen too.
Oh, and about "Ancestry circle data". You're not using the right term, are you?

I have had no problem, whatsoever with the DNA Circles (which an adoptee wouldn't even have access to).

In contrast, I was doing some analysis for an adoptee just last night, and used the fact that you will ALWAYS get a match to a 2C1R (or equivalent, or closer) to rule out one of the two likely branches of the family she is likely in. There are couple of people I know of that she'd be matching, if she was in that branch - so she is likely in the other one.
Probably using the wrong terms since I only go into Ancestry under duress.

I prefer to work with the data uploaded to GedMatch or FTDNA where I can see the chromosome and SNP ranges to validate overlaps for triangulation.
I hear ya, Laura! It's almost cruel to bring that up - imagine how much more I could do with this stuff if AncestryDNA would "show us the chromosomes!" I read that even having access to the cM is a fairly recent feature (only about 2 years old), and even THAT is practically HIDDEN. All I've heard about chromosomes is "Don't hold your breath - it's not even a gleam in their software developer's eye." Well, I thought I'd never see the Berlin Wall fall either, so who knows...

I assume AncestryDNA is unavoidable for dealing with adoptees - it likely has the biggest dataset to match off of, doesn't it? cM can be a potent tool to use - especially if you get as much out of it as is possible.

You know, ANOTHER problem with Blaine's chart, now that you mention different testing platforms, is that the numbers come up somewhat different, for different companies. I've actually used Blaine's data (from his PDF file you can download) to make my own versions. I have one for each testing site, list the 5%->95% numbers (to try to throw away most outliers), give the median (not mean), and arrange the cells differently (it makes no sense to throw all the "halfs" off to one side). It seems to match my own data a lot better.

I recommend checking out the discussion after Jamie Cox's answer. Some real data (from Jamie) is presented, and it gives an excellent example of exactly what I'm talking about.
DNA has to be matched to a tree.  The possible CMs are guidelines not hard and fast rules.   I think most of the variance is coming from recombant DNA not endogamy.  Endogmay happens but at a lower frequency level than recombant DNA variance.  So, once again, I understand your match question and the math I just don't think it is that big of an issue.  

I have in myself an extreme example of recombant DNA variance not at all based on endogamy.

My paper trail genealogy is 25% Scottish to Canada in 1892 to the US in the 1920s, 12.5% French (Alsace Lorraine area since 1600s and prior to that what could be German, Austrian, Swiss or maybe even Italian), 12.5% Swiss to Hesse to the US in the 1800s, the rest is some mix of Prussian or Hannoverian Northern what today is Germany and Southern Black Forest BW what today is Germany.  All of those coming to the US in the 1800s.  

DNA comes back through a multitude of ethnic admixture tools:

in 2016 showed 45% British Isles, 26% Scandinavian, 21% Southern Europe, 6% Finland and Southern Siberia,  2% Asia Minor.  

by later 2017 these were reworked and now show 57% British Isles, 36% West and Central Europe, 6% Iberia and less than 1% trace Finland.  

So I have double the amount of DNA% showing for specific branch than theoretically I should.  Yet Blaine's tool worked for this line just as it did for may other branches.  

Ok, so I have matches with people on both sides of my family.

On my Dad's side (this would be the British Isles) I match with my 4th cousin at 28.89cm on chromosome 2 with SNP range of 7579 shared overlapping CMs.    This falls well within Blaine's tool.   I tested at FTDNA.

On my Mom's side, she has 50% Southern German from BW area and 50% Hannoverian.  A cousin and I share 11 chromosome segments on 3 (2 different segments), 4, 5, 7, 10, 11, 14, 19, and X. 23 longest segment and 55 cm total.  She is my 4th cousin on paper and once again the CMs fall within Blaine's tool.  This is repeated with another branch that has a total of 51cm and falls within Blaine's tool.  

So I do not see the issue in my own DNA and I have done similar studies for multiple adoptees and have not had issues using Blaine's tool.
Laura, I don't think you're understanding me. First of all, this has absolutely nothing to do with admixture (which is something of a fool's errand, as far as I can tell). A serious genealogical discussion that involves DNA should not include anything about admixture, so I have nothing to say about that part of your comment.

Second, Blaine's chart actually represents the "hard and fast rules" that you say "don't exist". He includes practically all possible variation, including errors in the data. You should virtually never see numbers outside of what he has. In that sense he's "not wrong", and I'm not saying he is. I'm saying his stated averages lead you to expect to see generally bigger values than you typically see, and that the range of values is overly broad for the typical non-endogamy case, which could lead you to consider relationships for the match that are very unlikely.

The example I go through in answering Jamie's question describes pretty well how you miss out on accuracy when using data polluted by endogamy.

Your isolated data point of 28.89cM for a 4C doesn't really tell us anything. In my own data set, the values go up to 67cM, so that's within MY range too. In fact, it's very close to my average, which is about 28cM. Blaine says the average is 35cM, and can go as high as 127cM.The endogamy apparently just about doubles the size of the window.

Your 55cM and 51cm cases for 4C also fall within the window I'm seeing, so they actually support my case, if anything. None of the 3 values you gave are in the extra 68cM-127cM range that Blaine says can also happen for 4C.

So if you has a relative with 100cM, you might think that 4C is a possibility, using Blaine's chart, when there's really little chance of that, outside of endogamy - THAT'S how it's misleading.

I only have a couple of dozen data points for 4C, so the right number might be a bit higher than the 67cM I've seen so far, but probably nowhere near Blaine's 127cM. That's part of what I'm asking here - what non-endogamic numbers are people seeing?

Again, it's not that you've done anything "wrong", but that you're tying one hand behind your back. You could be getting more out of your cM numbers. Again, my example in replying to the Jamie's question illustrates this well.
My understanding is that Blaine's chart is a direct result of his polls.  He did not pull the figures out of his hat.  Also, when you read what he says in variious posts or lectures is that that is all it is and therefore that is why he has such wide averages.  I never have accepted that the figures are just that.  But it is a tool worth using, which I do.  One thing I will say though is that I wish you would not use the term "misled".  I believe there must be better terms to use.  To me, misled insinuates giving incorrect information on purpose.
+1 vote

I believe Blaine's chart is based on mathematical possibilities, not projects or case studies.  I went back and looked and I was wrong. It is based on self-reported data which he admits could have errors. He also talks about endogamy. Question: What are the mathematical possibilities, not probabilities?

by Edie Kohutek G2G6 Mach 9 (97.9k points)
edited by Edie Kohutek
All I've seen, myself, is some sort of theoretical numbers that aren't even used anymore because they don't match reality well enough. Even those were just mean values, without a prediction of the range of values. You'd think there would be some kind of model that would give some decent guidance.

As for endogamy, you almost wonder how that is even reported. If you're both a 2C and a 4C to somebody, would you report the cM for both 2C AND 4C? neither one actually makes any sense - your relationship is actually 2-dimensional. Relationship vs cM is NOT a one-to-one correspondence! It's a special case that doesn't warrant all the trouble it would take to properly account for everything. It's best to just throw that data out, and proceed assuming non-endogamy unless and until it becomes unavoidable that it's not. Throwing special case data in with the rest - screwing up the stats, and helping neither case - is madness. The stats are too broad for non-endogamy cases, and not necessarily helpful in endogamy ones.

It speaks to the newness of the field that something an unscientific as it is is used with such unquestioned reverence and devotion, and hasn't been supplanted yet. it was certainly a leap forward, but it's got issues. It even lumps all testing companies into one table, when their results vary, due to the difference in algorithms they use!
Wouldn't you report endogamy if you or one of your ancestors has DNA matching both maternal and paternal side? And isn't that why people do DNA painting? You have 2 strands of DNA at the same location: one matches the paternal dide and the other matches the maternal side. That is why one has to be careful when trying to triangulate.

BTW, Blaine does not state in his chart that these are hard and fast rules; he states that the numbers reflect the data he has gathered.
Edie, I'd have a DIFFERENT kind of report to make, in my DNA DIDN'T match BOTH my parents...  :)

But endogamy really just means the two being matched are related in two or more ways. It doesn't have to be one on the mom's side & one on the dad's side, and it really doesn't have much, if anything, to do with DNA painting.
+1 vote
When comparing siblings, you also have to remember that your full sibling  may get the 50% of DNA from their parents that you didnt get - so that some of your DNA matches (and CM Shared) naturally will be different.

Just thought I would throw that in and see if that skews the numbers. LOL
by Robynne Lozier G2G Astronaut (1.3m points)
Theoretically, that could happen, but only about 10 billion people have ever lived, so the odds that exactly that ever happened is undoubted something in excess of one in a quadrillion, to take a wild guess.

This does bring attention to some basic probability and statistics that are involved, however. Since most people have never had a single class in that, maybe it's worth a quick mention. When you say that the cM values for a cousin level vary between two values, there's ALWAYS going to be some small probability of a true result coming in outside those values. Ideally, you select that small probability (for example, 1%) and that gives you the numbers. When I use Blaine's data sheets and allow for 5% on top and bottom, I get a much more reasonable result than the values on his chart, which use 1%. Basically, that means he's using including outliers in his published intervals that are probably either bad data or the result of endogamy.
+2 votes
When I finally got some "Proven" cousins to do DNA, I realized all of the dead ends that I had been searching.   One third cousin and I have 380 cMs in common, (brothers married sisters)  and most of my first  and second cousins all have over 200 cMs in common.   Chasing anyone with less than 100 cMs in common is now off my radar.
by Robin Lee G2G6 Pilot (860k points)
When brothers marry sisters, their kids are called a "double cousin", and you might expect double the cM vs normal (I'm not sure what the exact multiplier is). But you're quoting roughly triple the highest value for 3C that I'm seeing in my own data, so maybe there's even more endogamy going on there.

I'm not sure exactly what you're saying for some of this, but about half my 2C results are over 200cM, and a 1C is generally at least 600cM, so those aren't unusual at all. The endogamy you cite is only on 1/8 of your tree, so I'm not really seeing what the problem is with your matches less than 100cM. You can often tell if a match is on the effected side of your family by looking at the shared matches for them.
+1 vote
Misled? I don't think I have. One of my first cousins has her test up on Ancestry and we're clearly a match. We share the same grandfather. That connection was obvious. Going a little further back I have a paper trail that connects me to a third cousin who has an account here. I forget how many centimorgans are between me and her. We still connected before I had a test done since her ancestor, Rocco, was my grandmother's uncle.

Another way to check to see if connections are legit is to upload to all of the places. I took my DNA and put it up on Gedmatch, FTDNA, Myheritage and 23andme. From there I saw the same people popping up as they did on Ancestry. That was really cool and it confirmed a few things. So, if you ever have doubts just upload the DNA elsewhere.

I have a few cousin finding stories of my own and in one case I had doubts until I compared the match's DNA with that of me, my dad and my great-aunt. The numbers got bigger and bigger by the time I got to my great-aunt. It was in the 1000s when I got to her. I then asked professional genealogists what the story was and they helped me figure things out.

Another cousin who is a fourth cousin needed my help finding our connection. We found our paper trail and that was it.

As what was said here, the cMs are a guide. What you need to do to cement the guide is find a paper trail leading to the match be it censuses, births, marriages, deaths etc. That there is the smoking gun. It helps that the connection has data. It also helps if people's families remember things, too.

The point that I am trying to make is that the DNA and centimorgans don't lie. I think lower than 10 is a false match from what I hear. Just go along with the cMs and combine it with family history and of course the paper trial. In my case, people who have been 22 + cMs have been provable thanks to paper trails etc. It might be the same for you. I don't know.

Work with the match to see what the connection is. Keep in mind that it may involve talking to people in other countries.

I have no clue what "Blaine" is. But, whatever. Compare and contrast with other tests. I uploaded elsewhere and found the same people as matches again and again. So...if it looks like a duck, quacks like a duck....it's a duck.

I guess maybe I am fortunate in my case since many members of my family already had DNA tests up by the time I put mine on Ancestry. And that, I think, helped a lot because of the shared matches there. Also, the last name search is a godsend. It helps. It really does!

Good luck!
by Chris Ferraiolo G2G6 Pilot (764k points)
edited by Chris Ferraiolo
Chris, Blaine Bettinger is the author of the blog, The Genetic Genealogist, and his work is often cited in articles produced by the International Society of Genetic Genealogy Wiki. See https://isogg.org/wiki/Autosomal_DNA_statistics and https://thegeneticgenealogist.com for more info re his project.
Neat. I'll check that out. Thanks! =D
Chris, I'm afraid you have completely missed what is being talked about here, and written an emotional response to my legitimate but provocative question. There are two specific areas I'm talking about:

(1) AncestryDNA assigning your matches to various "Cousins" categories, giving you a false impression. We go through an excellent example that Jamie Cox brought up in a question that should make this very clear. In that case, there were a number of cases cited where Ancestry assured you that it was only possible for the match to be between a 3C and 4C, when it was in fact for a 2C. In your own cases you cite, you don't even cite what categories AncestryDNA put your matches in, vs. what they really are, and I specifically stated that I wasn't really talking about closer relatives, like 1C or grandfather.

(2) A more subtle point about Blaine's chart, which people (like Edie) who have been on here talking about DNA eventually run across. Since you never heard of it, you can't really weigh in on whether you've been misled by it.

Most of what you seem to be talking about is a "given" for the discussion, not something in dispute. If you cited some cM numbers, actual relationships, and the AncestryDNA categories they were assigned to, some of it might be relevant.
Okay. Sorry I missed the point. The question to be honest was kind of vague. No worries, though. Just thought I would share my own experience with it. That's all.
Don't worry, Chris, the question is very esoteric and not all that helpful. I find Blaine's charts helpful and instructive. You might take a look at them. When I'm looking for relationships, I tend to look at the mean numbers and don't worry about the extremes. This is definitely not a science set in stone, but is definitely evolving.
Definitely, Edie. It's changing every day as the science improves. The proof is in what happened a while back when the admixtures changed on Ancestry. Mine didn't budge much. If at all. Might be due to the sample size. Most likely is. I took the same DNA to Myheritage and 23andme and got a little bit different results. Same matches by and large, though.

I was actually told that AncestryDNA was best for people like me who have Italian ancestry. Turned out it was. =)

As for the question, I'm not worried about it. I wanted to offer my take on the situation. Is my situation unique? Maybe. I just think it's wise to maybe combine the DNA results, paper trails and family knowledge to see what you get. That's how you can find out the truth. If they mesh, I mean.

Not sure what Blaine is all about, but, I may look into it. Thanks for filling me in, earlier. =D
Edie, it's only as "esoteric" as the use of centimorgans itself. If you find Blaine's charts "helpful and instructive" then you would find a better understanding of what cM you should really expect to see even more so.

I have literally demonstrated exactly that, in the discussion after Jamie Cox's answer here. I don't know how I can make it any plainer and clearer than a real-life, concrete example where my better understanding than what Blaine's chart tells us was able to discern a 2C relationship level EXACTLY, while Blaine's chart left us with three possibilities to pick from.

As I explained, Blaine's averages are perhaps the worst part! The distributions are asymmetric, and just a few off-the-wall high values pull those averages up beyond reasonable values. The vast majority of cases fall below those average values - they're basically useless, if not counterproductive. In other words, they're "misleading".

 

Chris, admixtures are the junk science of genetic genealogy, and always will be. It's not relevant to the topic at hand, at all. As to cM of DNA, unfortunately you never really told us enough about your situation to even have a discussion relating to the topic.
Sorry I'm so late to this conversation!

I'll just note that if you aren't using the Shared cM Project histograms, you aren't taking full advantage of the data. This data is linked everywhere the project is provided (including in the main image), but unfortunately people don't utilize them.

Here's a direct link: https://thegeneticgenealogist.com/wp-content/uploads/2017/08/Shared_cM_Project_2017.pdf
Wow.

This post....So old now. XD In internet terms this is five years old.

Thanks for the link, Blaine. This was back before I knew about the cM project and way before I even joined your Facebook group. Nice to see ya here.

Thank you, Blaine. It's good to see so many people using data from the Shared cM Project...but usually it's from the chart presenting a visual overview, or via Jonny Perl's great adaptations on the DNA Painter site. The link to the PDF file is front-and-center on the Project update notification, but it seems few read it. And the histograms are vital to understanding how to interpret the numbers.

Just to note that Blaine is extremely clear about the data sampling and its use. He never pretends or implies that it is anything other than crowd-sourced information...the key, for me, is that like any other type of poll-gathered information, the larger and more diverse the dataset the better. The greater the volume of data, generally the more likely it is that the outliers will reveal themselves (so if you haven't contributed your own matches, ones about whom you're confident, please consider doing so). No one ever attempted to gather this type of information publicly before, so the more we help grow the volume of data, the better.

I also see frequent comments that the theoretical, mathematical average sharing amounts are meaningless and no longer relevant. IMHO, that is not true. Those theoretical averages are overly simplistic, but they do--and will continue to--provide an important baseline for autosomal inheritance. What you'll see in data from the Shared cM Project vs. theoretical averages is that the disparity is directly proportionate to the degrees of relationship: close relatives display little difference; distant relatives display differences that might be factors of magnitude...which only continues to illustrate that use of autosomal DNA as evidence for very distant cousinships is complex and difficult at best, at worst tenuous and likely to be false.

0 votes
I don’t think I even read the confidence level. I don’t pay too much attention to the Ancestry cousin level either, other than just thinking that what I am really looking for is probably closer that they suggest. I go straight to the number of cM and use the Shared cM tool to get an indication of probable/possible relationships.
by Lynda Crackett G2G6 Pilot (671k points)
0 votes

Sorry for the late response, but it has taken me awhile to gather the data of what I have long suspected. And I found another 3rd cousin in the process!

I and a few cousins share our Ancestry DNA results and I have two groups descended from two couples, no known half-relationships nor endogamy.

All the cousins show DNA matches with 4th and 5th cousins through the male lines, so we are confident the attribution of paternity is correct at the g-g-grandparent level.

All the testers are at the same generation, so no once-removed comparisons. In each group there are 3 great-grandparents represented, and 6 testers, so a few are 2C and one pair is 1C. Two testers are in both groups.

In group A, the cM match values are: 

0, 14, 15, 15, 19, 22, 23, 30, 47, 108, >20,>20

In group B they are:

0, 0, 0, 11, 12, 16, 23, 43, 59, >20, >20

The values of  >20 I have not been able to get results from either of them, but they appear "in-common-with", which uses a 20cM cut-off.

Looking at the shared cM tables (in the Aug 2017 version), 3C median values are 64cM, or 63cM using non-endogamous data.  The GEDMatch median is 69, ftdna is 88 (because they throw in all the tiny fake matches), and Ancestry is 53.

So Ancestry is already lower than the others, but my median cannot be greater than 22 (depending on how much greater than 20 the unknowns are).

My hypothesis is that Ancestry have thrown out too many valid segments due to their pseudo-phasing process.  The only ones I have been able to check on gedmatch, are:

  • the 108cM match, which gedmatch says is 106cM, or 104cM if I used phased data; and
  • a zero match where GEDMatch says they share 9cM/1 segment

Why might Ancestry be doing this? well, it's all a secret, but my guess is that they used their own customers to provide the training samples, and since neither of my families went to North America, our dna is almost certainly under-represented.

One other factor, which I cannot recall if it has been mentioned already, is the self-reporting bias. People will be far less likely to discover and report a cousin with a  zero length match than one with a 100cM match. At least the way I have done the test will be somewhat less prone to this, but I have a suspicion there are still other low value 3C matches lurking in there that none of us has identified..

by Cameron Davidson G2G6 (7.5k points)

Related questions

+2 votes
1 answer
+5 votes
2 answers
+4 votes
2 answers
+12 votes
1 answer
+12 votes
3 answers

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...