Searching for cousins UK regarding anyone who has Dna J1c3please? thanks Diane.

Question

Searching for cousins UK regarding anyone who has Dna J1c3please? thanks Diane.

396 views

Hi my name is Diane Wan [nee] Heddon born 1944. Swansea Wales u UK I am trying to find the mate rnal line of my ancestors ; as far back as posible; my DNA is J1c3 I have all the percentages of past people from my blood line,DNA but have not found a person of whom can yet correspond with me regarding all these percentages and countries. Can someone see if they are [maybe related to me please? My names so far are Brock.- Moore Ellis., from Devon. I would like to follow the path well before this lineage, way back in time from a different country .best wishes to anyone who may be able to assist to get me further on a track of some sort Many thanks for any replies from Diane in Wales UK [ re. DEVON// ENGLAND

asked Sep 4, 2022 in Genealogy Help by Diane Heddon G2G1 (1.9k points)
edited Sep 19, 2022 by Diane Heddon

4 Answers

Answer 1 · 2022-09-06T18:25:50+0000

Hi, Diane. This is just a note that J1c3 is a very old mitochondrial haplogroup. The current Family Tree DNA mtDNA haplotree shows that it has 12 distinct subbranches below it. These would represent newer splits, chronologically, resulting from newer mutations.

J1c3 and its subbranches have been found in ancient remains all throughout Europe, from Neolithic France to the Early Bronze Age British Isles to the Russian Steppes and into Siberia. One paper (Behar, et al., 2012) estimated the time J1c3 first appeared to be within a range of 6,600 and 11,000 years ago. An estimate for its parent haplogroup, J1c, is that it first appeared about 16,500 years ago.

The top-level haplogroup J does show its highest European concentrations today as being in Cornwall (20%) and Wales (15%). But that also reinforces that the haplogroup J1c3 by itself won't give you adequate information on which to base an assumption of genealogical relationship. Someone else who carries J1c3 might not have a common ancestor with you for thousands of years.

Taking a full-sequencing mtDNA test at Family Tree DNA would give you more information, i.e., which subbranch of J1c3 your mitochondrial DNA is plus the capability to match other test takers, but whether it's worth the expense is up to you. Even with a full-sequence test, exact matches might still be separated by as many as 70 generations; at a 50/50 probability of matching, it's around 31 or 32 generations (Andersen and Balding, 2018).

If you encounter someone who does not have "J1c3" at the start of their haplogroup, you can definitely rule out a matrilineal relationship. But even an exact mtDNA match can span too many generations to be very useful as a form of positive evidence without applying a great deal of genealogical research and analysis along with it.

As others have noted, your autosomal DNA is the best place to start with DNA for genealogy. That can give you a lot of valuable information as far back as your 3rd and possibly 4th great-grandparents. And the ability to compare both autosomal DNA and your J1c3 results to the matrilineal connections you find gives an extra level of confidence about the evidence.

It's just that mtDNA is more difficult to work with accurately for genealogy than it may seem. It's a remarkably tiny DNA molecule with very little room to mutate. That's why it isn't very diverse in the world's population. We're expected to reach a global population of 8 billion before the end of this year. There are currently 5,468 mtDNA haplogroups cataloged; that number has been essentially unchanged since 2016. So that means for every mtDNA haplogroup there are over 1.46 million people. The actual ratio is even higher because most people will be identifiable in one of the deeper, newer haplogroups, like the subbranches of your J1c3, rather than the higher, older branches like just J1c or J1.

It's great information to have. I had my mtDNA first tested at a low resolution 20 years ago. Then upgraded the test, then upgraded it again to the full-sequence version. But mtDNA is better at helping you rule out who isn't a match than it is in offering positive evidence about who is a match.

Best of luck in the search!

answered Sep 6, 2022 by Edison Williams G2G6 Pilot (442k points)

"Thank you Edison for your normal succinct reply."

Almost spit out my coffee, Alan. Funniest thing I've seen all week.

In the spirit of brevity and concision...I'll give you a super-long reply.

Part 1

Interestingly, all microarray chips currently in use in our inexpensive autosomal testing include SNPs from the Y chromosome and mitochondria. Some companies, like Family Tree DNA, simply don't put those markers in the raw data files because they sell for-purpose testing separately. But they use the Illumina GSA v3 chip, and there are yDNA and mtDNA markers in there; available customization of the chip doesn't allow the default markers to be removed from the programmed microscopic probes, only additional ones added to them, if desired. Likewise, yDNA and mtDNA markers are in the Quest Diagnostics chip that Ancestry uses, as well; Ancestry just doesn't report on them.

But the uniparental yDNA and mtDNA make up only a fraction of our inherited lines, and that fraction decreases with each generation. On the typical Ahnentafel chart, yDNA is inherited only down the left-hand edge, and mtDNA down the right. At any given generation there is exactly one man from whom our yDNA could have originated, and likewise only one woman.

Estimates differ, but one oft-cited paper arrived at the approximation that, on average, we would have 190 3rd cousins, 940 4th cousins, 4,700 5th cousins...and the numbers get ridiculously staggering from there (as in 590,000 8th cousins).

So the permutations quickly become unwieldy when considering DNA cousins. But yDNA and mtDNA remains simple: every female in the patrilineal line (like a cousin who is descended from your great-grandfather's sister) breaks that yDNA inheritance chain, and every male in the matrilineal line breaks the mtDNA inheritance chain.

If we are investigating the patrilineal or matrilineal lines, then adding yDNA or mtDNA data into the mix gives us more data and the possibility of either corroborating or conflicting evidence, as the case may be.

In Diane's example, she's seeking matrilineal information...and as a plus it sounds like her family has been in Wales for a while. Beyond about 3rd cousins it can be difficult and uncertain work to use autosomal DNA evidence, and even at the 3rd cousin level there's roughly a 10% chance that any two 3rd cousins will show no autosomal DNA sharing at all; at 4th cousins about 52%, and only one in six 5th cousins will carry matching autosomal DNA. As for mtDNA information alone, even full-sequencing exact matches can't provide reliable evidence--by itself--of a genetic connection within the past, say, 40 to 70 generations.

But if the paper trail solidly indicates a matrilineal line connection between two 4th or 5th cousins who have only weak autosomal evidence (e.g., reported matching at less than 20cM) but who also have the same J1c3 mtDNA haplogroup (or preferably a full-sequence exact match), then the three factors--genealogical documentary information, autosomal DNA information, and mtDNA information--can be weighed together, increasing the strength of an evidentiary conclusion.

Since we already know I have never been able to spell "succinct," I'll haul out my soapbox for a moment. By the way, I much prefer the elegant term "multiloquent" over "can just never shut up."

For me, a big flaw in the way I see genetic genealogy often used is an assumption that, since it sounds like science, looks like science, and walks like science, then the results we see reported are chiseled in Grade 3 Italian granite. That it's always an either/or proposition, binary only. Either someone is a match at first glance, or not a match. We'd like it to be that simple...and the companies who make money from selling tests would also like us to view it as being that simple.

But the biology, and moreover the analysis and interpretation of it, is not simple. The complexity and nuance is absolutely fascinating. We see research papers published almost weekly that contain new insights. And, heck, a fact surprising to some: it's only been 15 months since we first sequenced a complete human genome. Prior to that, almost 8% of the genome was invisible to us.

Net message: We can't treat genetics for genealogy the same way we would, say, blood-typing before a transfusion. It isn't, "Yep; you're type A+ and you're type B+." It's only straightforward for the closest relationships; past full siblings it quickly begins to get complicated...in fact, even past parent/child if you're using the autosomal STR testing employed in DNA forensics.

To analyze and evaluate DNA requires a skillset different than that used for the documentary evidence with which traditional genealogy is accustomed. And the word "prove" and its variants should be rejected when dealing with a physical science or life science like genetics.

It's all about quality of evidence, and in that way genetic genealogy aligns with the Genealogical Proof Standard. We scrutinize information and decide which portions of it are eligible for promotion to the category of evidence. Then we have to analyze all the pieces of evidence individually as well as in their combined effect. We weight the evidence, both what we have and what we're missing. We reconcile any conflicting or missing evidence. Then we can come to a rational, and supportable, conclusion.

Just a sec. I'm having a bit of trouble getting the soapbox back under the desk...

As for next steps in genetic genealogy from the testing/reporting side, I believe there are several. First and foremost, for autosomal DNA we're still using the same, low-resolution microarray technology that we've been using since the first direct-to-consumer tests were offered in 2008. The targeted markers have shifted (and in some instances the number has significantly decreased rather than increased; e.g., the 23andMe v3 test targeted ~959,000 SNPs while the current v5 uses fewer than 640,000), and lab techniques and interpretive algorithms have been refined. But overall coverage has not improved in those 14 years.

Similarly, we're still anchored to the human genome reference assembly (think of it as a reference map) called GRCh37. This major release was published in 2009 and its final iteration, GRCh37.p13, was released in June 2013. The subsequent version, GRCh38, is now in its 14th patch release, GRCh38.p14, published May of this year; of some note is that this iteration was postponed almost two years. The GRCh38 major release corrected a number of mistakes and omissions in its predecessor. The testing companies haven't moved to it because of the expense to retool all the associated IT infrastructure. The lab equipment and processes wouldn't change.

This reference genome is important to genealogists because whatever reference assembly is in use becomes the foundation that identifies exact positions along a chromosome and, therefore, the basis of the calculation of centiMorgans.

Add to that: by the dates and data, even GRCh38 should be approaching end-of-life. I'll comment more about reference assemblies in general in a moment, but that we may be taking a new direction is hinted at by this statement on the Genome Reference Consortium's overview webpage:

"We will continue to make these updates publicly available in the form of patch releases, but have decided to indefinitely postpone the next coordinate changing assembly release [meaning a major release al la GRCh39] while we evaluate new models and sequence content for the human reference."

That state of our reference assemblies further inhibits interest by testing/reporting companies in expending the money and resources to change from the decade-old reference we currently use to the current one. Relatively simple technology exists (algorithms collectively referred to as doing a "liftover") that can migrate the existing GRCh37 data to GRCh38.p14 (even though there will be a tiny portion of it lost due to the inaccuracies in Build 37) so that our cM calculations would be a bit more accurate. The "bit more accurate" is key because it would still entail expense to do so: not only would the 38 million or more existing sets of test data need to be "lifted over," but all future test analyses would have to be retooled to use the newer reference assembly and the rather complex methods of genotype imputation--which all the testing companies use in one form or another--would have to be modified.

An aside, I've run some trials against GEDmatch data, which does provide a matching option for either Build 37 or 38 and uses no imputation, and my take is that they aren't actually doing a full liftover and storage of the data. It looks as if they're attempting to report segment start/stop positions as translated to Build 38, but are still running cM calculations solely against the Build 37 data. I haven't investigated this thoroughly, so consider that as just an unsubstantiated observation.

commented Sep 8, 2022 by Edison Williams G2G6 Pilot (442k points)

Part 2

Moving into the future, I believe we need to do two important things, the first one critical. That's expanding the number markers tested. With some of our current and historic tests looking at as few as 577,000 SNPs, and with some tests examining fewer than 20% of the same SNPs, our ability to compare one set of test results to another comes with a whole boatload of assumptions and guesstimates. Too, up to 18% of the SNPs in any given test are there because they provide some reference to our protein coding genes. Expending almost one-fifth of an already low marker count to test items of almost solely medical/pharmacological interest doesn't help us much for genealogy.

Illumina, the largest microarray manufacturer, for several years has had higher density chip options available. For example, there's the Infinium Omni5-4 that targets over 4.28 million SNPs selected from the International HapMap and 1000 Genomes Projects. Same microarray technology that uses the same Illumina iScan systems which many, if not most, of the testing companies already use.

The deterrent? Cost. The chips are more expensive; individual chips can handle fewer samples at one time so it decreases the throughput; and in terms of genealogy matching to all the existing kits that have been tested, it means a conundrum because it would be more like an apples to, well, pears comparison (at least in the Order Rosales, so not in a different Order like oranges in Sapindales).

And the new tests with 4.28 million SNPs would display far, far fewer total matches than the low-density tests. That's due to a simple reason: the greater the number of appropriate data points examined, the fewer the number of false positive matches. At GEDmatch--and the fact that they don't use imputation affects this--I've informally found that, using data derived from whole genome sequencing and using almost 2.1 million unique SNPs (about half of what the newer Illumina chips would provide) and comparing that to 11 different sets of results from individual major company microarray tests, at segments of greater than or equal to 20cM, around 14% of the reported matches are probable false positives. At greater than or equal to 10cM, that number jumps dramatically to around 210% probable false positives...meaning that about one in every two small segments shown at GEDmatch for a single microarray test's results is likely wrong.

The combined increase in consumer pricing plus a significant decrease in the number of reported matches would make a very tough sell. To those who don't understand the details, it would look like they're paying significantly more to get significantly less. Not a great marketing strategy from a revenue perspective.

That decrease in the number of matches, though, is precisely what the casual genetic genealogist needs. For segments at or above approximately 30cM, everything would be almost identical with the results we see today. The low-density microarrays are pretty solid with individual segments of that size. But an explosion of false positives can happen--depending upon how the testing/reporting company handles the data--below 20cM.

Back in 2012 that International HapMap Project (now discontinued) had cataloged approximately 10 million unique human SNPs. SNP, of course, isn't synonymous with base pair, nucleotide, or allele. To be a SNP (or SNV, for that matter) the polymorphism, the mutation, needs to be found in the global population, not just in a couple of individuals.

Today, there are 957.2 million human polymorphic variants (SNPs and SNVs) currently classified in the NIH's dbSNP database representing over 192,000 tested individuals. There are 107.2 million cataloged entries in total which includes microsatellites and small-scale insertions and deletions. We've cataloged almost 10 times the number of SNPs we knew about just a decade ago. Our current tests look at only about 0.02% of our genomes, and only about 0.07% of the currently cataloged SNPs.

What we really should do is move toward whole genome sequencing so we can eliminate all the estimations, assumptions, imputation, and inference about whether a given segment is really one continuous segment or is broken up into smaller segments by mismatching base pairs. That wouldn't eliminate all possible interpretation errors, specifically a common one resulting from what's called "haplotype switching," but it would minimize them and also give us accurate segment start and end points; today, those values as reported are, perforce, inaccurate. They seem quite precise when we see reports that offer exact numbers, but since our tests look at an average of only one nucleotide base pair out of every 4,700 the possibility for precision simply isn't there.

Moving to use of whole genome sequencing (WGS) would entail more expensive testing (though not nearly as expensive as it was just five years ago) plus an IT infrastructure that could handle such a massive amount of data. When you see a WGS test stating that it offers 30X coverage, a typical depth, it means that there is an attempt to scan each base pair up to an average of 30 times in order to more accurately piece together the actual chromosomal sequences. Each of these scans is called a "read," and they all are recorded. The data points alone, then, would number about 9.2x10^10, or 92 billion per genome sequenced.

However, there are ways to manage that--perhaps like a Burrows-Wheeler transform--where the initial comparison could be done with a much smaller dataset, and then the nucleotide-by-nucleotide comparison performed only on demand and only for a single, defined segment. We aren't there yet...but I digress.

The other major sea-change I believe we need is an adoption of a pangenomic approach to our genomic references. Some of the same team that accomplished the first full sequencing of a genome last year, the Telomere to Telomere Consortium, are also spearheading the call to move toward pangenomics. I won't bore your other sock off; it's easy to Google. But suffice to say that the vast majority of the results from the Human Genome Project used just one person's DNA when the project was declared completed in 2003. Our current GRCh38 reference genome is derived from just 19 people. Kind of shocking, really.

As we saw with the rapid increase in the number of cataloged SNPs, humans have more variability than we thought...and that hasn't yet extended to the heritable chromatin, epigenetic material that surrounds the DNA strands and exerts significant control over what is actuated and what is suppressed. We're coming to think of the DNA itself less as a blueprint, as did Watson and Crick and company, and more as an amalgam of structured raw materials. Some of it works just as it is, and some needs the detailed construction plans of epigenetics to shape and control how it works.

Many are beginning to feel that the concept of a single reference genome is approaching obsolescence. Genealogy should be impacted. For example, we already know that certain pile-up regions--fairly short chunks of DNA where far more people than should be expected statistically display as being identical--differ among continental-level populations. A pile-up region that displays in Western Europeans may not be there in Southeast Asians.

Likewise, our rough calculation of genetic distance, the centiMorgan, is based solely on that genomic reference map. It tries to estimate the likelihood, given two locations on the same chromosome, that a crossover, or recombination, will happen between those locations the next time meiosis creates gametes. But it's doing it over a single map that's trying to account for the variances among a population of 8 billion individuals.

We already know there are around 50,000 recombination "hotspots" that aren't taken into account in GRCh37, and that the cM calculation also doesn't do a stellar job of accounting for something during meiosis called "crossover interference." We also know that, in males, positions of likely crossover change as they age due to a process related to DNA methylation. Technically, centiMorgans should be calculated a bit differently for a male 22 years old than for the same male at 52. By the way, this phenomenon doesn't impact females because the oocytes, the forerunner of the egg cells, form and undergo most of the first of two stages of meiosis while the female herself is still a fetus. When she is born, all the egg cells she will ever produce have already undergone recombination, so the age at which she gives birth isn't affected the same way by DNA methylation.

I digressed way further.

Bottom line is that we genealogists often view and use the data and tools we have today as if we're working with solid, highly specific, long established and thoroughly tested science. We aren't. It's evolving rapidly. Very rapidly. The hybrid techniques that allowed for the first full genome sequencing in May 2021 were not available in 2020. Those specific techniques are not available for direct-to-consumer purchase yet, at any price.

To put it in terms of another technology, today we have home computers that would exceed the capabilities of what were mainframes not all that long ago. Comparatively, our use of genetics for genealogy is about at the point where home computers first came with a small, preinstalled hard drive and we no longer had to use a floppy disk to boot them. Our average microarray test totals around 650,000 points of data (complete coincidence, the IBM PC was introduced with 640K of RAM, random access memory). A 30X whole genome sequencing provides over 90 billion. We're using a single reference map of the human genome that was derived from only 19 individuals out of 8 billion living today.

Much is left to be done, and in the meantime we genealogists need to be critical of results that go too deep, that have us working beyond the cutting edge of the actual science.

commented Sep 8, 2022 by Edison Williams G2G6 Pilot (442k points)

Sorry about the coffee. Perhaps I should have said ‘ for our Education and Edification’ ?.

It will all take a while to get the gist of - understand is probably a step to far for me. For Part 1 I see my question is not well thought out, as mtDNA will not exclude any cousins, it will just confirm those in my direct maternal line. I had a look at that, which is interesting. I am one of three brothers and my only aunt had no children, so my HRV1 or HRV2 line has ended, with just 3 matches so far, none recognised. It’s confusing what those two terms mean or how long ago those mutations arose. On the maternal line I can go back 8 generations, but the relevant mutation will have been well before then, probably well before surnames became commonplace. Advice on this time interval would be appreciated. Further back, granny had 3 sisters; Ggranny had 9!; GGg4; GGGg had ?; GGGGg had 1; GGGGGg had ?, so it’s probably safe to say there is quite a bit of my mtDNA still hanging about and it does cut that 590,000 number down a bit.

Part 2 will just have to stay with you experts and I am likely to be long gone before it happens. Maybe my FTDNA or Ancestry samples will last long enough ?. What is hard to figure is how just one of my 3 billion base pairs, AT or CG, surrounded by like pairs on the same segment and every other segment of every other chromosome, finds it way to that precise point on the chip. I’m getting very nervous about that little silver rectangle on my credit card. Must remember not to lick it !.

PS. My mtDNA haplogroup is H1, when did it appear ?

PS. Thanks to Diane for inadvertantly hosting this discussion.

commented Sep 10, 2022 by Alan Upritchard G2G6 (6.4k points)
edited Sep 10, 2022 by Alan Upritchard

Part 1
(Honest. I don't intentionally run into the G2G maximum character limit all the time. It just...happens...)

Peter, I believe Rob's SNP Tracker may be working with overly optimistic data for that particular date. He lists Eupedia as one of the data sources, but Eupedia notes:

"Haplogroup H1 is by far the most common subclade in Europe, representing approximately half of the H lineages in Western Europe. Roostalu et al. (2006) estimate that H1 arose around 22,500 years ago."

The breadth of H1 prevalence across Europe would seem to support a much earlier coalescence date than 7,300 YBP. "The modern distributions of H1 and H3 support a Late Glacial recolonisation of Western and Central Europe from the Franco-Cantabrian refugium," the Last Glacial Maximum having ended circa 19,000 YBP. Even YFull estimates an older coalescence; their MTree shows 16,100 YBP.

I'm also uncertain where SNP Tracker is arriving at the H1 coalescence location being near Hamburg, Germany; that seems unlikely. The oldest remains that have been incontrovertibly tested as H1 came from the Languedoc-Roussillon region in the far south of France, at the border of Spain. Which aligns with H1 showing the highest concentration today in the populations of the Iberian peninsula. In keeping with that, Ottoni, et al. (2010) estimated the arrival time of H1 into Northern Africa from Iberia to have been 8,000 to 9,000 YBP where, today, the Tuareg of the Fezzan region in Libya show a 61% frequency of haplogroup H1. Pardiñas, et al. (2012) agrees with Maciamo Hay's Eupedia:

"...A strong geographical southwestern–northeastern cline in the frequencies of haplogroups with postglacial coalescence times, such as H1, H3 and V. Frequency peaks for those lineages can all be found in the northern Iberian Peninsula and the southwestern part of France, in a territory which formed a glacial refuge called the 'Franco-Cantabria'. Molecular dissection studies and intensive samplings have shown these clines to be consistent and compatible with a postglacial (formerly late-glacial) resettlement of most European regions, which would be centered in this refuge."

So it seems H1 may be significantly older and have originated near the coast of the Balearic Sea, close to Andorra, rather than near today's province of Schleswig-Holstein and the North Sea. Postglacial repopulation of Northern Europe would have followed the receding ice sheet, so a southwest to northeast migration seems logical (see also the early artistic artifacts at the Le Tuc d'Audoubert cave in the French Pyrenees near what is now the border of Spain), with H1 then being about twice as old as SNP Tracker indicates.

All a bit esoteric, I know, but it highlights that the mtDNA germline simply isn't very diverse in the world's population, that different sources can have widely divergent opinions regarding haplogroup ages and locales, and that the dates and lack of diversity involved make mtDNA, while of definite value in genealogy, not terribly relevant as a form of positive evidence if used by itself.

Alan, the hypervariable regions of mitochondrial DNA (HVR) are called that because there's more room to mutate there; ergo, mutations are more likely. The extraordinarily tiny mtDNA molecule contains only 16,569 base pairs...give or take one or two insertions or deletions. In contrast, the smallest human chromosome is Chr 21 and contains roughly 48 million base pairs. So the size of the entire mtDNA molecule would never show up even as a reported segment in our autosomal test results.

Included in that small molecule are a regulatory region and 37 genes that code for 13 polypeptides, 22 tRNAs, and two rRNAs. It's a busy and critically important little organelle. Those active areas account for over 80% of all the 16,569 base pairs, and mutations there often mean trouble to the host human cells. Trouble as in survivability...or lack thereof. This is what's at the root of the lack of diversity in mtDNA.

The two hypervariable regions are at the very beginning and very end of the circular shape of mtDNA. You'll find some studies that include a third hypervariable region, but most don't and FTDNA uses only two, so we'll stick with two. Counterintuitively, HVR1 is at the end of the physical DNA strand and HVR2 is at the start. Go figure.

But if the whole mtDNA molecule is tiny, then the hypervariable regions are...minuscule. HVR1 starts at position 16,024 and runs through the end of the DNA strand at position 16,569; it totals 546 base pairs. HVR2 starts at the physical beginning of the DNA strand at position 1 and goes through position 576. Together, HVR1 and HVR2 account for only 1,122 base pairs.

FTDNA will report no matches that are not completely identical in both the HVR1 and HVR2 regions (with one or two insertions/deletions that aren't included in the comparisons because they are way too common across many different haplogroups, positions 309 and 315 specifically). FTDNA will report to a genetic distance of 3 for the entire mitogenome, but many feel that is inaccurate and leads to a false sense of the "match." Dr. Ian Logan wrote three years ago that, at the time, there were 46,234 complete mtDNA sequences in the NCBI GenBank database, and he noted that only about one in 500 was recognizable as having anything at all distinguishing it from the others.

With mtDNA, the term "genetic distance" is perhaps even more misleading than it is when dealing with yDNA short tandem repeats (STRs). A GD of 1 simply means that one difference has been found at the same position in two different sets of mtDNA results; a GD of 2, two differences. There is no correlation between a difference at any specific position and the time to most recent common ancestor (TMRCA). But even a single deviation can infer scores of generations difference to an MRCA.

Above, I noted the 2018 paper by Anderson and Balding, "How Many Individuals Share a Mitochondrial Genome?" PLoS Genetics, 2018. That and other research (e.g., Toomas Kivisild, 2015; Rieux, et al., 2014; Henn, et al., 2009; Vicente Cabrera, 2021) indicate that, while the numbers are variable, they fairly consistently place the human germline mtDNA per-polymorphism mutation rate at somewhere between about 6x10^-7 (0.0000006) to about 2×10^–8 (0.00000002). If we split the difference, we get 5.8×10⁻⁷, or 0.00000058. With 16,569 total base pairs in the mtDNA molecule, that would represent an overall mutation rate of about 0.0096 per whole mitogenome, or 1 mutation every 104 generations.

commented Sep 11, 2022 by Edison Williams G2G6 Pilot (442k points)

Part 2

FTDNA itself writes (https://www.familytreedna.com/products/mt-dna):

"mtDNA mutates slowly which allows you to find out ancient information (such as your haplogroup), and will not help you learn about your more recent (within the past 200-500 years) origins. However, you can use mtDNA results with your personal genealogy research to contact your matches to find out more about where your common ancestors may be from."

As I mentioned, you can surmise from the mutation rates why many feel that FTDNA has long been fairly optimistic about effective genealogical matching with mtDNA. They have always reported only exact matching on HVR1 and/or HVR2, and they state that "...matching on HVR1 means that you have a 50% chance of sharing a common maternal ancestor within the last 52 generations (or about 1,300 years)." And: "Matching on both HVR1 and HVR2 means that you have a 50% chance of sharing a common maternal ancestor within the last 28 generations (or about 700 years)." They say, with an exact full-sequencing match, "that you have a 50% chance of sharing a common maternal ancestor within the last 5-16 generations (or about 125-400 years)." Of course, a 50% confidence interval is awfully low to start with for genealogy, but note that a 2014 chart FTDNA published was even more optimistic:

If the germline mitogenome average mutation rate is 1 in 104 generations, then the Anderson and Balding estimate of a "range up to around 1 per 70 generations" is probably reasonable. Which makes a 95% confidence interval of once every 22 generations seem, realistically, pretty far off.

FTDNA discontinued HVR1-only testing in May 2013, and sale of the combined HVR1/HVR2 testing stopped as of the end of 2019 (see this G2G topic). In the early days of selling the tests, they did not offer a haplogroup call at all for mtDNA. What they began to offer for the HVR1 and HVR2 results was an estimate only...albeit it an accurate one. The reason surprises some people: Many mtDNA haplogroups are defined by mutations that are in the coding region, not HVR1 or HVR2, and thus can't be tested without a full mtDNA sequencing.

Haplogroup H and some of its subclades are a good example. Haplogroup H is defined by G2706A and T7028C, that means a G (guanine) instead of an A (adenine) at position 2,706 and a T (thymine) instead of a C (cytosine) at position 7,028. If you recall Part One of today's, er, uh, lecture, the HVR areas cover only positions 1-576 and 16,024-16,569. Neither of the defining markers for the most common haplogroup in the world, H, are in HVR1 or HVR2. What about H1? It has one additional defining marker that's added to the two markers in H: G3010A. So it also falls outside of the HVR areas.

Alan, your H1 result from your HVR1/HVR2 is almost certainly correct. But it was never directly examined at FTDNA.

Because I've been testing for <mumblemumble> years, I'm one example of what this can mean for the matches at HVR1/HVR2 you see at FTDNA. I first took an HVR1 test; then later upgraded to include HVR2; and then several years ago upgraded again to a full sequence. Throughout the hypervariable region testing, all FTDNA could predict was that I was haplogroup H. They couldn't predict anything deeper.

I turned out to be H4a1a1a. The month before learning that, I had 2,411 exact HVR1 matches. Of those exact matches, 48.6% could not possibly be genealogically related to me because they were in neither the H-only or H4 haplogroups. Combining the HVR1 and HVR2 results was marginally better. At that level I had 504 exact matches, but 162 of them were still not in the correct H4 subclade; so 32% proved to be in no way genealogically related to me after my full-sequence data came in. As for the full-sequence results, today I have a grand total of 3 exact matches. Three out of 2,411 is an awfully small subset to narrow things down to maybe 70 generations or so...which is one reason that FTDNA discontinued offering HVR1/HVR2 testing.

Philip Ritter, PhD, of the Stanford University School of Medicine, has written: "...mtDNA is useful for long-term evolutionary studies. In general the most useful genealogical purpose for mtDNA is to try to solve specific puzzles or hypotheses about unproven relationships." I agree with that, and feel it's most valuable as a tool when used as negating evidence of a hypothetical relationship. If the paper trail shows two people to be of the same matrilineal line, then their mtDNA can't be markedly different...belonging to a different haplogroup certainly qualifies. But sometimes even comparing full-sequencing mtDNA data can be misleading without more rigorous analysis (see, for example: Blaine Bettinger, "Heteroplasmies and Poly-Cytosine Stretches – An mtDNA Case Study," April 2018).

Perhaps something that puts the lack of human mitochondrial DNA diversity into an even broader perspective comes from David Reich (department of genetics at Harvard Medical School, the Reich Lab, and an associate of the Broad Institute). In his recent book, Who We Are and How We Got Here: Ancient DNA and the New Science of the Human Past, he writes: "Looking at the entire mitochondrial genome of modern day humans and Neanderthals, they differ by only about 200 DNA letters, or nucleic acids."

The Neanderthals and modern humans diverged at least 360,000 to 470,000 years ago, yet our mtDNA molecules differ by only about 1.2%.

Edited: To clarify a minor issue with one set of numbers. Wasn't off much, but was off all the same. Math is hard!

commented Sep 11, 2022 by Edison Williams G2G6 Pilot (442k points)
edited Sep 15, 2022 by Edison Williams

Well, my lengthy bloviations just scream "bypass me," anyway. By conservative estimate, I've written at least a couple million words here on G2G. Genetics for genealogy is a particular interest of mine, and when available time and particular subject matter happens to converge...well, I just sorta play it where it lies. I type away and seldom bother to self-edit...which really is a bother to some.

I've also been accused of saying that mtDNA is useless for genealogy. I've not said that, and so now I go to special lengths and offer many (excess) words of qualification when I discuss the 7 quadrillion or so little mitochondria we each have working, replicating, and then shuffling off their mortal coils of inside of us.

It can be quite useful, but it takes planning and very specific hypotheses to put it to use. Of the types of direct-to-consumer DNA tests we have, it's the most difficult to use accurately as a contributing form of positive evidence. And even then, comparing full-sequence mtDNA test results is required. Simply using HVR1/HVR2 results doesn't cut it save for the rarest of haplogroups; in a common haplogroup, an exact HVR1/HVR2 match could mean a common ancestor back well before we have genealogical records.

However, even knowing one's haplogroup can be helpful when used as a form of negating evidence. Here I'll use up more words to note the difference between "negative evidence" and "negating evidence"...because there's often confusion.

Negative evidence refers to the lack of finding evidence when the evidence is expected to exist. An oft used example is the Sir Arthur Conan Doyle short story, "Silver Blaze." To wit:

Gregory (Scotland Yard detective): "Is there any other point to which you would wish to draw my attention?"

Holmes: "To the curious incident of the dog in the night-time."

Gregory: "The dog did nothing in the night-time."

Holmes: "That was the curious incident."

Sherlock went on to say: "I had grasped the significance of the silence of the dog, for one true inference invariably suggests others.... Obviously the midnight visitor was someone whom the dog knew well." The watchdog did not bark when it would have been expected to had it been a stranger who came to the stables.

Negative evidence is difficult to use effectively in genealogy. If one document indicates that Samuel Smith lived in Common County, USA, in early 1880, but Mr. Smith doesn't show up in the 1880 census as expected, we can't read much of anything into that solitary omission.

Negating evidence, on the other hand, would be a condition where we have a body of research showing, via a preponderance of evidence, that a hypothesis is false. If my paper trail shows I'm a matrilineal descendant of Sally Smith, born 1720, and I'm mtDNA tested with a haplogroup of, say, H1a3a1, if I find one other person who also believes themselves to be descended from Sally but they're haplogroup H3, then I have to be either supremely confident in the paper trails to make a call or, more likely, will need additional evidence. If I find three other matrilineal descendants of Sally and they all show as H3 or some subclade thereof, then it may be in my own line where a surprise lurks. But at that point I have enough information to conclude that the preponderance of evidence negates the hypothesis that we all are Sally Smith's matrilineal descendants.

In that vein of more information, I'll drop you a private message. It's not something I can generally offer; time constraints usually prevent that. But it looks like (via GEDmatch detail) that your AncestryDNA test may have been some iteration of their version 2. If it's version 1 we're out of luck, no mtDNA data would be included; and even if v2 it may not offer anything new. However, if it's a v2 test, somewhere between about 160 and 260 SNPs were examined in your mtDNA. Looking at those SNPs may--I reinforce it's just a may--tell us what subclade of H1 you are. My message will tell you how you can securely send me a copy of your AncestryDNA raw data file if you'd like me to take a look. No obligation; just an offer. I'd be able to get to it this weekend. Would be minimal information, but at least it's cheaper than a new test.

Edited: I keep saying we have "over a quadrillion mitochondria inside us." I wanted to make the scope a little more accurate: We have, on average, somewhere around 7 quadrillion mitochondria inside us, working and replicating all the time. But what's a few quadrillion among friends?

commented Sep 14, 2022 by Edison Williams G2G6 Pilot (442k points)
edited Sep 15, 2022 by Edison Williams

Categories

Searching for cousins UK regarding anyone who has Dna J1c3please? thanks Diane.

Please log in or register to add a comment.

Please log in or register to answer this question.

4 Answers

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Related questions