Part 2
Gotcha #3
Obviously, if we can have thousands of mitochondria inside a single human cell, they have to be quite literally minuscule. And they are.
The DNA of our friendly little organelle, the mitochondrion, contains only 16,569 (give or take a deletion or insertion) base pairs, the nucleotides, or pairs of "letters" of DNA. As DNA goes, the whole thing is about 400 times too small to even register as a single segment using our current direct-to-consumer autosomal DNA testing technology. Included in that tiny molecule are a regulatory region and 37 genes that code for 13 polypeptides, 22 tRNAs, and two rRNAs. These account for over 80% of all the DNA base pairs in mtDNA, and mutations there often mean trouble to the host human cells.
Most markers useful for genealogy occur in the other 20% of the mtDNA, where mutations don't risk the viability of the organism. By comparison, the Y chromosome is about 57.2 million base pairs long and contains around 107 protein coding genes. The end result is that there isn't much room available for mtDNA to mutate.
That's a factor that makes mtDNA haplogroups look much more confusing than yDNA haplogroups. With yDNA, the single nucleotide polymorphisms, or SNPs, are pretty straightforward and hierarchical. Meaning that if you have, for example, the SNP BY3332 tested as positive, you'll also be positive for its parent, ZZ12_1, and its parent DF27...each is unique to its branch in the haplotree.
But it doesn't work that way for mtDNA. Our H4a1a1a haplogroup is defined by A73G! This means that we have guanine in position 73 instead of adenine, and the exclamation point designates a marker which is considered non-phylogenetic for the haplogroup; specifically, they are "back mutations" where a certain value expected within a main haplogroup is not present. So we're already pretty confusing, aren't we?
Being H4a1a1a means we're also H4a1a1, and that branch is defined by A10044G. Likewise, we're also H4a1a, defined by G8269A; and we're H4a1, defined by C14365T. The H4 branch is defined by A4024G and A14582G.
But wait. A73G! currently appears 12 different times in the mtDNA haplotree! It's also a defining mutation for H13a2b3, H17c, H1a, H1e2c, H32, and others. How is that possible? Unlike yDNA where the SNPs are (typically) unique to a given haplogroup and branch of the tree, with mtDNA--because there are far, far fewer possible mutations--the haplogroups are defined by the total aggregation of their mutations, not individual ones. For example, the A10044G that defines our H4a1a1, is also a defining variant of the haplogroup L3h1b...not even in the same basal "H" clade as you and me.
Gotcha #4
The biggest problem with mtDNA in general is that its germline mutation rate is so slow that its use as a positive form of evidence in genealogy is really, really tricky.
One quick (and admittedly fairly hyperbolic) example there is that the entire mtDNA haplotree contains (currently; this may change in a year or so) a total of 5,468 branches. So that's the total number of haplogroups that have been identified. We know that, for instance, our H4a1a1a haplogroup means that we would also show up as H4a1a1, H4a1a, H4a1, and so on. But let's ignore that for the moment and assume there are 5,468 distinct haplogroups in the world. The current global population is 8.03 billion people. If everything were neatly averaged out, that means the world has about 1.47 million people in every mtDNA haplogroup. That's roughly the population of San Antonio, Texas, or Ganzhou, China.
A lot of presumptive positive matches might be made where the DNA evidence, on its own, can't really support that. In the Group Projects I admin at FTDNA, my all-time leader in reported mtDNA full-sequence DNA matches (again, a distance of 0 through 3) has over 1,200 matches and gets at least a few new ones every month.
But with mtDNA, the reverse problem can happen, also: false negatives. Matches that really do exist but that you may never know about. This goes back to the matter of our not having a single mitogenome throughout our body cells, but multiple mtDNA "signatures."
Perhaps the best and most clearly explained example of this comes from noted genetic genealogist, Dr. Blaine Bettinger, in an April 2018 article where he explains why he and his biological mother don't show as being a match at FTDNA. That means the test results came back and he and she were at least a "genetic distance" of four or greater.
The one inaccuracy I'll note in the article is that Blaine writes, "We have something like 3 trillion cells in our body, almost all with a copy of our DNA." The number of cells in the body--depending on the size of the person and not counting bacteria, our microbiome--will range from around 30 trillion to 40 trillion based on work published in Molecular Biology of the Cell (Roy and Conroy, 2018); the authors arrived at an average of 37 trillion. And over 80% of those are red blood cells; platelets have mitochondria, but mature red blood cells have neither mitochondria nor nuclei.
The good thing about that false-negative was, being mother and son, Blaine had all the raw data so he could figure out what was going on (but he also recruited a cousin to help get to the bottom of it). It may be a bit in-depth, but it's a useful read for those diving into mtDNA, and it also shows the various notations reported when heteroplasmies are detected, plus the use of YSEQ in Germany as second-opinion testing in unusual cases.
The false-negative situation is very probably rare, but Blaine's example shows clearly that it exists. The difficulty this presents in genealogical research is that you may have a solid, paper-trail hypothesis about an in-common matrilineal ancestor but you and a cousin both get a full-sequence mtDNA test and...no match. Even as a form of negating evidence, if the documentary evidence is good, it may still require multiple test-takers in order to use mtDNA evidence effectively.