Haplotrees and STR's : how do they relate ?

+2 votes
140 views

On my wall I have an electron microscope image of 23 fuzzy X shaped things called chromosomes. Number 23 in a human is the male Y chromosome, I read somewhere about 1m long (wrong, probably less than 50mm) if you could both stretch it out and see it. Along this meter (50mm) an expert picks specific points to test : how do they pick the points and how many points are there ?. Somehow they come up with my Y-haplogroup tree; R_M269, also called R1b. In the appropriate Family Tree table, this breaks down into something in the order of 300 steps, a single example being R1b > U125 > L2 >Z376.

Another FT table shows my Y Short Tandem Repeat (STR) table. It features largely DNA_Y-chromosome_Segment (DYS) numbers, also a few CDY and Y-GATA numbers. Are these at the same locations on the chromosome as the R1b numbers and if YES, which DYS numbers relate to (say) Z376. In they are not related, are haplogroups and STR's related at all ? I have done Y-37 and Y-700 is due any day.

at-DNA is a bit easier to envision, as the FT table for that shows the other 22 chromosomes in dark blue with orange areas or stripes showing where I match with a selected person. Many of these chromosomes are much longer than Y and the matching strings can be very long, especially for close relatives. Bearing in mind there are millions of genes on a chromosome, how is it possible to make these matches so easily ?

Can at-DNA be used to determine which parent the match comes from ?. Matches related to parent-1 would logically be quite different to matches for parent-2. In Family Finder I have about 2250 matches including '5th to remote cousins' and it would seem to be quite easy for FT to define them as related to parent-1 or parent-2. It would be easy for me to determine which parent by checking the list for known relatives who have tested ? Could FT create two lists automatically rather than one as they do now ?

Clarification on any of these questions would be greatly appreciated, particularly if they expressed at "DNA for dummies" level.

asked in The Tree House by Alan Upritchard G2G1 (1.5k points)
edited ago by Alan Upritchard

4 Answers

+1 vote
Yes, please, a DNA explained for Dummies, a DNA 101 or even an Intro Course for DNA 101, 102. Something really really simple, please ... those haploids sound Finnish and steps makes me think of Stepes and Russia and there's cousins of R2D2 in there (shades of) ... just plain words nothing technical, please
answered by Susan Smith G2G6 Mach 2 (20.4k points)
+2 votes
Well they do say I think that all the Sykeses go back to a common ancestor, give or take.  Let's call him Adam Sykes, and let's suppose he had two sons, Bart and Colin.

Bart and Colin inherited Adam's yDNA, approximately, but never quite exactly.  If you could compare their full sequences, you'd find differences at a few random points.  The trick is to identify those points from the DNA of living testers by dividing them into 2 septs.  Doable but not trivial.

The two septs will them mutate independently, and you can repeat the trick to find the first division in the Bartines and the first division in the Colines.

But as you go down the tree, the amount of analysis multiplies, the number of testers needed multiplies, and the number of descendants interested in each finding divides.

Which is why you can't Google up the Sykes family tree as drawn by yDNA.  The technology exists, but the economics are impractical.

STRs are alternative DNA.  They're God's gift to genealogy - DNA as the testing companies would have designed it.  All families have their mutations at the same small set of locations, so you know where to look.

And the beauty of it is, although the SNPs and STRs are quite separate things and mutate independently, the SNP mutation tree and the STR mutation tree must both reflect the same underlying genealogy.

Which doesn't mean they'll be mirror images.  For instance, Bart and Colin might both have the same STRs, so the STR tree can't separate the Bartines and Colines.

But then Colin has two sons, Eddie and Freddie, and Freddie has an STR change.  So the STR tree can separate Bartines and Freddines.  At this point the careless analyst might think he's separating Bartines and Colines - but he isn't, because the Eddines are Colines, genealogically, but are still lumped with the Bartines, STR-wise.

So the STR tree will give a lumped view of the branching, but not lumped the way you'd like.  The step-changes in the numbers won't happen at quite the key genealogical points.  And the SNP tree, so long as it's incomplete, will also give a lumped view of the branching, but differently lumped.  Messy.

(There's massive scope for inferring wrong genealogy here.  That careless analyst might well decide that Eddie must have been the son of Bart.  It's very easy to turn those STR tables into little trees without realizing that the tree you've drawn isn't the only one you could draw.)
answered by RJ Horace G2G6 Pilot (433k points)

Thanks RJH. At Y-37 I can see this, in that my first cousin (NZ) and an unknown similar surname person from the same N.Ireland geographic location and separated by at least 3 generations are an exact match. A third cousin already has two mutations (GD-2). A similar surname match from Canada is GD-1. Two similar surname matches from Australia and USA and 2 brothers with an unrelated surname are GD-2. Some have tested at 67 and 111 and the Genetic Distance increases accordingly. We are trying to get everyove to join the same Family Tree project and develop a picture of what the links are - a co-ordinator with a lot more knowledge than me !.

+1 vote

Alan and Susan, in your plea for a simple explanation, I feel your pain. I started in genealogical DNA more than 10 years ago and I went through years of the same angst. The fact is, there are all sorts of attempted DNA for Dummies explanations and I have made presentations myself to try and help people. What I have found is that the learning curve of each person is a little different depending on what science one knows and what experience and goals one has. One important matter is understanding the differences between the types of tests. I made an attempt to explain one matter recently, the difference between Y-STR and Y-SNP. I'm afraid I haven't the time right now to offer specifics, but I'd be interested to know if my explanation HERE (third answer down) is of any help at all.

answered by Douglas Beezley G2G6 Mach 2 (21.4k points)
Sorry folks. Looks like I asked to many questions at the same time. Thanks to RJH and Douglas for their answers, but I think they are down the track a bit whilst Susan and I are still on the start line. Bear with us please.

Subsequent reading indicates the Y chromosome has about 100 genes averaging 590,000 bp (base pairs) and X is about 5 times the size with 1000 genes but averaging half the number of bp per gene (153,000 bp). There are 23 more autosomal chromosomes, some are much larger and will have many more genes than X. Is this adequate for high school genetics, or to far off the tracks ? As for diploid, twice as big as haploid, lets not go there yet as the A T G C are just mirror imaged as T A C G.

I have Y-111 results and before long Y-700. They can't be genes, as there are only 100 genes on Y. So what are they ? Something to do with base pairs ? Looks like 'yes' as DYS393 for me has 13 STR's but DYS464 has 5-16-17-17 STR's. If each hyphen shows an area not repeating in tandem (?), why are the 4 repeating segments not given different codes ?. Family Tree have got to Y-700; how much further can they go : there are 59m bp on Y and since a typical STR seems to be 10 to 30 bp, there must infinite room for more ?

Now the HAPLOGROUP problem. Are haplogroups identified by genes or STR's ?. Presumably the former, so there is no relationship between GENES and STRs. Can an STR overlap 2 adjacent genes or are they within a single gene ?

I did not introduce SNiPs (Single Nucleotide Polymorphism) above, but here they are. Can a SNP only occur within a STR ?. (as an irrelevant aside, why did SNP get called SNiP but STR not STiR ?).
Genealogical genetics is more abstract - it doesn't really need to know the high school stuff (who does).  It only needs to know there are hereditary mutations.

A gene is a sequence of base-pairs that can be read off by the cell machinery as the code to make a protein.  If genes mutate, they don't work right.  That's how they make fruit flies with legs growing out of their heads.  So genes don't mutate much.  Established variants exist where the difference happens to be non-fatal, but some of those are linked with hereditary diseases.

But a chromosome also contains a lot of non-coding DNA - old junk.  This mutates more freely, as mutations are harmless.  This is what they test for genealogy.

An SNP is a mutation of a single base-pair.  For practical purposes this is unique.  It happens in one person.  If it's on the Y chromosome, all his male-line descendants will have it.  And they'll have it in combination with all the other mutations which that one person had inherited.  You don't expect it to turn up somewhere else in the clan in a different combination.

An STR is a short section of junk that is prone to to get repeated, like the word "to" just then.  So where there was once TGACG, there might now be TGACGTGACG... repeated 17 times over.  Particular known locations are vulnerable to this, so you aren't looking for needles in haystacks as you are with SNPs.  The repeated sequence is unimportant, but the number of repeats is a parameter that mutates in a hereditary pattern.

Y-111 looks at 111 locations where STRs are found and tells you how many repeats you have.

Y-700 looks at a sample of 100K base-pairs hoping to hit on some SNPs that might be significant in your clan.

Haplogroups are defined by SNPs.  In fact they're practically synonymous with SNPs.

But STRs can predict haplogroups.  Basically they just say, we determined the haplogroup of a few other people with STR patterns very similar to yours, and this is what they were.  You'll be the same, because you will have inherited both the STRs and the haplogroup from the same common ancestor, whoever he was.
I did go back to high school (YouTube 'Crash Course in Biology (about  #9)) and found I'm talking a load of junk above (pun intended). The presenter is excellent if you can follow delivery at machine gun speed and an American accent.

A gene is a region of DNA that codes for a protein. Only 2% of a chromosome does this. A further 20% is described as 'regulatory genes' leaving 78% junk, an unfortunate term as current research seems to be gradually proving this incorrect. Y has about 100 genes, presumably the 2%. It is not clear if 'regulatory genes' are included in the 100 or if they are new terminology, yet to be included in the FT-DNA glossary.

Your clarification that we are looking only at 'junk' regions is very helpful as is the haplogroup comment, tho the 'regulatory gene' regions still seems equivocal. Your STR description is as I understand it and now I can see SNPs fall at random across the entire 98% junk region (will regulatory genes work if mutated ?). So the male descendants of Bart, Colin, Eddie and Freddie are appearing out of the mist. It is also clearer how Y-700 could improve the estimate of  'cousin distance' - a question I also asked in "The Tree House". I estimated it would need in the order of 50 like surname Y-37 'cousins' to form the base of my 'big triangle' with Adam at the summit, maybe 10 generations ago. With 8 matches so far it might be a challenge !
+1 vote

Hello Alan, 

You said “R_M269, also called R1b. In the appropriate Family Tree table, this breaks down into something in the order of 300 steps, a single example being R1b > U125 > L2 >Z376.”

R_M269 should be written as R-M269, and the break down should be 

M269 > U125 > L2 > Z376.

Using autosomal DNA for genealogy is actually more complex.

answered ago by Peter Roberts G2G6 Pilot (447k points)
edited ago by Peter Roberts

Peter - good to see you commenting on this question. I still seem to have some way to go yet to get my ducks in a row. I thought R-M269 was a Y-DNA haplogroup ?. With my Y-700 results just posted, I have changed from R-M269 to R-CTS10029. With Y-700 SNP, the Y-37 STR result and 'Z376' seem to now be redundant ? Or is Z376 still relevant to Y-700 ?

With regard to my mtDNA atDNA haplogroup I can "GET MINE" for another USD $200, but don't see any value in doing that as my real interest is the paternal line - Upritchard and related spelling. It seems atDNA runs out at 5 generations (recombinant - halving each generation) and would have no relevance in the long paternal ancestor time line ?

R-M269 is a Y haplogroup.  Your Y haplogroup has been refined to (or is more precisely) R-CTS10029.  Its ancestor is R-Z376 and an earlier ancestor is R-M269.

There are no atDNA haplogroups.

Related questions

+9 votes
1 answer
135 views asked Jul 4, 2017 in Genealogy Help by Jack Haywood G2G Crew (990 points)
+6 votes
3 answers
143 views asked Nov 22, 2017 in Genealogy Help by Ted Cockett G2G1 (1.9k points)
+4 votes
5 answers
+6 votes
2 answers
+2 votes
1 answer
199 views asked Apr 7, 2014 in WikiTree Tech by Nola Moses G2G6 Mach 1 (11.6k points)
+5 votes
2 answers

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...