The Use of Gedmatch in Law Enforcement - Dealbreaker?

There has been a lot of news coverage on the use of Gedmatch in identifying suspects in the Golden State murders case.  While I am happy to see that this might have cracked this cold case, the fact that Gedmatch was used for this has raised a significant ethical concern for me.

Just some background, I am an avid Gedmatch user and have convinced 17 of my family members (including both of my parents) to allow me to upload our tests into the database for use in my family history research.  I am such a huge supporter of Gedmatch that have been a regularly monthly donor to Gedmatch going on two years now.

I just came across this press release that really concerns me though:

Basically, Paragon, is a private company that will now specialize on using genetic genealogy in law enforcement cases (for a $3,750 fee) across the United States.  In the press release, the company reports that it already has more than 100 investigations with agencies across the U.S.  100+ investigations sounds like this is becoming a fairly common practice.

With this revelation, I can't in good conscience keep my family's kits public on Gedmatch.  None of my 17 family members consented to having their tests used in regular law enforcement investigations - much less in law enforcement investigations that are conducted through a private company.  Also, with the thought of more than 100 active law enforcement cases, I can't envision a scenario in which I would be willing to try and persuade any of my family members that there are no risks involved.  I just switched each of their 17 kits to "Private" on Gedmatch so that they cannot be accessed in such investigations (specifically - the most basic Gedmatch one-to-one and one-to-many searches are the most common Gedmatch searches used in law enforcement investigations).

I am also feeling a great sense of dissonance over making my own test available on Gedmatch.  As Columbia University's Yaniv Erlich observed recently, each of our tests are a "...beacon who illuminates 300 people around you."  I am bothered by the thought that my Gedmatch test could easily be used in familial searches in legal investigations against any of my family members -  from close relatives to second and even third cousins who I have never met.

And, make no mistake, I completely understand the reality that anyone (myself included) who has U.S. ancestors from within the past 3-5 generations is already part of the Gedmatch database.  I just feel uncomfortable at the thought of my test being used against family members.

Bottom line, the unfettered use of Gedmatch in law enforcement investigations (with a private company making such investigations available for $3,750) has me concerned.  While I see the legitimacy of using this testing in serious "cold cases" that have gone unsolved for many years, the possibility of this practice having negative consequences (e.g., my test being used to help build a case against a family member who is being wrongfully-accused of something and the family member having to prove their innocence) has led me to make my Gedmatch test private as well.

I'm still taking a wait and see approach though, so I am only making the tests private to remove them from one-to-one and one-to-many searches.  I am not going to delete the tests, in hopes that standards will be created for the use of Gedmatch in law enforcement investigations.

How about you - is the revelation that familial searches on Gedmatch have become a fairly common law enforcement practice a "dealbreaker" for you having your test on Gedmatch?


I think it is great as well. The problem I see is that what is to stop someone from developing a GEDMATCH profile to spit out all the matches that are Jewish? Or perhaps a report for everyone who might be a diabetic? You dont need an actual DNA sample to create these profiles, just some know how and a text editor. Hopefully we can stop abuse with government oversight, but having government regulation of businesses is on the down turn.

But it doesn't work quite that way. You would need to be able to search against those fake profiles for very specific allele combinations, and GEDmatch can't do that. I'm on less firm footing here about admixture/ethnicity than on protein-producing genes, so I'll stick to that; the principle, though, will be similar.

Your genome has a little over 3 billion base pairs, and the typical genetic genealogy test looks only at about 700,000 of those, or about 0.023%. There are a lot of areas along the chromosomes that have no known direct effect on us; those areas seem to have no active function and are often referred to as "junk DNA." A lot of those 700,000 base pairs the genealogy tests look at are in junk sectors for a very good reason: alleles there are more freely able to mutate without causing possible harm to the organism, and those are places we can look at to help differentiate the otherwise 99.9% DNA that all human beings share.

The protein-producing genes that affect bodily structure and function typically vary in length from just a few thousand base pairs to, rarely, about 2 million. A very few are larger than that. GEDmatch can't report on information down to the gene level. And within that protein-producing gene, a mutation that changes its coding is the result of only a few alleles or even a single one, a single base pair.

There's a massive amount of assumptive math that goes on in genetic genealogy that's unique to the field. Medical and forensic genealogy isn't concerned with trying to estimate crossover frequencies and guess at possible relationships in generations. Heck, even the start and stop points for a physical segment you see reported by GEDmatch (and others) does not reflect actual base pairs and alleles. They're estimates only, because the data are working with defined SNPs that represent, again, only about 0.023% of the entire genome. The start and end points of a segment are estimated by the closest matching SNP...and that could be thousands of base pairs from the actual loci.

We search and evaluate based on centiMorgans. This isn't a physical measurement at all. It uses linear extrapolation to estimate the recombination frequencies based on the location on a particular version of a human genome map, male or female...and the two differ significantly. The centiMorgan is way to estimate and present relative genetic distance in terms of relatedness, and the computed values differ greatly depending upon which chromosome is considered, and the (again, estimated) start and stop points on that chromosome. One cM may be equivalent to tens of thousands of physical base pairs in one place (rare), or a few million base pairs in other places. An unusably tiny, to genealogists, 3cM segment might well contain 10 million base pairs.

And the "integrity" of the reported segment is also an estimate. You know when you go to look at one-to-many or one-to-one matches in GEDmatch, it has one field to set the minimum number of SNPs and another to set the maximum "mismatch bunching" limit? The integrity of a physical segment is assumed if a minimum number of SNPs match sequentially (and no-calls, base pair values that came out of the test with a null value--there typically are around 0.5% to 1.5% of these--are ignored in the evaluation) with some wiggle-room from that mismatch bunching limit. A segment computed to be 7cM may have, say, only 1,000 SNPs tested along a stretch of chromosome 10 million base pairs long. So we're guessing that all those millions of intervening base pairs are identical because the relatively few that we samples, the SNPs, are, mostly at least, identical in a contiguous string.

That's the granularity and, frankly, "iffiness" with which genetic genealogy works. GEDmatch can't find individual genes, much less the individual allele mutations that might affect a medical condition. If a few alleles impact a gene and those base pairs are not a tested SNP, they could change all day long and estimates used to arrive at our genealogically-relevant segments would never know it.

Specific example. Diabetes is not thought to be a genetic disease. However, studies have shown that the risk of developing Type 1 may be increased by particular variants of the HLA-DQA1, HLA-DQB1, and HLA-DRB1 genes that live in the HLA (human leukocyte antigen) region mostly in chromosome 6. In genetic genealogy, this is a well-known "pile-up" region, meaning that most humans "match" there in the current technology of our testing. The HLA genes provide instructions for making proteins that play a critical role in the immune system. Kind of a big deal, and wholesale mutations in that area mean not-so-good things for the survival of the organism. Ergo, most of us match along that area of chromosome 6. You could construct a fake genome that included appropriate values for HLA-DQA1, HLA-DQB1, and HLA-DRB1, and upload it to GEDmatch...with zero information resulting. 

I had to go look this up, without much to show for it. I found that the Illumina Human1M-Duo BeadChip--now discontinued and not a population (genealogy) purpose chip, but a medical one--looked at a grand total of four base pairs (technically, reference clusters) in the HLA-DQB gene; none in DQA or DRB. I can't find any indication the Illumina OmniExpress or GSA chips--the ones used in autosomal DNA testing today--test for any base pairs in the HLA-DQ or DR genes at all.

I know that diabetes was only a random, top-of-the head example. I'm not singling it out. I just wanted to go from the macro to the micro to illustrate that the reporting we can get from GEDmatch has pretty much zero possibility of linking results to disease, and that those reported results--unless using the for-purpose admixture tools--are also probably of no value for identifying ethnicity. Mind you, there are companies like Promethease who can take your uploaded raw data and provide some information about non-genealogical stuff, but that's what they do; they don't do genealogy. And you can't extract the medical stuff from GEDmatch.

Awesome answer Edison. I was wondering the same thing about mtdna tests.  Would any health information be released if I shared my mtdna haplogroup down to the finest group indentified or does disease happen at a much smalller level there as well?
Thanks, Lance. Much smaller level there, as well. If you had a full mtDNA sequence, there will be information in that complete, raw set of information that can be used for medical, evaluative purposes. But any haplogroup designation--even rare ones--are at a much coarser scale than could be applied medically. Even rare mtDNA haplogroups include many thousands if not millions of people.

If you took HVR1 and HVR2 panels only, the detailed results reported look only at places of variance from baseline "standards," the Cambridge Reference Sequence (rCRS) or the Reconstructed Sapiens Reference Sequence (RSRS). Those specific variances, shown at the allele level, are also not medically informative.

But the mitochondria aren't even part of the human genome. The little organelles are in every animal, and without them we wouldn't be able to produce energy in the form of ATP. They're so small that we have hundreds of them living inside each of our own cells (well, except red blood cells...different story). The mtDNA HVR1 testing region consists of base pairs numbered 16,001 through 16,569 (569 base pairs); HVR2 from 1 through 574 (575 base pairs). The third area now tested in full sequencing, the "coding region," looks at base pairs 575 through 16,000, or 15,425 base pairs. That's it. That's how tiny these lil' guys are compared to even our smallest chromosome.

And most people have different...versions of mtDNA in their bodies. Even within a single human cell there might be a difference in the mtDNA genomes found there. This is called heteroplasmy. In fact, only about 15% of the time are mtDNA associated conditions the result of mitochondrial mutation. The relationship in the cell is so symbiotic that the other 85% of the time the causal factor is actually a human mutation and the nucleus of our cell generates proteins that are imported into the mitochondria. When our mtDNA is tested, multiple passes are performed to try to make certain the results are for the "average," baseline mtDNA that's in most every cell of our bodies.

I know it's an old thread but thought the following article about police use of DNA to be of particular interest. People wrongly convicted because of DNA evidence. How easily DNA is transferred. DNA tansference implicating innocent people. It should be a real concern when considering law enforcement and others utilizing sites like Gedmatch. 

Framed for Murder By His Own DNA

Ray:  In that same Science paper (Erlich et al.,DOI: 10.1126/science.aau4832) they propose a useful solution.  If the legitimate genealogy companies were to digitally sign their raw DNA files with a public-private key pair, GEDmatch and others could tell if the sample had been done for genealogy purposes, and prove that it had not been altered since testing.  This would complicate the ability to exploit forensic or research samples.

Too, GEDmatch's TOS now requires users to acknowledge that unforeseen genealogic and non-genealogic uses for the database may arise, including "Familial searching by third parties such as law enforcement agencies to identify the perpetrator of a crime, or to identify remains." The only remedy they offer is to remove your raw DNA file from the site.

I appreciate your sharing that article.  One comfort is that the justice system and forensic science does try to keep up with advances in science.  Fortunately, there has to be some nexus between trace DNA and a crime before it is useful to implicate someone. There are so many other ways to be wrongfully accused and go to jail...or prison.  For example, eyewitness testimony is notoriously faulty, and many, many, many innocent people have gone to prison based on eyewitness testimony. I am thankful for DNA to help get the right person for the crime,  but the criminal justice system will need to get up to speed on touch DNA and its fallibility.

Hi Ray,  Yeah, kind of annoying that someone is making money on my autosomal DNA test. I hate it when that happens.  However, I am not too concerned about innocent family members being accused of a crime based on my autosomal test on Gedmatch.

It is a different topic altogether, but I wonder how GedMatch is going to manage the GDPR changes.  Anyone know?
I'm not too concerned either - do the crime, do the time. If a family member of mine turned out to be a deviant and my dna helped get them off the streets I'd just be glad to have helped :)

That said, I am vexed about people making money off anything I have paid for, or done on line, without my permission and without full disclosure.

Keeping our communities safe though is a whole different thing - law enforcement are free to use my dna to help catch criminals anytime if need be!
When you say that GDPR is a different topic altogether, I think this actually goes right to the heart of GDPR. My very limited understanding of GDPR is that it is intended to stop commercial organisations using the data of EU citizens without first telling them how it will be used and getting their permission.

Since Paragon is apparently charging a fee it would seem to be a commercial organisation. GedMatch contains the data of a lot of EU citizens, myself included, and I am not aware that it has tools that only search based on country of residence. Since Paragon has not asked permission to use my data in this way it seems to me that they are in breach of GDPR. If they asked me then I would probably give permission to use my DNA results for catching criminals as this seems like a good thing, but they are supposed to ask first and give me the right to remove my permission if they start using my data for other purposes.

I expect that law enforcement agencies have a bit more leeway, but are not completely above the law, so as a commercial organisation working for law enforcement that is probably one for the lawyers to sort out.

My main concern would be if GedMatch got caught in the crossfire and had to make big changes to stop other organisations from "misusing" the data of EU citizens in this way. Whether they have the resources to make these sort of changes and support the administrative demands going forward I really don't know.
Great question about Gedmatch and GDPR.  I haven't seen anything on if/how Gedmatch plans on making any changes due to GDPR.

I agree in principle with the use of genetic genealogy in law enforcement.  The identification of this suspect for the Golden State killer is an amazing example of how the basics of autosomal DNA matching in a large database can be applied to do things that were not possible just a few years ago.

In my opinion though, the cause for concern comes in the expansion in the use of genetic genealogy in law enforcement investigations - and the negative impact of false matches.  For example, in the Golden State case, a man in Oregon had to clear his name after a Y-DNA match through the public database, Y-Search, identified him as a possible suspect:

The use of genetic websites in the hunt for the Golden State Killer also led investigators to misidentify a potential suspect last year, according to court records.... The daughter of a 73-year-old Oregon City man said authorities swabbed her father for DNA in a nursing home without her knowledge. " 

The possibility of false matches/false accusations makes it alarming to see the announcement that the company has screened more than 100 samples from agencies across the country.  And it sounds like some of these investigations are well underway:  "In the coming months, we anticipate a large number of arrests in which Snapshot Genetic Genealogy analysis was helpful, even critical to the investigation,” - Ellen McRae Greytak, head of Parabon’s Snapshot division.  

No, it isn't a "dealbreaker" for me.  I think it's great that they solved a cold case by using intensive detective work.  I too, engage in extensive detective work in trying to find my great-grandmother.  It isn't easy to do, and it's fairly amazing that it worked.  I rely on other people sharing their DNA and trees for me to find her, and even with all those clues, she's still lost to me.  

Because it's so difficult to do and because the work required is intensive and not fun, if law enforcement needs help with the process and can afford that fee, then what's wrong with paying the fee?  They're not sharing my DNA.  They're doing genealogy.  They're looking at matching segments and shared trees and building out mirror trees.  I do that every day.

Ancestry uses our DNA in combination with our trees to make genealogical connections (although they do a fairly horrible job of it).  They're making money off of it.  We live in a capitalist society.  So be it.
I agree that it is great that the Golden State killer might finally have to account for his crimes.  

I'm not opposed to the money-making part of this.  It is the fact that this technique is suddenly being applied in more than 100 law enforcement investigations.  There is a great deal of debate over the legality/ethics of familial searching.DNA tests in law enforcement:
The article refers to "public databases such as GEDmatch."

If companies like GEDmatch are public databases, then I would NEVER use their services.

Furthermore, if their terms of service do not state that DNA results can/will be public and useable by law enforcement and commercial entities, then they have opened themselves to lawsuits from every customer (and possibly from non-customers) - in my opinion.

Use of DNA testing to fight crime is fine as long as that evidence was obtained properly by law enforcement. To me, properly would mean by following the (USA) 4th Amendment as it was written (before the endless erosion of its original intent).
Ray if you elect to make your Gedmatch ID public yes, someone can login and view your matches.  But to view any matches you have to have a GedMatch ID so it is not public in the sense that you can go in and see everyone who has a gedmatch number... only matching ones to the id you put in.   I know that here on WikiTree we chose to list the IDs and that makes them public.  Which can be abused.  I would have preferred they were behind a trusted list but I don't run the site.  I think that layer of security would have been protection against the public issue.  The way gedmatch was deployed here does create a more open environment but once again, people make the decision to make that public or not.  

I did not mean to say upload a gedcom I meant to say upload raw DNA file.  I mistyped that...  

HIPPA Law (those are the medical ones) and the specific DNA privacy laws called GINA are laws in the USA.  However, like most laws, when it comes to Law Enforcement, keeping the public safe will super cede other laws generally speaking.  I think Gedmatch should require that anyone registering or uploading a file has to verify that they either are the owner of have their written permission.  That way if someone misuses it there are legal recourse for fraud or misrepresentation.
No precisely germane, but if you haven't already you'll soon hear about another cold case solved by use of GEDmatch. This one a double murder from 1987 in Washington State; the county sheriff's press conference is only about seven hours old. This is the first case to utilize the company Parabon for the research.

I support some changes in the way GEDmatch operates, but I sincerely hope this doesn't become a piling-on that causes John Olson and Curtis Rogers to minimize GEDmatch, or even to close it entirely. It doesn't make a lot of money, and now they're being squeezed by GDPR from one side, and the GSK/Parabon issue from the other. GEDmatch is a critically important genealogy resource that needs to survive.
I think the main takeaway from this discussion is that if we are afraid of law enforcement using GEDmatch against us or our families, then we should remove our DNA from the site. But for the rest of us who are more interested in finding the way through our brick wall, then we need to continue being supportive of GEDmatch.  They have revolutionized our ability to make connections and solve mysteries.
When you upload a DNA test to GEDmatch, you know whose DNA it is.

When Parabon upload a test done on a crime scene sample, they don't know whose DNA it is.

It would be very easy to ask simple questions like "do you know whose DNA this is?" and "do you want your test matched against tests of unknown origin? (which obviously can't help your genealogy)"

But I don't suppose they will.
Presumably the 4th cousins of the crime scene sample get a mysterious match on their lists.  What happens then?  Do they just email the owner and get no reply?  Or do they get contacted by Parabon posing as an adoptee and asking them to share their tree?
I think GedMatch can do a couple of really simple things.

1.  Require as part of the upload that you electronically sign that it is your DNA being uploaded OR that you are uploading it for someone else and you have their written permission to do that   This would effectively eliminate the lega use by some company like Parabon  This is essentially adding a signature check box to the upload screen.  

2.  On WikiTree, put all gedmatch numbers and FTDNA numbers behind a trusted list fire wall so they are just not visible to the general public.  This may take a bit more work but given the privacy stuff going on, it might be necessary because of how we interface to GedMatch.   Instead just put a note for public consumption that says:  This member has done DNA testing.  Contact them via this private email link (the link on the profile) to request being added to their trusted list to see the DNA information.   I don't think that would stop a lot of people who are serious about finding DNA relatives from contacting us.  

Those 2 things one at Gedmatch and one here at WikiTree should alleviate the public issue.  

Will they change?  Maybe because of the press on this and because of the EU privacy rules.

In addition to Ray Jones rebuttal vis a vis GEDMATCH


Laws protecting DNA can be, and I almost guarantee be reversed either by SCOTUS or Congress, both of which are, shall we say, corporate friendly.
I think SCOTUS would uphold a law allowing people to be held accountable civilly or even criminally for falsely signing an acknowledgement that it is their own DNA or they have express permission to upload it, but I don't think law enforcement would be subject to any such law.  They already use the internet by creating false personas to catch perverts trying to solicit minors for sex. They are lying in creating those. It is basically the same thing here. But I don't think corporations should be allowed to do it for law enforcement and I think SCOTUS would draw the line there.
Detectives have always lied.  It's called undercover work.

Making them tell the lie would at least establish what business they're in.  Parabon are private investigators.

I don't know whether it makes any difference whether the clients are public authorities.

They won't always be public authorities.  It's possible to imagine many commercial or even domestic scenarios where it's worth $4000 to somebody to discover the identity of somebody (who might be doing nothing illegal).

The thin end of a wedge is being driven in.

And the people supplying the data are only pursuing a hobby.
Now that is a good post. I totally agree. I like the statement "the thin edge of a wedge is being driven in". I totally agree and that has been my thought all along.

Tis a sad state of affairs but the money spent by genealogy enthusiasts, have the potential (and certainly will) be used against us, not just for criminal investigations but for employment, health and life insurance. And not just by personal access or use of our DNA, but by using metadata, as collected by Universities, non profits and for profits like 23andme.

I see a day coming when one who applies for an insurance policy will have to submit to a swab.

I recently bought a life insurance policy, and they took blood, had me fill out a lengthy questionaire, subjected me to a verbal, mental and physical tests and finally reviewed my medical records. The only thing they didn't do is swab my cheeks.'

I passed and received the policy.
Sharing DNA connections has long revealed extra-marital affairs, biological parentage, and relationships to unpleasant people and I've warned any test-takers about the dangers. Further, I've always warned test takers about possible use by law enforcement. When you put your DNA into a database and reveal your identity, you are sharing your relationships, known and unknown with the world.

I also feel that my DNA relationships belong to me and are mine to share or not share. I don't judge anyone on their decisions to share or not share, its personal choice, but I refuse any argument that I don't have the right to share my DNA relationships any more than I don't have the right to share what I did last Tuesday because it involved another person.

So pretty much nothing has changed for me, no.
Point is, if you tell the internet you had pancakes for breakfast with your brother last Tuesday, it wouldn't be that easy to search the internet for people who had pancakes for breakfast last Tuesday and discover that your brother was one of them.

Data is private not when it's undiscoverable but when it costs more than it's worth to discover it.

It's always been the case that a P.I. could discover a lot about us, but our secrets were safe as long as it wasn't worth anybody's while to hire a P.I.

So it was never a good argument for relaxing privacy to say that anybody could discover the information anyway if they really wanted to.
Living people are anonymous in pretty much every database and the standards here are that they remain anonymous unless they volunteer otherwise. That is a reasonable standard.

If my DNA matches <blank> then that tells no one anything. These people get uncovered by other public records. In the U.S. many records of living people are public in many states, such as marriage records. In Canada, it's very hard to track down the identity of any living people - the best source of information being obituaries or Facebook, back to people sharing who their brother or cousin is.

I don't tag people on Facebook unless I know they are ok with it. I don't share my friend's list. I don't put people's pictures on Facebook without consent. That is their opportunity to remain anonymous. And that's the best they should expect.

You raise some interesting points Davis.  One thing that you said that particularly resonates is "I don't tag people on Facebook unless I know they are ok with it. I don't share my friend's list. I don't put people's pictures on Facebook without consent. That is their opportunity to remain anonymous. And that's the best they should expect."

I feel the same way about this issue, which is why Professor Yaniv Erlich's observation that each of our DNA tests is a "beacon who illuminates 300 people around you" has me so troubled.  Like you, I don't post information about people on Facebook (or Wikitree) without their permission.  As such, the thought of my test on Gedmatch being used as part of a legal case against a close relative gives me pause.  While I agree with what several people have said about relatives who "did the crime" deserving to be identified, my concern is that the unfettered use of this technique in legal investigations could lead to a near or distant relative being falsely-accused and having to clear their name.  It would be very unfortunate that this would happen (in part, anyway) because of my use of genetic testing in my family history research.

I just would like to see some standards for the use of familial testing in DNA databases in law enforcement investigations.   

I would agree with you except that DNA is more often used to find the true culprit and free innocent people, not cause innocent people to come under scrutiny. But I am also respectful of other people's privacy and I try not to put them in a position of having it violated.
I'm always for safeguards to our rights. We can certainly pass a law insisting warrants for police to upload a DNA file to a database. Police find DNA at crime scene, judge OKs upload to DNA matching service... profit!

I don't believe that would prevent cases like this, though. As long as the evidence is from a crime scene, and what other DNA would be in police labs, what judge would say no?

As far as being falsely accused, the police did do a direct test on suspect here. Using genetic genealogy as evidence in court would be wrong; we all know how hard it is to really prove a relationship.
Only a matter of time before the smart criminals pick up random DNA samples and leave them at crime scenes.
