PDA

View Full Version : 20 years of DNA profiling


Paul Nutteing
09-08-2004, 10:51 AM
The inventor has been doing sums but has he seen
any full 2.5 million profile databases - even he would not
be allowed to interrogate the NDNAD
http://news.scotsman.com/latest.cfm?id=3469847
Quote

Wed 8 Sep 2004

3:08pm (UK)
DNA Pioneer Celebrates Triumph

By Jacqui Walls, PA News

Genetic fingerprinting could allow police to determine the eye colour, hair
colour and even facial features of criminal suspects just by their DNA, the
scientist who discovered the technique said today.

Professor Sir Alec Jeffreys hailed his discovery, which he made by accident
in a laboratory at the University of Leicester 20 years ago, as one of the
greatest triumphs of British science.

But he said changes made to genetic fingerprinting to create much broader
DNA profiles led to a risk of mistakes being made in the tracing of suspects
using the national DNA database.

The highly-acclaimed scientist made his comments as he celebrated the 20th
anniversary of the technique which has had implications for criminal cases,
paternity, identical twins, cloning and conservation as well as led to
life-saving developments in medical research.

He said the application of the technique which made him most proud was its
use in immigration disputes.

"These are families who have fallen foul of immigration disputes," he said.
"These families have done nothing wrong whatsoever.

"The fact that we have managed to bring thousands of them back together
makes me very proud."

The scientific discovery has also led to the creation of a national DNA
database in the UK containing the profiles of more than 2.5 million
convicted criminals.

However, he said only 10 different DNA markers were used on the database to
distinguish between individuals, which could lead to mistakes.

"If you have a database of 2.5 million people you will start having matches.

"The current DNA database uses 10 distinct markers and I think there is
still a residual risk of a false match.

"I think they should use about 15 markers because otherwise it leaves open
the possibility that the match from the crime scene sample is genuine but is
a fluke."

He welcomed a global database containing DNA information on every individual
but said police should not have access to the profiles unless the people had
been convicted of a criminal offence.

He warned that if officers kept profiles of suspects who were later cleared
that could lead to a disproportionate representation of ethnic groups,
especially in the major cities.

Sir Alec said one of the implications for DNA profiling in the future could
be to help determine the physical characteristics of people by their genetic
make-up.

He said: "The most robust test is for gender. The research is at the science
fiction stage but our physical appearance is largely determined by our
genetic make-up.

"The problem is that facial features depend on age and if you have not got
the age it will be very complicated.

"Also some of the variations behind these genes will lie behind severe
clinical problems.

"The police would then be accessing information of profound medical
significance and I would argue that the police have no right to access such
information."

A more immediate development in DNA technology, however, was likely to be
quicker and easier ways of recovering a genetic profile.

"I think the basic science of DNA typing has been resolved but the
technology is now important," he said.

"If you could miniaturise the technology and speed it up, it would allow
police to take it out to the scene of a crime.

"We need a system that will recover a DNA profile within seconds or minutes
rather than hours."

Sir Alec was in his laboratory on September 10, 1984, when he stumbled
across the key to the future of genetic research and development.

He said: "I regard this as a wonderful triumph of British science. We have
consistently led the world in the application of DNA."

A year later, he developed a variation on genetic fingerprinting to enable
profiles of suspected criminals to be created, which led to the conviction
of a double rapist and murderer.

Two young girls were raped and murdered in the Enderby area of
Leicestershire and a man who had been arrested had confessed to one murder
but not the other.

Police decided to use genetic profiling, believing they would prove him
guilty of both cases. Against all expectation, he was found to be innocent
of both.

A hunt was launched to find a genetic profile among the entire male
population of the area that matched samples taken from the two victims.

No match was found until Colin Pitchfork was overheard boasting of how he
had persuaded a friend to give a sample on his behalf. The case was solved.

Sir Alec said after the case: "I felt relief because he was a serial
murderer and would kill again and because if the operation had failed then
the public's perception of forensic DNA would have been shattered.

"Also, here was a serial killer in the region who knew what I was doing and
where I worked and where my family lived. That feels very uncomfortable so
on a personal level it was a great relief when he was trapped."

End Quote

What they aren't telling you about DNA profiles
and what Special Branch don't want you to know.
http://www.nutteing2.freeservers.com/dnapr.htm
or nutteingd in a search engine

Valid email nutteing@fastmail.....fm (remove 4 of the 5 dots)
Ignore any other apparent em address used to post this message -
it is defunct due to spam.

Paul Nutteing
09-08-2004, 02:42 PM
Apologies for misleading everyone - the inventor has not
done his sums. He is slavishly following the lies of the
forensic statisticians.
Has anyone other than myself done proper simultaions of
multi-million DNA profile databases reflecting the
biological constraints of real allele frequencies ?
1 in 400 chance of just one match in 2.5 million -
complete and utter nonsense - I think this
character will be getting another email from me.

http://www.globetechnology.com/servlet/story/RTGAM.20040908.gtdnaa0908/BNSto
ry/Technology/
Part Quote

DNA testing is not an infallible proof of identity. While Sir Alec's
original technique compared scores of markers to create an individual
"fingerprint," modern commercial DNA profiling compares a small number of
genetic markers - often 5 or 10 - to calculate a likelihood that the sample
belongs to a given individual

He estimates the probability of two DNA profiles' matching in the most
commonly used tests at between one in a billion or one in a trillion, "which
sounds very good indeed until you start thinking about large DNA databases."

In a database of 2.5 million people, he said, a one-in-a-billion probability
becomes a one-in-400 chance of at least one match.

End Quote

What they aren't telling you about DNA profiles
and what Special Branch don't want you to know.
http://www.nutteing2.freeservers.com/dnapr.htm
or nutteingd in a search engine

Valid email nutteing@fastmail.....fm (remove 4 of the 5 dots)
Ignore any other apparent em address used to post this message -
it is defunct due to spam.

Richard White
09-09-2004, 01:45 AM
"Paul Nutteing" <nutteing@quickfindit.com> wrote in message news:2q9caqFt9vtaU1@uni-berlin.de... Apologies for misleading everyone - the inventor has not done his sums. He is slavishly following the lies of the forensic statisticians. Has anyone other than myself done proper simultaions of multi-million DNA profile databases reflecting the biological constraints of real allele frequencies ? 1 in 400 chance of just one match in 2.5 million - complete and utter nonsense - I think this character will be getting another email from me.

So what is your estimate of the inadvertent match probability?

Gordon Burditt
09-09-2004, 07:01 AM
>Apologies for misleading everyone - the inventor has notdone his sums. He is slavishly following the lies of theforensic statisticians.Has anyone other than myself done proper simultaions ofmulti-million DNA profile databases reflecting thebiological constraints of real allele frequencies ?1 in 400 chance of just one match in 2.5 million -complete and utter nonsense - I think thischaracter will be getting another email from me.http://www.globetechnology.com/servlet/story/RTGAM.20040908.gtdnaa0908/BNStory/Technology/Part QuoteDNA testing is not an infallible proof of identity. While Sir Alec'soriginal technique compared scores of markers to create an individual"fingerprint," modern commercial DNA profiling compares a small number ofgenetic markers - often 5 or 10 - to calculate a likelihood that the samplebelongs to a given individualHe estimates the probability of two DNA profiles' matching in the mostcommonly used tests at between one in a billion or one in a trillion, "whichsounds very good indeed until you start thinking about large DNA databases."In a database of 2.5 million people, he said, a one-in-a-billion probabilitybecomes a one-in-400 chance of at least one match.

The "birthday problem" is relevant here. Taking N people at random,
how many do you need to take before you've got more than a 50%
chance of having at least two with the same birthday? The answer
is NOT 365.25 divided by two. It's about 25.

What's the probability of no duplicates with 1 person? 365/365
(certainty). What's the probability of no duplicates with 2 people?
365/365 * 364/365. With 3 people? 365/365 * 364/365 * 363/365.
With each additional person, the number of available birthdays goes
down by one.

Now, let's consider the DNA problem. Assume a one in a TRILLION
probability of an individual match. I don't have info on allele
frequencies so I'll just assume it's random (which is probably the
most optimistic assumption for the claim that there will be no false
matches, and very unrealistic for certain populations.) There's a
trillion possible DNA profiles. How many people can you put in the
database before there is at least a 50% probability of a false
match? About 1,178,000. What is the probability of no false matches
with 2.5 million people? 0.0439, or 1 in 22.77 . Put another way,
there's a greater than 95% chance of at least one false match.

What if it's a one in a BILLION probability of an individual match?
You've got a 50% probability of at least one false match at 38,000
people. With 2.5 million people, the probability of no false matches
at all is virtually zero (less than 1e-300, at which point the
precision of my calculations has run out). With one million people,
the probability of no false matches at all is 6.0e-218, likely
somewhat less than the probability of the sun exploding tomorrow.

Gordon L. Burditt

Paul Nutteing
09-09-2004, 10:29 AM
"Richard White" <whiter@gmx.co.uk> wrote in message
news:2qal2mFsq8dbU1@uni-berlin.de... "Paul Nutteing" <nutteing@quickfindit.com> wrote in message
news:2q9caqFt9vtaU1@uni-berlin.de... Apologies for misleading everyone - the inventor has not done his sums. He is slavishly following the lies of the forensic statisticians. Has anyone other than myself done proper simultaions of multi-million DNA profile databases reflecting the biological constraints of real allele frequencies ? 1 in 400 chance of just one match in 2.5 million - complete and utter nonsense - I think this character will be getting another email from me. So what is your estimate of the inadvertent match probability?

For 2.5 million then between 2 and 140 scaled from the 4 million
simulation results of between 5 and 360 detailed below

I would hazard a guess that no more than 100 million DNA
profiles have been determined in the whole world with incompatible
systems. It is Alice in Wonderland territory referring to
billions or trillions of profiles. I know this fromn doing just low
million simulations . Because of a square law there are extremely
serious problems with multiple matches for populations in
the hundreds of millions to billions

Throughout the 1990s they were spouting, even in court ,
false match probability of 1 in 37 million for 6 loci profiles.
That was until admission of a crime-scene profile
matching 2 separate unrelated arrestee profiles in the (UK) NDNAD
(R v. Watters appeal) and case of Raymond Easton
forced a halt to that dangerous nonsense.
The real 6 loci false match data for a population of
4 million is between 94,000 and 380,000 matches.
Remember the courts were quite happy to convict in those
circumstances because the forensic statisticians had lied to them

So they upped it to 10 loci which certainly improves things
but not the multi-billion figures now spouted.
Again for a 4 million population the false match figure
is between 5 and 360 pairs. 5 is unrealistically low

parthonogenic - no coancestry ) and 360 too high as all
having similar ancestry as myself - ( one parent and his
parents from one English county and mother and her parents
from another English county ).

The custodians of the NDNAD will not release the actual number
of false matches in the NDNAD - for obvious ( to me ) reasons.
They plead the ridiculous situation of people using aliases and
multiple criminal records. Easily excluded in the first analysis
because you can safely exclude anyone with rare alleles from
the false match situation, and cross check with other biometric data.

Because the technology is so flakey is in necessary to
consider 19 out of 20 matches as a match not a mis-match.
This is argued permissable because a mutation on any locus/
allele is possible (0.01 to 0.3 percent chance on any of the 20
loci/alleles)
As long as further exploration to confirm this
conjectured mutation - but do they ?
19 out of 20 if identifying a decomposed body, profile already
known, or someone already named as a suspect in an enquiry but not
where the suspect has been implicated from a DNA
database trawl.

By 19 out of 20 I mean any 9 pairs matching and one allele match between
the remaining
pairs.

For the completely artificial situation for totally random
profiles ( no co-ancestry at all ) then for any 19 in 20 then about
120 matches in 4 million or 1 in 33,000.

For the opposite situation and excessive co-ancestry then
28,000 free 19 from 20 matches or 1 in 140
of a 4 million population.

Until real-life data becomes available for co-ancestry
( parents & grandparents from same or different country,
from same or different county etc )
tabulated against allele frequency minima then these are the
outer bounds as it stands.

They only arrest
the first person who happens to match in a DNA database,
they don't wait for ten years, say, for a third match
to come along.

Also there is blind faith in a very
flakey technology, 8 per cent of 10 loci DNA profiles
are incorrect , Source: International Journal of Legal
Medicine (2004 ) 118: p83-89. That is crime-scene
or arrestee profiles.

Relevant simulations and results are on dnas5.htm
and dnas6.htm on URL below. I would like to be
able to point to academics or forensic scientists
who have done such simulations but I seem to be the
only one to determine false-match likelihoods from
real-world allele frequencies. Afterall it does
not require any fancy high-power computer

The USA with 13 loci CODIS profiles cannot be smug as one
of there loci/alleles is TPOX (8) which 79 percent of people
have one or more. Also increase the number of loci
and you increase the chance of an error in any one
locus/allele so throwing out the whole profile,
falsely implicating or falsely eliminating.

What they aren't telling you about DNA profiles
and what Special Branch don't want you to know.
http://www.nutteing2.freeservers.com/dnapr.htm
or nutteingd in a search engine

Valid email nutteing@fastmail.....fm (remove 4 of the 5 dots)
Ignore any other apparent em address used to post this message -
it is defunct due to spam.

Paul Nutteing
09-09-2004, 10:33 AM
"Gordon Burditt" <gordonb.e2ev5@burditt.org> wrote in message
news:chpnn7$t34@library1.airnews.net...Apologies for misleading everyone - the inventor has notdone his sums. He is slavishly following the lies of theforensic statisticians.Has anyone other than myself done proper simultaions ofmulti-million DNA profile databases reflecting thebiological constraints of real allele frequencies ?1 in 400 chance of just one match in 2.5 million -complete and utter nonsense - I think thischaracter will be getting another email from me.http://www.globetechnology.com/servlet/story/RTGAM.20040908.gtdnaa0908/BNSt
ory/Technology/Part QuoteDNA testing is not an infallible proof of identity. While Sir Alec'soriginal technique compared scores of markers to create an individual"fingerprint," modern commercial DNA profiling compares a small number ofgenetic markers - often 5 or 10 - to calculate a likelihood that the
samplebelongs to a given individualHe estimates the probability of two DNA profiles' matching in the mostcommonly used tests at between one in a billion or one in a trillion,
"whichsounds very good indeed until you start thinking about large DNA
databases."In a database of 2.5 million people, he said, a one-in-a-billion
probabilitybecomes a one-in-400 chance of at least one match. The "birthday problem" is relevant here. Taking N people at random, how many do you need to take before you've got more than a 50% chance of having at least two with the same birthday? The answer is NOT 365.25 divided by two. It's about 25. What's the probability of no duplicates with 1 person? 365/365 (certainty). What's the probability of no duplicates with 2 people? 365/365 * 364/365. With 3 people? 365/365 * 364/365 * 363/365. With each additional person, the number of available birthdays goes down by one. Now, let's consider the DNA problem. Assume a one in a TRILLION probability of an individual match. I don't have info on allele frequencies so I'll just assume it's random (which is probably the most optimistic assumption for the claim that there will be no false matches, and very unrealistic for certain populations.) There's a trillion possible DNA profiles. How many people can you put in the database before there is at least a 50% probability of a false match? About 1,178,000. What is the probability of no false matches with 2.5 million people? 0.0439, or 1 in 22.77 . Put another way, there's a greater than 95% chance of at least one false match. What if it's a one in a BILLION probability of an individual match? You've got a 50% probability of at least one false match at 38,000 people. With 2.5 million people, the probability of no false matches at all is virtually zero (less than 1e-300, at which point the precision of my calculations has run out). With one million people, the probability of no false matches at all is 6.0e-218, likely somewhat less than the probability of the sun exploding tomorrow. Gordon L. Burditt

I only work from data derrived from fully compliant
simulations using real allele frequency data not the
rubbish refering to billions, trillions...... heptillions coming
from deluded forensic statisticians.
For a more detailed reply see my other reply in this thread

What they aren't telling you about DNA profiles
and what Special Branch don't want you to know.
http://www.nutteing2.freeservers.com/dnapr.htm
or nutteingd in a search engine

Valid email nutteing@fastmail.....fm (remove 4 of the 5 dots)
Ignore any other apparent em address used to post this message -
it is defunct due to spam.

JPB
09-10-2004, 02:57 PM
Gordon Burditt wrote:
Apologies for misleading everyone - the inventor has notdone his sums. He is slavishly following the lies of theforensic statisticians.Has anyone other than myself done proper simultaions ofmulti-million DNA profile databases reflecting thebiological constraints of real allele frequencies ?1 in 400 chance of just one match in 2.5 million -complete and utter nonsense - I think thischaracter will be getting another email from me.http://www.globetechnology.com/servlet/story/RTGAM.20040908.gtdnaa0908
BNStory/Technology/Part QuoteDNA testing is not an infallible proof of identity. While Sir Alec'soriginal technique compared scores of markers to create an individual"fingerprint," modern commercial DNA profiling compares a small number ofgenetic markers - often 5 or 10 - to calculate a likelihood that thesample belongs to a given individualHe estimates the probability of two DNA profiles' matching in the mostcommonly used tests at between one in a billion or one in a trillion,"which sounds very good indeed until you start thinking about large DNAdatabases."In a database of 2.5 million people, he said, a one-in-a-billionprobability becomes a one-in-400 chance of at least one match. The "birthday problem" is relevant here. Taking N people at random, how many do you need to take before you've got more than a 50% chance of having at least two with the same birthday? The answer is NOT 365.25 divided by two. It's about 25.
<snip> What if it's a one in a BILLION probability of an individual match? You've got a 50% probability of at least one false match at 38,000 people. With 2.5 million people, the probability of no false matches at all is virtually zero (less than 1e-300, at which point the precision of my calculations has run out). With one million people, the probability of no false matches at all is 6.0e-218, likely somewhat less than the probability of the sun exploding tomorrow. Gordon L. Burditt

We did this here on uk.legal a while back, and pretty much nailed the basic
math down for the simplified case where everything is assumed to be wholly
random.

As you say, the likelihood of having at least one false match rapidly
approaches certainty with a large database, and that means a better
question to ask is what the *expected* number of duplicates in a large
database, as it's clear there will be some.

For your assumed 1 in a billion chance of an individual false match, I get
an expected number of individuals whose profile is not unique in a database
of 2.5 million people at a little over 6 thousand. For one in a trillion
(1e12?), it works out to about 6. So that range (anywhere from a handful to
a few thousand) is broadly in the same ballpark as Paul's simulations, and
as he says when the database size increases, the number unique initially
doesn't increase linearly but in proportion to the square of the database
size.

In a real database, where it's not all truly random and you have
interbreeding, it wouldn't surprise me to find that there were more false
matches than that - if anything, the random assumption strikes me as taking
an optimistic view.

--
JPB

Paul Nutteing
09-11-2004, 01:06 AM
"JPB" <news{@}europa{.}demon{.}co{.}uk> wrote in message
news:cht7q0$cd5$1$830fa79f@news.demon.co.uk... Gordon Burditt wrote: We did this here on uk.legal a while back, and pretty much nailed the
basic math down for the simplified case where everything is assumed to be wholly random. As you say, the likelihood of having at least one false match rapidly approaches certainty with a large database, and that means a better question to ask is what the *expected* number of duplicates in a large database, as it's clear there will be some. For your assumed 1 in a billion chance of an individual false match, I get an expected number of individuals whose profile is not unique in a
database of 2.5 million people at a little over 6 thousand. For one in a trillion (1e12?), it works out to about 6. So that range (anywhere from a handful
to a few thousand) is broadly in the same ballpark as Paul's simulations, and as he says when the database size increases, the number unique initially doesn't increase linearly but in proportion to the square of the database size. In a real database, where it's not all truly random and you have interbreeding, it wouldn't surprise me to find that there were more false matches than that - if anything, the random assumption strikes me as
taking an optimistic view. -- JPB

Hello again.
Every now and then I check the tables of contents of the
leading forensic 'science' journals and general searchengines
to find who else has done proper multi-million simulations.
Still no one else - thousands of scientists must hear reference
to billions, trillions,quadrillions.... even heptillions and take it
in without question. As disturbing for the whole of science
not just the forensic lot.

Later simulations of mine would suggest something lower than a square
law in operation - from a 10 million 13 loci CODIS simualtion.
But a phenominal increase in the numbers of triples and n-tuples
in lesser number of loci matches - cube lawing or more.
I would suggest that in a billion 10 loci profile database and considering
all but the rarest sub 1 per cent rare alleles then majority of
all such profiles would have at least one match.

I started on analysing published DNA profiles (circa 300 in a batch) for
cross-loci cross-linking , was promising but I ran out of time.
Perhaps return to it over winter.
For each locus/and common alleles doing counts on the remaining
loci/alleles and logging excesses beyond expected allele frequencies.
Then cross-comparing with other profile collections to
see if the same excesses emerge.

Tomorrow will compose an email for Mr Alec Jeffreys.
Main points
His non-substantiated reference lack of matches in 2.5 million
Increasing to 15 loci does not just increase the resolution.
Presently in terms of one error in 20 of a 10 loci profile
then 8 percent are wrong - source a recent IJLM article
on GEDNAP trial. Increasing to 15 loci just means 12
percent become erroneous or individual profiles having more
erroneous (unknown to anyone) individual alleles.

Too low a number of loci like the original 6 then far
too many false matches.
Increase to CODIS 13 or even 15 then or more then
a high proportion of arrestee and even more so crime-scene
profiles (due to contamination, mixtures etc) becoming unknowingly
invalid so falsely implicating or falsely eliminating.

What they aren't telling you about DNA profiles
and what Special Branch don't want you to know.
http://www.nutteing2.freeservers.com/dnapr.htm
or nutteingd in a search engine

Valid email nutteing@fastmail.....fm (remove 4 of the 5 dots)
Ignore any other apparent em address used to post this message -
it is defunct due to spam.

JPB
09-11-2004, 01:44 AM
Paul Nutteing wrote:

<snip>
Later simulations of mine would suggest something lower than a square law in operation - from a 10 million 13 loci CODIS simualtion. But a phenominal increase in the numbers of triples and n-tuples in lesser number of loci matches - cube lawing or more. I would suggest that in a billion 10 loci profile database and considering all but the rarest sub 1 per cent rare alleles then majority of all such profiles would have at least one match.

Yep. Square law only applies initially - once the duplicates become a larger
proportion of the total then it starts to drop off from that (as triples
etc start to appear). Too late by that stage for the purpose of uniqueness,
of course. And you are correct, once you start getting triples the initial
rate of increase will be cube law, and higher n-tuples at ever-increasing
power laws. Again, only initially when there aren't many of them, but by
the time the rate starts to slow you've already got far too many duplicates
to deal with.

<snip> Too low a number of loci like the original 6 then far too many false matches. Increase to CODIS 13 or even 15 then or more then a high proportion of arrestee and even more so crime-scene profiles (due to contamination, mixtures etc) becoming unknowingly invalid so falsely implicating or falsely eliminating.

Concur - simply ever raising the number of locations doesn't in and of
itself resolve matters. As well as the possibilities of contaminated
mixtures, even chimeras, etc, it becomes easier to find a subset of the
elements that do match, and as you were describing before where a match on
all but one element was/is being deemed as a positive match?

--
JPB

Paul Nutteing
09-11-2004, 09:22 AM
"JPB" <news{@}europa{.}demon{.}co{.}uk> wrote in message
news:chudmk$eck$1$8300dec7@news.demon.co.uk...
Yep. Square law only applies initially - once the duplicates become a
larger proportion of the total then it starts to drop off from that (as triples etc start to appear). Too late by that stage for the purpose of
uniqueness, of course. And you are correct, once you start getting triples the initial rate of increase will be cube law, and higher n-tuples at ever-increasing power laws. Again, only initially when there aren't many of them, but by the time the rate starts to slow you've already got far too many
duplicates to deal with.

Are you aware of this effect coming into play in
another discipline not associated with DNA profiles ?
or the generic term for the effect in stastical theory,
game theory, business theory or wherever ?

What they aren't telling you about DNA profiles
and what Special Branch don't want you to know.
http://www.nutteing2.freeservers.com/dnapr.htm
or nutteingd in a search engine

Valid email nutteing@fastmail.....fm (remove 4 of the 5 dots)
Ignore any other apparent em address used to post this message -
it is defunct due to spam.

Complete Labor Law Poster for $24.95
from www.LaborLawCenter.com, includes
State, Federal, & OSHA posting requirements