Sister blog of Physicists of the Caribbean. Shorter, more focused posts specialising in astronomy and data visualisation.

Thursday 5 October 2017

Statistics is hard

I know nothing of forensics, though I will add a few related points.

The first is that in my own field I see people making extremely strong statistical claims based on somewhat dodgy data. For example there's this claim that satellite galaxies orbit their hosts in thin planes, which is not predicted by the standard model. Based on analysis of conventional models, claims have been made that the odds of the observations (especially of the Milky Way and our neighbour Andromeda) matching reality are something like 1 in 10,000. And I'm sure this is correct if you take it at face value and don't dig any deeper.

The problem is that there are huge uncertainties here at every level : the simulations don't take into account the gas physics, the observational uncertainties in distances (which strongly affects the narrowness of the planes), or the likelihood of galaxies interacting which has been shown can cause planes of satellites to narrow significantly. Not to mention that 1 in 10,000 isn't very impressive when you've got billions of galaxies. Or that galaxies in close proximity to each other are likely to be born in similar environments, and if these are (for whatever reason) more susceptible to forming planes, then the statistical probability that they're similar is based on a faulty assumption that they're drawn from a random population. And rather than saying, "let's see if there's some physical mechanism within the existing model that can explain the data", of which many have already been published, authors sometimes say, "the whole model must be fundamentally flawed".

This is fine in extragalactic astronomy where the worst that can happen is that one author will argue with another. Obviously if this happens in forensics then the consequences are very much worse. But I can easily believe that it could happen and I would understand why.

The second point is that I've come to the opinion that the formal error equation is bollocks. As an observer, I simply don't believe error bars that are smaller than the plotted data points - or worse, error bars that are smaller than the observed scatter when there's a clear underlying trend. In my experience, systematic effects always dominate. This doesn't mean that the results are wrong or the claims are wrong, just that I'd be automatically wary of claims of extremely high confidence in the results (with many caveats, depending on the particular data and claims being made). Not only do observers make mistakes, but sometimes the intrinsic scatter is high anyway.

The third point, which is understandable that Oliver doesn't mention, is that probability is hard. Intuitively, a test with a 99% success rate which produces a match implies there's a 1% chance that the match was false (because of the test limitations). But in reality, this simply isn't the case. : test limitations are not the only factor ! The test isn't meaningless, it's just nowhere near as good as you might think (honestly, I still have a hard time getting my head around this one) :
http://www.patheos.com/blogs/daylightatheism/2008/02/how-to-think-critically-vi/

So for all these reasons, I'm inclined to find this report credible. The chance that a test is correct is much more complicated than whether the test works. The test could function just fine and still get the wrong result, both because of the nature of statistical probability and because of not properly considering alternatives. You don't have to throw out the forensics, but you probably want to bring in a statistician too. And perhaps it would be worth considering (if this isn't done already, perhaps it is but the end section of the report suggests otherwise) giving independent forensic data to independent labs and not telling them the context of what they're dealing with. Don't tell them to identify the killer, just tell them to establish if two samples match with no other information at all. And give them also "decoy" samples so they have absolutely no preconceptions about whether a match is good or bad at all.

That's my two cents. But in keeping with the "I'm not a scientist, but..." fallacy, since I know nothing of forensics I may be talking out of my backside here. Perhaps this report is overly-selective and I'm misapplying my own experiences. The only correct way to finish the sentence, "I'm not an expert in..." is, "so therefore you shouldn't weight my opinion as seriously as an expert." Which in this case might not be a forensic scientist, but someone who's conducted large, long-term studies of the judicial system.
https://www.youtube.com/watch?v=ScmJvmzDcG0

2 comments:

  1. TL;DR: Well said. All hail the mighty error bar!

    I can't remember how many times I've seen some amateurish graph displayed in a presentation (frequently by computer scientists, and that tic in my eye is completely coincidental) without either error bars or an awareness that there should be some. Or data (say, "availability metrics") with ten decimal places of precision despite being based on input with two or three significant digits at best.

    I could go on, but you know all this already, and Oliver's rants are more amusing.

    ReplyDelete

Back from the grave ?

I'd thought that the controversy over NGC 1052-DF2 and DF4 was at least partly settled by now, but this paper would have you believe ot...