Soycode: Observing causality and avoiding bias

Two weeks ago I posted about the correlation-causation fallacy: today my focus is on how much we can hope to learn about causality through observation. For more academically steeped background reading I suggest this essay by Andrew Gelman - here I will attempt to portray similar thoughts but with fewer citations.

Controlled experiments are the "official" way to determine causality, but there are many interesting questions in this world that cannot be treated in a controlled environment. Statistical hypothesis testing is essentially an attempt to use observational data to create a "quasi-experiment", where we have cases and treatments and controls and, perhaps, causality. But a common claim is that, no matter what your computer says after it inverts those really big matrices, you need some underlying theory (e.g. reason, explanation) for what is going on before we can talk about it causally.

Unfortunately, in reality this leads to people fitting data to models rather than models to data. They develop their "theoretically informed" viewpoint, then go around looking for data to validate it. This is even worse outside of academia, where the viewpoint may not be theoretically informed but rather "business" informed.

But I agree that numbers without theory mean very little - even if you believe the causal relationship solely based on the numerical results, you still need some sort of viewpoint to have a meaningful interpretation (and then presumably suggestions for taking action based on it, depending on your situation). So, I see the problem not just as teasing out causality but also as avoiding the bias introduced by our theoretical musing while still adding something to the study beyond number crunching.

There are many statistical techniques that can be employed, but the key to causality is in the "higher level" design and data used. As Gelman puts it:

The most compelling causal studies have (i) a simple structure that you can see through to the data and the phenomenon under study, (ii) no obvious plausible source of major bias, (iii) serious efforts to detect plausible biases, efforts that have come to naught, and (iv) insensitivity to small and moderate biases (see, for example, Greenland, 2005).

In other words, your analysis (data and theory) should be no more complicated than absolutely necessary and you should take pains to be open minded and genuinely consider alternatives. Though not statistical, this relates to the argument in my previous entry about education - words like "obviously" and "clearly" have no place in serious writing. Obviously.

Both simplicity and open mindedness are rather difficult goals for analysis. Appearing complex can be key to "selling" an argument, be it a paper you are trying to publish or a product you want your company to make. People are naturally intimidated but excessive precision and other telltale signs of statistical amateurism - 45% doesn't sound anywhere near as accurate as 44.72%, even if your confidence interval dwarfs the rounding. The main key to overcoming this is a bit of actually learning statistics and a lot of being willing to call people on nonsense and not pandering to sell things yourself.

And open mindedness is difficult in any realm - after all, we spend all our time inside our own heads. I believe that taking philosophy/logic classes (or at least having real philosophical/logical discussions) and actively trying to play "devil's advocate" is a fantastic method to recondition our natural urge to delegitimize and dismiss opposing views. Even a perspective that you feel is horribly flawed is likely at least internally cohesive, if you're willing to tweak a few axioms.

So there you have it, two simple but difficult guidelines on the quest for causality. Of course, there are many more specific bugaboos to be concerned with. By my view, causal direction is the next largest issue (after those addressed in the two causality entries I've written already). I may at some point elaborate on it, but in a nutshell it's always good to remember that even the most airtight statistical study doesn't really tell you which way your arrow is going - and in reality, it's almost certainly pointing both ways, and even beyond to further factors. The best approach is to make an argument for the "strongest", but not only, causal relationship.

For now, whether you produce or consume statistics (and I assure you that at least the latter is true), keep a simple and open minded approach. Thanks for reading!

Sunday, November 28, 2010

Observing causality and avoiding bias

No comments:

Post a Comment

Search This Blog

Blog Archive