Sunday, March 20, 2011

Think Stats - a free statistics and probability book for programmers

Check out Think Stats - it's free as in speech (Creative Commons license), which alone is reason enough to point it out. The code examples are Python, which is a plus in my book, and the table of contents covers most any basic case I can imagine most programmers would have to deal with. Highlights:
  • Probability distributions
  • Monty Hall (a.k.a. getting started with Bayes)
  • Central Limit Theorem, and "Why normal?"
  • Hypothesis testing
  • Estimation
  • Correlation (e.g. fits and regression)
The whole thing is sprinkled with light math but nothing intimidating, quick runnable code snippets, and nice pictures. Plus as I said above, it's truly free and open, as this sort of material really ought to be.

Why does this matter? If you're a computer programmer in this day and age, you deal with data. But a lot of programmers just deal with data without understanding anything about analyzing it - that's somebody else's job. Or even worse, the programmer just assumes they can intuit what they need about analysis - there are more than a few statistical cases that are notoriously counterintuitive (see the Monty Hall example).

So boning up a bit on your statistics is important, even if you're not an "analyst." In fact, many people with that title are more about fitting data to models than models to data - that is, they know the theory they want to support, so they "massage" the data until it looks like that. This is absolutely entirely unscientific, and getting a better understanding of stats will help you see when folks try to do this.

Anyway, I hope this resource is helpful to folks, and thanks for reading!

No comments:

Post a Comment