The Statistics Police

The Statistics Police

I came across an interesting news bit in this month's issue of Wired: "Minister of Truth - The UK's data cop protects the public from lies, damn lies, and statistics". The piece, which appears on page 10, was written by Mathew Honan:

"Did you know that 62 percent of all cited statistics are bogus? OK, we made that up. But after a 2007 poll found that barely a third of all British citizens trust published stats, Parliament formed a math-police squad to investigate. The top cop in the UK Statistics Authority is Richard Alldritt, a expert in how governments fudge numbers."

In his current position, Mr. Alldritt monitors statistics from approximately 200 public agencies. Mr. Alldritt has been quoted as saying that "no set of statistics has has a completely clean bill of health."

It's unlikely that this problem is limited to the 200 UK public agencies being monitored. If the same level of scrutiny was applied to the US, I would bet that we would find a similar situation.

This raises an interesting issue for employment litigation. Assume that a firm is being sued for failing to hire female applicants for a given position. Further assume that an expert conducts a statistical analysis in which the gender characteristics of hired individuals is compared to the gender characteristics of individuals "available" for the position, and that Census data is used to construct the "available" population. Finally, assume that this availability analysis indicates a statistically significant shortfall of female hires for the position(s) in question.

This analysis, and the finding of a statistically significant disparity, may contribute to a finding of gender discrimination. But what if the Census data is wrong? What if the Census data contains errors?

Let me be clear - I am not advocating the abandonment of Census data (or other government data) for any analytical purpose based on the assumption that it may be incorrect. In many cases, Census data is the best available data for the analytical question at hand. I do think, however, that this is an issue that deserves some consideration. I don't know whether the formation of a US Statistics Police is the answer, or whether this would eliminate all data error with 100% accuracy. We, as consumers of data, should educate ourselves about the data we're using in our analyses and question anything that doesn't pass the smell test.