Weird Statistics: Simpson's Paradox and Disparate Impact Analyses

Statistics is a peculiar science, full of apparent contradictions and strange absurdities. One such oddity that presents itself frequently in disparate impact analyses is Simpson's Paradox. Simply stated:
Simpson's paradox is the case where it appears that two sets of data separately support a certain hypothesis, but, when considered together, they support the opposite hypothesis.
Assume that we are examining promotion data for WidgetCo to assess whether there is a disparate impact with respect to gender. Assume that WidgetCo has a total of 1,000 employees, half of whom are female. Further assume that 200 employees were promoted. Among the promoted employees, 80 were female.


# of employees
# proms
M
F
M
F
500
500
120
80


In order to assess whether there is disparate impact with respect to gender, we compare the actual number of female promotions with the number of female promotions we would expect under a gender-neutral process. 


Based on the gender composition of the workforce, females are 50% of the population (500 females out of 1,000 employees). We therefore would expect that 50% of the promotions would be females: 100 promotions (50% multipled by 200 promotions). We then compare the 80 actual female promotions with the 100 expected female promotions, and arrive at a female "promotion shortfall" of 20.





# of employees
# proms
Expected
F Proms
F Prom
Shortfall
M
F
M
F
500
500
120
80
100
20

Based on the above, it would appear that females did not receive their "fair share" of promotions. If the female promotion shortfall is "statistically significant", one may infer disparate impact with respect to gender.

What if you then learned that within WidgetCorp, there were two different job groups, and the gender composition of those two job groups was different?


Job Group
# of employees
# proms
M
F
M
F
A
350
150
105
45
B
150
350
15
35


We repeat the same comparison of actual promotions to expected promotions, but this time we look at the two different job groups separately. In Job Group A, females are 30% of the population, and they received 45 of the 150 promotions. In Job Group B, females are 70% of the population, and they received 35 of the 50 promotions.  Comparing actual promotions to expected promotions for each of the two job groups, we see the following:


Job Group
# of employees
# proms
Expected
F Proms
F Prom
Shortfall
M
F
M
F
A
350
150
105
45
45
0
B
150
350
15
35
35
0


Based on this result, we would conclude that females received the exact number of promotions we expected; we cannot infer disparate impact based on this result.


Now consider the case of WidgetCorp's main competitor, SprocketInc. Assume that SprocketInc also has a total of 1,000 employees, half of whom are female. Further assume that SprocketInc promoted 200 employees, 100 of whom were female.


# of employees
# proms
M
F
M
F
500
500
100
100


Because females are 50% of the workforce, we would expect that 50% of the 200 promotions, or 100 promotion events, would be female. In this case, the actual number of female promotions exactly equals the expected number of female promotions:


# of employees
# proms
Expected
F Proms
F Prom
Shortfall
M
F
M
F
500
500
100
100
100
0


Since there is a zero female promotion shortfall, there can't possibly be disparate impact, right? Wrong.


Assume that SprocketInc has the same two job groups as WidgetCorp, and that SprocketInc's workforce is divided amongst the two job groups as follows:






Job Group
# of employees
# proms
M
F
M
F
A
40
200
40
90
B
460
300
60
10



We calculate our expected number of female promotions for the two job groups as above, and compare the actual female promotions to expected female promotions:


Job Group
# of employees
# proms
Expected
F Proms
F Prom
Shortfall
M
F
M
F
A
40
200
40
90
108.3
18.3
B
460
300
60
10
27.7
17.7


When the job groups are examined individually, we see that there is a female promotion shortfall for both Job Group A and Job Group B. If the female promotion shortfalls are statistically significant, one may infer disparate impact with respect to gender.


Our conclusions for WidgetCorp and SprocketInc are as follows:
  • When WidgetCorp's workforce is examined as a whole, there is a female promotion shortfall;
  • When WidgetCorp's workforce is examined by job group, there is no female promotion shortfall for either job group;
  • When SprocketInc's workforce is examined as a whole, there is no female promotion shortfall;
  • When SprocketInc's workforce is examined by job group, there is a female promotion shortfall for both job groups.
How can this be? The answer is Simpson's Paradox. Studying the job groups individually leads to a different conclusion than studying the job groups together.


How often does Simpson's Paradox occur? If a scenario like the one laid out above is selected at random, the probability is approximately 1/60 that Simpson's Paradox will occur purely by chance (Marios Pavlides and Michael Perlman, "How Likely is Simpson's Paradox?", The American Statistician, August 2009, 63(3): 226-233).


Aside from being a statistical "curiosity", this example has a direct implication for disparate impact analysis. What is true for one segment of the population may not be true for other segments, or for the population as a whole. Examining the workforce at highest level of aggregation (i.e., the workforce as a whole) may create the appearance of a disparity that does not exist when the workforce is stratified. On the other hand, examining the workforce at the highest level of aggregation may mask a disparity existing in strata of the workforce.


The level of aggregation selected can have a major effect on the results generated and inferences drawn from the analysis. There is no "right" answer. The level of aggregation at which the analysis is performed should be driven by the organization itself, attempting to capture how decisions were made, and reflecting reality as closely as possible.

0 comments: