Statistics is a peculiar science, full of apparent contradictions and strange absurdities. One such oddity that presents itself frequently in disparate impact analyses is Simpson's Paradox. Simply stated:

Simpson's paradox is the case where it appears that two sets of data separately support a certain hypothesis, but, when considered together, they support the opposite hypothesis.

Assume that we are examining promotion data for WidgetCo to assess whether there is a disparate impact with respect to gender. Assume that WidgetCo has a total of 1,000 employees, half of whom are female. Further assume that 200 employees were promoted. Among the promoted employees, 80 were female.

In order to assess whether there is disparate impact with respect to gender, we compare the actual number of female promotions with the number of female promotions we would expect under a gender-neutral process.

Based on the gender composition of the workforce, females are 50% of the population (500 females out of 1,000 employees). We therefore would expect that 50% of the promotions would be females: 100 promotions (50% multipled by 200 promotions). We then compare the 80 actual female promotions with the 100 expected female promotions, and arrive at a female "promotion shortfall" of 20.

# of employees | # proms | ||

M | F | M | F |

500 | 500 | 120 | 80 |

In order to assess whether there is disparate impact with respect to gender, we compare the actual number of female promotions with the number of female promotions we would expect under a gender-neutral process.

Based on the gender composition of the workforce, females are 50% of the population (500 females out of 1,000 employees). We therefore would expect that 50% of the promotions would be females: 100 promotions (50% multipled by 200 promotions). We then compare the 80 actual female promotions with the 100 expected female promotions, and arrive at a female "promotion shortfall" of 20.

# of employees | # proms | Expected F Proms | F Prom Shortfall | ||

M | F | M | F | ||

500 | 500 | 120 | 80 | 100 | 20 |

Based on the above, it would appear that females did not receive their "fair share" of promotions. If the female promotion shortfall is "statistically significant", one may infer disparate impact with respect to gender.

What if you then learned that within WidgetCorp, there were two different job groups, and the gender composition of those two job groups was different?

We repeat the same comparison of actual promotions to expected promotions, but this time we look at the two different job groups separately. In Job Group A, females are 30% of the population, and they received 45 of the 150 promotions. In Job Group B, females are 70% of the population, and they received 35 of the 50 promotions. Comparing actual promotions to expected promotions for each of the two job groups, we see the following:

Based on this result, we would conclude that females received the exact number of promotions we expected; we cannot infer disparate impact based on this result.

Now consider the case of WidgetCorp's main competitor, SprocketInc. Assume that SprocketInc also has a total of 1,000 employees, half of whom are female. Further assume that SprocketInc promoted 200 employees, 100 of whom were female.

Because females are 50% of the workforce, we would expect that 50% of the 200 promotions, or 100 promotion events, would be female. In this case, the actual number of female promotions exactly equals the expected number of female promotions:

Since there is a zero female promotion shortfall, there can't

Assume that SprocketInc has the same two job groups as WidgetCorp, and that SprocketInc's workforce is divided amongst the two job groups as follows:

We calculate our expected number of female promotions for the two job groups as above, and compare the actual female promotions to expected female promotions:

When the job groups are examined individually, we see that there is a female promotion shortfall for

Job Group | # of employees | # proms | ||

M | F | M | F | |

A | 350 | 150 | 105 | 45 |

B | 150 | 350 | 15 | 35 |

We repeat the same comparison of actual promotions to expected promotions, but this time we look at the two different job groups separately. In Job Group A, females are 30% of the population, and they received 45 of the 150 promotions. In Job Group B, females are 70% of the population, and they received 35 of the 50 promotions. Comparing actual promotions to expected promotions for each of the two job groups, we see the following:

Job Group | # of employees | # proms | Expected F Proms | F Prom Shortfall | ||

M | F | M | F | |||

A | 350 | 150 | 105 | 45 | 45 | 0 |

B | 150 | 350 | 15 | 35 | 35 | 0 |

Based on this result, we would conclude that females received the exact number of promotions we expected; we cannot infer disparate impact based on this result.

Now consider the case of WidgetCorp's main competitor, SprocketInc. Assume that SprocketInc also has a total of 1,000 employees, half of whom are female. Further assume that SprocketInc promoted 200 employees, 100 of whom were female.

# of employees | # proms | ||

M | F | M | F |

500 | 500 | 100 | 100 |

Because females are 50% of the workforce, we would expect that 50% of the 200 promotions, or 100 promotion events, would be female. In this case, the actual number of female promotions exactly equals the expected number of female promotions:

# of employees | # proms | Expected F Proms | F Prom Shortfall | ||

M | F | M | F | ||

500 | 500 | 100 | 100 | 100 | 0 |

Since there is a zero female promotion shortfall, there can't

*possibly*be disparate impact, right? Wrong.

Assume that SprocketInc has the same two job groups as WidgetCorp, and that SprocketInc's workforce is divided amongst the two job groups as follows:

Job Group | # of employees | # proms | ||

M | F | M | F | |

A | 40 | 200 | 40 | 90 |

B | 460 | 300 | 60 | 10 |

We calculate our expected number of female promotions for the two job groups as above, and compare the actual female promotions to expected female promotions:

Job Group | # of employees | # proms | Expected F Proms | F Prom Shortfall | ||

M | F | M | F | |||

A | 40 | 200 | 40 | 90 | 108.3 | 18.3 |

B | 460 | 300 | 60 | 10 | 27.7 | 17.7 |

When the job groups are examined individually, we see that there is a female promotion shortfall for

*both*Job Group A and Job Group B. If the female promotion shortfalls are statistically significant, one may infer disparate impact with respect to gender.

Our conclusions for WidgetCorp and SprocketInc are as follows:

How often does Simpson's Paradox occur? If a scenario like the one laid out above is selected at random, the probability is approximately 1/60 that Simpson's Paradox will occur purely by chance (Marios Pavlides and Michael Perlman, "How Likely is Simpson's Paradox?",

Aside from being a statistical "curiosity", this example has a direct implication for disparate impact analysis. What is true for one segment of the population may not be true for other segments, or for the population as a whole. Examining the workforce at highest level of aggregation (i.e., the workforce as a whole) may

The level of aggregation selected can have a major effect on the results generated and inferences drawn from the analysis. There is no "right" answer. The level of aggregation at which the analysis is performed should be driven by the organization itself, attempting to capture how decisions were made, and reflecting reality as closely as possible.

- When WidgetCorp's workforce is examined as a whole, there is a female promotion shortfall;
- When WidgetCorp's workforce is examined by job group, there is no female promotion shortfall for either job group;
- When SprocketInc's workforce is examined as a whole, there is no female promotion shortfall;
- When SprocketInc's workforce is examined by job group, there is a female promotion shortfall for both job groups.

How often does Simpson's Paradox occur? If a scenario like the one laid out above is selected at random, the probability is approximately 1/60 that Simpson's Paradox will occur purely by chance (Marios Pavlides and Michael Perlman, "How Likely is Simpson's Paradox?",

*The American Statistician*, August 2009, 63(3): 226-233).Aside from being a statistical "curiosity", this example has a direct implication for disparate impact analysis. What is true for one segment of the population may not be true for other segments, or for the population as a whole. Examining the workforce at highest level of aggregation (i.e., the workforce as a whole) may

*create the appearance*of a disparity that does not exist when the workforce is stratified. On the other hand, examining the workforce at the highest level of aggregation may*mask*a disparity existing in strata of the workforce.The level of aggregation selected can have a major effect on the results generated and inferences drawn from the analysis. There is no "right" answer. The level of aggregation at which the analysis is performed should be driven by the organization itself, attempting to capture how decisions were made, and reflecting reality as closely as possible.

## 0 comments:

Post a Comment