Equality of
Outcome

Black defendants

White defendants

How would you set the thresholds?

Drag the handles on the left to move the thresholds.

Records show that on average 64% of defendents reoffend. You can use the same threshold for both groups, or different thresholds for each group.

% classified as high-risk

1%

% classified as high-risk

1%

Scroll to continue

In 2012, the Wisconsin Department of Corrections began using COMPAS - software that predicts a defendant's likelihood of reoffending - for making sentencing decisions [1].

Each defendant was assigned a score from 1 to 10, with 1 being the least likely to reoffend and 10 being the most likely.

Here's the overall score distribution. The x-axis is the score and the y-axis is the percentage of defendants who were assigned that score.

Most of the scores are in the low end of the spectrum.

However, there is a significant discrepancy between racial groups. The most noticeable is between black and white defendants.

The average scores are 4.3 for black defendants and 3.1 for white defendants. The score distribution for white defendants leans more toward the left compared to that for black defendants.

Now, to make a simple classification of someone's risk of reoffending, we use a threshold. If their score is above that threshold, then we label them as "high-risk".

1.

Equalized Rates

The second conception of fairness that we are considering - Equality of Outcome - would require that we equalize the rate at which we classify defendants as "high-risk" (the high-risk classification rate, or HCR) among all demographic groups [2]. For example, if we want to have an HCR of 64% over the entire population, which is the true population average, then we must ensure that all demographic groups have an HCR of around 64%. This means we need to set the threshold at 3.2 for black defendants and 2.0 for white defendants, given their different score distributions.

It is most likely the case that under Equality of Outcome, we would need different thresholds for different demographic groups. Try the fixed model below. It has been modified to always equalize the HCRs (numbers on the right-hand side). Move one of the thresholds up or down and see how the model automatically updates the other threshold in order to keep the HCRs equal among the two group.

Black defendants

% classified as high-risk

1%

Lock

White defendants

% classified as high-risk

1%

2.

Outcome Fairness

Equality of Outcome measures fairness in terms of the outcomes. It considers an algorithmic model to be fair if the model makes predictions at similar rates across all demographic groups.

"Rate" here refers to the prediction rate. For binary models, where the algorithm chooses between two options, the prediction rate is simply the rate at which the algorithm chooses one option instead of the other. In the defendant classification model above, the prediction rate is simply the number of times the model classifies a defendant as "high-risk", as opposed to "not high-risk". We can generalize this to multi-class, non-binary models, where the algorithm chooses between more than two options. Here, for the sake of simplicity, we will stick to the binary model.

When we think of fairness in terms of outcomes, one of the first criteria we think of is equality. Most think of equality as the duty to treat people the same regardless of their age, sex, race, or other arbitrary attributes that are not in their control. Given this basic understanding, we musk ask further: equality in terms of what [3][4]? What exactly is the quantity that we must equalize among demographic groups?

Equality of Outcome offers a rather straighforward answer to this question: in terms of the prediction rates. It does not require that everyone get the same outcomes. Each person must still put in the effort to reap the rewards. Rather, the equality requirement is aggregative: on average, no demographic group should be more likely to get positive predictions than others.

Such an aggregative criterion of outcome fairness is based on luck-egalitarianism [4]. This view is concerned with equality given the arbitrariness of certain attributes, like race and sex. No one has control over these attributes: they are just born with them. As such, no one should be more or less likely to succeed based on any of these attributes. In an ideal world, when considering predictive algorithms, we should see no significant difference in prediction rates among demographic groups. This drives the equalization of prediction rates as proposed by Equality of Outcome.


3.

Potential Objections

Some stakeholders may object to the use of Equality of Outcome in decision-making models. There are two main objections that warrant further discussion.

Model accuracy

Algorithm designers are concerned with the effect of rate equalization on model accuracy. Changing the acceptance thresholds on any given model will also change its accuracy. In essence, the thresholds determine who gets classified as what (higher thresholds mean fewer people will get classified as "high-risk"). Changing the thresholds will most likely change the number of misclassifications, resulting in different accuracy scores [5].

This is a valid concern, but not a fatal flaw that precludes the use of outcome-equal models. Since the thresholds are often continuous and movable, there may be an acceptable level that strikes a balance between performance and equity/equality among demographic groups. This is a decision that algorithm designers must make on a case-by-case basis, with careful consideration of the model's architecture, the data it uses, and the history that is reflected in that data.

Disparate treatment

Members of demographic groups that have historically been favored by decision-making algorithms, such as white defendants in the risk model above, may reject Equality of Outcome because they think that it discriminates against them. From their perspective, Equality of Outcome makes it more likely that they will be classified as "high-risk".

In many ways, this is similar to the case of affirmative action in university admissions. Certain racial groups, like white and Asian students, may object to the placement of race-based quotas during the admissions process. These quotas, similar to the "high-risk" thresholds in our case with the defendants, restrict the number of people being admitted based on their racial identity.

Both of these cases touch on Equality of Outcome's potential weakness: its procedurally unfair requirements. Outcome-equal models must necessary be aware of each individual's demographic identity and make different sets of predictions based on such information. This disparate treatment makes these models procedurally unfair. While the intent is not to discriminate, but rather to promote some form of outcome fairness, the procedure itself is inherently discriminatory.

We may respond to this type of fairness complaint in two ways. First, we can accept Equality of Outcome's inherently discriminatory nature but still reason that its potential benefits outweigh the harms. This approach emphasizes outcomes over procedural nature. For example, affirmative action creates diverse campuses where students with various backgrounds, beliefs and values can meet and exchange ideas. This leads to a rich and challenging learning environment, which is immensely beneficial for students.

Second, we can argue that the disparate treatment is justified because it serves to correct past injustices, under the framework of corrective justice. Most of us generally accept that disciplinarian measures and imprisonment are justified against criminals as long as the measures are proportional to the crime. There are many ways to justify such measures, one of which is to argue that criminals forgo certain rights and privileges (e.g. the freedom of movement) the moment they commit a crime.

We can extend corrective justice to the case of defendant risk assessment. We may reason that black defendants have been more likely to reoffend due to an unjust legal and social system. Black ex-prisoners face prejudice from parole officers and overwhelming stigma within society, to a worse degree than white ex-prisoners. As such, they are more likely to break parole or reoffend. From this, we may reason that outcome-equal decision-making models are justified in extending disparate treatment to different racial groups, because such treatment helps correct the mentioned injustices.

It is important to note that corrective justice is not an "eye for an eye" approach. Rather, the aim is to rebalance the scale, to help amend the harm and inequality that resulted from past injustices. It emphasizes healing and reconciliation, not retribution.


4.

In a Nutshell

Equality of Outcome is an equality-based conception of fairness, but there are concerns about its rather stringent requirements.

Outcome-equal models prioritize outcome fairness over procedural fairness. The quantity of equalization - the prediction rate - is a simple yet effective one. However, there are concerns about its effects on the models' accuracy, and whether it is the right quantity to equalize in the first place.

In the next chapter, we will explore Equality of Opportunity, another conception of fairness that works with a different, potentially more appealing quantity of equalization.


References

Data


Read Chapter 3