Lecture 9

Multivariate Analysis and Adjustment

Multivariate Analysis

Today, we’re going to talk about how to analyze data when we have multiple covariates.
An important application is analyzing disparities to think about discrimination.
Above, we see the kind of data we’ll be working with.
Each dot represents a California resident, age 25-35, who responded to the CPS.
- $Y_i$, their income, is on the $y$-axis.
- $X_i$, their education level, is on the $x$-axis.
- $W_i$, their sex, is indicated by color. And a slight shift along the $x$-axis.
  - $\textcolor[RGB]{248,118,109}{W_i=0}$ indicates male, $\textcolor[RGB]{0,191,196}{W_i=1}$ indicates female.
As usual, we’ve drawn in each group’s income mean and standard deviation too.
We’ll write $\textcolor[RGB]{248,118,109}{\hat\mu(0,x)}$ and $\textcolor[RGB]{0,191,196}{\hat\mu(1,x)}$ for the mean income of male and female respondents with $x$ years of education.

Plan for Today

Imaginary Population of California Residents

Real Sample of California Residents from CPS

From a calculational and inferential perspective, this is nothing new.
- We’re still talking about ways of summarizing a sample that’s broken down into columns.
- That our columns are determined by two variables rather than one makes no difference.
The part that is new will be interpretation and communication.
- We’ll see that there are a variety of ways to look at differences in a multivariate setting.
- If we’re not careful, we can find ourselves talking about different summaries as if they were the same.
That’ll be our focus today.
- Thinking and communicating precisely about what we’re talking about.
- Understanding how we get different answers to ‘the same question’ if we’re imprecise.
- Thinking about the implications of what we choose to report.

Our Data and Approach

Imaginary Population of California Residents

Real Sample of California Residents from CPS

Our main example will be income disparities that we can see in the CPS.
But we’ll start by looking at something else: disparities in grad-school admissions.
We’re not going to talk about statistical inference today.
- But we’re not going to be working with a made-up population either.
- Instead, we’ll just work with our sample a little recklessly.
We’ll talk as if our summaries of the sample are reasonable descriptions of the population.
- Even though we know that they’re just point estimates.
- And that point estimates can be pretty far off-target.

Berkeley Admissions and Simpson’s Paradox

A Famously Paradoxical Example

In the 1970s, Berkeley graduate programs were admitting a much higher proportion of men than women.
- $\textcolor[RGB]{248,118,109}{3738/8442\approx 44}$% of men who applied were admitted.
- $\textcolor[RGB]{0,191,196}{1494/4321\approx 35}$% of women who applied were admitted.
- People were concerned that this difference was evidence of discrimination against women.¹ \[ \color{gray} \text{admission rate gap} = \textcolor[RGB]{0,191,196}{\frac{1}{N_1}\sum_{i: W_i=1} Y_i} - \textcolor[RGB]{248,118,109}{\frac{1}{N_0}\sum_{i: W_i=0} Y_i} \approx -0.1 \]
But, on closer examination, most departments were admitting women at a higher rate than men.
What’s going on? Let’s see if we can figure it out by working with some stylized data.

No Solution
Warm-Up Solution
Paradox Solution

What We’re Doing
Warm-up Exercise
Paradox Exercise

Above is a stylized version of the Berkeley data in which we have two departments.
- At the department on the left ($x=0$), getting in is easy.
- At the department on the right ($x=1$), getting in is hard.
We’ll draw dots representing applicants.
- The $y$-coordinate indicates whether they were admitted ($y=1$) or not ($y=0$).
- The color of the dot indicates their gender. Men are red. Women are green.

Color three dots red and three green so that …
1. Within each column, the mean of the green dots is equal to the mean of the red dots.
2. The mean of all the green dots is lower than the mean of the red ones.
Annotate your plot by …
1. Drawing in the means of the red and green dot within each column.
- Use dots with arms ⍿ ⍿ because that’s what we usually do in R.
- Don’t worry about the length of the arms. We’re not thinking about spread here.
1. Drawing in the means of all the red dots and all the green dots.
- Use horizontal lines.

Add three more dots — two red and one green — so that …
1. Within each column, the mean of the green dots is now higher the mean of the red ones.
2. The mean of all the red dots is still lower than the mean of green ones.
Fix your annotations to account for these new dots.

The Story our Exercises Tell

Warm-up Data

Paradox Data

Warm-up Story
Paradox Story

The story, in this simple example, is that …
- more men apply to the department everybody gets in to.
- more women apply to the department nobody gets in to.
So men are admitted at a higher rate to the university as a whole.
Even though men and women are admitted at the same rate within each department.
Lesson. Even when there is no disparity within groups, there can be a disparity in aggregate.

The story in this follow-up, is that …
- some woman miraculously get admitted to the department nobody gets in to.
- some man, equally miraculously, is rejected by the department everybody gets in to.
- and one more man, less miraculously, gets is admitted to that department too.
The university as a whole still admits men at a higher rate than women.
But on a departmental level, that’s reversed. Every department admits women at a higher rate than men.
Lesson. Within-group and aggregate disparities can be in opposite directions.

The Real Berkeley Data

Our Paradox Data

The real data is a bit more complicated than our simple caricature.
- Berkeley has more than two departments.
- And they tend to have more than 4 or 5 applicants.
To relate our caricature to the real Berkeley data, lets do a quick drawing exercise.
Exercise. Draw our two little departments into the plot of the real data.
- How do you calculate the x-coordinates?
- What about the y-coordinates?
Discuss. Is the observed difference evidence of discrimination?

Breaking Down Disparities in Admission Rates

Analyzing our caricature of the Berkeley data

Summaries of Admission Rates

An aggregated-over-departments version of the ‘paradox data’.

The disaggregated ‘paradox data’.

At the Graduate School

The rate for women is 10% lower than for men. \[ \color{gray} \begin{aligned} \textcolor[RGB]{0,191,196}{\hat\mu(1)} \ - \ \textcolor[RGB]{248,118,109}{\hat\mu(0)} &\approx \textcolor[RGB]{0,191,196}{0.5} \ - \ \textcolor[RGB]{248,118,109}{0.6} \\ &\approx -0.1 \end{aligned} \]

In Department 0

The rate for women is 25% higher than for men. \[ \color{gray} \begin{aligned} \textcolor[RGB]{0,191,196}{\hat\mu(1,0)} \ - \ \textcolor[RGB]{248,118,109}{\hat\mu(0, 0)} &\approx \textcolor[RGB]{0,191,196}{1} \ - \ \textcolor[RGB]{248,118,109}{0.75} \\ &\approx 0.25 \end{aligned} \]

This is just a review of the paradox we talked about before.
- When we compare within departments, it looks like women are better-off than men.
- But when we look at the university as a whole, this discrepency is reversed.
These comparisons are relatively simple in that we’re just reading two numbers off a plot.
On the left:
- We compared means in our plot of the aggregated data.
- This is nice because the result is simple. It’s one number.
On the right:
- We compared columns in the disaggregated data.
- This is nice because it’s a ‘like-with-like’ comparison.

What If We Want Simplicity and Comparability?

The warm-up version

The paradox version

Exercise. Using only one number, summarize the typical difference in acceptance rates …
… for women and men applying to the same department.
Step 1. Calculate a number for both samples above: the ‘warm up version’ and the ‘paradox version’.
Step 2. Write out a formula in abstract terms using mathematical notation.
- Write ${\color[RGB]{64,64,64}W_i,X_i,Y_i}$ for the values of gender, department, and admission status in our sample.
- Write $\textcolor[RGB]{248,118,109}{\hat\mu(0, x)}$ and $\textcolor[RGB]{0,191,196}{\hat\mu(1, x)}$ for within-group means.
- Check that when you plug in the numbers for your two samples, you get your answers from Step 1.
Step 3. Generalize your formula so it works when you have more than two departments (i.e. non-binary $X$).
- If your answer to Step 2 already worked for non-binary $X$, you have nothing to do here. You’re done.

\[ \color{gray} \begin{aligned} \text{summary} &=\frac{1}{\# \text{departments}} \sum_{x \in \text{departments}}\qty{ \textcolor[RGB]{0,191,196}{\hat\mu(\overset{\textcolor[RGB]{192,192,192}{w}}{1},x)} - \textcolor[RGB]{248,118,109}{\hat \mu(\overset{\textcolor[RGB]{192,192,192}{w}}{0}, x)} } \\ &=\frac{1}{2}\qty[ {\{\textcolor[RGB]{0,191,196}{\hat\mu(\overset{\textcolor[RGB]{192,192,192}{w}}{1}, \overset{\textcolor[RGB]{192,192,192}{x}}{0})} - \textcolor[RGB]{248,118,109}{\hat\mu(\overset{\textcolor[RGB]{192,192,192}{w}}{0}, \overset{\textcolor[RGB]{192,192,192}{x}}{0})}\} + \{\textcolor[RGB]{0,191,196}{\hat\mu(\overset{\textcolor[RGB]{192,192,192}{w}}{1}, \overset{\textcolor[RGB]{192,192,192}{x}}{1})} - \textcolor[RGB]{248,118,109}{\hat\mu(\overset{\textcolor[RGB]{192,192,192}{w}}{0}, \overset{\textcolor[RGB]{192,192,192}{x}}{1})}\}} ] \\ &= \frac{1}{2}\qty[ \{ \textcolor[RGB]{0,191,196}{1} - \textcolor[RGB]{248,118,109}{1} \} + \{ \textcolor[RGB]{0,191,196}{0} - \textcolor[RGB]{248,118,109}{0} \} ] \\ &= \frac{1}{2}\qty[ \{ 0 \} + \{ 0 \} ] \\ &= 0 \end{aligned} \]

\[ \color{gray} \begin{aligned} \text{summary} &=\frac{1}{\# \text{departments}} \sum_{x \in \text{departments}}\qty{ \textcolor[RGB]{0,191,196}{\hat\mu(\overset{\textcolor[RGB]{192,192,192}{w}}{1},x)} - \textcolor[RGB]{248,118,109}{\hat \mu(\overset{\textcolor[RGB]{192,192,192}{w}}{0}, x)} } \\ &=\frac{1}{2}\qty[ {\{\textcolor[RGB]{0,191,196}{\hat\mu(\overset{\textcolor[RGB]{192,192,192}{w}}{1}, \overset{\textcolor[RGB]{192,192,192}{x}}{0})} - \textcolor[RGB]{248,118,109}{\hat\mu(\overset{\textcolor[RGB]{192,192,192}{w}}{0}, \overset{\textcolor[RGB]{192,192,192}{x}}{0})}\} + \{\textcolor[RGB]{0,191,196}{\hat\mu(\overset{\textcolor[RGB]{192,192,192}{w}}{1}, \overset{\textcolor[RGB]{192,192,192}{x}}{1})} - \textcolor[RGB]{248,118,109}{\hat\mu(\overset{\textcolor[RGB]{192,192,192}{w}}{0}, \overset{\textcolor[RGB]{192,192,192}{x}}{1})}\}} ] \\ &= \frac{1}{2}\qty[ \qty{\textcolor[RGB]{0,191,196}{\frac{1}{1}} - \textcolor[RGB]{248,118,109}{\frac{3}{4}}} \ + \ \qty{\textcolor[RGB]{0,191,196}{\frac{1}{3}}-\textcolor[RGB]{248,118,109}{\frac{0}{1}} } ] \\ &\approx \frac{1}{2}\qty[ \qty{0.25} + \qty{0.33} ] \\ &\approx 0.29 \end{aligned} \]

This is telling us the direction of the disparity within departments.
- In the warm-up version, it’s zero because there is no disparity in acceptance rates within departments.
- In the paradox version, it’s positive because bot departments accept women at a higher rate than men.
But is it a good summary of the experience of the women applying to them? Probably not.
- Only 1 applies to Department 0. Several times more—3 in the paradox version—apply to Department 1.
- And we’re giving the same weight to the rate difference in each departments.
We’ve counted the one woman who applied to Department 0 three times as much as the ones who applied to Dept. 1.

Solution 2. An Average over Applicants

Warm-up Version
Paradox Version

\[ \color{gray} \begin{aligned} \text{summary} &= \textcolor[RGB]{0,191,196}{\frac{1}{N_1}\sum_{i: W_i=1}} \qty{ \textcolor[RGB]{0,191,196}{\hat \mu(1, X_i) - \textcolor[RGB]{248,118,109}{\hat \mu(0,X_i)} } } \qfor N_w = \sum_{i: W_i=w} 1 \\ &= \frac{1}{3} \qty[ \underset{\textcolor[RGB]{192,192,192}{i=1}}{ \{ \textcolor[RGB]{0,191,196}{\hat \mu(\overset{\textcolor[RGB]{192,192,192}{w}}{1},\overset{\textcolor[RGB]{192,192,192}{x}}{1})} - \textcolor[RGB]{248,118,109}{\hat \mu(\overset{\textcolor[RGB]{192,192,192}{w}}{0}, \overset{\textcolor[RGB]{192,192,192}{x}}{1})} \} } + \underset{\textcolor[RGB]{192,192,192}{i=2}}{ \{ \textcolor[RGB]{0,191,196}{\hat \mu(\overset{\textcolor[RGB]{192,192,192}{w}}{1}, \overset{\textcolor[RGB]{192,192,192}{x}}{1})} - \textcolor[RGB]{248,118,109}{\hat \mu(\overset{\textcolor[RGB]{192,192,192}{w}}{0}, \overset{\textcolor[RGB]{192,192,192}{x}}{1})} \} } + \underset{\textcolor[RGB]{192,192,192}{i=4}}{ \{ \textcolor[RGB]{0,191,196}{\hat \mu(\overset{\textcolor[RGB]{192,192,192}{w}}{1}, \overset{\textcolor[RGB]{192,192,192}{x}}{0})} - \textcolor[RGB]{248,118,109}{\hat \mu(\overset{\textcolor[RGB]{192,192,192}{w}}{0}, \overset{\textcolor[RGB]{192,192,192}{x}}{0})} \} } ] \\ &= \frac{1}{3} \qty[\underset{\textcolor[RGB]{192,192,192}{i=1}}{ \{ \textcolor[RGB]{0,191,196}{0} - \textcolor[RGB]{248,118,109}{0} \} } + \underset{\textcolor[RGB]{192,192,192}{i=2}}{ \{ \textcolor[RGB]{0,191,196}{0} - \textcolor[RGB]{248,118,109}{0} \} } + \underset{\textcolor[RGB]{192,192,192}{i=4}}{ \{ \textcolor[RGB]{0,191,196}{1} - \textcolor[RGB]{248,118,109}{1} \} } ] \\ &= \frac{2}{3}\qty{\textcolor[RGB]{0,191,196}{0} - \textcolor[RGB]{248,118,109}{0}} + \frac{1}{3}\qty{\textcolor[RGB]{0,191,196}{1} - \textcolor[RGB]{248,118,109}{1}} \\ &= 0 \end{aligned} \]

\[ \color{gray} \begin{aligned} \text{summary} &= \textcolor[RGB]{0,191,196}{\frac{1}{N_1}\sum_{i: W_i=1}} \qty{ \textcolor[RGB]{0,191,196}{\hat \mu(1, X_i) - \textcolor[RGB]{248,118,109}{\hat \mu(0,X_i)} } } \qfor N_w = \sum_{i: W_i=w} 1 \\ &= \frac{1}{4} \qty[ \underset{\textcolor[RGB]{192,192,192}{i=1}}{ \{ \textcolor[RGB]{0,191,196}{\hat \mu(\overset{\textcolor[RGB]{192,192,192}{w}}{1},\overset{\textcolor[RGB]{192,192,192}{x}}{1})} - \textcolor[RGB]{248,118,109}{\hat \mu(\overset{\textcolor[RGB]{192,192,192}{w}}{0}, \overset{\textcolor[RGB]{192,192,192}{x}}{1})} \} } + \underset{\textcolor[RGB]{192,192,192}{i=2}}{ \{ \textcolor[RGB]{0,191,196}{\hat \mu(\overset{\textcolor[RGB]{192,192,192}{w}}{1}, \overset{\textcolor[RGB]{192,192,192}{x}}{1})} - \textcolor[RGB]{248,118,109}{\hat \mu(\overset{\textcolor[RGB]{192,192,192}{w}}{0}, \overset{\textcolor[RGB]{192,192,192}{x}}{1})} \} } + \underset{\textcolor[RGB]{192,192,192}{i=4}}{ \{ \textcolor[RGB]{0,191,196}{\hat \mu(\overset{\textcolor[RGB]{192,192,192}{w}}{1}, \overset{\textcolor[RGB]{192,192,192}{x}}{0})} - \textcolor[RGB]{248,118,109}{\hat \mu(\overset{\textcolor[RGB]{192,192,192}{w}}{0}, \overset{\textcolor[RGB]{192,192,192}{x}}{0})} \} } + \underset{\textcolor[RGB]{192,192,192}{i=7}}{ \{ \textcolor[RGB]{0,191,196}{\hat \mu(\overset{\textcolor[RGB]{192,192,192}{w}}{1}, \overset{\textcolor[RGB]{192,192,192}{x}}{1})} - \textcolor[RGB]{248,118,109}{\hat \mu(\overset{\textcolor[RGB]{192,192,192}{w}}{0}, \overset{\textcolor[RGB]{192,192,192}{x}}{1})} \} } ] &= \frac{3}{4}\qty{ \textcolor[RGB]{0,191,196}{\frac{1}{3}} - \textcolor[RGB]{248,118,109}{\frac{0}{1}}} + \frac{1}{4}\qty{ \textcolor[RGB]{0,191,196}{\frac{1}{1}} - \textcolor[RGB]{248,118,109}{\frac{3}{4}}} \\ &= \frac{3}{4}\qty{ \textcolor[RGB]{0,191,196}{\frac{1}{3}} - \textcolor[RGB]{248,118,109}{\frac{0}{1}}} + \frac{1}{4}\qty{ \textcolor[RGB]{0,191,196}{\frac{1}{1}} - \textcolor[RGB]{248,118,109}{\frac{3}{4}}} \\ & \approx \frac{3}{4}\qty{0.33} + \frac{1}{4}\qty{0.25} \approx 0.31 \end{aligned} \]

This is a comparison that focuses on what’s typical for women who apply to the Graduate School.
- Each female applicant gets equal weight in ‘deciding’ what is typical.
- That’s why ${\color[RGB]{64,64,64}X_i=1}$ occurs in 3 of 4 terms.
The number we get is not radically different from the Solution 1 version.
It’s 0.31 vs. 0.29
But it can be. We’ll see a case in which that happens soon.

Reinterpreting Solution 2

We can think of this as the average in even more woman-centric terms. It’s the average difference between …
- each woman applicant’s acceptance status, $\textcolor[RGB]{0,191,196}{Y_i}$
- the mean acceptance status (i.e. rate) among men who applied to the same department, $\textcolor[RGB]{248,118,109}{\hat \mu(0, X_i)}$.

\[ \color{gray} \begin{aligned} \text{summary} &= \textcolor[RGB]{0,191,196}{\frac{1}{N_1}\sum_{i: W_i=1}} \qty{ \textcolor[RGB]{0,191,196}{\hat \mu(1,X_i)} - \textcolor[RGB]{248,118,109}{\hat \mu(0, X_i)} } = \textcolor[RGB]{0,191,196}{\frac{1}{N_1}\sum_{i: W_i=1}} \qty{ \textcolor[RGB]{0,191,196}{Y_i} - \textcolor[RGB]{248,118,109}{\hat \mu(0, X_i)} } \end{aligned} \]

These two expressions will always be equal, no matter what the actual values of $\textcolor[RGB]{128,128,128}{W_i,X_i,Y_i}$ are.
We can think of the first one as a version of the second where we sum in a certain order.

\[ \color{gray} \begin{aligned} \sum_{i: W_i=w} Y_i &\overset{\texttip{\small{\unicode{x2753}}}{We sum first over dots in a column, then over columns.}}{=} \class{fragment}{\sum_{x} \sum_{\substack{i: W_i=w, \ X_i=x}} Y_i} \\ &\overset{\texttip{\small{\unicode{x2753}}}{The column sum is the column's mean times its number of dots.}}{=} \class{fragment}{\sum_{x} \hat \mu(w, x) \times N_{w,x} \text{ for } N_{w,x} = \sum_{\substack{i: W_i=w, \ X_i=x}} 1} \\ &\overset{\texttip{\small{\unicode{x2753}}}{Or equivalently, its mean summed over the dots.}}{=} \class{fragment}{\sum_{x} \sum_{\substack{i: W_i=w, \ X_i=x}} \hat \mu(w,x)} \\ &\overset{\mathtip{\small{\unicode{x2753}}}{\hat\mu(w,x)=\hat\mu(w,X_i) \text{ within a column}.}}{=} \class{fragment}{\sum_{x} \sum_{\substack{i: W_i=w, \ X_i=x}} \hat \mu(w, X_i)} \\ &\overset{\texttip{\small{\unicode{x2753}}}{We 'unorder' that sum to get back to the form we started with.}}{=} \class{fragment}{\sum_{i: W_i=w} \hat \mu(w, X_i)} \end{aligned} \]

Visualizing Averages using Histograms

To make sense of our summary, it’s helpful to revisit a counting trick we use now and then.
- When we want to average over the people in a sample, we can use a histogram to help us count.
- This is a small sample, so it’s not necessary here. Think of it as a warm-up for the next example.
The average of function of $x$ over the department-choices of women who applied to the graduate school …
- is a weighted average of the function’s values at each department
- where a department’s weight is its frequency among these applicants. \[ \color{gray} \begin{aligned} \textcolor[RGB]{0,191,196}{\frac{1}{N_1}\sum_{i: W_i=1}} f(\textcolor[RGB]{0,191,196}{X_i}) = \sum_{x} f(x) \textcolor[RGB]{0,191,196}{P_{x\mid 1}} \qfor \underset{\color[RGB]{64,64,64}\text{height of the green bar at $x$}}{\textcolor[RGB]{0,191,196}{P_{x\mid 1} = \frac{N_{1,x}}{N_1}}} \end{aligned} \]
Here the function of $x$ we’re looking at is $\textcolor[RGB]{128,128,128}{f(x) = \textcolor[RGB]{0,191,196}{\hat\mu(1,x)} - \textcolor[RGB]{248,118,109}{\hat \mu(0,x)}}$.
- That’s the difference in accepance rate for women and men in department $x$.
- In visual terms, it’s the difference in the height of the green and red lines.
And we’re averaging that difference with the weights shown by the green bars.

Proof

To prove it, we can trace the argument from our last slide backward, stopping in the middle.

\[ \color[RGB]{64,64,64} \begin{aligned} \sum_{i: W_i=w} f(X_i) &\overset{\texttip{\small{\unicode{x2753}}}{We sum first over dots in a column, then over columns.}}{=} \sum_{x} \sum_{\substack{i: W_i=w, \ X_i=x}} f(X_i) \\ &\overset{\mathtip{\small{\unicode{x2753}}}{f(x)=f(X_i) \text{ within a column}.}}{=} \sum_{x} \sum_{\substack{i: W_i=w, \ X_i=x}} f(x) \\ &\overset{\texttip{\small{\unicode{x2753}}}{The column sum is the column's mean times its number of dots.}}{=} \sum_{x}f(x) \times N_{w,x} \qfor N_{w,x} = \sum_{\substack{i: W_i=w, \ X_i=x}} 1 \end{aligned} \] Looking at this sum for $w=1$ and then dividing by $N_1$, we get the weighted-average identity above.

Variations

If we prefer, we could summarize the within-department disparities $\textcolor[RGB]{0,191,196}{\hat\mu(1,x)} - \textcolor[RGB]{248,118,109}{\hat\mu(0,x)}$ in other ways.
We could focus on the disparities within departments men apply to.
- To do this, we could weight our average using the red bars.
We could focus on the disparities within departments people apply to.
- To do this, we could weight using the height of the purple bars.
- This would emphasize disparities in popular departments over those in unpopular ones.

\[ \color{gray} \begin{aligned} \hat\Delta_a &\overset{\texttip{\small{\unicode{x2753}}}{All men who applied to the Graduate School. All red dots. Red bars.}}{=} \textcolor[RGB]{248,118,109}{\frac{1}{N_0}\sum_{i: W_i=0}} \qty{ \textcolor[RGB]{0,191,196}{\hat\mu(1,X_i)} - \textcolor[RGB]{248,118,109}{\hat \mu(0, X_i)} } \\ \hat\Delta_b &\overset{\texttip{\small{\unicode{x2753}}}{All women who applied to the Graduate School. All green dots. Green bars.}}{=} \textcolor[RGB]{0,191,196}{\frac{1}{N_1}\sum_{i: W_i=1}} \qty{ \textcolor[RGB]{0,191,196}{\hat\mu(1,X_i)} - \textcolor[RGB]{248,118,109}{\hat \mu(0, X_i)} } \\ \hat\Delta_c &\overset{\texttip{\small{\unicode{x2753}}}{All applicants to the Graduate School outright. All dots. Purple bars.}}{=} \frac{1}{n} \sum_{i=1}^n \qty{ \textcolor[RGB]{0,191,196}{\hat\mu(1,X_i)} - \textcolor[RGB]{248,118,109}{\hat \mu(0, X_i)} } \end{aligned} \]

Exercise
Terminology

Above are a few summaries of the difference in within-department acceptance rates between men and women.
- They are all averages of the same within-column differences.
- But they’re averages over different groups of dots (i.e. applicants).
Part 1. For each of these summaries, describe the group we’re averaging over. Do it three ways.
1. As you’d describe the people in them, e.g. ‘all women who applied to Department 1’.
2. As you’d describe the dots in the plot, e.g. ‘all green dots in the x=1 column’.
3. In ‘histogram form’ mathematical notation, i.e. as a weighted average over departments.
- Write $\textcolor[RGB]{0,191,196}{P_{x\mid 1}}$, $\textcolor[RGB]{248,118,109}{P_{x\mid 0}}$, and $\textcolor[RGB]{160,32,240}{P_x}$ for the heights of the green, red, and purple bars.
Part 2. Using the information in the plot, calculate each of them.

It’s common to use the same words to describe these three different summaries.
- ’The average difference in acceptance rate for men and women, adjusted for department.
- Or sometimes ‘…, controlling for department’.²
This results in a lot of confusion. There is more precise terminology out there and you should use it.
- We’ll discuss the precise terminology, and why we’re not using it today, when we talk about causality.
- Today, when we talk about the adjusted difference, that’ll mean the average over green dots.

Acceptance Rates at the Graduate School Level

When we look at the acceptance rates at the graduate school as a whole in ‘histogram form’,
we can see how it differs from our summary of within-department rates very clearly.

\[ \color{gray} \begin{aligned} \hat\Delta_{\text{raw}} &=\text{difference in womens' and mens' acceptance rates at the Graduate School} \\ &\overset{\texttip{\small{\unicode{x2753}}}{summing over people}}{=} \textcolor[RGB]{0,191,196}{\frac{1}{N_1}\sum_{i:W_i=1} \hat\mu(1,X_i)} - \textcolor[RGB]{248,118,109}{\frac{1}{N_0}\sum_{i:W_i=0} \hat\mu(0,X_i)} \\ &\overset{\texttip{\small{\unicode{x2753}}}{rewriting inhistogram form}}{=} \sum_{x} \textcolor[RGB]{0,191,196}{P_{x\mid 1} \ \hat\mu(1,x)} - \sum_{x} \textcolor[RGB]{248,118,109}{P_{x\mid 0} \ \hat\mu(0,x)} \\ &\overset{\texttip{\small{\unicode{x2753}}}{adding a fancy version of zero}}{=} \sum_{x} \textcolor[RGB]{0,191,196}{P_{x\mid 1} \ \hat\mu(1,x)} + \sum_{x} (\textcolor[RGB]{0,191,196}{P_{x\mid 1}} - \textcolor[RGB]{0,191,196}{P_{x|1}} - \textcolor[RGB]{248,118,109}{P_{x\mid 0}}) \ \textcolor[RGB]{248,118,109}{\hat\mu(0,x)} \\ &\overset{\texttip{\small{\unicode{x2753}}}{moving terms around}}{=} \underset{\text{within-department summary}}{\sum_{x} \textcolor[RGB]{0,191,196}{P_{x\mid 1}} \ \qty{\textcolor[RGB]{0,191,196}{\hat\mu(1,x)} - \textcolor[RGB]{248,118,109}{\hat\mu(0,x)}}} + \underset{\text{covariate shift term}}{\qty{\sum_x \textcolor[RGB]{0,191,196}{P_{x\mid 1}} \ \textcolor[RGB]{248,118,109}{\hat\mu(0,x)} - \sum_x \textcolor[RGB]{248,118,109}{P_{x\mid 0}} \ \textcolor[RGB]{248,118,109}{\hat\mu(0,x)}}} \end{aligned} \]

It differs from our average of within-department differences by a ‘covariate shift term’.

Covariate Shift

\[ \color{gray} \begin{aligned} \hat\Delta_{\text{raw}} &= \underset{\text{within-department summary}}{\sum_{x} \textcolor[RGB]{0,191,196}{P_{x\mid 1}} \ \qty{\textcolor[RGB]{0,191,196}{\hat\mu(1,x)} - \textcolor[RGB]{248,118,109}{\hat\mu(0,x)}}} + \underset{\text{covariate shift term}}{\qty{\sum_x \textcolor[RGB]{0,191,196}{P_{x\mid 1}} \ \textcolor[RGB]{248,118,109}{\hat\mu(0,x)} - \sum_x \textcolor[RGB]{248,118,109}{P_{x\mid 0}} \ \textcolor[RGB]{248,118,109}{\hat\mu(0,x)}}} \end{aligned} \]

What is Covariate Shift?
Visualizing Covariate Shift
The Multiplication Heuristic
Terminology

This covariate shift term that has nothing to do with the disparity within departments.
- It’s about the shift in the distribution of our covariate $x$, the applicant’s department choice …
- … from the group of men who applied to the Graduate School to the group of women who did.
In our paradox example, this term is negative.
- Our summary of within-department differences is positive.
- The graduate school-level difference is negative.³
- So the covariate shift term—what we add the first to get the second—must be negative.
In our warm-up example, the covariate shift term was the whole difference we saw on the university level.
- There were no within-department disparities, so the within-department term was zero.

The covariate shift term is the difference of two averages of the same function of $\textcolor[RGB]{128,128,128}{x}$.
- The function—the red line—is the acceptance rate for men within departments.
- The term is the difference of two averages.
  - The average over the distribution of departments men applied to. Red dots/bars.
  - The average over the distribution of departments women applied to. Green dots/bars.
Why is it negative?
- The function we’re averaging decreases from left to right.
- The distribution we’re averaging over shifts from left to right.
- Consequence. The first average is bigger than the second.

There’s a simple heuristic that’ll tell you if this term is positive or negative.
- It’s like multiplying two numbers: $\textcolor[RGB]{128,128,128}{+ \times + = +, + \times - = -, - \times + = -, - \times - = +}$.
- The two factors here are the function and the distribution shift.
To use this heuristic, we need to give these two factors signs.
- A function is ‘positive’ if it increases from left to right and ‘negative’ if it decreases from left to right.
- A distribution shift is ‘positive’ if probability mass shifts left to right and ‘negative’ if it shifts right to left. \[ \textcolor[RGB]{128,128,128}{ \overset{-}{\text{decreasing function}} \times \overset{+}{\text{rightward shift}} = \overset{-}{\text{negative difference}} } \]
This’ll be more useful but subtler when we have more than two levels of $\textcolor[RGB]{128,128,128}{X}$.

The word adjustment refers to the process of getting rid of the covariate shift term.
- Or trying to. Often people do it in a way that only half-works.
We call the difference without adjustment the raw difference.
- Adjustment is, after all, a bit like cooking. There are recipes.

Income Disparities

We’ll continue from here.

A Few Comparisons You Might Hear

income vs. sex (aggregated)

income vs. education and sex (disaggregated)

Differences
Ratios

The Raw Difference. Female respondents earn $12k less, on average, than male respondents. \[ \color{gray} \textcolor[RGB]{0,191,196}{\frac{1}{N_1}\sum_{i:W_i=1} Y_i} \ - \ \textcolor[RGB]{248,118,109}{\frac{1}{N_0}\sum_{i:W_i=0} Y_i} \approx -12k \]

The Adjusted Difference. And they earn 15k less than similarly-educated ones. \[ \color{gray} \textcolor[RGB]{0,191,196}{\frac{1}{N_1}\sum_{i: W_i=1}} \qty{ \textcolor[RGB]{0,191,196}{Y_i} \ - \ \textcolor[RGB]{248,118,109}{\hat \mu(0, X_i)} } \approx -15k \]

The Raw Ratio. Female respondents earn 78 cents for every dollar a male one does. \[ \color{gray} \begin{aligned} \textcolor[RGB]{0,191,196}{\hat\mu(1)} \ / \ \textcolor[RGB]{248,118,109}{\hat\mu(0)} &\approx \textcolor[RGB]{0,191,196}{43k} \ / \ \textcolor[RGB]{248,118,109}{55k} \\ &\approx {0.78} \\ \end{aligned} \]

The Adjusted Ratio. And 74 cents for every dollar a similarly-educated male one does. \[ \color{gray} \begin{aligned} \textcolor[RGB]{0,191,196}{\frac{1}{N_1}\sum_{i:W_i=1}Y_i } \ / \ \textcolor[RGB]{0,191,196}{\frac{1}{N_1}\sum_{i:W_i=1}}\textcolor[RGB]{248,118,109}{\hat\mu(0, X_i)} &\approx \textcolor[RGB]{0,191,196}{43k} \ / \ \textcolor[RGB]{248,118,109}{58k} \\ &\approx {0.74} \\ \end{aligned} \]

Review: What’s the Difference?

The Raw Difference

\[ \color{gray} \textcolor[RGB]{0,191,196}{\frac{1}{N_1}\sum_{i:W_i=1} Y_i} \ - \ \textcolor[RGB]{248,118,109}{\frac{1}{N_0}\sum_{i:W_i=0} Y_i} \approx -12k \]

is what we get when we …

Look at the height of each red dot.
Subtract the average height of all of the green dots.
Average these differences.

The Adjusted Difference

\[ \color{gray} \textcolor[RGB]{0,191,196}{\frac{1}{N_1}\sum_{i: W_i=1}} \qty{ \textcolor[RGB]{0,191,196}{Y_i} \ - \ \textcolor[RGB]{248,118,109}{\hat \mu(0, X_i)} } \approx -15k \]

is what we get when we …

Look at the height of each red dot.
Subtract the average height of the
green dots in the same column.
Average these differences.

Income Disparities and Covariate Shift

\[ \color{gray} \begin{aligned} \underset{\text{raw difference} \approx -11.8K}{\Delta_{\text{raw}}} &= \underset{\text{adjusted difference} \approx -15.1K}{\sum_{x} \textcolor[RGB]{0,191,196}{P_{x\mid 1}} \ \qty{\textcolor[RGB]{0,191,196}{\hat\mu(1,x)} - \textcolor[RGB]{248,118,109}{\hat\mu(0,x)}}} + \underset{\text{covariate shift term} \approx 3.3K}{\qty{\sum_x \textcolor[RGB]{0,191,196}{P_{x\mid 1}} \ \textcolor[RGB]{248,118,109}{\hat\mu(0,x)} - \sum_x \textcolor[RGB]{248,118,109}{P_{x\mid 0}} \ \textcolor[RGB]{248,118,109}{\hat\mu(0,x)}}} \end{aligned} \]

The Covariate Shift Term
Breaking Down the Covariate Shift Term
Other Visualizations of Covariate Shift

Here the covariate shift term is positive.
- Adding it to the adjusted difference gives you something bigger.
- The sum we get is still negative, but it’s closer to zero.
You can think of covariate shift as ‘hiding’ part of the disparity that’s there within education levels.

We do the same thing we did in our last example, but now we’ve got many more levels of the covariate.
The covariate shift term is the difference of two averages of the same function of x.
- This function $\textcolor[RGB]{248,118,109}{f(x)=\hat\mu(0,x)}$ increases from left to right. Income increases with education.
- The distribution shift is from left to right. Women tend to have more years of education.
The multiplication heuristic tells us the shift term is positive: $\textcolor[RGB]{128,128,128}{+1 \times +1 = +1}$.

These are exactly the same histograms we had overlaid on the previous plot. They’re exactly the same bars.
What’s changed is layout.
- Our groups are shown one above the other or one on top of the other rather than interleaved as before.
- This can make it easier to see the direction of the shift—that it’s from left to right.
The red histogram has more mass on the left side than the green one does

An Animated Visualization of Covariate Shift

Animated visualizations are great at showing shifts. Flip back and forward a slide here.
- You’ll see the shifts in the column means on the left.
- And the shift in the covariate distribution on the right.
The downside is that you can’t use them everywhere. e.g., on a board or in a book.
But you can try to see the animation in your head when you look at static ones.
- The one-above-the-other histogram plot on the last slide is pretty good for this.
- Scan up and down from one histogram to the other. You can sort of see the animation.

Covariate Shift attenuates the Disparity⁴

Reviewing What We Just Talked About
A Confusing Quirk of Language

Our plot shows that, at virtually all levels of education, male respondents out-earn female ones.
And the typical magnitude of this disparity is something like 15k.
- Sometimes it’s about 20k (e.g. at 11, 14, and 18 years of education)
- Sometimes it’s about 10k (e.g. at 16 years)
That’s summarized reasonably by the adjusted difference, -15k.
The raw difference, -12k, is bigger—it’s closer to zero.
The increase is due to two factors working together.
1. An Upward Shift. Female respondents tend to have more education than male ones.
2. An Upward Trend. Income tends be higher at higher levels of education.
The relationship between the raw and adjusted difference depends on both.
- Heuristically, the dependence is multiplicative.

People tend to report the magnitude and sign of a disparity separately.
- e.g., they’ll say female Californians earn 12k less than male ones.
- e.g., or 15k less than similarly-educated ones.
This means that a positive covariate shift term can make the disparity sound larger or smaller.
- if the adjusted difference is positive, a positive shift will make the magnitude bigger.
- if the adjusted difference is negative, a positive shift will make the magnitude smaller.
If you like, you can use a triple product heuristic to keep track of it all.

\[ \color{gray} \begin{aligned} &\text{impact of covariate shift on the magnitude of the disparity} \\ &= \qqtext{sign of the adjusted difference } \\ &\times \qqtext{direction of the trend $\textcolor[RGB]{248,118,109}{\hat\mu(0,x)}$} \\ &\times \qqtext{direction of the shift} \end{aligned} \]

Another Income Example

Covariate Shift increases this other Disparity

Here we’re making the same comparison, but for Black and white respondents in Georgia.
The within-group means are qualitatively similar (using the same dot colors) to the ones we’ve just looked at.
The trend is still increasing $\color[RGB]{64,64,64}{(+)}$. More education means more income.
- And Black respondents make less at most levels of education, esp. the common ones.
But the covariate shift is in the opposite direction. It’s now right to left $\color[RGB]{64,64,64}{(-)}$.
- Black respondents tend to have fewer years of education than white ones.
As a result, the raw difference is smaller than the adjusted one. $\textcolor[RGB]{128,128,128}{\text{+ trend} \times \text{- shift} \implies \text{- shift term}}$
- It decreases -10k adjusted → -15k raw. Signed disparity ↓, disparity magnitude ↑.
This is a bigger shift — and in the opposite direction — compared to what we saw in the previous example.
- That was -15k adjusted → -12k raw.

Be Careful: How You Average Matters⁵

\[ \color{gray} \begin{aligned} \hat\Delta_{\text{years}} &\overset{\texttip{\small{\unicode{x2753}}}{The average over the 9 levels of education observed in both groups. No dots/bars needed.}}{=} \frac{1}{9}\sum_{x \in \text{years}} \qty{ \textcolor[RGB]{0,191,196}{\hat\mu(1,x)} - \textcolor[RGB]{248,118,109}{\hat \mu(0, x)} } \approx -100.0 \quad \text{for } \text{years} =\{ 8, 9, 10, 11, 12, 13, 14, 16, 18 \} \\ \\ \hat\Delta_0 &\overset{\texttip{\small{\unicode{x2753}}}{The average over the distribution of education among male respondents. Red dots and red bars.}}{=} \textcolor[RGB]{248,118,109}{\frac{1}{N_0}\sum_{i: W_i=0}} \qty{ \textcolor[RGB]{0,191,196}{\hat\mu(1,X_i)} - \textcolor[RGB]{248,118,109}{\hat \mu(0, X_i)} } \approx -9.8K \\ \hat\Delta_1 &\overset{\texttip{\small{\unicode{x2753}}}{The average over the distribution of education among female respondents. Green dots and green bars.}}{=} \textcolor[RGB]{0,191,196}{\frac{1}{N_1}\sum_{i: W_i=1}} \qty{ \textcolor[RGB]{0,191,196}{\hat\mu(1,X_i)} - \textcolor[RGB]{248,118,109}{\hat \mu(0, X_i)} } \approx -10.1K \\ \hat\Delta_{\text{all}} &\overset{\texttip{\small{\unicode{x2753}}}{The average over the distribution of education among all respondents. All dots and purple bars.}}{=} \frac{1}{n} \sum_{i=1}^n \qty{ \textcolor[RGB]{0,191,196}{\hat\mu(1,X_i)} - \textcolor[RGB]{248,118,109}{\hat \mu(0, X_i)} } \approx -9.9K \end{aligned} \]

Reading Comparisons

Here are a few things you might want to think about when hear about a comparison between two groups.
1. What’s being reported.
- A raw difference or an adjusted one?
- What variable(s) are being adjusted for?
- What group is being averaged over?
1. How the things they could report might differ.
2. Why they’re reporting what they’re reporting.

What Would You Report?

California
Georgia

\[ \color{gray} \begin{aligned} \hat\Delta_{\text{raw}} &= \textcolor[RGB]{0,191,196}{\frac{1}{N_1}\sum_{i:W_i=1} Y_i} - \textcolor[RGB]{248,118,109}{\frac{1}{N_0}\sum_{i:W_i=0} Y_i} \approx -12k \\ \hat\Delta_{\text{adjusted}} &= \textcolor[RGB]{0,191,196}{\frac{1}{N_1}\sum_{i:W_i=1}} \qty{ \textcolor[RGB]{0,191,196}{Y_i} - \textcolor[RGB]{248,118,109}{\hat \mu(0, X_i)} } \approx -15k \end{aligned} \]

\[ \color{gray} \begin{aligned} \hat\Delta_{\text{raw}} &= \textcolor[RGB]{0,191,196}{\frac{1}{N_1}\sum_{i:W_i=1} Y_i} - \textcolor[RGB]{248,118,109}{\frac{1}{N_0}\sum_{i:W_i=0} Y_i} \approx -15k \\ \hat\Delta_{\text{adjusted}} &= \textcolor[RGB]{0,191,196}{\frac{1}{N_1}\sum_{i:W_i=1}} \qty{ \textcolor[RGB]{0,191,196}{Y_i} - \textcolor[RGB]{248,118,109}{\hat \mu(0, X_i)} } \approx -10k \end{aligned} \]

Exercise

Context. You’re writing an article about income disparities in the US.

Flip a coin.

Tails. It’s on the disparity between female and male residents of California.
Heads. It’s on the disparity between Black and white residents of Georgia.

You’re going to include one summary in your headline—either the raw or the adjusted difference. Decide which.
Write down, in a sentence or two, why you chose what you did.

Footnotes

We’ve shifted our focus from discrimination on the basis of sex to discrimination on the basis of gender. That would’ve been my preference for the discussion of income disparities too, but the CPS doesn’t currently ask about gender identity. You analyze the data you have, not the data you wish you had.
Don’t do this. Saying ‘controlling for …’ can be confusing because it has a different, but related, meaning when when we’re talking about designing an experiment.
Remember we’re looking at rates for women - rates for men.
Attenuates is a word that means reduces but tends to refer to magnitude rather than a signed number.
The notation here is a bit of a lie. Our sample includes no Black respondents with 20 years of education, so we’re actually excluding everyone with 20 years of education from our calculations. This happens. I’m burying it in a footnote to keep the exposition manageable, but it’s important to be clear when you report your estimates. Note that the average over Black respondents is correct as written. Why?

Lecture 9

Multivariate Analysis

Multivariate Analysis

Plan for Today

Our Data and Approach

Berkeley Admissions and Simpson’s Paradox

A Famously Paradoxical Example

The Story our Exercises Tell

The Real Berkeley Data

Breaking Down Disparities in Admission Rates

Summaries of Admission Rates

At the Graduate School

In Department 0

What If We Want Simplicity and Comparability?

Solution 1. An Average over Departments

Solution 2. An Average over Applicants

Reinterpreting Solution 2

Visualizing Averages using Histograms

Proof

Variations

Acceptance Rates at the Graduate School Level

Covariate Shift

Income Disparities

A Few Comparisons You Might Hear

Review: What’s the Difference?

The Raw Difference

The Adjusted Difference

Income Disparities and Covariate Shift

An Animated Visualization of Covariate Shift

Covariate Shift attenuates the Disparity4

Another Income Example

Covariate Shift increases this other Disparity

Be Careful: How You Average Matters5

Reading Comparisons

What Would You Report?

Exercise

Footnotes

Covariate Shift attenuates the Disparity⁴

Be Careful: How You Average Matters⁵