Multivariate Analysis and Adjustment
$$
Analyzing our caricature of the Berkeley data
The rate for women is 10% lower than for men. \[ \color{gray} \begin{aligned} \textcolor[RGB]{0,191,196}{\hat\mu(1)} \ - \ \textcolor[RGB]{248,118,109}{\hat\mu(0)} &\approx \textcolor[RGB]{0,191,196}{0.5} \ - \ \textcolor[RGB]{248,118,109}{0.6} \\ &\approx -0.1 \end{aligned} \]
The rate for women is 25% higher than for men. \[ \color{gray} \begin{aligned} \textcolor[RGB]{0,191,196}{\hat\mu(1,0)} \ - \ \textcolor[RGB]{248,118,109}{\hat\mu(0, 0)} &\approx \textcolor[RGB]{0,191,196}{1} \ - \ \textcolor[RGB]{248,118,109}{0.75} \\ &\approx 0.25 \end{aligned} \]
\[ \color{gray} \begin{aligned} \text{summary} &=\frac{1}{\# \text{departments}} \sum_{x \in \text{departments}}\qty{ \textcolor[RGB]{0,191,196}{\hat\mu(\overset{\textcolor[RGB]{192,192,192}{w}}{1},x)} - \textcolor[RGB]{248,118,109}{\hat \mu(\overset{\textcolor[RGB]{192,192,192}{w}}{0}, x)} } \\ &=\frac{1}{2}\qty[ {\{\textcolor[RGB]{0,191,196}{\hat\mu(\overset{\textcolor[RGB]{192,192,192}{w}}{1}, \overset{\textcolor[RGB]{192,192,192}{x}}{0})} - \textcolor[RGB]{248,118,109}{\hat\mu(\overset{\textcolor[RGB]{192,192,192}{w}}{0}, \overset{\textcolor[RGB]{192,192,192}{x}}{0})}\} + \{\textcolor[RGB]{0,191,196}{\hat\mu(\overset{\textcolor[RGB]{192,192,192}{w}}{1}, \overset{\textcolor[RGB]{192,192,192}{x}}{1})} - \textcolor[RGB]{248,118,109}{\hat\mu(\overset{\textcolor[RGB]{192,192,192}{w}}{0}, \overset{\textcolor[RGB]{192,192,192}{x}}{1})}\}} ] \\ &= \frac{1}{2}\qty[ \{ \textcolor[RGB]{0,191,196}{1} - \textcolor[RGB]{248,118,109}{1} \} + \{ \textcolor[RGB]{0,191,196}{0} - \textcolor[RGB]{248,118,109}{0} \} ] \\ &= \frac{1}{2}\qty[ \{ 0 \} + \{ 0 \} ] \\ &= 0 \end{aligned} \]
\[ \color{gray} \begin{aligned} \text{summary} &=\frac{1}{\# \text{departments}} \sum_{x \in \text{departments}}\qty{ \textcolor[RGB]{0,191,196}{\hat\mu(\overset{\textcolor[RGB]{192,192,192}{w}}{1},x)} - \textcolor[RGB]{248,118,109}{\hat \mu(\overset{\textcolor[RGB]{192,192,192}{w}}{0}, x)} } \\ &=\frac{1}{2}\qty[ {\{\textcolor[RGB]{0,191,196}{\hat\mu(\overset{\textcolor[RGB]{192,192,192}{w}}{1}, \overset{\textcolor[RGB]{192,192,192}{x}}{0})} - \textcolor[RGB]{248,118,109}{\hat\mu(\overset{\textcolor[RGB]{192,192,192}{w}}{0}, \overset{\textcolor[RGB]{192,192,192}{x}}{0})}\} + \{\textcolor[RGB]{0,191,196}{\hat\mu(\overset{\textcolor[RGB]{192,192,192}{w}}{1}, \overset{\textcolor[RGB]{192,192,192}{x}}{1})} - \textcolor[RGB]{248,118,109}{\hat\mu(\overset{\textcolor[RGB]{192,192,192}{w}}{0}, \overset{\textcolor[RGB]{192,192,192}{x}}{1})}\}} ] \\ &= \frac{1}{2}\qty[ \qty{\textcolor[RGB]{0,191,196}{\frac{1}{1}} - \textcolor[RGB]{248,118,109}{\frac{3}{4}}} \ + \ \qty{\textcolor[RGB]{0,191,196}{\frac{1}{3}}-\textcolor[RGB]{248,118,109}{\frac{0}{1}} } ] \\ &\approx \frac{1}{2}\qty[ \qty{0.25} + \qty{0.33} ] \\ &\approx 0.29 \end{aligned} \]
\[ \color{gray} \begin{aligned} \text{summary} &= \textcolor[RGB]{0,191,196}{\frac{1}{N_1}\sum_{i: W_i=1}} \qty{ \textcolor[RGB]{0,191,196}{\hat \mu(1, X_i) - \textcolor[RGB]{248,118,109}{\hat \mu(0,X_i)} } } \qfor N_w = \sum_{i: W_i=w} 1 \\ &= \frac{1}{3} \qty[ \underset{\textcolor[RGB]{192,192,192}{i=1}}{ \{ \textcolor[RGB]{0,191,196}{\hat \mu(\overset{\textcolor[RGB]{192,192,192}{w}}{1},\overset{\textcolor[RGB]{192,192,192}{x}}{1})} - \textcolor[RGB]{248,118,109}{\hat \mu(\overset{\textcolor[RGB]{192,192,192}{w}}{0}, \overset{\textcolor[RGB]{192,192,192}{x}}{1})} \} } + \underset{\textcolor[RGB]{192,192,192}{i=2}}{ \{ \textcolor[RGB]{0,191,196}{\hat \mu(\overset{\textcolor[RGB]{192,192,192}{w}}{1}, \overset{\textcolor[RGB]{192,192,192}{x}}{1})} - \textcolor[RGB]{248,118,109}{\hat \mu(\overset{\textcolor[RGB]{192,192,192}{w}}{0}, \overset{\textcolor[RGB]{192,192,192}{x}}{1})} \} } + \underset{\textcolor[RGB]{192,192,192}{i=4}}{ \{ \textcolor[RGB]{0,191,196}{\hat \mu(\overset{\textcolor[RGB]{192,192,192}{w}}{1}, \overset{\textcolor[RGB]{192,192,192}{x}}{0})} - \textcolor[RGB]{248,118,109}{\hat \mu(\overset{\textcolor[RGB]{192,192,192}{w}}{0}, \overset{\textcolor[RGB]{192,192,192}{x}}{0})} \} } ] \\ &= \frac{1}{3} \qty[\underset{\textcolor[RGB]{192,192,192}{i=1}}{ \{ \textcolor[RGB]{0,191,196}{0} - \textcolor[RGB]{248,118,109}{0} \} } + \underset{\textcolor[RGB]{192,192,192}{i=2}}{ \{ \textcolor[RGB]{0,191,196}{0} - \textcolor[RGB]{248,118,109}{0} \} } + \underset{\textcolor[RGB]{192,192,192}{i=4}}{ \{ \textcolor[RGB]{0,191,196}{1} - \textcolor[RGB]{248,118,109}{1} \} } ] \\ &= \frac{2}{3}\qty{\textcolor[RGB]{0,191,196}{0} - \textcolor[RGB]{248,118,109}{0}} + \frac{1}{3}\qty{\textcolor[RGB]{0,191,196}{1} - \textcolor[RGB]{248,118,109}{1}} \\ &= 0 \end{aligned} \]
\[ \color{gray} \begin{aligned} \text{summary} &= \textcolor[RGB]{0,191,196}{\frac{1}{N_1}\sum_{i: W_i=1}} \qty{ \textcolor[RGB]{0,191,196}{\hat \mu(1, X_i) - \textcolor[RGB]{248,118,109}{\hat \mu(0,X_i)} } } \qfor N_w = \sum_{i: W_i=w} 1 \\ &= \frac{1}{4} \qty[ \underset{\textcolor[RGB]{192,192,192}{i=1}}{ \{ \textcolor[RGB]{0,191,196}{\hat \mu(\overset{\textcolor[RGB]{192,192,192}{w}}{1},\overset{\textcolor[RGB]{192,192,192}{x}}{1})} - \textcolor[RGB]{248,118,109}{\hat \mu(\overset{\textcolor[RGB]{192,192,192}{w}}{0}, \overset{\textcolor[RGB]{192,192,192}{x}}{1})} \} } + \underset{\textcolor[RGB]{192,192,192}{i=2}}{ \{ \textcolor[RGB]{0,191,196}{\hat \mu(\overset{\textcolor[RGB]{192,192,192}{w}}{1}, \overset{\textcolor[RGB]{192,192,192}{x}}{1})} - \textcolor[RGB]{248,118,109}{\hat \mu(\overset{\textcolor[RGB]{192,192,192}{w}}{0}, \overset{\textcolor[RGB]{192,192,192}{x}}{1})} \} } + \underset{\textcolor[RGB]{192,192,192}{i=4}}{ \{ \textcolor[RGB]{0,191,196}{\hat \mu(\overset{\textcolor[RGB]{192,192,192}{w}}{1}, \overset{\textcolor[RGB]{192,192,192}{x}}{0})} - \textcolor[RGB]{248,118,109}{\hat \mu(\overset{\textcolor[RGB]{192,192,192}{w}}{0}, \overset{\textcolor[RGB]{192,192,192}{x}}{0})} \} } + \underset{\textcolor[RGB]{192,192,192}{i=7}}{ \{ \textcolor[RGB]{0,191,196}{\hat \mu(\overset{\textcolor[RGB]{192,192,192}{w}}{1}, \overset{\textcolor[RGB]{192,192,192}{x}}{1})} - \textcolor[RGB]{248,118,109}{\hat \mu(\overset{\textcolor[RGB]{192,192,192}{w}}{0}, \overset{\textcolor[RGB]{192,192,192}{x}}{1})} \} } ] &= \frac{3}{4}\qty{ \textcolor[RGB]{0,191,196}{\frac{1}{3}} - \textcolor[RGB]{248,118,109}{\frac{0}{1}}} + \frac{1}{4}\qty{ \textcolor[RGB]{0,191,196}{\frac{1}{1}} - \textcolor[RGB]{248,118,109}{\frac{3}{4}}} \\ &= \frac{3}{4}\qty{ \textcolor[RGB]{0,191,196}{\frac{1}{3}} - \textcolor[RGB]{248,118,109}{\frac{0}{1}}} + \frac{1}{4}\qty{ \textcolor[RGB]{0,191,196}{\frac{1}{1}} - \textcolor[RGB]{248,118,109}{\frac{3}{4}}} \\ & \approx \frac{3}{4}\qty{0.33} + \frac{1}{4}\qty{0.25} \approx 0.31 \end{aligned} \]
\[ \color{gray} \begin{aligned} \text{summary} &= \textcolor[RGB]{0,191,196}{\frac{1}{N_1}\sum_{i: W_i=1}} \qty{ \textcolor[RGB]{0,191,196}{\hat \mu(1,X_i)} - \textcolor[RGB]{248,118,109}{\hat \mu(0, X_i)} } = \textcolor[RGB]{0,191,196}{\frac{1}{N_1}\sum_{i: W_i=1}} \qty{ \textcolor[RGB]{0,191,196}{Y_i} - \textcolor[RGB]{248,118,109}{\hat \mu(0, X_i)} } \end{aligned} \]
\[ \color{gray} \begin{aligned} \sum_{i: W_i=w} Y_i &\overset{\texttip{\small{\unicode{x2753}}}{We sum first over dots in a column, then over columns.}}{=} \class{fragment}{\sum_{x} \sum_{\substack{i: W_i=w, \ X_i=x}} Y_i} \\ &\overset{\texttip{\small{\unicode{x2753}}}{The column sum is the column's mean times its number of dots.}}{=} \class{fragment}{\sum_{x} \hat \mu(w, x) \times N_{w,x} \text{ for } N_{w,x} = \sum_{\substack{i: W_i=w, \ X_i=x}} 1} \\ &\overset{\texttip{\small{\unicode{x2753}}}{Or equivalently, its mean summed over the dots.}}{=} \class{fragment}{\sum_{x} \sum_{\substack{i: W_i=w, \ X_i=x}} \hat \mu(w,x)} \\ &\overset{\mathtip{\small{\unicode{x2753}}}{\hat\mu(w,x)=\hat\mu(w,X_i) \text{ within a column}.}}{=} \class{fragment}{\sum_{x} \sum_{\substack{i: W_i=w, \ X_i=x}} \hat \mu(w, X_i)} \\ &\overset{\texttip{\small{\unicode{x2753}}}{We 'unorder' that sum to get back to the form we started with.}}{=} \class{fragment}{\sum_{i: W_i=w} \hat \mu(w, X_i)} \end{aligned} \]
To prove it, we can trace the argument from our last slide backward, stopping in the middle.
\[ \color[RGB]{64,64,64} \begin{aligned} \sum_{i: W_i=w} f(X_i) &\overset{\texttip{\small{\unicode{x2753}}}{We sum first over dots in a column, then over columns.}}{=} \sum_{x} \sum_{\substack{i: W_i=w, \ X_i=x}} f(X_i) \\ &\overset{\mathtip{\small{\unicode{x2753}}}{f(x)=f(X_i) \text{ within a column}.}}{=} \sum_{x} \sum_{\substack{i: W_i=w, \ X_i=x}} f(x) \\ &\overset{\texttip{\small{\unicode{x2753}}}{The column sum is the column's mean times its number of dots.}}{=} \sum_{x}f(x) \times N_{w,x} \qfor N_{w,x} = \sum_{\substack{i: W_i=w, \ X_i=x}} 1 \end{aligned} \] Looking at this sum for \(w=1\) and then dividing by \(N_1\), we get the weighted-average identity above.
\[ \color{gray} \begin{aligned} \hat\Delta_a &\overset{\texttip{\small{\unicode{x2753}}}{All men who applied to the Graduate School. All red dots. Red bars.}}{=} \textcolor[RGB]{248,118,109}{\frac{1}{N_0}\sum_{i: W_i=0}} \qty{ \textcolor[RGB]{0,191,196}{\hat\mu(1,X_i)} - \textcolor[RGB]{248,118,109}{\hat \mu(0, X_i)} } \\ \hat\Delta_b &\overset{\texttip{\small{\unicode{x2753}}}{All women who applied to the Graduate School. All green dots. Green bars.}}{=} \textcolor[RGB]{0,191,196}{\frac{1}{N_1}\sum_{i: W_i=1}} \qty{ \textcolor[RGB]{0,191,196}{\hat\mu(1,X_i)} - \textcolor[RGB]{248,118,109}{\hat \mu(0, X_i)} } \\ \hat\Delta_c &\overset{\texttip{\small{\unicode{x2753}}}{All applicants to the Graduate School outright. All dots. Purple bars.}}{=} \frac{1}{n} \sum_{i=1}^n \qty{ \textcolor[RGB]{0,191,196}{\hat\mu(1,X_i)} - \textcolor[RGB]{248,118,109}{\hat \mu(0, X_i)} } \end{aligned} \]
When we look at the acceptance rates at the graduate school as a whole in ‘histogram form’,
we can see how it differs from our summary of within-department rates very clearly.
\[ \color{gray} \begin{aligned} \hat\Delta_{\text{raw}} &=\text{difference in womens' and mens' acceptance rates at the Graduate School} \\ &\overset{\texttip{\small{\unicode{x2753}}}{summing over people}}{=} \textcolor[RGB]{0,191,196}{\frac{1}{N_1}\sum_{i:W_i=1} \hat\mu(1,X_i)} - \textcolor[RGB]{248,118,109}{\frac{1}{N_0}\sum_{i:W_i=0} \hat\mu(0,X_i)} \\ &\overset{\texttip{\small{\unicode{x2753}}}{rewriting inhistogram form}}{=} \sum_{x} \textcolor[RGB]{0,191,196}{P_{x\mid 1} \ \hat\mu(1,x)} - \sum_{x} \textcolor[RGB]{248,118,109}{P_{x\mid 0} \ \hat\mu(0,x)} \\ &\overset{\texttip{\small{\unicode{x2753}}}{adding a fancy version of zero}}{=} \sum_{x} \textcolor[RGB]{0,191,196}{P_{x\mid 1} \ \hat\mu(1,x)} + \sum_{x} (\textcolor[RGB]{0,191,196}{P_{x\mid 1}} - \textcolor[RGB]{0,191,196}{P_{x|1}} - \textcolor[RGB]{248,118,109}{P_{x\mid 0}}) \ \textcolor[RGB]{248,118,109}{\hat\mu(0,x)} \\ &\overset{\texttip{\small{\unicode{x2753}}}{moving terms around}}{=} \underset{\text{within-department summary}}{\sum_{x} \textcolor[RGB]{0,191,196}{P_{x\mid 1}} \ \qty{\textcolor[RGB]{0,191,196}{\hat\mu(1,x)} - \textcolor[RGB]{248,118,109}{\hat\mu(0,x)}}} + \underset{\text{covariate shift term}}{\qty{\sum_x \textcolor[RGB]{0,191,196}{P_{x\mid 1}} \ \textcolor[RGB]{248,118,109}{\hat\mu(0,x)} - \sum_x \textcolor[RGB]{248,118,109}{P_{x\mid 0}} \ \textcolor[RGB]{248,118,109}{\hat\mu(0,x)}}} \end{aligned} \]
It differs from our average of within-department differences by a ‘covariate shift term’.
\[ \color{gray} \begin{aligned} \hat\Delta_{\text{raw}} &= \underset{\text{within-department summary}}{\sum_{x} \textcolor[RGB]{0,191,196}{P_{x\mid 1}} \ \qty{\textcolor[RGB]{0,191,196}{\hat\mu(1,x)} - \textcolor[RGB]{248,118,109}{\hat\mu(0,x)}}} + \underset{\text{covariate shift term}}{\qty{\sum_x \textcolor[RGB]{0,191,196}{P_{x\mid 1}} \ \textcolor[RGB]{248,118,109}{\hat\mu(0,x)} - \sum_x \textcolor[RGB]{248,118,109}{P_{x\mid 0}} \ \textcolor[RGB]{248,118,109}{\hat\mu(0,x)}}} \end{aligned} \]
We’ll continue from here.
The Raw Difference. Female respondents earn $12k less, on average, than male respondents. \[ \color{gray} \textcolor[RGB]{0,191,196}{\frac{1}{N_1}\sum_{i:W_i=1} Y_i} \ - \ \textcolor[RGB]{248,118,109}{\frac{1}{N_0}\sum_{i:W_i=0} Y_i} \approx -12k \]
The Adjusted Difference. And they earn 15k less than similarly-educated ones. \[ \color{gray} \textcolor[RGB]{0,191,196}{\frac{1}{N_1}\sum_{i: W_i=1}} \qty{ \textcolor[RGB]{0,191,196}{Y_i} \ - \ \textcolor[RGB]{248,118,109}{\hat \mu(0, X_i)} } \approx -15k \]
The Raw Ratio. Female respondents earn 78 cents for every dollar a male one does. \[ \color{gray} \begin{aligned} \textcolor[RGB]{0,191,196}{\hat\mu(1)} \ / \ \textcolor[RGB]{248,118,109}{\hat\mu(0)} &\approx \textcolor[RGB]{0,191,196}{43k} \ / \ \textcolor[RGB]{248,118,109}{55k} \\ &\approx {0.78} \\ \end{aligned} \]
The Adjusted Ratio. And 74 cents for every dollar a similarly-educated male one does. \[ \color{gray} \begin{aligned} \textcolor[RGB]{0,191,196}{\frac{1}{N_1}\sum_{i:W_i=1}Y_i } \ / \ \textcolor[RGB]{0,191,196}{\frac{1}{N_1}\sum_{i:W_i=1}}\textcolor[RGB]{248,118,109}{\hat\mu(0, X_i)} &\approx \textcolor[RGB]{0,191,196}{43k} \ / \ \textcolor[RGB]{248,118,109}{58k} \\ &\approx {0.74} \\ \end{aligned} \]
\[ \color{gray} \textcolor[RGB]{0,191,196}{\frac{1}{N_1}\sum_{i:W_i=1} Y_i} \ - \ \textcolor[RGB]{248,118,109}{\frac{1}{N_0}\sum_{i:W_i=0} Y_i} \approx -12k \]
is what we get when we …
\[ \color{gray} \textcolor[RGB]{0,191,196}{\frac{1}{N_1}\sum_{i: W_i=1}} \qty{ \textcolor[RGB]{0,191,196}{Y_i} \ - \ \textcolor[RGB]{248,118,109}{\hat \mu(0, X_i)} } \approx -15k \]
is what we get when we …
\[ \color{gray} \begin{aligned} \underset{\text{raw difference} \approx -11.8K}{\Delta_{\text{raw}}} &= \underset{\text{adjusted difference} \approx -15.1K}{\sum_{x} \textcolor[RGB]{0,191,196}{P_{x\mid 1}} \ \qty{\textcolor[RGB]{0,191,196}{\hat\mu(1,x)} - \textcolor[RGB]{248,118,109}{\hat\mu(0,x)}}} + \underset{\text{covariate shift term} \approx 3.3K}{\qty{\sum_x \textcolor[RGB]{0,191,196}{P_{x\mid 1}} \ \textcolor[RGB]{248,118,109}{\hat\mu(0,x)} - \sum_x \textcolor[RGB]{248,118,109}{P_{x\mid 0}} \ \textcolor[RGB]{248,118,109}{\hat\mu(0,x)}}} \end{aligned} \]
\[ \color{gray} \begin{aligned} &\text{impact of covariate shift on the magnitude of the disparity} \\ &= \qqtext{sign of the adjusted difference } \\ &\times \qqtext{direction of the trend $\textcolor[RGB]{248,118,109}{\hat\mu(0,x)}$} \\ &\times \qqtext{direction of the shift} \end{aligned} \]
\[ \color{gray} \begin{aligned} \hat\Delta_{\text{years}} &\overset{\texttip{\small{\unicode{x2753}}}{The average over the 9 levels of education observed in both groups. No dots/bars needed.}}{=} \frac{1}{9}\sum_{x \in \text{years}} \qty{ \textcolor[RGB]{0,191,196}{\hat\mu(1,x)} - \textcolor[RGB]{248,118,109}{\hat \mu(0, x)} } \approx -100.0 \quad \text{for } \text{years} =\{ 8, 9, 10, 11, 12, 13, 14, 16, 18 \} \\ \\ \hat\Delta_0 &\overset{\texttip{\small{\unicode{x2753}}}{The average over the distribution of education among male respondents. Red dots and red bars.}}{=} \textcolor[RGB]{248,118,109}{\frac{1}{N_0}\sum_{i: W_i=0}} \qty{ \textcolor[RGB]{0,191,196}{\hat\mu(1,X_i)} - \textcolor[RGB]{248,118,109}{\hat \mu(0, X_i)} } \approx -9.8K \\ \hat\Delta_1 &\overset{\texttip{\small{\unicode{x2753}}}{The average over the distribution of education among female respondents. Green dots and green bars.}}{=} \textcolor[RGB]{0,191,196}{\frac{1}{N_1}\sum_{i: W_i=1}} \qty{ \textcolor[RGB]{0,191,196}{\hat\mu(1,X_i)} - \textcolor[RGB]{248,118,109}{\hat \mu(0, X_i)} } \approx -10.1K \\ \hat\Delta_{\text{all}} &\overset{\texttip{\small{\unicode{x2753}}}{The average over the distribution of education among all respondents. All dots and purple bars.}}{=} \frac{1}{n} \sum_{i=1}^n \qty{ \textcolor[RGB]{0,191,196}{\hat\mu(1,X_i)} - \textcolor[RGB]{248,118,109}{\hat \mu(0, X_i)} } \approx -9.9K \end{aligned} \]
\[ \color{gray} \begin{aligned} \hat\Delta_{\text{raw}} &= \textcolor[RGB]{0,191,196}{\frac{1}{N_1}\sum_{i:W_i=1} Y_i} - \textcolor[RGB]{248,118,109}{\frac{1}{N_0}\sum_{i:W_i=0} Y_i} \approx -12k \\ \hat\Delta_{\text{adjusted}} &= \textcolor[RGB]{0,191,196}{\frac{1}{N_1}\sum_{i:W_i=1}} \qty{ \textcolor[RGB]{0,191,196}{Y_i} - \textcolor[RGB]{248,118,109}{\hat \mu(0, X_i)} } \approx -15k \end{aligned} \]
\[ \color{gray} \begin{aligned} \hat\Delta_{\text{raw}} &= \textcolor[RGB]{0,191,196}{\frac{1}{N_1}\sum_{i:W_i=1} Y_i} - \textcolor[RGB]{248,118,109}{\frac{1}{N_0}\sum_{i:W_i=0} Y_i} \approx -15k \\ \hat\Delta_{\text{adjusted}} &= \textcolor[RGB]{0,191,196}{\frac{1}{N_1}\sum_{i:W_i=1}} \qty{ \textcolor[RGB]{0,191,196}{Y_i} - \textcolor[RGB]{248,118,109}{\hat \mu(0, X_i)} } \approx -10k \end{aligned} \]
Context. You’re writing an article about income disparities in the US.
We’ve shifted our focus from discrimination on the basis of sex to discrimination on the basis of gender. That would’ve been my preference for the discussion of income disparities too, but the CPS doesn’t currently ask about gender identity. You analyze the data you have, not the data you wish you had.
Don’t do this. Saying ‘controlling for …’ can be confusing because it has a different, but related, meaning when when we’re talking about designing an experiment.
Remember we’re looking at rates for women - rates for men.
Attenuates is a word that means reduces but tends to refer to magnitude rather than a signed number.
The notation here is a bit of a lie. Our sample includes no Black respondents with 20 years of education, so we’re actually excluding everyone with 20 years of education from our calculations. This happens. I’m burying it in a footnote to keep the exposition manageable, but it’s important to be clear when you report your estimates. Note that the average over Black respondents is correct as written. Why?