Application: Profit vs. Outcomes in Heart Attack Patients
$$
If you want to see the data in a table, run this.
\[ \begin{aligned} W_i &= \begin{cases} 1 & \text{ with probability } \pi(x) \\ 0 & \text{ otherwise } \end{cases} \\ \qfor & \pi(x) = \Phi(x_1^2 + 2x_2^2 - 2x_3^2 - (x_4+1)^3 - .05\log(x_5+10) + x_6 - 1.5) \end{aligned} \]
The Mean Function.
\[ \begin{aligned} \mathop{\mathrm{E}}[Y_i \mid W_i,X_i] &= \mu(W_i,X_i) \qfor && \mu(w,x) = \mathop{\mathrm{logit^{-1}}}((x_1 + x_2 + x_5)^2) \\ \end{aligned} \tag{1}\]
Binary.
\[ \begin{aligned} Y_i &= \begin{cases} 1 & \text{ with probability } \mu(w,x) \\ 0 & \text{ otherwise } \end{cases} \end{aligned} \]
Continuous.
\[ \begin{aligned} Y_i &= \mu(W_i,X_i) + \varepsilon_i \qfor \varepsilon_i = U_i \times \begin{cases} 1-\mu(W_i,X_i) & \text{ with probability } \mu(W_i,X_i) \\ -\mu(W_i,X_i) & \text{ otherwise } \end{cases} \\ & \qfor U_i \sim \text{Uniform}(0,1) \end{aligned} \]
viewof xx = Inputs.radio(
["x.1", "x.2", "x.3", "x.4", "x.5", "x.6"], {value:"x.1", label:"x"})
viewof yy = Inputs.radio(
new Map([["binary", "y"],
["continuous", "y.continuous"]]), {value:"y", label:"y"})
Learn to predict outcomes for all combinations of treatment and covariates. \[ \text{ Find } \quad \hat \mu(w,x) \qqtext{ with } \hat\mu(w,x) \approx Y_i \qqtext{ when } X_i=x, W_i=w \]
Compare predictions for different treatments to estimate the effect of treatment.
\[ \hat\tau(X_i) = \textcolor[RGB]{0,191,196}{\hat \mu(1,X_i)} - \textcolor[RGB]{248,118,109}{\hat \mu(0,X_i)} \]
\[ \begin{aligned} \overset{\textcolor[RGB]{128,128,128}{\hat\Delta_{\text{all}}}}{\text{ATE}} &\approx \textcolor[RGB]{160,32,240}{\frac{1}{n} \sum_{i=1}^n} \hat\tau(X_i) && \text{ the average over all people in the sample} \\ \underset{\textcolor[RGB]{128,128,128}{\hat\Delta_1}}{\text{ATT}} &\approx \textcolor[RGB]{0,191,196}{\frac{1}{N_1} \sum_{i:W_i=1}} \hat\tau(X_i) && \text{ the average over the treated people in the sample} \end{aligned} \]
formula_options = new Map(
[["lines","y ~ w*(x.1 + x.2 + x.3 + x.4 + x.5 + x.6)"],
["parallel lines","y ~ w + x.1 + x.2 + x.3 + x.4 + x.5 + x.6"]])
viewof example_formula = Inputs.radio(formula_options,
{value: formula_options.get("lines"), label: "model"})
\[ \hat\mu(1,x) - \hat\mu(0,x) = a(1) - a(0) \qqtext{ doesn't depend on } x \]
And if we parameterize it the right way, \(a(1)-a(0)\) is just a coefficient. \[ \begin{aligned} \mathcal{M}&= \{ m(w,x) = a_0 + a_w w + bx \} && \text{ is the parallel lines model again } \\ \end{aligned} \]
And our prediction for the treatment effect \(\tau(X_i)\) is just a coefficient. \[ \begin{aligned} m(1,x) - m(0,x) = \{a_0 + a_w \times 1 + bx \} - \{a_0 + a_w \times 0 + bx \}= a_w \end{aligned} \]
This means that if we get R to tell us the coefficient \(a_w\), we’re done.
The little squares you see are the predictions \(\hat\mu(W_i,X_i)\).
Some of them are outside the interval \([0,1]\). In particular, a couple are bigger than \(1\).
This makes people uncomfortable. To fix this, they often use what are called generalized linear models. \[ \begin{aligned} \mathcal{M}&= \{ m(w,x) = \mathop{\mathrm{logit^{-1}}}(a(w) + bx) \} && \text{ logistic regression version of the lines model} \\ \mathcal{M}&=\{ m(w,x) = \mathop{\mathrm{logit^{-1}}}(a(w) + b(x)) \} && \text{ logistic regression version of the additive model} \end{aligned} \]
Here \(\textcolor[RGB]{0,0,255}{\mathop{\mathrm{logit^{-1}}}}\) is a function that maps the real line into the interval \([0,1]\): \(\mathop{\mathrm{logit^{-1}}}(x) = 1/(1+\exp(-x))\).
\[ \mathop{\mathrm{logit^{-1}}}( v ) = \frac{1}{1+e^{-v}} \in [0,1] \qqtext{ for any } v \]
\[ \hat\mu(w,x) = \mathop{\mathrm{logit^{-1}}}(\hat a_0 + \hat a_w w + \hat b x) \qfor \mathop{\mathrm{logit^{-1}}}(v) = \frac{1}{1+\exp(-v)} \]
\[ e^{\hat a_{w}} = \frac{\frac{\hat\mu(1,x)}{1-\hat\mu(1,x)}}{ \frac{\hat\mu(0,x)}{1-\hat\mu(0,x)}} \qfor \hat\mu(w,x) = \mathop{\mathrm{logit^{-1}}}(\hat a_0 + \hat a_w w + \hat b x) \]
viewof x = Inputs.radio(
["x.1", "x.2", "x.3", "x.4", "x.5", "x.6"], {value:"x.1", label:"x"})
viewof trim_level = Inputs.radio(
[1, 2, 5, 10, 20, 100, Infinity], {value: 100, label:"trim γ at"})
viewof y = Inputs.radio(
new Map([["binary", "y"],
["continuous", "y.continuous"]]), {value:"y", label:"y"})
Scatter Plot
These are for our ATT estimators using inverse probability weighting and not.
The sampling assumption we do without if we were willing to settle for internal validity: getting the effect on the patients in our sample right. We’d do our statistics thinking about the sampling distribution of our estimator when all that’s random is treatment assignment. Without something like the randomization assumption, estimating this effect would require a very different analysis than what we see in these papers if it was possible at all.
Almost. You’ll see a difference or two on the next slide.
To generate these, we find a sort of ‘square root’, \(L\) such that \(\Sigma = L L^T\). Then we take a vector of independent normal random variables with mean 0 and variance 1 and compute \(X = L Z\). This is normal—linear combinations of normals are normal—and you can check that \(\mathop{\mathrm{E}}[X_{1i}X_{1j}] = \Sigma_{ij}\).
That’s if you haven’t clicked anything. To change the model, click the button.
If you want to change that, go back and change it there. This’ll update automatically.
See the column ‘Adjusted OR’ in the ‘PCI’ row in Table 5.