Inference for Complicated Summaries
$$
\[ \small{ \begin{array}{ccccccc} x_1 & y_1 & x_2 & y_2 & \ldots & x_m & y_m \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ \end{array} } \]
\[ \small{ \begin{array}{ccccccc} X_1 & Y_1 & X_2 & Y_2 & \ldots & X_n & Y_n \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ \end{array} } \]
\[ \hat\theta = \hat\mu(16) - \hat\mu(12) \]
\[ \begin{aligned} \hat\theta &= \hat\mu(16) - \hat\mu(\le 12) && \qfor \hat\mu(\le 12) = \frac{\sum_{i:X_i \le 12} \hat\mu(X_i)}{\sum_{i:X_i \le 12} 1} \end{aligned} \]
\[ \begin{aligned} \hat\theta &= \frac{\sum\limits_{x=9}^{12} \qty{ \hat\mu(x) - \hat\mu(x-1) }}{4} && \qqtext{ the average over years} \\ \hat\theta &= \frac{\sum\limits_{i:X_i \in 9 \ldots 12} \qty{ \hat\mu(X_i) - \hat\mu(X_i-1) }}{\sum_\limits{i:X_i \in 9 \ldots 12} 1} && \qqtext{ the average over people} \end{aligned} \]
\[ \begin{aligned} \hat\theta_A &= \frac{\sum\limits_{x=9}^{12} \qty{ \hat\mu(x) - \hat\mu(x-1) }}{4} && \qqtext{ the average over years} \\ \hat\theta_B &= \frac{\sum\limits_{i:X_i \in 9 \ldots 12} \qty{ \hat\mu(X_i) - \hat\mu(X_i-1) }}{\sum_\limits{i:X_i \in 9 \ldots 12} 1} && \qqtext{ the average over people} \end{aligned} \]
One of these two estimators can be ‘unrolled’ into a simple comparison of two column means. \[ \textcolor[RGB]{17,138,178}{\hat \theta = \frac{\hat \mu(12) - \hat \mu(8)}{4}} \] Which is it?
What is the variance of this estimator \(\textcolor[RGB]{17,138,178}{\hat\theta}\)?
It’s the average over years that unrolls like this. \[ \begin{aligned} \frac{\sum_{x=9}^{12} \qty{ \hat\mu(x) - \hat\mu(x-1) }}{4} &= \frac{ \qty{\mu(12) - \mu(11)} + \qty{ \mu(11) - \mu(10) } + \qty{ \mu(10) - \mu(9) } + \qty{ \mu(9) - \mu(8) } }{4} \\ &= \frac{\mu(12) - \mu(8)}{4} \end{aligned} \]
It’s \(1/4^2 = 1/16\) times the variance of the difference of these two column means.
For any random variable \(X\) and constant \(a\) … \[ \mathop{\mathrm{\mathop{\mathrm{V}}}}[aX] \overset{\texttip{\tiny{\unicode{x2753}}}{definition}}{=} \mathop{\mathrm{E}}[(aX-\mathop{\mathrm{E}}[aX])^2] \overset{\texttip{\tiny{\unicode{x2753}}}{linearity of expectation + arithmetic}}{=} a^2\mathop{\mathrm{E}}[(X-\mathop{\mathrm{E}}[X])^2] \overset{\texttip{\tiny{\unicode{x2753}}}{definition}}{=} a^2\mathop{\mathrm{\mathop{\mathrm{V}}}}[X]. \]
In the special case \(a=1/4\) and \(X=\hat\mu(12) - \hat\mu(8)\) … \[ \begin{aligned} \mathop{\mathrm{\mathop{\mathrm{V}}}}\qty[\frac{\hat\mu(12) - \hat\mu(8)}{4}] &= \frac{1}{4^2}\mathop{\mathrm{\mathop{\mathrm{V}}}}[\hat\mu(12) - \hat\mu(8)] && \text{ is the answer I was looking for } \\ &= \frac{1}{16} \times \mathop{\mathrm{E}}\qty[ \frac{\sigma^2(12)}{N_12} + \frac{\sigma^2(8)}{N_8} ] && \text{ using a formula from our lecture on comparing two groups.} \end{aligned} \]
We know this estimator is unbiased. We can think about the difference in means for this, too. \[ \begin{aligned} \mathop{\mathrm{E}}\qty[\frac{\hat\mu(12) - \hat\mu(8)}{4}] &\overset{\texttip{\tiny{\unicode{x2753}}}{via linearity}}{=} \frac{\mathop{\mathrm{E}}[\hat\mu(12)] - \mathop{\mathrm{E}}[\hat\mu(8)]]}{4} \\ &\overset{\texttip{\tiny{\unicode{x2753}}}{via unbiasedness of subsample means}}{=} \frac{\mu(12) - \mu(8)}{4}\\ &\overset{\texttip{\tiny{\unicode{x2753}}}{'rolling it back up'}}{=} \frac{1}{4}\sum_{x=9}^{12}\qty{\mu(x) - \mu(x-1)}{4}. \end{aligned} \]
To turn a point estimator into a confidence interval, we need to know two things about its sampling distribution.
\[ \begin{aligned} \mathop{\mathrm{\mathop{\mathrm{V}}}}[\hat\theta] &= \frac{1}{16} \times \mathop{\mathrm{E}}\qty[ \frac{\sigma^2(12)}{N_{12}} + \frac{\sigma^2(8)} {N_8} ] \\ &\approx \frac{1}{16} \times \qty[\frac{\hat\sigma^2(12)}{N_{12}} + \frac{\hat\sigma^2(8)}{N_8} ] \end{aligned} \]
\(x\) | \(N_x\) | \(\hat \mu(x)\) | \(\hat \sigma(x)\) |
---|---|---|---|
8 | 14 | 19.3K | 22.2K |
12 | 605 | 28.8K | 28.3K |
\[ \begin{aligned} \frac{1}{4}\sum_{x\in 9 \ldots 12} \qty{ \hat \mu(x) - \hat \mu(x-1)} &= \frac{\hat\mu(12) -\hat\mu(8)}{4} &= \textcolor[RGB]{17,138,178}{\sum_{x}\hat\alpha(x)\hat\mu(x)} \qfor \hat\alpha(x) = \begin{cases} & \qqtext{if} \\ & \qqtext{if} \\ & \qqtext{if} \\ \end{cases} \end{aligned} \]
\[ \frac{\sum_{i:X_i \in 9 \ldots 12} \qty{ \hat \mu(X_i) - \hat \mu(X_i-1)}}{\sum_{i:X_i \in 9 \ldots 12} 1} = \sum_{x}\hat\alpha(x)\hat\mu(x) \qfor \hat\alpha(x)= \]
\[ \begin{aligned} \frac{\sum_{i:X_i \in 9 \ldots 12} \qty{ \hat\mu(X_i) - \hat\mu(X_i-1) }}{\sum_{i:X_i \in 9 \ldots 12} 1} &= \sum_{x \in 9 \ldots 12} P_x \ \qty{ \hat\mu(x) - \hat\mu(x-1) } \quad \text{ for } \quad P_x = \frac{N_x}{\sum_{x \in 9 \ldots 12}N_x} \\ &= \sum_x \hat \alpha(x) \hat \mu(x) \qfor \hat\alpha(x) = \begin{cases} P_{12} & \text{ if } x = 12 \\ P_{x} - P_{x+1} & \text{ if } x \in \{9 \ldots 11\} \\ -P_9 & \text{ if } x = 8 \end{cases} \end{aligned} \]
\[ \begin{aligned} \mathop{\mathrm{E}}\qty[\sum_x \hat\alpha(x) \hat \mu(x)] &\overset{\texttip{\tiny{\unicode{x2753}}}{linearity of expectations}}{=} \sum_x \mathop{\mathrm{E}}\qty[\hat\alpha(x) \hat\mu(x)] \\ &\overset{\texttip{\tiny{\unicode{x2753}}}{law of iterated expectations}}{=} \sum_x \mathop{\mathrm{E}}\qty{ \mathop{\mathrm{E}}\qty[\hat\alpha(x) \hat \mu(x) \mid X_1 \ldots X_n] } \\ &\overset{\texttip{\tiny{\unicode{x2753}}}{linearity of conditional expectations}}{=} \sum_x \mathop{\mathrm{E}}\qty{ \hat \alpha(x) \mathop{\mathrm{E}}[\hat \mu(x) \mid X_1 \ldots X_n] } \\ &\overset{\texttip{\tiny{\unicode{x2753}}}{conditional unbiasedness of subsample means}}{=} \sum_x \mathop{\mathrm{E}}\qty{ \hat \alpha(x) \mu(x) } \\ &\overset{\texttip{\tiny{\unicode{x2753}}}{linearity of expectations}}{=} \sum_x \mathop{\mathrm{E}}\qty{ \hat \alpha(x) } \mu(x) \end{aligned} \]
\[ \begin{aligned} &\text{Does} \quad \mathop{\mathrm{E}}\qty{ \hat \alpha(x) } = \alpha(x) \qfor &&\hat\alpha(x) = \begin{cases} P_{12} & \text{ if } x = 12 \\ P_{x} - P_{x+1} & \text{ if } x \in \{9 \ldots 11\} \\ -P_9 & \text{ if } x = 8 \end{cases} \\ \qqtext{and} &&&\alpha(x) = \begin{cases} p_{12} & \text{ if } x = 12 \\ p_{x} - p_{x+1} & \text{ if } x \in \{9 \ldots 11\} \\ -p_9 & \text{ if } x = 8 \end{cases} \end{aligned} \]
\[ \begin{aligned} \mathop{\mathrm{E}}\qty[\sum_x \hat\alpha(x) \hat \mu(x) ] &= \sum_x \mathop{\mathrm{E}}[\hat\alpha(x)] \mu(x) \\ \end{aligned} \]
We’ll continue from here.
\[ \mathop{\mathrm{\mathop{\mathrm{V}}}}[Y] = \mathop{\mathrm{E}}\qty{\mathop{\mathrm{\mathop{\mathrm{V}}}}( Y \mid X ) } + \mathop{\mathrm{\mathop{\mathrm{V}}}}\qty{\mathop{\mathrm{E}}( Y \mid X ) } \]
\[ \mathop{\mathrm{\mathop{\mathrm{V}}}}\qty[\sum_x \hat\alpha(x) \hat\mu(x)] = \textcolor{blue}{\sum_x \sigma^2(x) \times \mathop{\mathrm{E}}\qty[ \frac{\hat\alpha(x)^2}{N_x} ]} + \textcolor{red}{\mathop{\mathrm{\mathop{\mathrm{V}}}}\qty[ \sum_x \mu(x) \hat \alpha(x) ]} \qfor \sigma^2(x) = \mathop{\mathrm{\mathop{\mathrm{V}}}}[Y_i \mid X_i=x] \]
We won’t get deep into the derivation now. I’ll guide you through it in the homework.
\[ \small{ \begin{aligned} \mathop{\mathrm{\mathop{\mathrm{V}}}}\qty[ \sum_x \alpha(x) \hat \mu(x) ] &= \mathop{\mathrm{E}}\qty[ \mathop{\mathrm{\mathop{\mathrm{V}}}}\qty{ \sum_x \hat\alpha(x) \hat \mu(x) \mid X_1 \ldots X_n }] \quad &&+ \quad \mathop{\mathrm{\mathop{\mathrm{V}}}}\qty[ \mathop{\mathrm{E}}\qty{\sum_x \hat\alpha(x) \hat \mu(x) \mid X_1 \ldots X_n} ] \\ &= \mathop{\mathrm{E}}\qty[ \mathop{\mathrm{\mathop{\mathrm{V}}}}\qty{ \sum_x \hat\alpha(x) \hat \mu(x) \mid X_1 \ldots X_n }] \quad &&+ \quad \mathop{\mathrm{\mathop{\mathrm{V}}}}\qty[ \sum_x \hat\alpha(x) \mathop{\mathrm{E}}\qty{ \hat \mu(x) \mid X_1 \ldots X_n} ] \\ &= \mathop{\mathrm{E}}\qty[ \mathop{\mathrm{\mathop{\mathrm{V}}}}\qty{ \sum_x \hat \alpha(x) \hat \mu(x) \mid X_1 \ldots X_n }] \quad &&+ \quad \mathop{\mathrm{\mathop{\mathrm{V}}}}\qty[ \sum_x \hat\alpha(x) \mu(x) ] \\ &= \textcolor{blue}{\sum_x \mathop{\mathrm{E}}\qty[ \hat\alpha(x)^2 \mathop{\mathrm{\mathop{\mathrm{V}}}}\qty{\hat\mu(x) \mid X_1 \ldots X_n}]} \quad &&+ \quad \textcolor{blue}{\sum_x \sum_{x'} \mu(x) \mu(x') \mathop{\mathrm{Cov}}\qty[\hat\alpha(x),\ \hat\alpha(x')]} \\ &=\textcolor{blue}{ \sum_x \sigma^2(x) \times \mathop{\mathrm{E}}\qty[ \frac{\hat\alpha(x)^2}{N_x} ]} \quad &&+ \quad \textcolor{blue}{\sum_x \sum_{x'} \mu(x) \mu(x') \mathop{\mathrm{Cov}}\qty[\hat\alpha(x),\ \hat\alpha(x')]} \end{aligned} } \]
\[ \mathop{\mathrm{\mathop{\mathrm{V}}}}\qty[\sum_x \alpha(x) \hat \mu(x)] \ge \textcolor{blue}{\sum_x \sigma^2(x) \times \mathop{\mathrm{E}}\qty[ \frac{\hat\alpha(x)^2}{N_x} ] \quad \text{ where } \quad \sigma^2(x) = \mathop{\mathrm{\mathop{\mathrm{V}}}}[Y_i \mid X_i=x]} \]
LThe lower bound has two quantities we don’t know, but can estimate easily.
The subsample variances are good esimates of the subpopulation variances \[ \sigma^2(x) \approx \hat\sigma^2(x) := \frac{1}{N_x} \sum_{i:X_i=x} \qty{Y_i - \hat\mu(x)}^2 \]
The squared coefficient / subsample size ratio is usually a good estimate of its expectation.
\[ \frac{\hat\alpha(x)^2}{N_x} \approx \mathop{\mathrm{E}}\qty[\frac{\hat\alpha(x)^2}{N_x}] \]
\[ \mathop{\mathrm{\mathop{\mathrm{V}}}}[\hat\theta] \ge \sum_x \ \frac{\hat\sigma^2(x)\hat\alpha(x)^2}{N_x} \]
\[ \hat\sigma_{\hat\theta}^2 = \sum_x \frac{\hat\alpha^2(x) \ \hat\sigma^2(x)}{N_x} \]
where
\[ \begin{array}{l|lll} x & 8 & 9 & 10 & 11 & 12 & 13 & \ldots & 20 \\ \hline \hat\alpha^2(x) & 0.06 & 0.00 & 0.00 & 0.00 & 0.06 & 0.00 & \ldots & 0.00 \\ \hat\sigma^2(x) & 492M & 444M & 113M & 603M & 803M & 1B & \ldots & 14B \\ N_x & 14 & 26 & 16 & 75 & 605 & 357 & \ldots & 71 \\ \frac{\alpha^2(x) \ \hat\sigma^2(x)}{N_x} & 2.20M & 0.00 & 0.00 & 0.00 & 83.01K & 0.00 & \ldots & 0.00 \\ \end{array} \]
Exercise. Use a caricature of this table to approximate our estimator’s variance.
\[ \mathop{\mathrm{\mathop{\mathrm{V}}}}\qty[\sum_x \hat\alpha(x) \hat\mu(x)] \ge \mathop{\mathrm{E}}\qty[ \mathop{\mathrm{\mathop{\mathrm{V}}}}\qty{ \sum_x \hat\alpha(x) \hat \mu(x) \mid X_1 \ldots X_n }] = \sum_x \sigma^2(x) \times \mathop{\mathrm{E}}\qty[ \frac{\hat\alpha(x)^2}{N_x} ] \]
Let’s think about what this means for our two examples. \[ \small{ \begin{aligned} \hat\theta_{\text{years}} &= \frac14\sum_{x=9}^{12} \qty{ \hat\mu(x) - \hat\mu(x-1)} = \sum_x \alpha(x) \hat \mu(x) \qfor \alpha(x) = \begin{cases} \hphantom{-}\frac14 & \text{ if } x = 12 \\ -\frac14 & \text{ if } x= 8 \\ 0 & \text{ otherwise} \end{cases} \\ \hat\theta_{\text{people}} &= \sum_{x=9}^{12} P_x \qty{ \hat\mu(x) - \hat\mu(x-1)} = \sum_x \hat \alpha(x) \hat \mu(x) \qfor \hat\alpha(x) = \begin{cases} \hphantom{-}P_{12} & \text{ if } x = 12 \\ P_{x} - P_{x+1} & \text{ if } x \in \{9 \ldots 11\} \\ -P_9 & \text{ if } x = 8 \end{cases} \end{aligned} } \]
We’ve seen that the variance of our estimate of the average-over-years summary has large variance.
The problem was that it paired a relatively large weight, \(\alpha(x)=-1/4\), with a very small subsample size \(N_8=14\).
And we know the average-over-people summary has an even larger variance. Let’s compare.
\[ \begin{array}{l|c|ccccc|c} & x & 8 & 9 & 10 & 11 & 12 & \sum \\ & N_x & 14 & 26 & 16 & 75 & 605 & 736 \\ & \hat\sigma^2(x) & 492M & 444M & 113M & 603M & 803M & \\ &\hline \\ \text{for}\ \ \hat\theta_{\text{people}} &\hat\alpha^2(x) & 0.00 & 0.00 & 0.01 & 0.52 & 0.68 & \\ &\frac{\hat\sigma^2(x)\alpha^2(x)}{N_x} & 44K & 3K & 45K & \textcolor{blue}{4.2M} & 897K & 5.2M \\ \hline \\ \text{for}\ \ \hat\theta_{\text{years}} &\hat\alpha^2(x) & 0.06 & 0.00 & 0.00 & 0.00 & 0.06 & \\ &\frac{\hat\sigma^2(x) \alpha^2(x)}{N_x} & \textcolor{blue}{2M} & 0 & 0 & 0 & 83K & 2M \\ \\ \end{array} \]
\[ \mathop{\mathrm{\mathop{\mathrm{V}}}}\qty[ \sum_x \alpha(x) \hat \mu(x) ] = \mathop{\mathrm{E}}\qty[ \mathop{\mathrm{\mathop{\mathrm{V}}}}\qty{ \sum_x \hat\alpha(x) \hat \mu(x) \mid X_1 \ldots X_n }] \quad + \quad \mathop{\mathrm{\mathop{\mathrm{V}}}}\qty[ \mathop{\mathrm{E}}\qty{\sum_x \hat\alpha(x) \hat \mu(x) \mid X_1 \ldots X_n} ] \]
\[ \small{ \hat \theta = \sum_x \alpha(x) \hat\mu(x) \quad \text{ satisfies } \mathop{\mathrm{E}}[\hat\theta] = \theta } \]
That’s something you have to think about case-by-case.
But proportions—even proportions of subsamples—are, in fact, unbiased.
Its variance is determined by the variance in each subpopulation, the coefficients, and the subsample sizes.
And we can estimate it. Or, at least, a lower bound. \[ \small{ \mathop{\mathrm{\mathop{\mathrm{V}}}}[ \hat\theta ] \ge \sum_x \sigma^2(x) \alpha^2(x) \times \mathop{\mathrm{E}}\qty[ \frac{1}{N_x} ] \approx \sum_x \hat\sigma^2(x)\alpha^2(x) \times \frac{1}{N_x} =: \hat\sigma_{\hat\theta}^2 } \]
Fundamentally, we’re going to get an imprecise estimate if a small subsample has a large weight.
By imprecise, we mean a large standard error and a wide confidence interval.
\[ \small{ \mathop{\mathrm{\mathop{\mathrm{V}}}}[ \hat\theta ] \ge \sum_x \sigma^2(x) \times \mathop{\mathrm{E}}\qty[\frac{\hat\alpha^2(x)}{N_x} ] \approx \sum_x \hat\sigma^2(x) \times \frac{\alpha^2(x)}{N_x} =: \hat\sigma_{\hat\theta}^2 } \]
\[ \small{ \theta \in \qty[ \hat \theta - 1.96\hat\sigma_{\hat\theta}, \ \hat \theta + 1.96\hat\sigma_{\hat\theta}] \quad \text{ with probability } \approx .95 \qfor \hat\sigma_{\hat\theta}^2 = \sum_x \hat\sigma^2(x)\times \frac{\hat\alpha^2(x)}{N_x} } \]