Week 5 Homework

Summary

This week’s homework addresses two issues we’ve left hanging.

Three sampling distributions and the interval spanning their middle 95%

All the same stuff shifted so that it’s centered at zero
Figure 1: Our calibration plot, which shows a ‘confidence interval’ for our estimate of the sampling distribution. When we’re estimating the population proportion \(\theta=0.7\) using a sample of size \(n=625\), drawn with replacement, 95% of the time, you’ll get an estimate somewhere between the red and blue ones. And, as a result, the width of your interval estimate will be somewhere between the red and blue widths. When we center everything, we see that these interval widths are almost identical. In particular,the widths of our intervals are much closer than their centers. What’s that about?

I know what you’re thinking: we had a whole homework assignment on this. But that’s only half-right. In the Week 2 Homework, we focused on showing that we get near-perfect calibration, not really why. But using normal approximation, we can do the why part pretty easily. This’ll be quick, and it’ll involve a bit of calculus, which makes it a good warm-up for what we’ll do next. It’ll give us an opportunity to revisit some of our Week 2 stuff using a more formula-driven perspective, too.

The difference’s bootstrap sampling distribution

The ratio’s bootstrap sampling distribution
Figure 2: The bootstrap sampling distributions of the difference and ratio of mean post-program incomes between treated and non-treated participants in the National Supported Work Demonstration.

I said, in our Lecture on Comparing Two Groups, that you’d be calculating the variance of a difference in subsample means in this one. You’ll be doing that and a little more: you’ll be calculating,approximately, the variance of a ratio of subsample means, too. Because sometimes that’s closer to what you want to know. People often say, for example, that women in this country earn 78 cents on the dollar for doing the same work as men. That’s a ratio. Tackling this in addition to the difference won’t be too much additional work. After a little bit of calculus, we basically wind up in the same place as we do for the difference.

Calculus Review: Linear Approximation

We’re going to be using linear approximation to simplify some of our calculations. Given a function \(f(x)\), we can approximate it near any point \(x_0\) like this. \[ f(x) \approx f(x_0) + f'(x_0)(x-x_0) \]

Hopefully you remember that from calculus. If you like, you can call this first-order Taylor approximation. And there are a few formulas for the error of this approximation, which is called the remainder in Taylor’s Theorem, in most calculus textbooks.

When we’re thinking about functions of multiple variables, we use the multivariate version, which involves partial derivatives. \[ \begin{aligned} f(x,y) &\approx f(x_0,y_0) + \qty[\frac{\partial f}{\partial x}(x_0,y_0)] (x-x_0) \ + \ \qty[\frac{\partial f}{\partial y}(x_0,y_0)] (y-y_0). \end{aligned} \]

Why We Usually Get Near-Perfect Calibration

Suppose we’ve sampled with replacement from a binary population in which \(\theta\) is the proportion of ones. If we use the sample mean \(\hat\theta\) as our point estimate and calibrate a 95% confidence interval around it using normal approximation, this is the interval we get.

\[ \hat\theta \pm 1.96 \hat\sigma / \sqrt{n} \qfor \hat\sigma^2 = \hat\theta(1-\hat\theta) \]

On the other hand, the interval we’d want—assuming we’re still happy to use normal approximation—uses the actual variance of our sample proportion, \(\sigma^2=\theta(1-\theta)\), instead of the estimate \(\hat\sigma^2\). Figure 1 is an attempt to convince ourselves that it doesn’t make much of a difference at all. I think I called the difference ‘a fingernail thick’ in lecture. Now you’re going to quantify this difference. We’ll assume that \(\hat\theta\) is one of the ‘good draws’ from its sampling distribution, which for our purposes will mean that it’s in the interval \(\theta \pm 1.96\sigma / \sqrt{n}\), the middle 95% of the sampling distribution’s normal approximation.

Exercise 1  

Suppose \(\hat\theta \in \theta \pm 1.96\sigma / \sqrt{n}\) for \(\sigma^2=\theta(1-\theta)\). Find an approximate upper bound on the difference \(|\hat w - w|\) between the estimated interval width \(\hat w = 2 \times 1.96 \hat\sigma / \sqrt{n}\) and the ideal-but-unusable interval width \(w= 2 \times 1.96 \sigma / \sqrt{n}\). What fraction of the ideal width \(w\) is your bound?

Your bound, both in absolute terms and as a fraction of \(w\), should be a function of \(\theta\) and \(n\).

You’ll probably want to use linear approximation to do this. \[ \hat\sigma - \sigma = f(\hat\theta) - f(\theta) \approx f'(\theta)(\hat\theta - \theta) \qfor f(x) = \sqrt{x(1-x)}. \]

Exercise 2  

Are there values of \(\theta\) where this difference \(\hat w - w\) is a large fraction of the ideal width \(w\)? If so, how large? Use Figure 3 to explain what’s going on in intuitive terms.

n=100

n=400

n=1600
Figure 3: The width of the middle 95% of \(\hat\theta\)’s sampling distribution as a function of \(\theta\) at three sample sizes \(n\).
This should be familiar from the Week 2 Homework.

Variance Calculation for Comparisons

Differences in Means

Figure 4: 1978 income for participants in the National Supported Work Demonstration.

In our Lecture on Comparing Two Groups, we talked about how to use subsample means to compare two groups. In particular, we talked about the case that we’ve drawn a sample \((X_1,Y_1) \ldots (X_n,Y_n)\) with replacement from a population \((x_1,y_1) \ldots (x_m,y_m)\) in which \(x_j \in \{0,1\}\) indicates membership in one of two groups, e.g. treated and control groups in Figure 4. And we talked about using the difference \(\textcolor[RGB]{0,191,196}{\hat\mu(1)}-\textcolor[RGB]{248,118,109}{\hat\mu(0)}\) in the mean of \(Y_i\) for the subsamples in which \(\textcolor[RGB]{0,191,196}{X_i=1}\) and \(\textcolor[RGB]{248,118,109}{X_i=0}\) to estimate the corresponding difference \(\textcolor[RGB]{0,191,196}{\mu(1)}-\textcolor[RGB]{248,118,109}{\mu(0)}\) in the population.

Exercise 3  

We showed that an individual subsample mean \(\hat\mu(x)\) is an unbiased estimator of the corresponding population mean \(\mu(x)\). That implies that the difference of two such estimates, \(\hat\mu(1)-\hat\mu(0)\), is an unbiased estimator of the difference of the corresponding population means, \(\mu(1)-\mu(0)\).

Explain why the first implies the second. A sentence or even a couple words should do.

We also calculated a formula for the variance of a subsample mean \(\hat\mu(x)\). \[ \mathop{\mathrm{\mathop{\mathrm{V}}}}[\hat\mu(x)] = \frac{\sigma^2(x)}{N_x} \text{ for } N_x = \sum_{i}1_{=x}(X_i) \qand \sigma^2(x) = \mathop{\mathrm{\mathop{\mathrm{V}}}}[Y_i \mid X_i=x] \]

And I stated without proof a formula for the variance of the difference of two subsample means. \[ \mathop{\mathrm{\mathop{\mathrm{V}}}}\qty[\hat{\mu}(1)-\hat{\mu}(0)] = \mathop{\mathrm{E}}\qty[\frac{1}{N_1}\sigma^2(1)+\frac{1}{N_0}\sigma^2(0)] \text{ for } N_x = \sum_{i}1_{=x}(X_i) \]

It’s a simple formula. The variance of the difference in means is the sum of the variances of the two means. Why is that the case? To start to see why, we can start from definitions and do a bit of arithmetic.

\[ \begin{aligned} \mathop{\mathrm{\mathop{\mathrm{V}}}}\qty[\hat{\mu}(1)-\hat{\mu}(0)] &= \mathop{\mathrm{E}}\qty[ \qty(\{\hat{\mu}(1) - \hat{\mu}(0)\} - \{\mu(1)-\mu(0)\})^2 ] \\ &= \mathop{\mathrm{E}}\qty[ \qty(\{\hat{\mu}(1) - \mu(1)\} - \{\hat{\mu}(0) -\mu(0)\})^2 ] \\ &= \mathop{\mathrm{E}}\qty[ \qty(\{\hat{\mu}(1) - \mu(1)\})^2 ] + \mathop{\mathrm{E}}\qty[ \qty(\{\hat{\mu}(0) -\mu(0)\})^2 ] \\ &- 2\mathop{\mathrm{E}}\qty[ \{\hat{\mu}(1) - \mu(1)\}\{\hat{\mu}(0) -\mu(0)\}] \end{aligned} \]

The first two terms here are the ones that appear in our formula above: the variances of the two means. For that formula to be correct, the last term has to be zero. It’s up to you to prove that.

Exercise 4  

Complete the argument by proving that the ‘cross term’ is zero, i.e., that \[ \mathop{\mathrm{E}}\qty[\{\hat{\mu}(1) - \mu(1)\}\{\hat{\mu}(0) -\mu(0)\}] = 0. \]

Look over the calculation we used for one subsample mean \(\hat{\mu}(x)\) here. You’ll want to use a lot of the same ideas: writing sums over our subsamples as sums over \(1 \ldots n\) by putting in group indicators \(1_{=x}(X_i)\), conditioning on \(X_1 \ldots X_n\), the indicator trick, thinking about what happens when \(j=i\) and \(j\neq i\) in the double sum we get when we expand the product, etc. In the subsample mean calculation, the \(j=i\) terms gave us some stuff that was nonzero. Why isn’t that happening here?

Ratios of Means

If \(\hat\mu(1)-\hat\mu(0)\) is a good estimator of \(\mu(1)-\mu(0)\), then shouldn’t \(\hat\mu(1)/\hat\mu(0)\) be a good estimator of \(\mu(1)/\mu(0)\)? Let’s look into it. To do this, we’ll think of the ratio as a function of the two means. \[ \frac{\hat\mu(1)}{\hat\mu(0)} - \frac{\mu(1)}{\mu(0)} = f(\hat\mu(1), \hat\mu(0)) - f(\mu(1), \mu(0)) \qfor f(x,y) = \frac{x}{y}. \]

And we’ll use a linear approximation to this function to think about this difference.

\[ \begin{aligned} f(\hat\mu(1), \hat\mu(0)) \approx f(\mu(1), \mu(0)) &+ \qty[\frac{\partial f}{\partial x}(\mu(1), \mu(0))](\hat\mu(1) - \mu(1)) \\ &+ \qty[\frac{\partial f}{\partial y}(\mu(1), \mu(0))](\hat\mu(0) - \mu(0)) \end{aligned} \]

This approximation should be good if \(\hat\mu(1)\) and \(\hat\mu(0)\) are close to \(\mu(1)\) and \(\mu(0)\).

Exercise 5  

When our sample includes a reasonably large number of observations in both groups, it’s reasonable to expect that they should be close. With a sentence or two, or a rough sketch if you prefer, explain why.

Now that we’ve justified the approximation, let’s use it to analyze our ratio estimator.

Exercise 6  

Write out a formula for the linear approximation to the ratio estimator \(\hat\mu(1)/\hat\mu(0)\) in terms of \(\mu(0)\), \(\mu(1)\), \(\hat\mu(1)-\mu(1)\), and \(\hat\mu(0)-\mu(0)\). Then approximate its bias by comparing the expected value of this linear approximation to the estimation target \(\mu(1)/\mu(0)\).

Exercise 7  

By calculating the variance of the linear approximation to \(\hat\mu(1)/\hat\mu(0)\) you used in Exercise 6, find a formula approximating the variance of this ratio estimator. Use it to calculate a 95% confidence interval based on normal approximation for the ratio \(\mu(1)/\mu(0)\) in the National Supported Work Demonstration. Draw it on top of the plot of that estimator’s bootstrap sampling distribution in Figure 2.1 With all these approximations, are you still getting an interval similar to the one I already drew there, which is calibrated using the bootstrap?

You’ll need some information included in a table in our discussion of this study in lecture.

All of that ignores the error of our linear approximation as a potential problem. We should, if we like, be able to reason about this error using tools from calculus.

Exercise 8  

Extra Credit. Using some version of Taylor’s Theorem to characterize the error of the linear approximation you’ve been working with, refine your answer to Exercise 6: find an upper bound on the absolute value of the estimator’s bias. This should be a formula involving \(\mu(1)\), \(\mu(0)\), and the subsample sizes \(N_1\) and \(N_0\). Compare it to your approximation of the estimator’s standard deviation from Exercise 7. Are you worried that confidence intervals like the one you calculated in Exercise 7 might have coverage well below the nominal level of 95%?

Footnotes

  1. You can find the figure in the ‘Variance Calculations for Comparisons’ tab at the top of this page. Draw in your interval however you like. You can print this, draw it on paper, and photograph it for submission. You can right click on the plot, save it as an image, and draw on that using your favorite image editor. You can sketch what you see in Figure 2 on paper, add your interval, and photograph that. Maybe the easiest thing to do is use the Tldrawe Chrome Extension to draw right on top of this webpage and take a screenshot. Don’t work too hard. A rough sketch is fine.↩︎