input: a data.frame with columns ‘estimator’ and ‘value’
2
output: a plot of the sampling distributions of the estimators
Premise
Whether you’re in academia or industry, you’ll run into people who are analyzing data badly. Often, what they’re doing is both the established common practice and obviously wrong. In this practice exam, I’m going to run you through some examples in which this is happening. I’m going to ask you how to do better; explain what’s going wrong; and, in some cases, deal with what happens when you’ve convinced someone to do better but they haven’t entirely understood you. Obviously, in real life not everything that happens can be explained using the concepts we’ve learned in this class, and often the problems are a bit better-hidden than they are here. But it’s surprising—at least to me—how often these things come up in a relatively obvious way.
Aside from giving you some practice recognizing and dealing with this stuff, what I hope to do here is give you some examples to help you explain what’s going on when these things happen. Often, if someone’s making a mistake in a subtler form or a more complex problem, you can point out that the situation in analogous to one where the problem is much more obvious, as it is in many of these examples.
About the Exam
This practice exam was long and, in places, hard. Think about this as a set of exercises I wrote to make sure y’all had some practice with the ideas and skills that’ll come up on the exam.
For the actual exam, expect questions that look like these, but they’ll be simpler with less repetition and fewer of parts.
A Theme
Several of these exercises — Problems 2, 4, and parts of 5 — look at how evidence from simulations can be misleading. I wanted to bring this up, show you a few flavors of it, and pick apart what’s going on a little in a simple case for two reasons. To explain them, I do have to give away a few exercise solutions.
The Reasons
A lot of the evidence people use to choose methods is based on simulation. When you write a paper proposing a new method, you’re expected to show it works—and often that it works better than the alternatives—in simulated data. That evidence is often pretty questionable. Even if you put aside the incentive to make your method look good, if you spend months or years working on a method to solve one piece of a problem, you get tunnel vision. That’s the piece of the problem that shows up in your simulated data because that’s the piece of the problem you’ve been thinking about. Knowing that this happens and a little about what it can look like can make it a little less disorienting the first time you download someone’s code, try it out, and find that it doesn’t seem to work as well as you’ve been led to believe.
Arguably, many methods work pretty often for the same reason simulations don’t show that they don’t work. A stopped clock is right twice a day. And in Problem 2, we see an estimator that’s right ‘once a day’. But when we get to more complex estimators, there start to be a lot of ways for things to go right. In Problem 5, we see two that come up when estimating treatment effects.
We could have misspecification but not covariate shift, and therefore no bias.
We could have covariate shift but no misspecification, and therefore no bias.
We can even have misspecification and covariate shift that somehow fit together perfectly so there’s still no bias. We saw one example of this a few weeks ago. It’s not likely that this is happening exactly— unless you use models that can’t be misspecified (all functions) or techniques like inverse probability weighting, you’ll get bias— but maybe not enough that it really matters. I don’t mean to suggest that you should be relying on this kind of luck instead of just using better models or inverse probability weighting if you have a choice. But it might be a reason not to discount conclusions based on problematic methods too much. Pretty often, when you reanalyze the data using a more trustworthy approach, what you get is roughly the same point estimate and a slightly wider confidence interval.