Introduction
Prediction, Inference, and Causality. And how they fit together.
$$
Starting with the complete list of registered voters in Michigan …
Each person is a dot.
Name | Age | Voted 2002 | Voted 2006 | if sent letter |
---|---|---|---|---|
Rush Hoogendyk | 27 | ▴ Yes | No | ? |
Mitt DeVos | 31 | ● No | No | ? |
Name | Age | Voted 2002 | Voted 2006 | if sent letter |
---|---|---|---|---|
Rush | 27 | ▴ Yes | No | No (33% of Rush-types voted) |
Mitt | 31 | ● No | No | No (31% of Mitt-types voted) |
Al | 34 | ● No | No | No (47% of Al-types voted) |
… | … | … | … | ? |
\[ \color{gray} \begin{aligned} \text{predicted turnout among 25-35s} =& \ \textcolor[RGB]{0,191,196}{30.1\%} && \text{ with the \textcolor[RGB]{0,191,196}{neighbors letter}} \\ vs. =& \ \textcolor[RGB]{248,118,109}{24.3\%} && \text{ with \textcolor[RGB]{248,118,109}{no letter}} \end{aligned} \]
Even just counting the 25-35s in the experiment, that’s \(29008 \times 0.057 \approx 1667\) more votes.
What do you make of this?
How many of the ●s are estimates of at least 30%? What about 29%?
Now let’s come back to the issue we put aside.
We’ve been talking as if we’d mailed the letter to every voter in Michigan.
And we were using a random sample of them to make our predictions.
That’s not what happened. GGL didn’t even mail the letter to every voter selected for their experiment.
What they did was …
Does the sample of ●s we got look like one we’d get if we’d mailed the letter to everyone?
Is it similar enough that we can get away with pretending that it is? Let’s talk about it.
On Friday, we’re going to start using some math to think about sampling. We’ll get some answers then.