Point and Interval Estimates
$$
\[ \color{gray} \begin{array}{r|rrrrrr|r} j & 1 & 2 & 3 & 4 & 5 & 6 & \bar{y} \\ y_{j} & 0 & 1 & \textcolor[RGB]{7,59,76}{1} & 1 & \textcolor[RGB]{7,59,76}{0} & \textcolor[RGB]{7,59,76}{1} & \textcolor{black}{4/6} \\ \end{array} \]
\[ \color{gray} \begin{array}{r|rrrrrr|r} j & 1 & 2 & 3 & 4 & 5 & 6 & \bar{Y} \\ y_{j} & 0 & 1 & \textcolor[RGB]{7,59,76}{1} & 1 & \textcolor[RGB]{7,59,76}{0} & \textcolor[RGB]{7,59,76}{1} & \textcolor[RGB]{7,59,76}{2/3} \\ y_{j} & 0 & 1 & 1 & \textcolor[RGB]{239,71,111}{1} & 0 & \textcolor[RGB]{239,71,111}{1} & \textcolor[RGB]{239,71,111}{2/2} \\ y_{j} & \textcolor[RGB]{17,138,178}{0} & 1 & \textcolor[RGB]{17,138,178}{1} & \textcolor[RGB]{17,138,178}{1} & 0 & \textcolor[RGB]{17,138,178}{1} & \textcolor[RGB]{17,138,178}{3/4} \\ \end{array} \]
It’s easy to think about sampling distributions when we’re running a simulation of our study.
In a real study, we don’t have any of this. All we get is one estimate.
But that doesn’t stop us from filling in the rest of the picture by …
Today, we’ll focus on sampling with replacement.
We’ll really be talking about the relationship between three things.
As the semester continues, we’ll look into this more generally. We’ll talk about …
\[ \begin{array}{r|rrrr|r} i & 1 & 2 & \dots & 625 & \bar{Y}_{625} \\ Y_i & 1 & 1 & \dots & 1 & 0.68 \\ \end{array} \]
 
\[ \begin{array}{r|rrrr|r} i & 1 & 2 & \dots & 625 & \bar{Y}_{625} \\ Y_i & 1 & 1 & \dots & 1 & 0.68 \\ \end{array} \]
Outcomes for our sample of polled voters.
To enumerate our sample, we …
\[
\begin{array}{r|rrrrrr|r}
j & 1 & 2 & 3 & 4 & \dots &   7.23M & \bar{y}_{7.23M}  \\
y_{j} & 1 & 1 & 1 & 0 & \dots & 1   & 0.70  \\
\end{array}
\] Outcomes for the population of registered voters.
To enumerate our population, we …
Our sample proportion \(425 / 625 \approx 0.68\) is close to the population proportion \(5.05M / 7.23M \approx 0.70\)
\[ \begin{array}{r|rrrr|r} i & 1 & 2 & \dots & 625 & \bar{Y}_{625} \\ Y_i & 1 & 1 & \dots & 1 & 0.68 \\ \end{array} \]
\[ \begin{array}{r|rrrrrr|r} j & 1 & 2 & 3 & 4 & \dots & 7.23M & \bar{y}_{7.23M} \\ y_{j} & 1 & 1 & 1 & 0 & \dots & 1 & 0.70 \\ \end{array} \]
\[ \begin{array}{r|rr|rr|r|rr|r} \text{call} & 1 & & 2 & & \dots & 625 & & \\ \text{variable} & J_1 & Y_1 & J_2 & Y_2 & \dots & J_{625} & Y_{625} & \overline{Y}_{625} \\ \text{value} & 869369 & \underset{\textcolor{gray}{y_{869369}}}{1} & 4428455 & \underset{\textcolor{gray}{y_{4428455}}}{1} & \dots & 1268868 & \underset{\textcolor{gray}{y_{1268868}}}{1} & 0.68 \\ \end{array} \]
\[ \begin{array}{r|rr|rr|r|rr|r} \text{call} & 1 & & 2 & & \dots & 625 & & \\ \text{variable} & J_1 & Y_1 & J_2 & Y_2 & \dots & J_{625} & Y_{625} & \overline{Y}_{625} \\ \text{value} & 869369 & \underset{\textcolor{gray}{y_{869369}}}{1} & 4428455 & \underset{\textcolor{gray}{y_{4428455}}}{1} & \dots & 1268868 & \underset{\textcolor{gray}{y_{1268868}}}{1} & 0.68 \\ \end{array} \]
\[ \begin{array}{r|rr|rr|r|rr|r} \text{call} & 1 & & 2 & & \dots & 625 & & \\ \text{variable} & J_1 & Y_1 & J_2 & Y_2 & \dots & J_{625} & Y_{625} & \overline{Y}_{625} \\ \text{value} & 600481 & \underset{\textcolor{gray}{y_{600481}}}{0} & 6793745 & \underset{\textcolor{gray}{y_{6793745}}}{1} & \dots & 1377933 & \underset{\textcolor{gray}{y_{1377933}}}{1} & 0.71 \\ \end{array} \]
\[ \begin{array}{r|rr|rr|r|rr|r} \text{call} & 1 & & 2 & & \dots & 625 & & \\ \text{variable} & J_1 & Y_1 & J_2 & Y_2 & \dots & J_{625} & Y_{625} & \overline{Y}_{625} \\ \text{value} & 3830847 & \underset{\textcolor{gray}{y_{3830847}}}{1} & 5887416 & \underset{\textcolor{gray}{y_{5887416}}}{1} & \dots & 4706637 & \underset{\textcolor{gray}{y_{4706637}}}{1} & 0.70 \\ \end{array} \]
\[ \begin{array}{r|rr|rr|r|rr|r} \text{call} & 1 & & 2 & & \dots & 625 & & \\ \text{pollster} & J_1 & Y_1 & J_2 & Y_2 & \dots & J_{625} & Y_{625} & \overline{Y}_{625} \\ \hline \color[RGB]{7,59,76}1 & \color[RGB]{7,59,76}869369 & \color[RGB]{7,59,76}1 & \color[RGB]{7,59,76}4428455 & \color[RGB]{7,59,76}1 & \color[RGB]{7,59,76}\dots & \color[RGB]{7,59,76}1268868 & \color[RGB]{7,59,76}1 & \color[RGB]{7,59,76}0.68 \\ \color[RGB]{239,71,111}2 & \color[RGB]{239,71,111}600481 & \color[RGB]{239,71,111}0 & \color[RGB]{239,71,111}6793745 & \color[RGB]{239,71,111}1 & \color[RGB]{239,71,111}\dots & \color[RGB]{239,71,111}1377933 & \color[RGB]{239,71,111}1 & \color[RGB]{239,71,111}0.71 \\ \color[RGB]{17,138,178}3 & \color[RGB]{17,138,178}3830847 & \color[RGB]{17,138,178}1 & \color[RGB]{17,138,178}5887416 & \color[RGB]{17,138,178}1 & \color[RGB]{17,138,178}\dots & \color[RGB]{17,138,178}4706637 & \color[RGB]{17,138,178}1 & \color[RGB]{17,138,178}0.70 \\ {\vdots} & {\vdots} & {\vdots} & {\vdots} & {\vdots} & {\vdots} & {\vdots} & {\vdots} & {\vdots} \\ \color[RGB]{6,214,160}1M & \color[RGB]{6,214,160}2533350 & \color[RGB]{6,214,160}1 & \color[RGB]{6,214,160}5539770 & \color[RGB]{6,214,160}1 & \color[RGB]{6,214,160}\dots & \color[RGB]{6,214,160}7068692 & \color[RGB]{6,214,160}1 & \color[RGB]{6,214,160}0.71 \\ {\vdots} & {\vdots} & {\vdots} & {\vdots} & {\vdots} & {\vdots} & {\vdots} & {\vdots} & {\vdots} \\ \end{array} \]
We run 10,000 simulated polls and store who we call (J) and what they say (Y).
We calculate the sample proportion for each poll.
And we histogram the result.
Before we look at the real thing, let’s sketch the histogram we expect to see.
Checklist
Observation. Our estimate — the black dot — is close to our estimation target. It’s within 2%.
Q. Did we get lucky?
Not really.
Q. Could we have predicted how close we’d get before the election happened?
Yes, in sense.
Activity. Explain how to calculate this coverage probability using the sampling distribution of your point estimate \(\bar Y_{625}\).
 
It’s the probability that a random draw from the sampling distribution lies in the green shaded area. \[ \text{coverage probability} = P\qty(\overline{Y}_{625} \in \overline{y}_{7.23M} \pm .01) \]
And it’s about 43%. We can get that by counting dots.
Or by finding the area of the histogram that’s shaded green.
What we just did was choose a width and calculate a coverage probability.
Question.
Answer.
Talking about calibrated interval estimates, a.k.a. confidence intervals, has some advantages.
But there’s something about them that is a bit infuriating when you’re not used to them.
This isn’t a problem with intervals. This is something that’s fundamentally uncomfortable about sampling.
This is one of those instances where language is complicated. We say the ‘frequency a person said yes’ or the ‘proportion of people who said yes’. If we’re leaving the people out of the sentence, we say either: ‘the sample frequency’ or ‘the sample proportion’. Using only one or the other really limits how you phrase things.
That’s why it looks like the solid blue light is ‘highlighted’ in green.