Lecture 4

Calibrating Interval Estimates using the Bootstrap

Review

Point and Interval Estimation

#| label: ggplot-theme
#| include: false

lightgray = "#aaaaaa"
body.bg = rgb(231, 240, 236, maxColorValue=255)
gridcolor = rgb(0,0,0,.1, maxColorValue=1)

lab.theme = theme(
        plot.background = element_rect(fill = body.bg, colour = NA),
        panel.background = element_rect(fill = body.bg, colour = NA),
                    legend.background = element_rect(fill=body.bg, colour = NA),
                    legend.box.background = element_rect(fill=body.bg, colour = NA),
                    legend.key = element_rect(fill=body.bg, colour = NA),
        axis.ticks.x = element_blank(),
        axis.ticks.y = element_blank(),
        axis.text.x  = element_text(colour = lightgray),
        axis.text.y  = element_text(colour = lightgray),
        axis.title.x  = element_blank(),
        axis.title.y  = element_blank())

theme_set(lab.theme)
set.seed(1)

0.00 0.01 0.02 0.03 0.600 0.625 0.650 0.675 0.700 0.725 0.750 0.775 0.800

  • In last week, we talked about estimating population frequencies.
  • Our point estimates were frequencies in a random sample drawn from the population.
  • To characterize our uncertaintly, we used interval estimates.
    • In particular, calibrated interval estimates. Confidence intervals.
    • To get these intervals, we add ‘arms’ of equal length to our point estimate.
    • We choose the arm length so that, in 95% of surveys done like ours, the estimation target is within them.

Calibration of Interval Estimates

0.00 0.01 0.02 0.03 0.600 0.625 0.650 0.675 0.700 0.725 0.750 0.775 0.800

  • We can do that using the sampling distribution of our point estimate.
  • We’d draw arms out from the sampling distribution’s until they span 95% of point estimates.
  • We can see that this is the right length by a shift of perspective.
    • A point estimate can touch the mean with its arms iff1 the mean can touch it with equally long arms.
    • So 95% of intervals cover if and only if 95% of point estimates are within the mean’s arms.
  • In practice, we can’t do exactly that. We don’t know the sampling distribution.
  • But we can use an estimate of the sampling distribution in its place.
  • If it’s a good one, we’ll get approximately 95% coverage.

A Parametric Estimate of the Sampling Distribution

0.600 0.625 0.650 0.675 0.700 0.725 0.750 0.775 0.800 0.00 0.01 0.02 0.03
Our point estimate and its sampling distribution
0.600 0.625 0.650 0.675 0.700 0.725 0.750 0.775 0.800 0.00 0.01 0.02 0.03
Our estimate of its sampling distribution
  • Our estimate was based on knowledge of the parametric form of the sampling distribution.
  • When we sample with replacement, it’s binomial.
    • It’s the distribution of the frequency of heads in 625 flips of a coin with probability \(\theta\)
    • where \(\theta\) is the ‘frequency of heads’ in the population.
    • e.g. \(\theta\approx 0.70\) is the proportion of registered voters who will vote in our turnout example.

\[ P_\theta(\hat\theta = t) = \binom{n}{nt} \theta^{nt} (1-\theta)^{n(1-t)} \qqtext{ is the probability of the sample frequency being $t$} \]

  • By parametric form, I mean a formula in terms of some parameters.
    • That tells us what parameters we need to estimate to estimate the sampling distribution.
    • In this case, the only (unknown) parameter is the \(\theta\), the frequency of heads in the population.
    • We plugged in the sample frequency, \(\hat\theta\), to get our estimate of the sampling distribution.

\[ \hat P_\theta(\hat\theta = t) = \binom{n}{nt} \hat\theta^{nt} (1-\hat\theta)^{n(1-t)} \qqtext{ is our estimate of the same thing} \]

Sampling without Replacement

0.600 0.625 0.650 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11
Our point estimate and its sampling distribution
0.600 0.625 0.650 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11
Our estimate of its sampling distribution
  • We can do the same thing when we sample without replacement.
  • We just have to use a different parametric form: the hypergeometric distribution.

\[ P_\theta(\hat\theta = t) = \binom{n}{nt} \frac{\{m(1-\theta)\}!}{\{m(1-\theta)-n(1-t)\}!} \times \frac{(m\theta)!}{(m\theta-nt)!} \times \frac{(m-n)!}{m!} \]

  • Again, the only unknown parameter is \(\theta\). And we can plug in our point estimate to estimate this distribution.

\[ \hat P_\theta(\hat\theta = t) = \binom{n}{nt} \frac{\{m(1-\hat\theta)\}!}{\{m(1-\hat\theta)-n(1-t)\}!} \times \frac{(m\hat\theta)!}{(m\hat\theta-nt)!} \times \frac{(m-n)!}{m!} \]

An Exercise

A Survey

n=6
Y = c(0,0,0,1,1,1)

ggplot() + geom_point(aes(x=1:6, y=Y))
  • We’re going to run a survey on Candy Preferences.
    • I’ll draw a sample of size \(n=6\), with replacement, from the people in the room.
    • They’ll pick candy and write their selection into a sample table on the board.
    • In particular, we’ll write out whether they chose Chocolate (\(Y=1\)) or sour candy (\(Y=0\)).
  • Then we’re going to estimate the sampling distribution in a new way.
    • It’ll be a lot like calculating the actual sampling distribution.
    • We’ll draw a sample of size \(n\) with replacement, use it to calculate our estimator, and repeat.
  • What’s different is the population we’re drawing our sample from.
  • We don’t have responses from the whole population, so we’ll draw it from the closest thing we’ve got: the sample.
    • We call this bootstrapping.
    • The estimate of the sampling distribution we get is called the bootstrap sampling distribution.
  • I’ve drawn the sample we got from our survey on the board.
  • Here’s how you draw a sample from the bootstrap sampling distribution of our point estimator.
    1. Roll your die 6 times to draw a sample of size \(n=6\) from the sample.
    2. Calculate the sample frequency. That’s one draw from the bootstrap sampling distribution. Write it down.
  • If we all do this 5 times, we’ll have a bunch of draws.
  • We’ll tally them up on the board and visualize the result as a histogram.
  • A histogram of draws from the bootstrap sampling distribution.

Simulating a Few More Draws

draws = 1:10000 |> map_vec(function(.) { 
  Jstar = sample(1:n, n, replace=TRUE)
  Ystar = Y[Jstar]
  mean(Ystar)
})
computer.tally = as.data.frame(table(draws)) |>
  rename(theta.hat=draws, count=Freq)

Discussion

hand.tally = data.frame(theta.hat = c(0,1,2,3,4,5,6), 
                        count     = c(0,0,0,1,0,0,0))

tally=computer.tally
ggplot() + geom_col(aes(x=(0:n)/n, y=dbinom(0:n, n, mean(Y))), color='blue', fill='blue', linewidth=.1, alpha=.2) + 
           geom_col(aes(x=(0:n)/n, y=tally$count/sum(tally$count)), fill='red', color='red', linewidth=.1, alpha=.2)
  • If all has gone to plan, our histogram looks a lot like our binomial estimate of the sampling distribition.
    • If we substitute in a computer tally of 10,000 draws from the bootstrap sampling distribution, we nail it.
    • It looks like the bootstrap sampling distribution is the binomial estimate.
  • Q. It is. How do you know?
  • We know the distribution of the frequency of 1s in a sample of size \(n\) drawn with replacement …
    • … from a population \(y_1\ldots y_m\) of binary responses with frequency \(\theta\).
  • It’s \(\text{Binomial}(n, \theta)\). To estimate it, we plug in \(\hat\theta\), the frequency of 1s in our sample.
  • So our Binomial estimate is the distribution of the frequency of 1s in a sample of size \(n\) drawn with replacement …
    • … from ‘a population’ \(Y_1 \ldots Y_n\) with frequency \(\hat\theta\).
  • What’s a draw from the bootstrap sampling distribution?
  • Each bootstrap sample is the frequency of 1s in a sample of size \(n\) drawn with replacement …
    • … from ‘a population’ \(Y_1 \ldots Y_n\) in which the frequency of 1s is \(\hat\theta\), the frequency of 1s in our sample.
    • Because that ‘population’ is our sample.

The Bootstrap

The Bootstrap Interpretation in Our Turnout Poll

  • The sampling distribution estimate we’ve used was \(\text{Binomial}(n,\hat\theta)\) for \(n=625\) and \(\hat\theta \approx 0.68\).
    • It’s the distribution of the proportion heads in 625 flips of a coin with probability \(\hat\theta \approx 0.68\) of heads.
    • where \(\hat\theta \approx 0.68\) is the proportion of voters we’ve polled who will vote.
  • That is, it’s the sampling distribution of a ‘poll’ of the people in our sample, i.e.
    • roll a 625-sided die 625 times
    • call up the corresponding person in our sample
    • and counting up the yeses we hear
  • Note that this is random because we’re drawing with replacement.
    • Each time we run this poll, we call each person in our sample 0,1,2,… times
    • And the number of times we call them is random.
  • If we plot our voters on a map, you can see the idea in visual terms.
    • On the left, we have the population.
    • In the middle, we have our sample. It’s drawn, with replacement, from the population.
    • On the right, we have something else. A new sample drawn, with replacement, from the sample.
  • Each ‘call’ that a person receives increases the size of their dot: \(\text{circle area} \propto \text{number of calls}\).
  • In the sample, even though it’s drawn with replacement, all dots are the same size.
    • Because we draw from such a large population, nobody gets called twice.
  • In the bootstrap sample, dots vary in size.
    • Because we draw \(n\) people from a sample of size \(n\), it’s almost impossible not to call somebody twice.

Bootstrapping

  • Before the election, we don’t observe the population. But we do observe the sample.
    • And we can sample from the sample act as if it were the population.
    • We’ll take repeated random samples of size 625, with replacement, from our sample of size 625.
  • We call these bootstrap samples and estimates based on them bootstrap estimates.
    • The distribution of these estimates is called the bootstrap sampling distribution.
    • If the sample is like the population, this should be like our estimator’s actual sampling distribution.

The Sample \[ \begin{array}{r|rrrr|r} i & 1 & 2 & \dots & 625 & \bar{Y}_{625} \\ Y_i & 1 & 1 & \dots & 1 & 0.68 \\ \end{array} \]

The Bootstrap Sample

\[ \begin{array}{r|rrrr|r} i & 1 & 2 & \dots & 625 & \bar{Y}_{625}^* \\ Y_i^* & 1 & 1 & \dots & 1 & 0.63 \\ \end{array} \]

The Population

\[ \begin{array}{r|rrrr|r} j & 1 & 2 & \dots & 7.23M & \bar{y}_{7.23M} \\ y_{j} & 1 & 1 & \dots & 1 & 0.70 \\ \end{array} \]

The ‘Bootstrap Population’ — The Sample \[ \begin{array}{r|rrrr|r} j & 1 & 2 & \dots & 625 & \bar{y}^*_{625} \\ y_j^* & 1 & 1 & \dots & 1 & 0.68 \\ \end{array} \]

The Bootstrap is Nonparametric

bootstrap.samples = array(dim=10000)
for(rr in 1:10000) {
    Y.star = Y[sample(1:n, n, replace=TRUE)]
    bootstrap.samples[rr] = mean(Y.star) 
}

0.00 0.01 0.02 0.03 0.60 0.65 0.70 0.75 0.80

  • We do not need to know the parametric form of our sampling distribution to use it.
  • All we do is re-run our poll acting as if our sample were the population.
    • We can do this no matter what we’re estimating.
    • So let’s try it out on some stuff where we don’t know the parametric form.

Comparing Black and Non-Black Turnout

\[ \small{ \begin{array}{r|rr|rr|r|rr|rrrrr} \text{call} & 1 & & 2 & & \dots & 625 & & & & & & \\ \text{question} & X_1 & Y_1 & X_2 & Y_2 & \dots & X_{625} & Y_{625} & \overline{X}_{625} & \overline{Y}_{625} &\frac{\sum_{i:X_i=0} Y_i}{\sum_{i:X_i=0} 1} & \frac{\sum_{i:X_i=1} Y_i}{\sum_{i:X_i=1} 1} & \text{difference} \\ \text{outcome} & \underset{\textcolor{gray}{x_{869369}}}{0} & \underset{\textcolor{gray}{y_{869369}}}{1} & \underset{\textcolor{gray}{x_{4428455}}}{1} & \underset{\textcolor{gray}{y_{4428455}}}{1} & \dots & \underset{\textcolor{gray}{x_{1268868}}}{0} & \underset{\textcolor{gray}{y_{1268868}}}{1} & 0.28 & 0.68 & 0.68 & 0.69 & 0.01 \\ \end{array} } \]

  • It’s useful to predict turnout among Black voters because they tend to vote differently than non-Black voters.
  • Let’s suppose that we’re interested in the difference in turnout between Black and non-Black voters.
  • This is a bit reductive, but it’s the beginning of the semester, so we’re keeping things simple.
  • We know from the voter file that 30% of registered voters are Black.
  • Let’s suppose we asked the people we polled if they were Black, recording their answer in a covariate \(X_i\).
  • And we found that…
    • 172 of the people we called (28%) were Black with a turnout rate of 0.69
    • 453 of the people we called (72%) were non-Black with a turnout rate of 0.68
  • So we’d estimate the difference to be \(0.69-0.68 \approx 0.01\).
  • After the election, we found that …
    • Turnout among Black registered voters was 0.74
    • Turnout among non-Black registered voters was 0.68.
  • The actual difference was \(0.74-0.68 \approx 0.06\).
  • Depending on what you’re doing, that point estimate of 0.01 may or may not have been accurate enough.
  • It would’ve been nice to have a confidence interval to tell us what kind of precision we could expect.
  • For that, we’ll need to estimate the sampling distribution of this difference.

A Table of Imaginary Polls

$$ \[\begin{array}{r|rr|rr|r|rr|rrrr} \text{call} & 1 & & 2 & & \dots & 625 & & & & & & \\ \text{poll} & X_1 & Y_1 & X_2 & Y_2 & \dots & X_{625} & Y_{625} & \overline{X} & \overline{Y} &\frac{\sum_{i:X_i=0} Y_i}{\sum_{i:X_i=0} 1} & \frac{\sum_{i:X_i=1} Y_i}{\sum_{i:X_i=1} 1} & \text{difference} \\ \hline \color[RGB]{7,59,76}1 & \color[RGB]{7,59,76}0 & \color[RGB]{7,59,76}1 & \color[RGB]{7,59,76}1 & \color[RGB]{7,59,76}1 & \color[RGB]{7,59,76}\dots & \color[RGB]{7,59,76}0 & \color[RGB]{7,59,76}1 & \color[RGB]{7,59,76}0.28 & \color[RGB]{7,59,76}0.68 & \color[RGB]{7,59,76}0.68 & \color[RGB]{7,59,76}0.69 & \color[RGB]{7,59,76}0.01 \\ \color[RGB]{239,71,111}1 & \color[RGB]{239,71,111}1 & \color[RGB]{239,71,111}0 & \color[RGB]{239,71,111}0 & \color[RGB]{239,71,111}1 & \color[RGB]{239,71,111}\dots & \color[RGB]{239,71,111}0 & \color[RGB]{239,71,111}1 & \color[RGB]{239,71,111}0.29 & \color[RGB]{239,71,111}0.71 & \color[RGB]{239,71,111}0.72 & \color[RGB]{239,71,111}0.70 & \color[RGB]{239,71,111}-0.01 \\ \color[RGB]{17,138,178}1 & \color[RGB]{17,138,178}1 & \color[RGB]{17,138,178}1 & \color[RGB]{17,138,178}0 & \color[RGB]{17,138,178}1 & \color[RGB]{17,138,178}\dots & \color[RGB]{17,138,178}0 & \color[RGB]{17,138,178}1 & \color[RGB]{17,138,178}0.26 & \color[RGB]{17,138,178}0.70 & \color[RGB]{17,138,178}0.68 & \color[RGB]{17,138,178}0.76 & \color[RGB]{17,138,178}0.08 \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ \color[RGB]{6,214,160}1 & \color[RGB]{6,214,160}0 & \color[RGB]{6,214,160}1 & \color[RGB]{6,214,160}0 & \color[RGB]{6,214,160}1 & \color[RGB]{6,214,160}\dots & \color[RGB]{6,214,160}0 & \color[RGB]{6,214,160}1 & \color[RGB]{6,214,160}0.30 & \color[RGB]{6,214,160}0.71 & \color[RGB]{6,214,160}0.68 & \color[RGB]{6,214,160}0.79 & \color[RGB]{6,214,160}0.12 \\ \end{array}\]

$$

The Sampling Distribution

difference.samples = array(dim=10000)
for(rr in 1:10000) {
    I = sample(1:m, n, replace=TRUE)
    X = x[I]
    Y = y[I]
    difference.samples[rr] = mean(Y[X==1]) - mean(Y[X==0])
}

0 3 6 9 -0.1 0.0 0.1 0.2

  • Here’s what the sampling distribution of this difference in turnout looks like.
  • As before, if we can estimate it we can use that estimate to get a 95% confidence interval.
  • But, unlike before, we don’t really know the parametric form of its sampling distribution.
  • Or—at least—it’d be a pain to work it out. So we’ll use the bootstrap to estimate it.

Making a Table of Bootstrap Polls

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625
Our Sample
398 293 281
Three Bootstrap Samples—The First ‘Call’
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625
Our Sample
398 129 293 526 281 520
Three Bootstrap Samples—The First and Second ‘Call’
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625
Our Sample
398 129 232 293 526 578 281 520 363
Three Bootstrap Samples—The First, Second, and Last ‘Call’

\[ \small{ \begin{array}{r|rr|rr|r|rr|rrrr} \text{call} & 1 & & 2 & & \dots & 625 & & & & & & \\ \text{poll} & X_1 & Y_1 & X_2 & Y_2 & \dots & X_{625} & Y_{625} & \overline{X} & \overline{Y} &\frac{\sum_{i:X_i=0} Y_i}{\sum_{i:X_i=0} 1} & \frac{\sum_{i:X_i=1} Y_i}{\sum_{i:X_i=1} 1} & \text{difference} \\ \hline \color[RGB]{7,59,76}1 & \color[RGB]{7,59,76}0 & \color[RGB]{7,59,76}1 & \color[RGB]{7,59,76}1 & \color[RGB]{7,59,76}1 & \color[RGB]{7,59,76}\dots & \color[RGB]{7,59,76}0 & \color[RGB]{7,59,76}1 & \color[RGB]{7,59,76}0.28 & \color[RGB]{7,59,76}0.68 & \color[RGB]{7,59,76}0.68 & 0.69 & \color[RGB]{7,59,76}0.01 \end{array} } \]

\[ \small{ \begin{array}{r|rr|rr|r|rr|rrrr} \text{`call'} & 1 & & 2 & & \dots & 625 & & & & & & \\ \text{`poll'} & X_1^* & Y_1^* & X_2^* & Y_2^* & \dots & X^*_{625} & Y^*_{625} & \overline{X}^* & \overline{Y}^* &\frac{\sum_{i:X_i^*=0} Y_i^*}{\sum_{i:X_i^*=0} 1} & \frac{\sum_{i:X_i^*=1} Y_i^*}{\sum_{i:X_i^*=1} 1} & \text{difference} \\ \hline \color[RGB]{239,71,111}1 & \color[RGB]{239,71,111}X_{398} & \color[RGB]{239,71,111}Y_{398} & & & \color[RGB]{239,71,111}\dots & & & & & & & & \\ & \color[RGB]{239,71,111}1 & \color[RGB]{239,71,111}0 & & & \color[RGB]{239,71,111}\dots & & & & & & & & \\ \color[RGB]{17,138,178}2 & \color[RGB]{17,138,178}X_{293} & \color[RGB]{17,138,178}Y_{293} & & & \color[RGB]{17,138,178}\dots & & & & & & & & \\ & \color[RGB]{17,138,178}0 & \color[RGB]{17,138,178}1 & & & \color[RGB]{17,138,178}\dots & & & & & & & & \\ \color[RGB]{6,214,160}3 & \color[RGB]{6,214,160}X_{281} & \color[RGB]{6,214,160}Y_{281} & & & \color[RGB]{6,214,160}\dots & & & & & & & & \\ & \color[RGB]{6,214,160}0 & \color[RGB]{6,214,160}0 & & & \color[RGB]{6,214,160}\dots & & & & & & & & \\ \end{array} } \]

A Completed Table of Bootstrap Polls

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625
Our Sample
398 129 509 471 299 270 187 307 597 277 494 330 591 37 105 485 382 601 326 330 554 422 111 404 532 506 556 343 582 121 40 537 375 248 198 378 39 435 390 280 526 45 402 22 193 371 499 104 326 492 616 615 465 525 176 345 110 84 29 141 252 620 304 545 557 287 614 145 329 487 498 619 576 490 103 316 51 290 129 282 143 442 285 48 501 511 295 536 214 339 346 43 1 29 590 233 293 573 369 451 86 483 327 622 355 49 361 316 242 440 247 219 135 111 532 377 408 51 565 467 356 130 65 359 105 124 77 218 610 194 19 273 418 543 419 403 587 16 40 604 138 579 229 423 421 140 126 526 508 16 271 130 577 512 451 504 457 358 127 41 548 305 413 576 129 309 441 117 309 470 614 562 336 349 72 590 474 168 501 421 455 356 625 234 484 121 73 539 553 15 441 294 62 390 35 381 77 105 327 31 549 28 62 148 127 572 284 334 31 268 93 336 300 610 282 241 33 437 117 86 217 108 271 270 209 338 609 584 565 568 434 201 354 357 349 587 514 116 422 233 271 439 197 220 462 299 235 513 474 173 83 474 407 324 185 615 180 464 493 444 167 49 471 291 247 434 316 346 346 56 25 81 472 329 494 329 37 480 104 3 179 161 384 436 127 260 60 448 488 181 510 133 618 428 547 377 279 291 375 611 148 150 121 169 598 467 108 530 556 423 91 164 544 479 51 119 465 229 51 381 436 89 533 591 357 472 71 384 315 219 414 493 570 271 546 513 574 117 474 168 41 392 281 5 330 492 20 183 568 79 514 69 570 473 229 296 141 132 42 133 193 294 397 264 177 595 604 368 433 563 139 520 320 115 453 450 453 396 451 218 159 119 495 109 519 624 172 393 86 45 112 27 222 266 158 129 316 386 175 300 261 166 115 340 92 412 99 516 128 494 166 435 116 192 85 208 229 383 277 87 484 503 305 150 515 112 312 516 58 308 485 111 342 597 377 91 122 44 390 48 100 313 232 481 360 11 310 85 81 256 54 146 115 543 328 6 377 139 435 250 485 258 288 234 296 315 419 523 130 488 198 399 567 182 553 61 348 572 510 65 433 219 362 66 613 245 206 601 198 380 453 153 17 190 33 296 209 148 501 343 276 624 31 449 573 226 548 341 486 564 472 303 84 498 579 596 250 122 208 363 164 624 286 367 349 525 243 519 594 573 257 325 110 36 354 282 225 19 465 371 397 306 327 86 329 352 564 97 433 412 268 478 573 568 547 242 410 387 207 167 87 217 513 500 398 351 248 80 469 44 460 605 449 524 476 268 153 306 89 8 54 140 30 521 194 361 503 128 425 43 278 377 411 42 102 572 87 323 385 322 211 453 354 489 396 105 574 13 295 70 306 48 365 345 150 410 552 560 505 262 566 200 597 280 232 293 526 27 589 3 573 69 601 59 247 543 123 589 515 296 24 461 446 330 122 498 19 376 466 183 218 463 128 42 541 342 577 525 442 326 108 41 97 280 125 237 339 8 396 459 144 134 472 218 252 456 21 519 298 606 133 133 617 386 290 332 110 570 201 487 211 5 465 329 492 182 89 397 291 190 247 314 457 176 58 126 583 183 492 186 514 596 180 57 435 315 553 537 419 118 277 32 13 336 7 217 234 250 483 345 525 560 563 494 330 75 344 606 278 407 88 268 28 392 151 13 500 322 233 563 394 568 300 23 582 141 606 251 107 237 268 179 214 151 308 387 260 469 460 88 254 199 534 312 186 118 311 352 389 564 500 183 294 347 394 413 589 185 439 113 510 114 500 192 347 231 42 560 129 376 73 620 523 314 228 407 476 25 280 507 332 186 517 159 518 430 346 475 321 577 393 582 448 492 170 484 498 324 457 73 235 569 551 318 58 567 349 568 272 131 425 14 430 108 64 483 83 459 407 306 487 382 462 458 149 375 284 260 89 165 416 138 557 265 289 55 331 84 354 329 175 13 5 387 282 500 110 174 304 392 384 210 130 174 548 452 78 573 587 45 400 153 379 479 27 617 352 46 343 468 454 197 226 369 341 90 551 616 458 469 269 575 249 71 101 593 520 550 121 454 343 267 14 411 617 1 580 551 23 127 255 608 400 296 404 132 9 241 223 266 417 529 217 576 504 586 574 157 432 538 486 338 547 444 368 206 350 65 442 567 219 590 544 305 142 348 596 220 373 414 293 440 476 552 166 343 541 623 552 591 437 522 611 503 217 73 462 445 568 410 104 464 95 141 361 559 371 387 459 271 323 275 75 136 161 27 561 304 624 482 289 400 144 559 196 60 93 309 265 251 43 175 459 574 213 492 554 455 501 64 625 137 500 47 204 15 503 47 493 484 346 391 80 55 236 593 610 241 490 406 82 68 231 549 15 408 56 318 581 228 428 116 23 47 343 214 413 203 479 194 516 617 106 522 13 421 578 139 235 69 572 412 243 7 79 382 427 127 266 365 582 576 42 39 205 225 103 458 185 378 513 245 33 607 583 615 450 572 214 614 241 66 161 133 225 213 493 500 355 39 528 104 579 185 416 377 408 459 386 353 364 20 273 175 327 287 258 606 383 181 493 165 248 117 435 530 490 537 244 16 139 140 442 378 607 276 553 534 579 534 556 250 250 306 387 346 255 360 190 224 552 459 124 376 406 47 196 219 264 192 492 9 623 534 576 121 39 502 341 401 207 34 243 116 216 500 326 227 347 278 82 571 36 508 444 592 607 289 624 298 158 68 522 64 503 173 391 302 138 505 379 241 372 481 292 120 397 255 341 37 360 378 390 90 561 599 490 79 377 552 451 438 315 46 248 487 452 510 303 578 281 520 97 40 41 171 143 584 113 156 80 600 207 86 118 133 115 615 617 431 431 553 495 15 291 274 350 516 232 87 52 336 335 557 531 363 515 463 580 140 474 570 590 97 202 166 130 430 8 536 67 107 146 234 11 94 180 569 205 340 121 225 354 598 29 14 512 240 341 530 250 187 352 217 517 74 118 118 47 489 490 408 610 220 608 191 491 117 41 619 72 120 134 612 219 234 137 445 300 162 237 459 615 197 317 618 451 448 267 187 194 489 81 472 504 53 478 309 554 426 310 494 616 237 68 128 387 163 23 190 547 621 218 553 483 174 136 139 191 182 534 120 451 240 595 294 403 64 237 252 72 308 348 554 406 271 145 54 538 224 336 199 185 474 125 442 121 407 547 194 190 569 556 150 441 7 115 174 66 494 92 266 229 196 580 51 179 392 311 461 278 514 253 31 229 338 140 204 361 14 76 286 244 331 344 373 464 185 335 95 144 110 82 307 54 261 1 44 460 479 131 332 268 427 137 602 399 182 163 198 198 255 355 461 459 468 18 79 14 312 332 169 3 217 239 617 415 167 268 566 196 308 259 355 406 66 252 375 264 28 130 479 355 29 610 11 101 50 589 117 219 512 424 433 271 573 254 370 253 488 115 237 619 221 173 380 303 106 152 498 558 71 197 516 543 528 318 471 28 246 157 48 473 617 295 401 148 459 104 324 416 329 536 393 78 244 189 305 610 621 29 460 492 63 306 442 225 344 519 307 114 429 126 482 158 553 271 539 472 621 586 212 463 566 152 515 48 414 604 113 579 325 181 263 362 273 110 334 323 257 571 546 142 138 362 528 317 235 208 28 467 16 36 133 17 197 130 560 533 303 263 546 215 412 137 55 370 597 250 330 140 258 113 169 89 619 526 461 83 314 135 291 232 27 205 216 522 239 411 342 52 371 162 594 158 85 511 168 458 233 103 579 615 413 25 344 512 434 322 523 309 60 552 18 510 38 533 264 570 489 296 288 87 280 564 335 484 259 318 21 107 553 557 262 519 131 4 434 175 491 396 207 513 546 119 365 173 547 200 378 288 571 83 362 161 242 529 49 489 121 316 445 543 93 141 544 107 26 12 519 359 576 388 389 47 338 185 569 283 384 557 560 250 102 558 572 178 87 61 542 210 240 474 564 438 604 149 99 132 213 167 73 185 164 423 531 269 105 77 116 447 624 560 92 573 532 593 89 398 286 147 346 108 68 403 616 322 80 38 220 501 284 154 543 300 170 257 218 334 10 62 400 497 250 163 213 540 429 536 503 71 582 88 338 254 381 35 417 61 66 238 291 348 294 22 63 218 440 390 103 150 371 384 352 148 513 500 28 505 544 77 100 135 29 142 157 5 190 374 443 101 478 251 343 342 257 243 599 242 375 247 592 10 359 363
A Few Bootstrap Samples

\[ \small{ \begin{array}{r|rr|rr|r|rr|rrrr} \text{call} & 1 & & 2 & & \dots & 625 & & & & & & \\ \text{poll} & X_1 & Y_1 & X_2 & Y_2 & \dots & X_{625} & Y_{625} & \overline{X} & \overline{Y} &\frac{\sum_{i:X_i=0} Y_i}{\sum_{i:X_i=0} 1} & \frac{\sum_{i:X_i=1} Y_i}{\sum_{i:X_i=1} 1} & \text{difference} \\ \hline \color[RGB]{7,59,76}1 & \color[RGB]{7,59,76}0 & \color[RGB]{7,59,76}1 & \color[RGB]{7,59,76}1 & \color[RGB]{7,59,76}1 & \color[RGB]{7,59,76}\dots & \color[RGB]{7,59,76}0 & \color[RGB]{7,59,76}1 & \color[RGB]{7,59,76}0.28 & \color[RGB]{7,59,76}0.68 & \color[RGB]{7,59,76}0.68 & 0.69 & \color[RGB]{7,59,76}0.01 \end{array} } \]

\[ \small{ \begin{array}{r|rr|rr|r|rr|rrrr} \text{`call'} & 1 & & 2 & & \dots & 625 & & & & & & \\ \text{`poll'} & X_1^* & Y_1^* & X_2^* & Y_2^* & \dots & X^*_{625} & Y^*_{625} & \overline{X}^* & \overline{Y}^* &\frac{\sum_{i:X_i^*=0} Y_i^*}{\sum_{i:X_i^*=0} 1} & \frac{\sum_{i:X_i^*=1} Y_i^*}{\sum_{i:X_i^*=1} 1} & \text{difference} \\ \hline \color[RGB]{239,71,111}2 & \color[RGB]{239,71,111}X_{398} & \color[RGB]{239,71,111}Y_{398} & \color[RGB]{239,71,111}X_{129} & \color[RGB]{239,71,111}Y_{129} & \color[RGB]{239,71,111}\dots & \color[RGB]{239,71,111}X_{232} & \color[RGB]{239,71,111}Y_{232} & & & & & & \\ & \color[RGB]{239,71,111}1 & \color[RGB]{239,71,111}0 & \color[RGB]{239,71,111}1 & \color[RGB]{239,71,111}1 & \color[RGB]{239,71,111}\dots & \color[RGB]{239,71,111}0 & \color[RGB]{239,71,111}1 & \color[RGB]{239,71,111}0.29 & \color[RGB]{239,71,111}0.68 & \color[RGB]{239,71,111}0.69 & \color[RGB]{239,71,111}0.68 & \color[RGB]{239,71,111}-0.01 \\ \color[RGB]{17,138,178}2 & \color[RGB]{17,138,178}X_{293} & \color[RGB]{17,138,178}Y_{293} & \color[RGB]{17,138,178}X_{526} & \color[RGB]{17,138,178}Y_{526} & \color[RGB]{17,138,178}\dots & \color[RGB]{17,138,178}X_{578} & \color[RGB]{17,138,178}Y_{578} & & & & & & \\ & \color[RGB]{17,138,178}0 & \color[RGB]{17,138,178}1 & \color[RGB]{17,138,178}0 & \color[RGB]{17,138,178}1 & \color[RGB]{17,138,178}\dots & \color[RGB]{17,138,178}0 & \color[RGB]{17,138,178}1 & \color[RGB]{17,138,178}0.28 & \color[RGB]{17,138,178}0.65 & \color[RGB]{17,138,178}0.64 & \color[RGB]{17,138,178}0.67 & \color[RGB]{17,138,178}0.03 \\ \color[RGB]{6,214,160}1M & \color[RGB]{6,214,160}X_{281} & \color[RGB]{6,214,160}Y_{281} & \color[RGB]{6,214,160}X_{520} & \color[RGB]{6,214,160}Y_{520} & \color[RGB]{6,214,160}\dots & \color[RGB]{6,214,160}X_{363} & \color[RGB]{6,214,160}Y_{363} & & & & & & \\ & \color[RGB]{6,214,160}0 & \color[RGB]{6,214,160}0 & \color[RGB]{6,214,160}0 & \color[RGB]{6,214,160}1 & \color[RGB]{6,214,160}\dots & \color[RGB]{6,214,160}0 & \color[RGB]{6,214,160}1 & \color[RGB]{6,214,160}0.28 & \color[RGB]{6,214,160}0.68 & \color[RGB]{6,214,160}0.66 & \color[RGB]{6,214,160}0.71 & \color[RGB]{6,214,160}0.05 \\ \end{array} } \]

The Difference’s Bootstrap Sampling Distribution

difference.bootstrap.samples = array(dim=10000)
for(rr in 1:10000) {
    I = sample(1:n, n, replace=TRUE)
    Xstar = X[I]
    Ystar = Y[I]
    difference.bootstrap.samples[rr] = mean(Ystar[Xstar==1]) - mean(Ystar[Xstar==0])
}

-0.1 0.0 0.1 0.2 0 3 6 9

-0.1 0.0 0.1 0.2 0 3 6 9

  • It looks like it works in this case.
  • But we’re no longer able to argue that it should work the way we did before.
    • For that, we took advantage of our knowledge of our estimator’s parametric form.
    • And we don’t have that now.
  • We’ll get there. But we’ll need a few new tools we’ll develop in the coming weeks.
    • Normal approximation—a parametric form for an approximation to our estimator’s sampling distribution.
    • Techniques for variance calculation. This’ll help us understand the parameters that go into it.