- 1
-
We input our population as a data frame
pop
with two columns,j
andy
, corresponding to the rows on in the table below. - 2
-
We record our population size as
m
for later use.
Probability Background: Random Variables
pop
with two columns, j
and y
, corresponding to the rows on in the table below.
m
for later use.
\(j\) | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|
\(y\) | \(\underset{y_1}{0}\) | \(\underset{y_2}{1}\) | \(\underset{y_3}{1}\) | \(\underset{y_4}{1}\) | \(\underset{y_5}{0}\) | \(\underset{y_6}{1}\) |
scales = list(scale_x_continuous(breaks = 1:6),
scale_y_continuous(breaks = (0:3/3), labels=sprintf("%d/3", 0:3)))
pop.plot = ggplot(pop, aes(x = j, y = y)) +
geom_point(size=5, shape='circle', alpha=.1) +
scales
pop.plot
breaks
) we want in our plot and how we want them labeled.
pop.plot
that uses pop
as its data source and interprets its columns j
and y
as x and y-coordinates respectively.
+
. The arguments we pass to geom_point
tell us how to style points. We ask that they be 5mm circles (size=5, shape='circle'
) that are fairly transparent (alpha=.1
for roughly 10% opacity). Why those choices? It looked right to me. Styling is a bit of a trial and error process.
+
again.
c(1,2,1)
gives us a vector of three numbers: the vector \([1,2,1]\).
[1] 1 2 1
pop[J,]
, if J
is a vector of numbers, stacks the rows of pop[J[1],]
, pop[J[2],]
… into a data frame. It handles repetition in J
by repeating rows. This one is pop[1,]
, pop[2,]
, and pop[1,]
stacked.
j y
1 1 0
2 2 1
1.1 1 0
n
numbers from 1 to m
with replacement, as if we were rolling an m
-sided die n
times. It gives us in a vector length n
that we will call J
. J[1]
(in code) or \(J_1\) (in math) is the first number in this list and so on for 2,3,…
sam
with \(n\) rows: its \(i\)th row is the row of the population specified by our \(i\)th dice roll: sam[i,]=pop[J[i],]
(code) and \(Y_i = Y_{J_i}\) (math).
\(i\) | 1 | 2 | 3 |
---|---|---|---|
\(J_i\) | 1 | 4 | 1 |
\(Y_i\) | \(\underset{y_{1}}{0}\) | \(\underset{y_{4}}{1}\) | \(\underset{y_{1}}{0}\) |
pop.plot + geom_point(aes(x=j, y=y), data=sam,
color='blue', size=4,
position=position_dodge2(width=.12))
data=sam
for this visualization because it would otherwise think we were using the population plot’s data source pop
. And that we want to use the j
column as the x-coordinate so we plot our sample points on top of the corresponding population points. We usually can’t do this because we don’t know the population, but this can be useful when can, e.g. in simulated studies.
position=position_dodge2(...)
lets us see when we have multiple copies of the same point in our sample. It plots the copies side-by-side instead of on top of each other. The width
argument tells ggplot how much space to put between the points.
\(i\) | 1 | 2 | 3 |
---|---|---|---|
\(J_i\) | 1 | 4 | 1 |
\(Y_i\) | \(\underset{y_{1}}{0}\) | \(\underset{y_{4}}{1}\) | \(\underset{y_{1}}{0}\) |
becomes
\(i\) | 1 | 2 | 3 |
---|---|---|---|
\(Y_i\) | \(\underset{\color{lightgray}y_{1}}{0}\) | \(\underset{\color{lightgray}y_{4}}{1}\) | \(\underset{\color{lightgray}y_{1}}{0}\) |
\(j\) | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|
\(y\) | \(\underset{y_1}{0}\) | \(\underset{y_2}{1}\) | \(\underset{y_3}{1}\) | \(\underset{y_4}{1}\) | \(\underset{y_5}{0}\) | \(\underset{y_6}{1}\) |
pop
where W
is TRUE
. Looking at the population member id \(j\) in the output, we see that it’s the 3rd, 4th, and 6th rows. As it should be.
j y
3 3 1
4 4 1
6 6 1
0
as false and the number 1
as true. This code converts our vector of logicals W
to a corresponding vector of 0
s and 1
.
0
s and 1
s. It only works with logicals. So this code doesn’t give us the same result as the previous one. In fact, it ignores the zeros in not.quite.W and gives us 3 copies of pop[1,]
— one for each copy of 1
in not.quite.W
.
j y
1 1 0
1.1 1 0
1.2 1 0
n = 3
sampling.rate = n/m
not.quite.W = rbinom(m, 1, sampling.rate)
W = as.logical(not.quite.W)
sam = pop[W, ]
n
heads when we flip our coin m
times.
m
times. This gives us a vector of 0
s and 1
s that we’ll call not.quite.W
.
0
s and 1
s to logicals. This gives us a vector of TRUE
s and FALSE
s that we’ll call W
.
W
to index our population. This gives us the rows of pop
where W
is TRUE
—the rows where our coin came up heads.
\(j\) | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|
\(W_j\) | 0 | 0 | 1 | 1 | 0 | 1 |
\(y_j\) | \(\underset{y_{1}}{0}\) | \(\underset{y_{2}}{1}\) | \(\underset{y_{3}}{1}\) | \(\underset{y_{4}}{1}\) | \(\underset{y_{5}}{0}\) | \(\underset{y_{6}}{1}\) |
becomes, dropping the rows where we flip tails (\(W_j=0\)), and counting our the remaining rows \(i=1,2,\ldots\),
\(i\) | 1 | 2 | 3 |
---|---|---|---|
\(J_i\) | 3 | 4 | 6 |
\(Y_i\) | \(\underset{y_{3}}{1}\) | \(\underset{y_{4}}{1}\) | \(\underset{y_{6}}{1}\) |
\(i\) | 1 | 2 | 3 |
---|---|---|---|
\(J_i\) | 3 | 4 | 6 |
\(Y_i\) | \(\underset{y_{3}}{1}\) | \(\underset{y_{4}}{1}\) | \(\underset{y_{6}}{1}\) |
becomes
\(i\) | 1 | 2 | 3 |
---|---|---|---|
\(Y_i\) | \(\underset{\color{lightgray}y_{3}}{1}\) | \(\underset{\color{lightgray}y_{4}}{1}\) | \(\underset{\color{lightgray}y_{6}}{1}\) |
It’s a function that assigns a numerical value to each outcome in a sample space of a random process.
It’s a variable whose possible values are numerical outcomes of a random phenomenon.
It’s a measurable function from a set of possible outcomes to a set of real numbers, often representing quantities of interest in probabilistic models.
Q. Which do you like best? Why?
p | X |
---|---|
1/2 | 0 |
1/2 | 1 |
p | Y |
---|---|
1/6 | 1 |
1/6 | 2 |
1/6 | 3 |
1/6 | 4 |
1/6 | 5 |
1/6 | 6 |
p | Z |
---|
The point of all this is that we can say things like …
Thinking this way gets very useful when we can start answering questions like this …
We can do this using ‘universality results’ like the Law of Large Numbers and the Central Limit Theorem.
p | X |
---|---|
1/2 | 0 |
1/2 | 1 |
p | Y |
---|---|
1/6 | 1 |
1/6 | 2 |
1/6 | 3 |
1/6 | 4 |
1/6 | 5 |
1/6 | 6 |
p | Z |
---|
Which of these are binary? Why?
p | \(X_1\) | \(X_2\) |
---|---|---|
1/4 | 0 | 0 |
1/4 | 0 | 1 |
1/4 | 1 | 0 |
1/4 | 1 | 1 |
p | \(X_1\) | \(X_2\) |
---|---|---|
1/3 | 0 | 0 |
1/3 | 1 | 0 |
1/3 | 1 | 1 |
p | \(X_1\) | \(X_2\) |
---|
p | \(X_1\) | \(X_2\) |
---|
In each of our four tables …
Tip
If you want the probability that \(X_1\) is \(x\), sum the probabilities of all rows where \(X_1\) is \(x\).
In each of our four tables …
Tip
If you want the probability that some function of \(X_1\) and \(X_2\) (e.g. \(X_1 + X_2\) ) is \(x\), add it as a column to the table. This doesn’t change probabilities because its value is determined by the values of the other columns. Then sum the probabilities of all rows where that function is \(x\).
In each of our four tables … - Are \(X_1\) and \(X_2\) independent?
Tip
A pair of random variables \(X_1\) and \(X_2\) are independent if their joint distribution is the product of their marginal distributions. That is, if \(P(X_1,X_2 = x_1, x_2) = P(X_1 = x_1)P(X_2 = x_2)\) for all \(x_1\) and \(x_2\).
p | Y |
---|---|
1/6 | 1 |
1/6 | 2 |
1/6 | 3 |
1/6 | 4 |
1/6 | 5 |
1/6 | 6 |