Lab 2

Probability Background: Random Variables

Plan for Today

Review sampling notation and implementation
Introduce Random Variables
Focus on Binary Random Variables
- Where they come from
- Things to know about them
Talk about Multiple Random Variables
- Joint distributions
- Marginal distributions
- Independence

Review

Sampling with Replacement


pop = data.frame(j=1:6,
                 y=c(0,1,1,1,0,1))
m = nrow(pop)

1: We input our population as a data frame pop with two columns, j and y, corresponding to the rows on in the table below.
2: We record our population size as m for later use.

$j$	1	2	3	4	5	6
$y$	$\underset{y_1}{0}$	$\underset{y_2}{1}$	$\underset{y_3}{1}$	$\underset{y_4}{1}$	$\underset{y_5}{0}$	$\underset{y_6}{1}$

Plotting
Accessing Rows and Columns
Sampling
… and Forgetting


scales = list(scale_x_continuous(breaks = 1:6),
              scale_y_continuous(breaks = (0:3/3), labels=sprintf("%d/3", 0:3)))

pop.plot = ggplot(pop, aes(x = j, y = y)) +
           geom_point(size=5, shape='circle', alpha=.1) +
           scales
pop.plot

1: Before we plot, we’ll define the ‘scales’ to communicate to ggplot the grid lines (breaks) we want in our plot and how we want them labeled.
2: We create a plot pop.plot that uses pop as its data source and interprets its columns j and y as x and y-coordinates respectively.
3: We add a visualization that plots these points $(j, y_j)$. Adding plot elements is done with +. The arguments we pass to geom_point tell us how to style points. We ask that they be 5mm circles (size=5, shape='circle') that are fairly transparent (alpha=.1 for roughly 10% opacity). Why those choices? It looked right to me. Styling is a bit of a trial and error process.
4: We add the scales we defined before. This is done with + again.
5: So far, we’ve defined but not displayed the plot. Here we ‘return’ the plot to the R terminal so it gets displayed.


pop$y

1: Saying pop$y gets the y column of pop. That’s a vector.

[1] 0 1 1 1 0 1


pop[1,]

2: Saying pop[1,] gets the first row of pop. That’s still a data frame.

  j y
1 1 0


J = c(1,2,1)
J

3: Saying c(1,2,1) gives us a vector of three numbers: the vector $[1,2,1]$.

[1] 1 2 1


pop[J,]

4: Saying pop[J,], if J is a vector of numbers, stacks the rows of pop[J[1],], pop[J[2],]… into a data frame. It handles repetition in J by repeating rows. This one is pop[1,], pop[2,], and pop[1,] stacked.


pop$y[J]

5: You can do the same thing with a vector, e.g. pop$y. This one is pop$y[1], pop$y[2], and pop$y[1].

[1] 0 1 0


n = 3  
J = sample(1:m, n, replace=TRUE)
sam = pop[J, ]

1: We ask R to sample n numbers from 1 to m with replacement, as if we were rolling an m-sided die n times. It gives us in a vector length n that we will call J. J[1] (in code) or $J_1$ (in math) is the first number in this list and so on for 2,3,…
2: We ask R to give us a data frame sam with $n$ rows: its $i$th row is the row of the population specified by our $i$th dice roll: sam[i,]=pop[J[i],] (code) and $Y_i = Y_{J_i}$ (math).

$i$	1	2	3
$J_i$	1	4	1
$Y_i$	$\underset{y_{1}}{0}$	$\underset{y_{4}}{1}$	$\underset{y_{1}}{0}$


pop.plot + geom_point(aes(x=j, y=y), data=sam,
                      color='blue', size=4,
                      position=position_dodge2(width=.12))

1: What we’re doing here is adding a visualization of our sample on top of our population plot. We have to specify a new data source data=sam for this visualization because it would otherwise think we were using the population plot’s data source pop. And that we want to use the j column as the x-coordinate so we plot our sample points on top of the corresponding population points. We usually can’t do this because we don’t know the population, but this can be useful when can, e.g. in simulated studies.
2: Use 4mm blue dots.
3: Using position=position_dodge2(...) lets us see when we have multiple copies of the same point in our sample. It plots the copies side-by-side instead of on top of each other. The width argument tells ggplot how much space to put between the points.

$i$	1	2	3
$J_i$	1	4	1
$Y_i$	$\underset{y_{1}}{0}$	$\underset{y_{4}}{1}$	$\underset{y_{1}}{0}$

becomes

$i$	1	2	3
$Y_i$	$\underset{\color{lightgray}y_{1}}{0}$	$\underset{\color{lightgray}y_{4}}{1}$	$\underset{\color{lightgray}y_{1}}{0}$


sam$i = 1:n
ggplot(sam) + geom_point(aes(x=i, y=y),
                         color='blue', size=4)  + scales

1: Usually, instead of plotting on top of the population, we just plot the sample as its own thing. We use $i$ as the x-coordinate instead of $J_i$ like we did before. No need to dodge because $i$ isn’t duplicated even if $J_i$ is.

Sampling by Randomization

pop = data.frame(j=1:6,                      
                 y=c(0,1,1,1,0,1))           
m = nrow(pop)

$j$	1	2	3	4	5	6
$y$	$\underset{y_1}{0}$	$\underset{y_2}{1}$	$\underset{y_3}{1}$	$\underset{y_4}{1}$	$\underset{y_5}{0}$	$\underset{y_6}{1}$

Plotting
Logical Indexing
Randomizing
… and Forgetting

scales = list(scale_x_continuous(breaks = 1:6),                                       
              scale_y_continuous(breaks = (0:3/3), labels=sprintf("%d/3", 0:3)))      

pop.plot = ggplot(pop, aes(x = j, y = y)) +                                           
           geom_point(size=5, shape='circle', alpha=.1) +                             
           scales                                                                     
pop.plot


W = c(FALSE,FALSE,TRUE,TRUE,FALSE,TRUE)
pop[W,]

1: We create a vector of ‘logicals’ (TRUE or FALSE) of the same size as our population.
2: We use this vector to index our population. It gives us the 3 rows of pop where W is TRUE. Looking at the population member id $j$ in the output, we see that it’s the 3rd, 4th, and 6th rows. As it should be.


not.quite.W = ifelse(W, 1, 0)
pop[not.quite.W, ]

3: Often, people treat the number 0 as false and the number 1 as true. This code converts our vector of logicals W to a corresponding vector of 0s and 1.
4: Logical indexing doesn’t work with vectors of 0s and 1s. It only works with logicals. So this code doesn’t give us the same result as the previous one. In fact, it ignores the zeros in not.quite.W and gives us 3 copies of pop[1,] — one for each copy of 1 in not.quite.W.


n = 3
sampling.rate = n/m
not.quite.W = rbinom(m, 1, sampling.rate)
W = as.logical(not.quite.W)
sam = pop[W, ]

1: We calculate our sampling rate—the probability that our coin comes up heads—as $n/m$. This’ll give us a roughly n heads when we flip our coin m times.
2: We flip our coin m times. This gives us a vector of 0s and 1s that we’ll call not.quite.W.
3: We convert our 0s and 1s to logicals. This gives us a vector of TRUEs and FALSEs that we’ll call W.
4: We use the vector of logicals W to index our population. This gives us the rows of pop where W is TRUE—the rows where our coin came up heads.

$j$	1	2	3	4	5	6
$W_j$	0	0	1	1	0	1
$y_j$	$\underset{y_{1}}{0}$	$\underset{y_{2}}{1}$	$\underset{y_{3}}{1}$	$\underset{y_{4}}{1}$	$\underset{y_{5}}{0}$	$\underset{y_{6}}{1}$

becomes, dropping the rows where we flip tails ($W_j=0$), and counting our the remaining rows $i=1,2,\ldots$,

$i$	1	2	3
$J_i$	3	4	6
$Y_i$	$\underset{y_{3}}{1}$	$\underset{y_{4}}{1}$	$\underset{y_{6}}{1}$


pop.plot + geom_point(aes(x=j, y=y), data=sam,
                      color='blue',  size=4)

1: Here we’re visualizing our sample on top of the population again. Because we’re not sampling any population members twice, we don’t need to use position=position_dodge(...) like we did before.

$i$	1	2	3
$J_i$	3	4	6
$Y_i$	$\underset{y_{3}}{1}$	$\underset{y_{4}}{1}$	$\underset{y_{6}}{1}$

becomes

$i$	1	2	3
$Y_i$	$\underset{\color{lightgray}y_{3}}{1}$	$\underset{\color{lightgray}y_{4}}{1}$	$\underset{\color{lightgray}y_{6}}{1}$


sam$i = 1:n
ggplot(sam) + geom_point(aes(x=i, y=y),
                         color='blue', size=4)  + scales

Binary Random Variables

It’s a function that assigns a numerical value to each outcome in a sample space of a random process.
It’s a variable whose possible values are numerical outcomes of a random phenomenon.
It’s a measurable function from a set of possible outcomes to a set of real numbers, often representing quantities of interest in probabilistic models.

Q. Which do you like best? Why?

It’s a way of referring to a probability distribution that can be written, in calculations, like a number.

p	X
1/2	0
1/2	1

p	Y
1/6	1
1/6	2
1/6	3
1/6	4
1/6	5
1/6	6

p	Z

The point of all this is that we can say things like …
1. What is the probability that $X^2$ is 1
2. What is probability that $X+Y$ is either 4,5, or 6
3. What is the probability that $X/Y$ is greater than 1/2
Thinking this way gets very useful when we can start answering questions like this …
- … without actually working out the probability distributions of $X^2$, $X+Y$, or $X/Y$.
- … at least approximately.
We can do this using ‘universality results’ like the Law of Large Numbers and the Central Limit Theorem.
- Coming soon.

Binary Random Variables

p	X
1/2	0
1/2	1

p	Y
1/6	1
1/6	2
1/6	3
1/6	4
1/6	5
1/6	6

p	Z

Which of these are binary? Why?

Where Do They Come From?

Real Life Objects
A Small Population
A Large Population

Things to Know About Them

Mean a.k.a. Expected Value

Mean Absolute Deviation

Standard Deviation

Use in ‘If/Else’ Statements

Multiple Random Variables

Joint Distributions

p	$X_1$	$X_2$
1/4	0	0
1/4	0	1
1/4	1	0
1/4	1	1

p	$X_1$	$X_2$
1/3	0	0
1/3	1	0
1/3	1	1

p	$X_1$	$X_2$

p	$X_1$	$X_2$

Marginal Distributions
Tip
‘Off-Axis’ Marginals
Tip
Independence
Tip
Want R?

In each of our four tables …

what is the marginal distribution of $X_1$?
what about $X_2$?

Tip

If you want the probability that $X_1$ is $x$, sum the probabilities of all rows where $X_1$ is $x$.

In each of our four tables …

what is the marginal distribution of $X_1 + X_2$?
what about $X_1 \times X_2$?

Tip

If you want the probability that some function of $X_1$ and $X_2$ (e.g. $X_1 + X_2$ ) is $x$, add it as a column to the table. This doesn’t change probabilities because its value is determined by the values of the other columns. Then sum the probabilities of all rows where that function is $x$.

In each of our four tables … - Are $X_1$ and $X_2$ independent?

Tip

A pair of random variables $X_1$ and $X_2$ are independent if their joint distribution is the product of their marginal distributions. That is, if $P(X_1,X_2 = x_1, x_2) = P(X_1 = x_1)P(X_2 = x_2)$ for all $x_1$ and $x_2$.

Discrete Random Variables

p	Y
1/6	1
1/6	2
1/6	3
1/6	4
1/6	5
1/6	6

How would you calculate the join distribution of two dice rolls?
- How would you do it by hand?
- Could you implement it in R?
What about the marginal distribution of their sum?

Continuous-Valued Random Variables (Optional)

\(i\)	1	2	3
\(J_i\)	1	4	1
\(Y_i\)	\(\underset{y_{1}}{0}\)	\(\underset{y_{4}}{1}\)	\(\underset{y_{1}}{0}\)

\(i\)	1	2	3
\(J_i\)	1	4	1
\(Y_i\)	\(\underset{y_{1}}{0}\)	\(\underset{y_{4}}{1}\)	\(\underset{y_{1}}{0}\)

\(j\)	1	2	3	4	5	6
\(W_j\)	0	0	1	1	0	1
\(y_j\)	\(\underset{y_{1}}{0}\)	\(\underset{y_{2}}{1}\)	\(\underset{y_{3}}{1}\)	\(\underset{y_{4}}{1}\)	\(\underset{y_{5}}{0}\)	\(\underset{y_{6}}{1}\)

\(i\)	1	2	3
\(J_i\)	3	4	6
\(Y_i\)	\(\underset{y_{3}}{1}\)	\(\underset{y_{4}}{1}\)	\(\underset{y_{6}}{1}\)

\(i\)	1	2	3
\(J_i\)	3	4	6
\(Y_i\)	\(\underset{y_{3}}{1}\)	\(\underset{y_{4}}{1}\)	\(\underset{y_{6}}{1}\)