Lab 6

Least Squares in R

Setup

Loading the Data and Sampling
Plotting Code
Styling

Least Squares in Linear Models

Sample
Population

Least Squares
Linear Models

The estimation targets we’ve talked about are all summaries of column means in the population.

\[ \begin{aligned} \mu(w,x) &= \frac{1}{m_{w,x}}\sum_{j:w_j=w,x_j=x} y_j \qfor m_{w,x} = \sum_{j: w_j=w, x_j=x} 1 \\ &= \mathop{\mathrm{E}}[Y_i \mid W_i=w, X_i=x] \qfor (W_i,X_i,Y_i) \qqtext{sampled uniformly-at-random} \end{aligned} \]

Least Squares Regression is a method for choosing a function to estimate these column means.
- When we choose from the set of all functions of $(w,x)$ …
- … we get the mean in the corresponding column of the sample.

\[ \begin{aligned} \hat\mu(w,x) &= \mathop{\mathrm{argmin}}_{m \in \mathcal{M}} \sum_{i=1}^n \qty{Y_i - m(W_i,X_i)}^2 \\ &= \frac{1}{N_{w,x}}\sum_{i:W_i=w,X_i=x} Y_i \qqtext{ when } \mathcal{M}= \{ \text{all functions of } (w,x) \} \end{aligned} \]

The set of functions we choose from is our model.
- Or regression model if we want to be verbose.
- And there are, of course, many other models.
When we don’t have a lot of data in some columns in the sample …
- Column means are really sensitive to who we happened to sample.
- And we’re probably better off using a smaller model, i.e., one that includes fewer functions.
The models we’ll focus on in this class are linear models.
- They’re closed under addition, subtraction, and scalar multiplication.
- That means when you add, subtract, or scale any functions in the model …
- … you get another function in the model.

Least Squares with Univariate Data

Models

Models
The Example Above
Using predict
How lm and predict work (optional reading)

\[ \begin{aligned} \mathcal{M}&= \{ \text{all functions } \ m(x) \} && \text{all functions} \\ \mathcal{M}&= \{ \text{all functions } \ m(c(x)) \} && \text{all functions of $c(x)$, a coarsened version of $x$} \\ \mathcal{M}&= \{ m(x) = a + bx \ : \ a,b \in \mathbb{R} \} && \text{all lines} \end{aligned} \]

We use R’s built-in function lm¹ to do least squares with linear models.
To tell it which model to use, we use what R calls a ‘formula’.
- What a formula does is describe a basis for the model.
- i.e. a set of functions that we can take linear combinations of to get any function in the model. \[ \begin{aligned} \mathcal{M}&= \{ m(x) = a_0 b_0(x) + a_1 b_1(x) + \ldots + a_k b_k(x) \ \ : \ a_0 \ldots a_k \in \mathbb{R} \} \\ & \qif b_0(x),b_1(x),\ldots,b_k(x) \ \text{ is a basis for } \ \mathcal{M}. \end{aligned} \]

Where it says lm(y~1+x, data=unisam) above, y~1+x is the formula and unisam is the data.
- It’s saying that we want to choose a function $m$ from our model for which $m(X_i) \approx Y_i$ …
- … where $(X_i,Y_i)$ are unisam$x[i] and unisam$y[i] for $i=1 \ldots n$.
- And our model is the set of all linear combinations of the functions $b_0(x) = 1$ and $b_1(x) = x$. \[ \hat \mu = \mathop{\mathrm{argmin}}_{m \in \mathcal{M}} \sum_{i=1}^n \qty{Y_i - m(X_i)}^2 \qfor \mathcal{M}= \{m(x) = a_0 \ 1 + a_1 \ x \ \ : \ \ a_0,a_1 \in \mathbb{R} \} \]

What lm gives you is not really the function $\hat\mu$.
- It’s something else. But whatever it is, you can calculate $\hat\mu(x)$ by calling predict.
- The code below uses predict to calculate $\hat\mu(8)$, $\hat\mu(10)$, and $\hat\mu(12)$.

I like to write a function muhat that calls predict on its argument.
- This makes the code shorter and more like the mathematical notation.

If you’re familiar with object oriented programming …
- lm returns an object we call fitted.model.
- predict is one of its methods.
Method calls in R look a little different than they do in Python, C++, or Java.
- But they work pretty much the same way. Almost.
- R, like Common Lisp and Julia, has an object system based on multimethods.

You can use any R functions you like in a formula.
- Above, I’ve implemented my own functions b.0 and b.1 to use as my basis.
- You can also use R’s built-in functions. Try replacing b.1 in the formula with sin.
You can use as many basis functions as you like.
- Try adding b.2(x) to the formula y ~ b.0(x) + b.1(x) above.

You don’t really have to define all these functions.
- $1$ and $x$ will do in place of b.0(x) and b.1(x).
But x^2 won’t do in place of b.2(x).
- It’s because squaring means something special in formulas.
- But you can work around this by writing I(x^2) instead.
- In short, wrapping an expression in I says ‘interpret this the normal way.’
Try writing out a formula that does the same thing as y ~ b.0(x) + b.1(x) + b.2(x) …
- … but without using b.0, b.1, or b.2.

R implicitly adds the constant function $b_0(x)=1$ to every basis you give it.
- Go ahead and get rid of b.0(x) in the formula y~b.0(x)+b.1(x) above.
- Does anything change in the plot?
If you want to stop it from doing this, you can write y ~ 0 + b.1(x) instead.
- What changes in the plot? Why?

Lab 6

Setup

Least Squares in Linear Models

Least Squares with Univariate Data

Models

Formulas in Abstract

The All-Functions Model

Piecewise-Constant Models

Least Squares with Multivariate Data

Footnotes