Lab 6

Least Squares in R

Setup

Least Squares in Linear Models

  • The estimation targets we’ve talked about are all summaries of column means in the population.

\[ \begin{aligned} \mu(w,x) &= \frac{1}{m_{w,x}}\sum_{j:w_j=w,x_j=x} y_j \qfor m_{w,x} = \sum_{j: w_j=w, x_j=x} 1 \\ &= \mathop{\mathrm{E}}[Y_i \mid W_i=w, X_i=x] \qfor (W_i,X_i,Y_i) \qqtext{sampled uniformly-at-random} \end{aligned} \]

  • Least Squares Regression is a method for choosing a function to estimate these column means.
    • When we choose from the set of all functions of \((w,x)\)
    • … we get the mean in the corresponding column of the sample.

\[ \begin{aligned} \hat\mu(w,x) &= \mathop{\mathrm{argmin}}_{m \in \mathcal{M}} \sum_{i=1}^n \qty{Y_i - m(W_i,X_i)}^2 \\ &= \frac{1}{N_{w,x}}\sum_{i:W_i=w,X_i=x} Y_i \qqtext{ when } \mathcal{M}= \{ \text{all functions of } (w,x) \} \end{aligned} \]

  • The set of functions we choose from is our model.
    • Or regression model if we want to be verbose.
    • And there are, of course, many other models.
  • When we don’t have a lot of data in some columns in the sample …
    • Column means are really sensitive to who we happened to sample.
    • And we’re probably better off using a smaller model, i.e., one that includes fewer functions.
  • The models we’ll focus on in this class are linear models.
    • They’re closed under addition, subtraction, and scalar multiplication.
    • That means when you add, subtract, or scale any functions in the model …
    • … you get another function in the model.

Least Squares with Univariate Data

Models

\[ \begin{aligned} \mathcal{M}&= \{ \text{all functions } \ m(x) \} && \text{all functions} \\ \mathcal{M}&= \{ \text{all functions } \ m(c(x)) \} && \text{all functions of $c(x)$, a coarsened version of $x$} \\ \mathcal{M}&= \{ m(x) = a + bx \ : \ a,b \in \mathbb{R} \} && \text{all lines} \end{aligned} \]

  • We use R’s built-in function lm1 to do least squares with linear models.
  • To tell it which model to use, we use what R calls a ‘formula’.
    • What a formula does is describe a basis for the model.
    • i.e. a set of functions that we can take linear combinations of to get any function in the model. \[ \begin{aligned} \mathcal{M}&= \{ m(x) = a_0 b_0(x) + a_1 b_1(x) + \ldots + a_k b_k(x) \ \ : \ a_0 \ldots a_k \in \mathbb{R} \} \\ & \qif b_0(x),b_1(x),\ldots,b_k(x) \ \text{ is a basis for } \ \mathcal{M}. \end{aligned} \]
  • Where it says lm(y~1+x, data=unisam) above, y~1+x is the formula and unisam is the data.
    • It’s saying that we want to choose a function \(m\) from our model for which \(m(X_i) \approx Y_i\)
    • … where \((X_i,Y_i)\) are unisam$x[i] and unisam$y[i] for \(i=1 \ldots n\).
    • And our model is the set of all linear combinations of the functions \(b_0(x) = 1\) and \(b_1(x) = x\). \[ \hat \mu = \mathop{\mathrm{argmin}}_{m \in \mathcal{M}} \sum_{i=1}^n \qty{Y_i - m(X_i)}^2 \qfor \mathcal{M}= \{m(x) = a_0 \ 1 + a_1 \ x \ \ : \ \ a_0,a_1 \in \mathbb{R} \} \]
  • What lm gives you is not really the function \(\hat\mu\).
    • It’s something else. But whatever it is, you can calculate \(\hat\mu(x)\) by calling predict.
    • The code below uses predict to calculate \(\hat\mu(8)\), \(\hat\mu(10)\), and \(\hat\mu(12)\).
  • I like to write a function muhat that calls predict on its argument.
    • This makes the code shorter and more like the mathematical notation.
  • If you’re familiar with object oriented programming …
    • lm returns an object we call fitted.model.
    • predict is one of its methods.
  • Method calls in R look a little different than they do in Python, C++, or Java.
    • But they work pretty much the same way. Almost.
    • R, like Common Lisp and Julia, has an object system based on multimethods.

Formulas in Abstract

  • You can use any R functions you like in a formula.
    • Above, I’ve implemented my own functions b.0 and b.1 to use as my basis.
    • You can also use R’s built-in functions. Try replacing b.1 in the formula with sin.
  • You can use as many basis functions as you like.
    • Try adding b.2(x) to the formula y ~ b.0(x) + b.1(x) above.
  • You don’t really have to define all these functions.
    • \(1\) and \(x\) will do in place of b.0(x) and b.1(x).
  • But x^2 won’t do in place of b.2(x).
    • It’s because squaring means something special in formulas.
    • But you can work around this by writing I(x^2) instead.
    • In short, wrapping an expression in I says ‘interpret this the normal way.’
  • Try writing out a formula that does the same thing as y ~ b.0(x) + b.1(x) + b.2(x)
    • … but without using b.0, b.1, or b.2.
  • R implicitly adds the constant function \(b_0(x)=1\) to every basis you give it.
    • Go ahead and get rid of b.0(x) in the formula y~b.0(x)+b.1(x) above.
    • Does anything change in the plot?
  • If you want to stop it from doing this, you can write y ~ 0 + b.1(x) instead.
    • What changes in the plot? Why?

The All-Functions Model

Piecewise-Constant Models

Least Squares with Multivariate Data

Footnotes

  1. ‘lm’ stands for ‘linear model’