Communicating

My email: trb5me@virginia.edu

slack: https://stat6021.slack.com/

Other Information: Collab (e.g. syllabus)

Getting started

Download Instructions for R:

Download R from here: http://archive.linux.duke.edu/cran/ See the first box at the top. Choose among Windows, Mac and Linux options.

Download RStudio Desktop from here: https://www.rstudio.com/products/rstudio/download/

Getting started

Assignments are graded with “gradeR”.

assign your answers to the right variables
use local paths (interactively use setwd())
don’t delete/overwrite variables later in the script

Questions?

Questions before we start the material?

Regression!

We want to develop a framework for modelling a dependence between a \(Y\) and an \(X\). The general form of a model would be

\[ Y = f(X) + \epsilon \]

\(f(\cdot)\) is a function, and
\(\epsilon\) is a random error variable.

Regression!

How do we get an idea for what \(f(\cdot)\) should be?

Typically we plot \(x_1, \ldots, x_n\) against \(y_1, \ldots, y_n\). This is called a scatterplot.

A scatterplot

Theoretical/scientific justification is often useful

A Workhorse: Simple Linear Regression

When there looks to be a linear relationship, we can use the simple linear regression model.

\[ Y = \beta_0 +\beta_1 X + \epsilon \]

\(\beta_0\) is the nonrandom, unknown intercept
\(\beta_1\) is the nonrandom, unknown slope
\(\epsilon\) has mean \(0\)
\(V[\epsilon \mid X] = \sigma^2\)

The Conditional Distribution of \(Y\) given \(X\)

\[\begin{align*} E[Y \mid X] &= E[\beta_0 +\beta_1 X + \epsilon \mid X ] \\ &= \beta_0 +\beta_1 X + E[\epsilon \mid X] \\ &= \beta_0 + \beta_1 X \end{align*}\]

\[\begin{align*} V[Y \mid X ] &= V[\beta_0 +\beta_1 X + \epsilon \mid X ] \\ &= V[\epsilon \mid X ] \\ &= \sigma^2 \end{align*}\]

The Conditional Distribution of \(Y\) given \(X\)

Estimation

With an SLR model \(Y = \beta_0 + \beta_1 X + \epsilon\).

We don’t know \(\beta_0, \beta_1, \sigma^2\), so we have to use data to estimate these.

Estimation

To estimate \(\beta_0\) and \(\beta_1\) we minimize the loss: \[ S(\beta_0, \beta_1) = \sum_{i=1}^n \left[ y_i - (\beta_0 + \beta_1 x_i)\right]^2 \]

Estimation

This is the loss-minimizing line:

Estimation

The \(\hat{\beta_0}\) and \(\hat{\beta_1}\) that minimize this are called the least squares estimates.

The estimated regression line is then \[ \hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x \]

Chain rule to the rescue

\[\begin{align*} \frac{\partial}{\partial \beta_0} S(\beta_0, \beta_1) &= \frac{\partial}{\partial \beta_0}\sum_{i=1}^n ( y_i - \beta_0 - \beta_1 x_i)^2 \\ &= \sum_{i=1}^n \frac{\partial}{\partial \beta_0}( y_i - \beta_0 - \beta_1 x_i)^2 \\ &= \sum_{i=1}^n 2 ( y_i - \beta_0 - \beta_1 x_i) (-1) \\ &\overset{\text{set}}{=}0 \end{align*}\]

\[\begin{align*} \frac{\partial}{\partial \beta_1} S(\beta_0, \beta_1) &= \frac{\partial}{\partial \beta_1}\sum_{i=1}^n ( y_i - \beta_0 - \beta_1 x_i)^2 \\ &= \sum_{i=1}^n \frac{\partial}{\partial \beta_1}( y_i - \beta_0 - \beta_1 x_i)^2 \\ &= \sum_{i=1}^n 2 ( y_i - \beta_0 - \beta_1 x_i) (-x_i) \\ &\overset{\text{set}}{=}0 \end{align*}\]

The normal equations

The normal equations are what you get when you re-arrange things to get rid of negative signs:

\[ n\hat{\beta}_0 + \left(\sum_i x_i \right)\hat{\beta}_1 = \sum_i y_i \]

\[ \left(\sum_i x_i \right)\hat{\beta}_0 + \left(\sum_i x_i^2 \right) \hat{\beta}_1 = \sum_i x_i y_i \]

Sometimes people divide through by \(n\)

\[ \hat{\beta}_0 + \bar{x}\hat{\beta}_1 = \bar{y} \]

\[ \bar{x}\hat{\beta}_0 + \frac{\sum_i x_i^2 }{n} \hat{\beta}_1 = \frac{\sum_i x_i y_i}{n} \]

The final solution

finishing off the work, we solve for \(\hat{\beta}_0\) and \(\hat{\beta}_1\) \[ \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x} \]

\[ \hat{\beta}_1 = \frac{\sum_i x_i y_i - n \bar{x}\bar{y}}{\sum_i x_i^2 - n\bar{x}^2} = \frac{\sum_i(x_i - \bar{x})(y_i - \bar{y})}{\sum_j (x_j - \bar{x})^2} \]

The fitted values

To get the fitted (predicted) values, denoted \(\hat{y}_1, \hat{y}_2, \ldots, \hat{y}_n\), are what you get when you plug in \(x_1, \ldots, x_n\) into the estimated regression line \(\hat{\beta}_0 + \hat{\beta}_1x\).

\[ \hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i \] for \(i = 1, \ldots, n\).

A Trick

\[ \hat{\beta}_1 = \sum_i c_i y_i \]

where \[ c_i = \frac{(x_i - \bar{x}) }{\sum_{j=1}^n (x_j - \bar{x})^2 } = \frac{(x_i - \bar{x}) }{S_{xx}} . \]

The following equation is true by definition: \(\sum(x_i - \bar{x})^2 = S_{xx}\).

A Trick

Why:

\[\begin{align*} \hat{\beta}_1 &= \frac{\sum_i (x_i - \bar{x})(y_i - \bar{y})}{S_{xx}} \\ &= \frac{\sum_i (x_i - \bar{x})y_i - \sum (x_i - \bar{x})\bar{y}}{S_{xx}}\\ &= \frac{\sum_i (x_i - \bar{x})y_i }{S_{xx}} \\ &= \sum_i c_i y_i \end{align*}\]

because \(\sum(x_i - \bar{x}) = 0\).

showing unbiasedness with our new trick

Drop from notation dependence on \(x_1, \ldots, x_n\)…

\[\begin{align*} E[\hat{\beta}_1 ] &= E[ \sum_i c_i y_i ] \\ &= \sum_i c_i E[y_i] \\ &= \sum_i c_i E[\beta_0 + \beta_1 x_i + \epsilon_i ] \\ &= \sum_i c_i [\beta_0 + \beta_1 x_i ] \\ &= \beta_0 \sum_i c_i + \beta_1 \sum_i c_i x_i \\ &= \beta_0 \sum_i \frac{(x_i - \bar{x}) }{S_{xx}} + \beta_1 \sum_i \frac{(x_i - \bar{x}) }{S_{xx}} x_i \\ &= \beta_1 \sum_i x_i \frac{(x_i - \bar{x}) }{S_{xx}} \\ &= \beta_1 \sum_i \frac{(x_i - \bar{x})^2 }{S_{xx}} \\ &= \beta_1 \end{align*}\]

calculating the variance with our new trick

\[\begin{align*} V[\hat{\beta}_1 ] &= V[ \sum_i c_i y_i ] \\ &= \sum_i c_i^2 V[y_i] \\ &= \sum_i c_i^2 V[\beta_0 + \beta_1 x_i + \epsilon_i ] \\ &= \sum_i c_i^2 \sigma^2 \\ &= \sigma^2 \sum_i \frac{(x_i - \bar{x})^2 }{S_{xx}^2} \\ &= \sigma^2 \frac{S_{xx} }{S_{xx}^2} \\ &= \frac{\sigma^2 }{S_{xx}} \end{align*}\]

mean of intercept

Similarly \[ E[\hat{\beta}_0]= E[\bar{y} - \hat{\beta}_1 \bar{x}] = E[\bar{y}] - \beta_1\bar{x} = \beta_0 \]

\[ V[\hat{\beta}_0]= \sigma^2\left(\frac{1}{n} + \frac{\bar{x}^2}{S_{xx}} \right) \]

Estimating the third parameter

Estimating \(\sigma^2\) can be done by taking the average squared error

Estimating the third parameter

\[ \hat{\sigma}^2 = \frac{\sum_{i=1}^n (y_i - \hat{y}_i )^2 }{n-2} = \frac{SS_{\text{res}} }{n-2}. \] Note we use \(n-2\) instead of \(n\).

In the next chapter, we call \(\hat{\sigma}^2 = \text{MS}_{Res}\).

Sums of Squares

\[ SS_T = SS_R + SS_{Res} \] or \[ \sum_{i=1}^n (y_i - \bar{y})^2 = \sum_{i=1}^n (y_i - \hat{y}_i)^2 + \sum_{i=1}^n (\hat{y}_i - \bar{y})^2 \]

Breaking up total sums of squares into good and bad pieces has a few other uses

Sums of Squares

\[\begin{align*} \sum_{i=1}^n (y_i - \bar{y})^2 &= \sum_{i=1}^n (y_i -\hat{y}_i + \hat{y}_i - \bar{y})^2 \\ &=\sum_{i=1}^n (y_i - \hat{y}_i)^2 + \sum_{i=1}^n (\hat{y}_i - \bar{y})^2 + 2 \sum_{i=1}^n (y_i - \hat{y}_i)(\hat{y}_i - \bar{y}) \end{align*}\]

use the normal equations on that last piece:

\[\begin{align*} \sum_{i=1}^n (y_i - \hat{y}_i)(\hat{y}_i - \bar{y}) &= \sum_{i=1}^n (y_i - \hat{y}_i)\hat{y}_i - \bar{y} \sum_{i=1}^n (y_i - \hat{y}_i) \end{align*}\]

it’s clearer after you write

\[\begin{align*} \sum_{i=1}^n (y_i - \hat{y}_i)\hat{y}_i &= \sum_{i=1}^n (y_i - \hat{y}_i)(\hat{\beta}_0 + \hat{\beta}_1 x_i ) \\ &= \hat{\beta}_0 \sum_{i=1}^n (y_i - \hat{y}_i) + \hat{\beta}_1\sum_{i=1}^n (y_i - \hat{y}_i) x_i \end{align*}\]

Coefficient of Determination

\[ R^2 = \frac{SS_R}{SS_T} \] “the proportion of variation explained by the regressor x”

The red line is the quantile \(F_{\alpha,1, 20}\).

`R` examples!

setwd("/home/t/UVa/all_teaching/summer19_6021/data")
my_data <- read.csv("data-table-B3.csv")
head(my_data) # always check that it worked

##       y  x1  x2  x3   x4   x5 x6 x7    x8   x9  x10 x11
## 1 18.90 350 165 260 8.00 2.56  4  3 200.3 69.9 3910   1
## 2 17.00 350 170 275 8.50 2.56  4  3 199.6 72.9 3860   1
## 3 20.00 250 105 185 8.25 2.73  1  3 196.7 72.2 3510   1
## 4 18.25 351 143 255 8.00 3.00  2  3 199.9 74.0 3890   1
## 5 20.07 225  95 170 8.40 2.76  1  3 194.1 71.8 3365   0
## 6 11.20 440 215 330 8.20 2.88  4  3 184.5 69.0 4215   1

`R` examples!

reg_mod_1 <- lm(y ~ x1, data = my_data)
summary(reg_mod_1)  # print a lot

## 
## Call:
## lm(formula = y ~ x1, data = my_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.7923 -1.9752  0.0044  1.7677  6.8171 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 33.722677   1.443903   23.36  < 2e-16 ***
## x1          -0.047360   0.004695  -10.09 3.74e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.065 on 30 degrees of freedom
## Multiple R-squared:  0.7723, Adjusted R-squared:  0.7647 
## F-statistic: 101.7 on 1 and 30 DF,  p-value: 3.743e-11

Module 1

STAT 6021

A brief tour

Communicating

Getting started

Getting started

Questions?

Regression!

Regression!

A scatterplot

A Workhorse: Simple Linear Regression

The Conditional Distribution of \(Y\) given \(X\)

The Conditional Distribution of \(Y\) given \(X\)

Estimation

Estimation

Estimation

Estimation

Estimation

Chain rule to the rescue

The normal equations

The final solution

The fitted values

A Trick

A Trick

showing unbiasedness with our new trick

calculating the variance with our new trick

mean of intercept

Estimating the third parameter

Estimating the third parameter

Sums of Squares

Sums of Squares

Coefficient of Determination

`R` examples!

`R` examples!

`R` examples!

Module 1

STAT 6021

A brief tour

Communicating

Getting started

Getting started

Questions?

Regression!

Regression!

A scatterplot

A Workhorse: Simple Linear Regression

The Conditional Distribution of \(Y\) given \(X\)

The Conditional Distribution of \(Y\) given \(X\)

Estimation

Estimation

Estimation

Estimation

Estimation

Chain rule to the rescue

The normal equations

The final solution

The fitted values

A Trick

A Trick

showing unbiasedness with our new trick

calculating the variance with our new trick

mean of intercept

Estimating the third parameter

Estimating the third parameter

Sums of Squares

Sums of Squares

Coefficient of Determination

R examples!

R examples!

R examples!

`R` examples!

`R` examples!

`R` examples!