Module 8

STAT 6021

Taylor R. Brown, PhD

Motivation

Recall from module 5:

\[ \hat{\mathbf{y}} = \mathbf{H}\mathbf{y} \]

and if you look at one of the rows you have

\[ \hat{y}_j = \sum_{i=1}^n y_i \mathbf{H}_{j,i}. \]

The number \(\mathbf{H}_{j,i}\) is the leverage of obs. \(i\) on fitted value \(j\).

Motivation

In module 5 we discussed how “leverage” is related to distance form the center in \(\mathbf{X}\)-space:

\[ \mathbf{H}_{i,j} = \frac{1}{n} + \frac{(\mathbf{x}_{i,-1} - \bar{\mathbf{x}}_{-1})^{\intercal} \mathbf{S}^{-1}(\mathbf{x}_{j,-1} - \bar{\mathbf{x}}_{-1})}{n-1} \] where \(\mathbf{S}\) is the matrix of sample variances/covariances (with \(n-1\) in the denominator).

Definitions

In addition to leverage, we have the concept of an observation’s influence. An observation has high influence if its existence affects the regression coefficient estimates.

Anscombe’s quartet

A good time to mention Anscombe’s quartet

4 data sets \(\{x_i, y_i\}\)

but…

Anscombe’s quartet

source:Wikipedia

Cook’s D

Cook’s D can be thought of as a measure for how a particular observation affects the overall fit/coefficient estimates/predictions.

\[ D_i := \frac{\left(\boldsymbol{\hat{\beta}}_{(i)} - \boldsymbol{\hat{\beta}}\right)^\intercal (\mathbf{X}^\intercal \mathbf{X}) \left(\boldsymbol{\hat{\beta}}_{(i)} - \boldsymbol{\hat{\beta}}\right) }{p MS_{Res} } \] This is clearly the same as \[ D_i = \frac{\left( \hat{\mathbf{y}}_{(i)} - \hat{\mathbf{y}}\right)^\intercal \left( \hat{\mathbf{y}}_{(i)} - \hat{\mathbf{y}}\right) }{p MS_{Res} } \]

Notice the last numerator is the PRESS statistic!

Cook’s D

Finally, Cook’s D can be written as

\[ D_i = \frac{r_i^2}{p} \frac{\mathbf{H}_{ii}}{1 - \mathbf{H}_{ii}} \] where \[ r_i = \frac{e_i}{\sqrt{ MS_{Res} (1 - \mathbf{H}_{ii}) }} \] (from in module 4).

Key takeaway: influence is a combination of x-outliers (leverage) and y-outliers (residuals)!

Cook’s D

Proof: first, recall from module 7

\[ \hat{\boldsymbol{\beta}}_{(i)} = \hat{\boldsymbol{\beta}} - \frac{e_i (\mathbf{X}^\intercal\mathbf{X})^{-1}\mathbf{x}_i }{1-\mathbf{H}_{ii}} \]

so \[ \left(\boldsymbol{\beta}_{(i)} - \boldsymbol{\beta}\right)^\intercal (\mathbf{X}^\intercal \mathbf{X}) \left(\boldsymbol{\beta}_{(i)} - \boldsymbol{\beta}\right) = \frac{e_i^2 \mathbf{x}_i^\intercal(\mathbf{X}^\intercal \mathbf{X})^{-1}\mathbf{x}_i }{(1-\mathbf{H}_{ii})^2} \]

so \[ D_i := \frac{\left(\boldsymbol{\beta}_{(i)} - \boldsymbol{\beta}\right)^\intercal (\mathbf{X}^\intercal \mathbf{X}) \left(\boldsymbol{\beta}_{(i)} - \boldsymbol{\beta}\right) }{p MS_{Res} } = \frac{e_i^2}{MS_{Res }(1-\mathbf{H}_{ii}) } \frac{ \mathbf{H}_{ii} }{p (1-\mathbf{H}_{ii})} = \frac{r_i^2}{p} \frac{\mathbf{H}_{ii}}{1 - \mathbf{H}_{ii}} \]

Plotting Cook’s D

Cook’s Distance doesn’t have a nice sampling distribution (roughly \(F\)-distributed). Care if an observation has one above \(1\).

fullMod <- lm(mpg ~ ., data = carsdf)
# plot(fullMod) # will allow you to cycle through afew plots interactively!
plot(fullMod, which=5)

DFFITS and DFBETAS

Cook’s D measures overall influence of each observation.

DFFITS and DFBETAS measure influence of each observation on a particular coefficient, or on a particular predicted value.

DFBETAS

The formula for the DFBETAS of coefficient \(j\) when leaving out observation \(i\) is

\[ \frac{ \hat{\beta}_j - \hat{\beta}_{j(i)} }{ \sqrt{ S^2_{(i)} (\mathbf{X}^\intercal \mathbf{X})^{-1}_{jj} } } \]

where

DFBETAS

fullMod <- lm(mpg ~ ., data = carsdf)
dfbetas(fullMod)
##                      (Intercept)          cyl         disp           hp
## Mazda RX4           -0.080145397 -0.060063473  0.066556822  0.154236152
## Mazda RX4 Wag       -0.014365817 -0.028017808  0.036782390  0.048882262
## Datsun 710          -0.237591910  0.275316955 -0.021293309 -0.058924761
## Hornet 4 Drive       0.003888909 -0.004692680  0.013768918 -0.001824431
## Hornet Sportabout   -0.016722046  0.037915130  0.092220543 -0.032074894
## Valiant             -0.070325566  0.025208079  0.041072249 -0.037753540
## Duster 360          -0.023698386  0.045388553 -0.095113553 -0.128029939
## Merc 240D            0.084169970 -0.111360536 -0.004184591 -0.057627583
## Merc 230             0.572020536 -0.090086545 -0.016374969 -0.435260324
## Merc 280             0.016236857 -0.027140988  0.041780617  0.014397721
## Merc 280C            0.201094314 -0.226504181  0.258557187  0.035424756
## Merc 450SE          -0.019866435  0.229769798 -0.342608760 -0.036581217
## Merc 450SL          -0.070755615  0.193302097 -0.163118663  0.021470228
## Merc 450SLC          0.030095383 -0.055781651  0.043064385 -0.012096243
## Cadillac Fleetwood  -0.035203256  0.119205191 -0.128010626  0.051002875
## Lincoln Continental  0.001375773 -0.004487812  0.002570815 -0.001434367
## Chrysler Imperial    0.011417805 -0.436545398  0.129632462 -0.044372950
## Fiat 128            -0.082881539  0.027833978 -0.079840769 -0.035444176
## Honda Civic         -0.020727435  0.016076765  0.013145317 -0.014495495
## Toyota Corolla      -0.562973896  0.349792760  0.153104547  0.169230214
## Toyota Corona       -0.045483363  0.316572363 -0.083954019 -0.382670762
## Dodge Challenger    -0.132158390 -0.056729384 -0.023554469  0.142734227
## AMC Javelin          0.075736797 -0.311400920 -0.032116761  0.210708079
## Camaro Z28           0.060478758 -0.022588006 -0.034494104 -0.058162894
## Pontiac Firebird     0.037921693 -0.010925279  0.400490789 -0.195511700
## Fiat X1-9           -0.006585931  0.004380165 -0.007105379  0.007546236
## Porsche 914-2       -0.074857404  0.072012345 -0.039019128  0.057937369
## Lotus Europa         0.617280896 -0.538063652  0.214302312  0.176919394
## Ford Pantera L       0.301094369 -0.132593408 -0.301203189 -0.172178453
## Ferrari Dino         0.023417404 -0.008281578 -0.022625443  0.003978400
## Maserati Bora       -0.057859405 -0.122723311 -0.316121190  0.907895477
## Volvo 142E          -0.129648389  0.257072350  0.059203915 -0.061514898
##                              drat            wt         qsec
## Mazda RX4           -0.0263009880 -0.0726411063  0.143629326
## Mazda RX4 Wag       -0.0177319072 -0.0394187820  0.040616356
## Datsun 710           0.2006476278 -0.0114974476  0.126343349
## Hornet 4 Drive      -0.0135071864 -0.0146597218  0.007586104
## Hornet Sportabout    0.0016247502 -0.1053022683  0.034364720
## Valiant              0.2611771880  0.0489771110 -0.084504132
## Duster 360           0.0662155813  0.1431640966 -0.033722973
## Merc 240D           -0.0472232897  0.1077833402 -0.073363000
## Merc 230            -0.2240051807  0.1598316785 -0.697507641
## Merc 280            -0.0259842010 -0.0416234357  0.004888845
## Merc 280C           -0.2034487328 -0.2078525898 -0.081790651
## Merc 450SE          -0.0626369523  0.2081297252 -0.027320838
## Merc 450SL          -0.0378657208  0.0042570455  0.077286267
## Merc 450SLC          0.0045944820  0.0023857326 -0.032714203
## Cadillac Fleetwood  -0.0004444128 -0.0692343205  0.042976070
## Lincoln Continental  0.0003575035  0.0050587185 -0.002412446
## Chrysler Imperial    0.2487837368  0.7346108213 -0.252718809
## Fiat 128             0.0937555451 -0.0401919705  0.135225053
## Honda Civic          0.0349191402 -0.0116744317  0.009367064
## Toyota Corolla       0.3660681826 -0.5811003009  0.682021952
## Toyota Corona        0.2873405406  0.2127309957 -0.256298702
## Dodge Challenger     0.1607378670  0.0665667532  0.084434072
## AMC Javelin         -0.0178758559  0.1666694979 -0.066229872
## Camaro Z28          -0.1323782424 -0.0008649131  0.011197149
## Pontiac Firebird    -0.0071393951 -0.2602497058  0.002302491
## Fiat X1-9            0.0009069651  0.0166213833 -0.001157335
## Porsche 914-2       -0.0171690761 -0.0306744750  0.097841565
## Lotus Europa        -0.5668470278 -0.3228596647 -0.311876674
## Ford Pantera L      -0.4608426196  0.3168667520 -0.156261726
## Ferrari Dino        -0.0174036294  0.0165153810 -0.023404829
## Maserati Bora       -0.1505241939 -0.0756927681  0.162608842
## Volvo 142E           0.0118117119 -0.2077380227  0.137445976
library(olsrr)
## 
## Attaching package: 'olsrr'
## The following object is masked from 'package:datasets':
## 
##     rivers
ols_plot_dfbetas(fullMod) # nice plots

DFFITS

The formula for the DFFITS when leaving out observation \(i\) is

\[ \frac{\hat{y}_i - \hat{y}_{(i)}}{\sqrt{S^2_{(i)}\mathbf{H}_{ii} }} \]

where

DFFITS

dffits(fullMod)
##           Mazda RX4       Mazda RX4 Wag          Datsun 710 
##         -0.23948813         -0.08145436         -0.42718305 
##      Hornet 4 Drive   Hornet Sportabout             Valiant 
##          0.02959920          0.15246583         -0.38313450 
##          Duster 360           Merc 240D            Merc 230 
##         -0.26503897          0.20401424         -0.84038195 
##            Merc 280           Merc 280C          Merc 450SE 
##         -0.06189518         -0.40532960          0.45630539 
##          Merc 450SL         Merc 450SLC  Cadillac Fleetwood 
##          0.29056157         -0.07952772         -0.27509573 
## Lincoln Continental   Chrysler Imperial            Fiat 128 
##          0.01059545          1.29744350          0.76866049 
##         Honda Civic      Toyota Corolla       Toyota Corona 
##          0.04510554          1.10286961         -0.74897695 
##    Dodge Challenger         AMC Javelin          Camaro Z28 
##         -0.31948602         -0.48511723         -0.28666117 
##    Pontiac Firebird           Fiat X1-9       Porsche 914-2 
##          0.53969610         -0.05941908         -0.14668858 
##        Lotus Europa      Ford Pantera L        Ferrari Dino 
##          0.91620251         -0.76995989          0.03738570 
##       Maserati Bora          Volvo 142E 
##          1.07385631         -0.43147677
ols_plot_dffits(fullMod)

carsdf[c(17,20,31),]
##                    mpg cyl  disp  hp drat    wt  qsec
## Chrysler Imperial 14.7   8 440.0 230 3.23 5.345 17.42
## Toyota Corolla    33.9   4  71.1  65 4.22 1.835 19.90
## Maserati Bora     15.0   8 301.0 335 3.54 3.570 14.60

Takeaways

Recap: