Taylor R. Brown, PhD
Recall from module 5:
\[ \hat{\mathbf{y}} = \mathbf{H}\mathbf{y} \]
and if you look at one of the rows you have
\[ \hat{y}_j = \sum_{i=1}^n y_i \mathbf{H}_{j,i}. \]
The number \(\mathbf{H}_{j,i}\) is the leverage of obs. \(i\) on fitted value \(j\).
In module 5 we discussed how “leverage” is related to distance form the center in \(\mathbf{X}\)-space:
\[ \mathbf{H}_{i,j} = \frac{1}{n} + \frac{(\mathbf{x}_{i,-1} - \bar{\mathbf{x}}_{-1})^{\intercal} \mathbf{S}^{-1}(\mathbf{x}_{j,-1} - \bar{\mathbf{x}}_{-1})}{n-1} \] where \(\mathbf{S}\) is the matrix of sample variances/covariances (with \(n-1\) in the denominator).
\(\mathbf{H}_{i,j}\) is the similarity between observations \(i\) and \(j\)
\(\mathbf{H}_{i,i}\) is the distance of observation \(i\) from the center.
In addition to leverage, we have the concept of an observation’s influence. An observation has high influence if its existence affects the regression coefficient estimates.
A good time to mention Anscombe’s quartet
4 data sets \(\{x_i, y_i\}\)
but…
 source:Wikipedia
Cook’s D can be thought of as a measure for how a particular observation affects the overall fit/coefficient estimates/predictions.
\[ D_i := \frac{\left(\boldsymbol{\hat{\beta}}_{(i)} - \boldsymbol{\hat{\beta}}\right)^\intercal (\mathbf{X}^\intercal \mathbf{X}) \left(\boldsymbol{\hat{\beta}}_{(i)} - \boldsymbol{\hat{\beta}}\right) }{p MS_{Res} } \] This is clearly the same as \[ D_i = \frac{\left( \hat{\mathbf{y}}_{(i)} - \hat{\mathbf{y}}\right)^\intercal \left( \hat{\mathbf{y}}_{(i)} - \hat{\mathbf{y}}\right) }{p MS_{Res} } \]
Notice the last numerator is the PRESS statistic!
Finally, Cook’s D can be written as
\[ D_i = \frac{r_i^2}{p} \frac{\mathbf{H}_{ii}}{1 - \mathbf{H}_{ii}} \] where \[ r_i = \frac{e_i}{\sqrt{ MS_{Res} (1 - \mathbf{H}_{ii}) }} \] (from in module 4).
Key takeaway: influence is a combination of x-outliers (leverage) and y-outliers (residuals)!
Proof: first, recall from module 7
\[ \hat{\boldsymbol{\beta}}_{(i)} = \hat{\boldsymbol{\beta}} - \frac{e_i (\mathbf{X}^\intercal\mathbf{X})^{-1}\mathbf{x}_i }{1-\mathbf{H}_{ii}} \]
so \[ \left(\boldsymbol{\beta}_{(i)} - \boldsymbol{\beta}\right)^\intercal (\mathbf{X}^\intercal \mathbf{X}) \left(\boldsymbol{\beta}_{(i)} - \boldsymbol{\beta}\right) = \frac{e_i^2 \mathbf{x}_i^\intercal(\mathbf{X}^\intercal \mathbf{X})^{-1}\mathbf{x}_i }{(1-\mathbf{H}_{ii})^2} \]
so \[ D_i := \frac{\left(\boldsymbol{\beta}_{(i)} - \boldsymbol{\beta}\right)^\intercal (\mathbf{X}^\intercal \mathbf{X}) \left(\boldsymbol{\beta}_{(i)} - \boldsymbol{\beta}\right) }{p MS_{Res} } = \frac{e_i^2}{MS_{Res }(1-\mathbf{H}_{ii}) } \frac{ \mathbf{H}_{ii} }{p (1-\mathbf{H}_{ii})} = \frac{r_i^2}{p} \frac{\mathbf{H}_{ii}}{1 - \mathbf{H}_{ii}} \]
Cook’s Distance doesn’t have a nice sampling distribution (roughly \(F\)-distributed). Care if an observation has one above \(1\).
fullMod <- lm(mpg ~ ., data = carsdf)
# plot(fullMod) # will allow you to cycle through afew plots interactively!
plot(fullMod, which=5)Cook’s D measures overall influence of each observation.
DFFITS and DFBETAS measure influence of each observation on a particular coefficient, or on a particular predicted value.
The formula for the DFBETAS of coefficient \(j\) when leaving out observation \(i\) is
\[ \frac{ \hat{\beta}_j - \hat{\beta}_{j(i)} }{ \sqrt{ S^2_{(i)} (\mathbf{X}^\intercal \mathbf{X})^{-1}_{jj} } } \]
where
##                      (Intercept)          cyl         disp           hp
## Mazda RX4           -0.080145397 -0.060063473  0.066556822  0.154236152
## Mazda RX4 Wag       -0.014365817 -0.028017808  0.036782390  0.048882262
## Datsun 710          -0.237591910  0.275316955 -0.021293309 -0.058924761
## Hornet 4 Drive       0.003888909 -0.004692680  0.013768918 -0.001824431
## Hornet Sportabout   -0.016722046  0.037915130  0.092220543 -0.032074894
## Valiant             -0.070325566  0.025208079  0.041072249 -0.037753540
## Duster 360          -0.023698386  0.045388553 -0.095113553 -0.128029939
## Merc 240D            0.084169970 -0.111360536 -0.004184591 -0.057627583
## Merc 230             0.572020536 -0.090086545 -0.016374969 -0.435260324
## Merc 280             0.016236857 -0.027140988  0.041780617  0.014397721
## Merc 280C            0.201094314 -0.226504181  0.258557187  0.035424756
## Merc 450SE          -0.019866435  0.229769798 -0.342608760 -0.036581217
## Merc 450SL          -0.070755615  0.193302097 -0.163118663  0.021470228
## Merc 450SLC          0.030095383 -0.055781651  0.043064385 -0.012096243
## Cadillac Fleetwood  -0.035203256  0.119205191 -0.128010626  0.051002875
## Lincoln Continental  0.001375773 -0.004487812  0.002570815 -0.001434367
## Chrysler Imperial    0.011417805 -0.436545398  0.129632462 -0.044372950
## Fiat 128            -0.082881539  0.027833978 -0.079840769 -0.035444176
## Honda Civic         -0.020727435  0.016076765  0.013145317 -0.014495495
## Toyota Corolla      -0.562973896  0.349792760  0.153104547  0.169230214
## Toyota Corona       -0.045483363  0.316572363 -0.083954019 -0.382670762
## Dodge Challenger    -0.132158390 -0.056729384 -0.023554469  0.142734227
## AMC Javelin          0.075736797 -0.311400920 -0.032116761  0.210708079
## Camaro Z28           0.060478758 -0.022588006 -0.034494104 -0.058162894
## Pontiac Firebird     0.037921693 -0.010925279  0.400490789 -0.195511700
## Fiat X1-9           -0.006585931  0.004380165 -0.007105379  0.007546236
## Porsche 914-2       -0.074857404  0.072012345 -0.039019128  0.057937369
## Lotus Europa         0.617280896 -0.538063652  0.214302312  0.176919394
## Ford Pantera L       0.301094369 -0.132593408 -0.301203189 -0.172178453
## Ferrari Dino         0.023417404 -0.008281578 -0.022625443  0.003978400
## Maserati Bora       -0.057859405 -0.122723311 -0.316121190  0.907895477
## Volvo 142E          -0.129648389  0.257072350  0.059203915 -0.061514898
##                              drat            wt         qsec
## Mazda RX4           -0.0263009880 -0.0726411063  0.143629326
## Mazda RX4 Wag       -0.0177319072 -0.0394187820  0.040616356
## Datsun 710           0.2006476278 -0.0114974476  0.126343349
## Hornet 4 Drive      -0.0135071864 -0.0146597218  0.007586104
## Hornet Sportabout    0.0016247502 -0.1053022683  0.034364720
## Valiant              0.2611771880  0.0489771110 -0.084504132
## Duster 360           0.0662155813  0.1431640966 -0.033722973
## Merc 240D           -0.0472232897  0.1077833402 -0.073363000
## Merc 230            -0.2240051807  0.1598316785 -0.697507641
## Merc 280            -0.0259842010 -0.0416234357  0.004888845
## Merc 280C           -0.2034487328 -0.2078525898 -0.081790651
## Merc 450SE          -0.0626369523  0.2081297252 -0.027320838
## Merc 450SL          -0.0378657208  0.0042570455  0.077286267
## Merc 450SLC          0.0045944820  0.0023857326 -0.032714203
## Cadillac Fleetwood  -0.0004444128 -0.0692343205  0.042976070
## Lincoln Continental  0.0003575035  0.0050587185 -0.002412446
## Chrysler Imperial    0.2487837368  0.7346108213 -0.252718809
## Fiat 128             0.0937555451 -0.0401919705  0.135225053
## Honda Civic          0.0349191402 -0.0116744317  0.009367064
## Toyota Corolla       0.3660681826 -0.5811003009  0.682021952
## Toyota Corona        0.2873405406  0.2127309957 -0.256298702
## Dodge Challenger     0.1607378670  0.0665667532  0.084434072
## AMC Javelin         -0.0178758559  0.1666694979 -0.066229872
## Camaro Z28          -0.1323782424 -0.0008649131  0.011197149
## Pontiac Firebird    -0.0071393951 -0.2602497058  0.002302491
## Fiat X1-9            0.0009069651  0.0166213833 -0.001157335
## Porsche 914-2       -0.0171690761 -0.0306744750  0.097841565
## Lotus Europa        -0.5668470278 -0.3228596647 -0.311876674
## Ford Pantera L      -0.4608426196  0.3168667520 -0.156261726
## Ferrari Dino        -0.0174036294  0.0165153810 -0.023404829
## Maserati Bora       -0.1505241939 -0.0756927681  0.162608842
## Volvo 142E           0.0118117119 -0.2077380227  0.137445976
## 
## Attaching package: 'olsrr'
## The following object is masked from 'package:datasets':
## 
##     rivers
The formula for the DFFITS when leaving out observation \(i\) is
\[ \frac{\hat{y}_i - \hat{y}_{(i)}}{\sqrt{S^2_{(i)}\mathbf{H}_{ii} }} \]
where
##           Mazda RX4       Mazda RX4 Wag          Datsun 710 
##         -0.23948813         -0.08145436         -0.42718305 
##      Hornet 4 Drive   Hornet Sportabout             Valiant 
##          0.02959920          0.15246583         -0.38313450 
##          Duster 360           Merc 240D            Merc 230 
##         -0.26503897          0.20401424         -0.84038195 
##            Merc 280           Merc 280C          Merc 450SE 
##         -0.06189518         -0.40532960          0.45630539 
##          Merc 450SL         Merc 450SLC  Cadillac Fleetwood 
##          0.29056157         -0.07952772         -0.27509573 
## Lincoln Continental   Chrysler Imperial            Fiat 128 
##          0.01059545          1.29744350          0.76866049 
##         Honda Civic      Toyota Corolla       Toyota Corona 
##          0.04510554          1.10286961         -0.74897695 
##    Dodge Challenger         AMC Javelin          Camaro Z28 
##         -0.31948602         -0.48511723         -0.28666117 
##    Pontiac Firebird           Fiat X1-9       Porsche 914-2 
##          0.53969610         -0.05941908         -0.14668858 
##        Lotus Europa      Ford Pantera L        Ferrari Dino 
##          0.91620251         -0.76995989          0.03738570 
##       Maserati Bora          Volvo 142E 
##          1.07385631         -0.43147677
##                    mpg cyl  disp  hp drat    wt  qsec
## Chrysler Imperial 14.7   8 440.0 230 3.23 5.345 17.42
## Toyota Corolla    33.9   4  71.1  65 4.22 1.835 19.90
## Maserati Bora     15.0   8 301.0 335 3.54 3.570 14.60
Recap: