Taylor R. Brown, PhD
Recall from module 5:
\[ \hat{\mathbf{y}} = \mathbf{H}\mathbf{y} \]
and if you look at one of the rows you have
\[ \hat{y}_j = \sum_{i=1}^n y_i \mathbf{H}_{j,i}. \]
The number \(\mathbf{H}_{j,i}\) is the leverage of obs. \(i\) on fitted value \(j\).
In module 5 we discussed how “leverage” is related to distance form the center in \(\mathbf{X}\)-space:
\[ \mathbf{H}_{i,j} = \frac{1}{n} + \frac{(\mathbf{x}_{i,-1} - \bar{\mathbf{x}}_{-1})^{\intercal} \mathbf{S}^{-1}(\mathbf{x}_{j,-1} - \bar{\mathbf{x}}_{-1})}{n-1} \] where \(\mathbf{S}\) is the matrix of sample variances/covariances (with \(n-1\) in the denominator).
\(\mathbf{H}_{i,j}\) is the similarity between observations \(i\) and \(j\)
\(\mathbf{H}_{i,i}\) is the distance of observation \(i\) from the center.
In addition to leverage, we have the concept of an observation’s influence. An observation has high influence if its existence affects the regression coefficient estimates.
A good time to mention Anscombe’s quartet
4 data sets \(\{x_i, y_i\}\)
but…
source:Wikipedia
Cook’s D can be thought of as a measure for how a particular observation affects the overall fit/coefficient estimates/predictions.
\[ D_i := \frac{\left(\boldsymbol{\hat{\beta}}_{(i)} - \boldsymbol{\hat{\beta}}\right)^\intercal (\mathbf{X}^\intercal \mathbf{X}) \left(\boldsymbol{\hat{\beta}}_{(i)} - \boldsymbol{\hat{\beta}}\right) }{p MS_{Res} } \] This is clearly the same as \[ D_i = \frac{\left( \hat{\mathbf{y}}_{(i)} - \hat{\mathbf{y}}\right)^\intercal \left( \hat{\mathbf{y}}_{(i)} - \hat{\mathbf{y}}\right) }{p MS_{Res} } \]
Notice the last numerator is the PRESS statistic!
Finally, Cook’s D can be written as
\[ D_i = \frac{r_i^2}{p} \frac{\mathbf{H}_{ii}}{1 - \mathbf{H}_{ii}} \] where \[ r_i = \frac{e_i}{\sqrt{ MS_{Res} (1 - \mathbf{H}_{ii}) }} \] (from in module 4).
Key takeaway: influence is a combination of x-outliers (leverage) and y-outliers (residuals)!
Proof: first, recall from module 7
\[ \hat{\boldsymbol{\beta}}_{(i)} = \hat{\boldsymbol{\beta}} - \frac{e_i (\mathbf{X}^\intercal\mathbf{X})^{-1}\mathbf{x}_i }{1-\mathbf{H}_{ii}} \]
so \[ \left(\boldsymbol{\beta}_{(i)} - \boldsymbol{\beta}\right)^\intercal (\mathbf{X}^\intercal \mathbf{X}) \left(\boldsymbol{\beta}_{(i)} - \boldsymbol{\beta}\right) = \frac{e_i^2 \mathbf{x}_i^\intercal(\mathbf{X}^\intercal \mathbf{X})^{-1}\mathbf{x}_i }{(1-\mathbf{H}_{ii})^2} \]
so \[ D_i := \frac{\left(\boldsymbol{\beta}_{(i)} - \boldsymbol{\beta}\right)^\intercal (\mathbf{X}^\intercal \mathbf{X}) \left(\boldsymbol{\beta}_{(i)} - \boldsymbol{\beta}\right) }{p MS_{Res} } = \frac{e_i^2}{MS_{Res }(1-\mathbf{H}_{ii}) } \frac{ \mathbf{H}_{ii} }{p (1-\mathbf{H}_{ii})} = \frac{r_i^2}{p} \frac{\mathbf{H}_{ii}}{1 - \mathbf{H}_{ii}} \]
Cook’s Distance doesn’t have a nice sampling distribution (roughly \(F\)-distributed). Care if an observation has one above \(1\).
Cook’s D measures overall influence of each observation.
DFFITS and DFBETAS measure influence of each observation on a particular coefficient, or on a particular predicted value.
The formula for the DFBETAS of coefficient \(j\) when leaving out observation \(i\) is
\[ \frac{ \hat{\beta}_j - \hat{\beta}_{j(i)} }{ \sqrt{ S^2_{(i)} (\mathbf{X}^\intercal \mathbf{X})^{-1}_{jj} } } \]
where
## (Intercept) cyl disp hp
## Mazda RX4 -0.080145397 -0.060063473 0.066556822 0.154236152
## Mazda RX4 Wag -0.014365817 -0.028017808 0.036782390 0.048882262
## Datsun 710 -0.237591910 0.275316955 -0.021293309 -0.058924761
## Hornet 4 Drive 0.003888909 -0.004692680 0.013768918 -0.001824431
## Hornet Sportabout -0.016722046 0.037915130 0.092220543 -0.032074894
## Valiant -0.070325566 0.025208079 0.041072249 -0.037753540
## Duster 360 -0.023698386 0.045388553 -0.095113553 -0.128029939
## Merc 240D 0.084169970 -0.111360536 -0.004184591 -0.057627583
## Merc 230 0.572020536 -0.090086545 -0.016374969 -0.435260324
## Merc 280 0.016236857 -0.027140988 0.041780617 0.014397721
## Merc 280C 0.201094314 -0.226504181 0.258557187 0.035424756
## Merc 450SE -0.019866435 0.229769798 -0.342608760 -0.036581217
## Merc 450SL -0.070755615 0.193302097 -0.163118663 0.021470228
## Merc 450SLC 0.030095383 -0.055781651 0.043064385 -0.012096243
## Cadillac Fleetwood -0.035203256 0.119205191 -0.128010626 0.051002875
## Lincoln Continental 0.001375773 -0.004487812 0.002570815 -0.001434367
## Chrysler Imperial 0.011417805 -0.436545398 0.129632462 -0.044372950
## Fiat 128 -0.082881539 0.027833978 -0.079840769 -0.035444176
## Honda Civic -0.020727435 0.016076765 0.013145317 -0.014495495
## Toyota Corolla -0.562973896 0.349792760 0.153104547 0.169230214
## Toyota Corona -0.045483363 0.316572363 -0.083954019 -0.382670762
## Dodge Challenger -0.132158390 -0.056729384 -0.023554469 0.142734227
## AMC Javelin 0.075736797 -0.311400920 -0.032116761 0.210708079
## Camaro Z28 0.060478758 -0.022588006 -0.034494104 -0.058162894
## Pontiac Firebird 0.037921693 -0.010925279 0.400490789 -0.195511700
## Fiat X1-9 -0.006585931 0.004380165 -0.007105379 0.007546236
## Porsche 914-2 -0.074857404 0.072012345 -0.039019128 0.057937369
## Lotus Europa 0.617280896 -0.538063652 0.214302312 0.176919394
## Ford Pantera L 0.301094369 -0.132593408 -0.301203189 -0.172178453
## Ferrari Dino 0.023417404 -0.008281578 -0.022625443 0.003978400
## Maserati Bora -0.057859405 -0.122723311 -0.316121190 0.907895477
## Volvo 142E -0.129648389 0.257072350 0.059203915 -0.061514898
## drat wt qsec
## Mazda RX4 -0.0263009880 -0.0726411063 0.143629326
## Mazda RX4 Wag -0.0177319072 -0.0394187820 0.040616356
## Datsun 710 0.2006476278 -0.0114974476 0.126343349
## Hornet 4 Drive -0.0135071864 -0.0146597218 0.007586104
## Hornet Sportabout 0.0016247502 -0.1053022683 0.034364720
## Valiant 0.2611771880 0.0489771110 -0.084504132
## Duster 360 0.0662155813 0.1431640966 -0.033722973
## Merc 240D -0.0472232897 0.1077833402 -0.073363000
## Merc 230 -0.2240051807 0.1598316785 -0.697507641
## Merc 280 -0.0259842010 -0.0416234357 0.004888845
## Merc 280C -0.2034487328 -0.2078525898 -0.081790651
## Merc 450SE -0.0626369523 0.2081297252 -0.027320838
## Merc 450SL -0.0378657208 0.0042570455 0.077286267
## Merc 450SLC 0.0045944820 0.0023857326 -0.032714203
## Cadillac Fleetwood -0.0004444128 -0.0692343205 0.042976070
## Lincoln Continental 0.0003575035 0.0050587185 -0.002412446
## Chrysler Imperial 0.2487837368 0.7346108213 -0.252718809
## Fiat 128 0.0937555451 -0.0401919705 0.135225053
## Honda Civic 0.0349191402 -0.0116744317 0.009367064
## Toyota Corolla 0.3660681826 -0.5811003009 0.682021952
## Toyota Corona 0.2873405406 0.2127309957 -0.256298702
## Dodge Challenger 0.1607378670 0.0665667532 0.084434072
## AMC Javelin -0.0178758559 0.1666694979 -0.066229872
## Camaro Z28 -0.1323782424 -0.0008649131 0.011197149
## Pontiac Firebird -0.0071393951 -0.2602497058 0.002302491
## Fiat X1-9 0.0009069651 0.0166213833 -0.001157335
## Porsche 914-2 -0.0171690761 -0.0306744750 0.097841565
## Lotus Europa -0.5668470278 -0.3228596647 -0.311876674
## Ford Pantera L -0.4608426196 0.3168667520 -0.156261726
## Ferrari Dino -0.0174036294 0.0165153810 -0.023404829
## Maserati Bora -0.1505241939 -0.0756927681 0.162608842
## Volvo 142E 0.0118117119 -0.2077380227 0.137445976
##
## Attaching package: 'olsrr'
## The following object is masked from 'package:datasets':
##
## rivers
The formula for the DFFITS when leaving out observation \(i\) is
\[ \frac{\hat{y}_i - \hat{y}_{(i)}}{\sqrt{S^2_{(i)}\mathbf{H}_{ii} }} \]
where
## Mazda RX4 Mazda RX4 Wag Datsun 710
## -0.23948813 -0.08145436 -0.42718305
## Hornet 4 Drive Hornet Sportabout Valiant
## 0.02959920 0.15246583 -0.38313450
## Duster 360 Merc 240D Merc 230
## -0.26503897 0.20401424 -0.84038195
## Merc 280 Merc 280C Merc 450SE
## -0.06189518 -0.40532960 0.45630539
## Merc 450SL Merc 450SLC Cadillac Fleetwood
## 0.29056157 -0.07952772 -0.27509573
## Lincoln Continental Chrysler Imperial Fiat 128
## 0.01059545 1.29744350 0.76866049
## Honda Civic Toyota Corolla Toyota Corona
## 0.04510554 1.10286961 -0.74897695
## Dodge Challenger AMC Javelin Camaro Z28
## -0.31948602 -0.48511723 -0.28666117
## Pontiac Firebird Fiat X1-9 Porsche 914-2
## 0.53969610 -0.05941908 -0.14668858
## Lotus Europa Ford Pantera L Ferrari Dino
## 0.91620251 -0.76995989 0.03738570
## Maserati Bora Volvo 142E
## 1.07385631 -0.43147677
## mpg cyl disp hp drat wt qsec
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60
Recap: