Everyone is welcome here --- except those who have borrowed books from me for and have not returned them yet!

Correlated regressors in multiple regression

Posted on August 25, 2012 in stats

It is often asserted that one two (or more) independent variables are correlated, this creates a problem in multiple regression. What problem? And when is it serious?

In multiple regression, the coefficients estimated for each regressor represents the influence of the associated variable when the others are kept constant. It is the “unique” contribution of this variable. When the variable is correlated with (a combination of) the others, it can difficult to interpret, and even meaningless to speak about the “unique” contirbution.

require(mvtnorm)
## Loading required package: mvtnorm
require(car)
## Loading required package: car
n <- 100
a1 <- 0.2
a2 <- 0.3
nsim <- 100

for (cor in c(0, .2, .4, .6, .8))
  {
  d <- rmvnorm(n, sigma=matrix(c(1, cor, cor, 1), nrow=2))
  x1 <- d[,1]
  x2 <- d[,2]
  print(cor.test(x1, x2))
  print("VIF:")
  print(vif(lm(rnorm(n)~x1 + x2)))

  stats <- matrix(NA, nrow=nsim, ncol=4)
  for (i in 1:nsim)
    {
    y <- a1 * x1 + a2 * x2 + rnorm(n)
    lmmod <-lm(y ~ x1 + x2)
    slm <- summary(lmmod)

    stats[i,] <- as.numeric(slm$coefficients[2:3, 1:2])
    }
  boxplot(stats, main=cor, ylim=c(-0.2,0.6))
  print(apply(stats, 2, summary))
}
## 
##  Pearson's product-moment correlation
## 
## data:  x1 and x2
## t = 0.40372, df = 98, p-value = 0.6873
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.1569257  0.2352834
## sample estimates:
##        cor 
## 0.04074844 
## 
## [1] "VIF:"
##       x1       x2 
## 1.001663 1.001663

##             [,1]   [,2]    [,3]    [,4]
## Min.    -0.04512 0.1036 0.09138 0.07261
## 1st Qu.  0.17010 0.2527 0.10590 0.08415
## Median   0.22950 0.3115 0.11360 0.09025
## Mean     0.22900 0.3152 0.11270 0.08956
## 3rd Qu.  0.29240 0.3488 0.11950 0.09498
## Max.     0.54710 0.5579 0.13490 0.10720
## 
##  Pearson's product-moment correlation
## 
## data:  x1 and x2
## t = 4.3448, df = 98, p-value = 3.405e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2230824 0.5545346
## sample estimates:
##       cor 
## 0.4018907 
## 
## [1] "VIF:"
##       x1       x2 
## 1.192629 1.192629

##              [,1]    [,2]    [,3]   [,4]
## Min.    -0.002223 0.06638 0.08325 0.0934
## 1st Qu.  0.151300 0.22860 0.09952 0.1117
## Median   0.188900 0.27260 0.10510 0.1179
## Mean     0.199500 0.28760 0.10490 0.1177
## 3rd Qu.  0.258600 0.34380 0.10870 0.1219
## Max.     0.416600 0.60090 0.12550 0.1408
## 
##  Pearson's product-moment correlation
## 
## data:  x1 and x2
## t = 4.8344, df = 98, p-value = 4.951e-06
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2652609 0.5848269
## sample estimates:
##       cor 
## 0.4388158 
## 
## [1] "VIF:"
##       x1       x2 
## 1.238481 1.238481

##            [,1]     [,2]   [,3]    [,4]
## Min.    -0.1360 -0.06371 0.1076 0.09756
## 1st Qu.  0.1041  0.18850 0.1213 0.10990
## Median   0.2272  0.27530 0.1258 0.11400
## Mean     0.2193  0.27940 0.1267 0.11490
## 3rd Qu.  0.3440  0.36090 0.1315 0.11920
## Max.     0.6023  0.60150 0.1582 0.14340
## 
##  Pearson's product-moment correlation
## 
## data:  x1 and x2
## t = 7.3803, df = 98, p-value = 5.194e-11
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4546582 0.7106843
## sample estimates:
##       cor 
## 0.5976999 
## 
## [1] "VIF:"
##       x1       x2 
## 1.555803 1.555803

##             [,1]     [,2]   [,3]   [,4]
## Min.    -0.08326 -0.01441 0.1161 0.1214
## 1st Qu.  0.11810  0.19350 0.1266 0.1323
## Median   0.21660  0.28790 0.1345 0.1405
## Mean     0.20970  0.28950 0.1341 0.1402
## 3rd Qu.  0.29070  0.37700 0.1408 0.1472
## Max.     0.55470  0.70070 0.1582 0.1654
## 
##  Pearson's product-moment correlation
## 
## data:  x1 and x2
## t = 12.533, df = 98, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.6955040 0.8501094
## sample estimates:
##       cor 
## 0.7847217 
## 
## [1] "VIF:"
##      x1      x2 
## 2.60273 2.60273

##             [,1]     [,2]   [,3]   [,4]
## Min.    -0.27240 -0.04132 0.1427 0.1378
## 1st Qu.  0.07114  0.17990 0.1632 0.1576
## Median   0.21950  0.30450 0.1717 0.1658
## Mean     0.19040  0.32100 0.1714 0.1655
## 3rd Qu.  0.30620  0.45400 0.1781 0.1719
## Max.     0.55520  0.80450 0.1955 0.1887