我尝试计算R中协方差矩阵的逆。我使用solve
和ginv
函数但是我无法正确找到单位矩阵。我想知道为什么会这样。以前有人遇到过这个问题吗?如何解决?
d <- data.frame(r1,r2,r3,r4)
ad <- cov(d) # get covariance
您可以通过以下方式访问ad
ad <- structure(c(0.000564103047135273, 0.000209426917735389,
-2.50601812852379e-07, 0.000318692159506722, 0.000209426917735389,
8.92756413718721e-05, -9.42226640483041e-08, 0.000135853820739977,
-2.50601812852379e-07, -9.42226640483041e-08, 1.12190005351388e-10,
-1.43381875666865e-07, 0.000318692159506722, 0.000135853820739977,
-1.43381875666865e-07, 0.000206733441799332),
.Dim = c(4L, 4L),
.Dimnames = list(c("r1", "r2", "r3", "r4"),
c("r1", "r2", "r3", "r4")))
# r1 r2 r3 r4
#r1 5.641030e-04 2.094269e-04 -2.506018e-07 3.186922e-04
#r2 2.094269e-04 8.927564e-05 -9.422266e-08 1.358538e-04
#r3 -2.506018e-07 -9.422266e-08 1.121900e-10 -1.433819e-07
#r4 3.186922e-04 1.358538e-04 -1.433819e-07 2.067334e-04
a <- ginv(ad,tol = 1e-18) `# calculate the inverse of the matrix by ginv`
# [,1] [,2] [,3] [,4]
#[1,] 2.369507e+05 7332.961 5.497029e+08 11158.82
#[2,] 7.332964e+03 9194.958 4.198473e+07 13992.28
#[3,] 5.497029e+08 41984725.523 1.353713e+12 63889594.47
#[4,] 1.115881e+04 13992.283 6.388959e+07 21292.54
b <- solve(ad,tol = 1e-18) # calculate the inverse of the matrix by solve
# r1 r2 r3 r4
#r1 236950.7 1.701817e+06 549702919 -1.099090e+06
#r2 1184944.4 -2.245684e+20 2905309940 1.475740e+20
#r3 549702919.1 4.213828e+09 1353713017167 -2.677616e+09
#r4 -762702.5 1.475740e+20 -1817729881 -9.697747e+19
ad%*%a
# r1 r2 r3 r4
#r1 1.000000e+00 3.552714e-15 4.729372e-11 -8.881784e-16
#r2 9.547918e-15 3.015976e-01 5.820766e-11 4.589515e-01
#r3 -2.385245e-18 3.234236e-12 1.000000e+00 -2.125362e-12
#r4 -8.881784e-15 4.589515e-01 6.002665e-11 6.984024e-01
ad%*%b
# r1 r2 r3 r4
#r1 1.000000e+00 0 0.000000e+00 0.000000000
#r2 1.421085e-14 0 2.910383e-11 0.000000000
#r3 0.000000e+00 0 1.000000e+00 -0.001953125
#r4 2.842171e-14 -4 0.000000e+00 4.000000000
答案 0 :(得分:5)
高度共线变量是回归方法核心矩阵计算中数值稳定性的威胁。评估多重共线性程度的一种方法是计算方差膨胀因子(VIF)。此代码是从Harrell的vif
包中的rms
函数中提取的:
library(rms)
dput(ad)
structure(c(0.0046716674, 0.0017256716, -0.0001918083, 0.0027385111,
0.001725672, 0.00073037, -8.1816e-05, 0.001159042, -0.0001918083,
-8.1816e-05, 9.169343e-06, -0.0001298359, 0.0027385111, 0.0011590423,
-0.0001298359, 0.0018393131), .Dim = c(4L, 4L), .Dimnames = list(
c("r1", "r2", "r3", "r4"), c("r1", "r2", "r3", "r4")))
nam <- dimnames(ad)[[1]]
d <- diag(ad)^0.5
vif.vals <- diag(solve(ad/(d %o% d)))
names(vif.vals) <- nam
vif.vals
#---
r1 r2 r3 r4
6.535378e+01 3.338701e+06 1.768640e+04 3.389362e+06
当您看到VIF高于10时,经常会担心经验。在此基础上,您的VIF是天文数字。 (使用你原来的评论后,最大VIF值甚至更高,两个值为1.185158e + 14,并且它们都远高于10。)