我有一个包含数字值的5列数据集(数据框)。
我希望为数据集中的每一对运行一个简单的线性回归。
例如,如果列名为A, B, C, D, E
,我想运行lm(A~B), lm(A~C), lm(A~D), ...., lm(D~E)
,...然后我想绘制每对的数据以及回归线。
我对R很陌生,所以我正在努力实现这一目标。我应该使用ddply
吗?还是lapply
?我真的不确定如何解决这个问题。
答案 0 :(得分:7)
以下是使用combn
combn(names(DF), 2, function(x){lm(DF[, x])}, simplify = FALSE)
示例:
set.seed(1)
DF <- data.frame(A=rnorm(50, 100, 3),
B=rnorm(50, 100, 3),
C=rnorm(50, 100, 3),
D=rnorm(50, 100, 3),
E=rnorm(50, 100, 3))
更新:添加@Henrik建议(参见评论)
# only the coefficients
> results <- combn(names(DF), 2, function(x){coefficients(lm(DF[, x]))}, simplify = FALSE)
> vars <- combn(names(DF), 2)
> names(results) <- vars[1 , ] # adding names to identify variables in the reggression
> results
$A
(Intercept) B
103.66739418 -0.03354243
$A
(Intercept) C
97.88341555 0.02429041
$A
(Intercept) D
122.7606103 -0.2240759
$A
(Intercept) E
99.26387487 0.01038445
$B
(Intercept) C
99.971253525 0.003824755
$B
(Intercept) D
102.65399702 -0.02296721
$B
(Intercept) E
96.83042199 0.03524868
$C
(Intercept) D
80.1872211 0.1931079
$C
(Intercept) E
89.0503893 0.1050202
$D
(Intercept) E
107.84384655 -0.07620397
答案 1 :(得分:2)
我建议还要查看相关矩阵(cor(DF)
),这通常是发现变量之间线性关系的最佳方法。相关性与简单线性回归的协方差和斜率紧密相关。下面的计算举例说明了这个链接。
示例数据:
set.seed(1)
DF <- data.frame(
A=rnorm(50, 100, 3),
B=rnorm(50, 100, 3),
C=rnorm(50, 100, 3),
D=rnorm(50, 100, 3),
E=rnorm(50, 100, 3)
)
回归斜率为cov(x, y) / var(x)
beta = cov(DF) * (1/diag(var(DF)))
A B C D E
A 1.00000000 -0.045548503 0.028448192 -0.32982367 0.01800795
B -0.03354243 1.000000000 0.003298708 -0.02489518 0.04501362
C 0.02429041 0.003824755 1.000000000 0.24269838 0.15550116
D -0.22407592 -0.022967212 0.193107904 1.00000000 -0.08977834
E 0.01038445 0.035248685 0.105020194 -0.07620397 1.00000000
截距为mean(y) - beta * mean(x)
colMeans(DF) - beta * colMeans(DF)
A B C D E
A 1.421085e-14 104.86992 97.44795 133.38310 98.49512
B 1.037180e+02 0.00000 100.02095 102.85026 95.83477
C 9.712461e+01 99.16182 0.00000 75.38373 84.06356
D 1.226899e+02 102.53263 80.87529 0.00000 109.22915
E 9.886859e+01 96.38451 89.41391 107.51930 0.00000
答案 2 :(得分:1)
将combn
用于列的所有名称组合(在以下示例中,我假设您只想要两个列的组合)和Map
用于运行循环。
使用来自R的mtcars数据的示例:
colc<-names(mtcars)
colcc<-combn(colc,2)
colcc<-data.frame(colcc)
kk<-Map(function(x)lm(as.formula(paste(colcc[1,x],"~",paste(colcc[2,x],collapse="+"))),data=mtcars), as.list(1:nrow(colcc)))
head(kk)
[[1]]
Call:
lm(formula = as.formula(paste(colcc[1, x], "~", paste(colcc[2,
x], collapse = "+"))), data = mtcars)
Coefficients:
(Intercept) cyl
37.885 -2.876
[[2]]
Call:
lm(formula = as.formula(paste(colcc[1, x], "~", paste(colcc[2,
x], collapse = "+"))), data = mtcars)
Coefficients:
(Intercept) disp
29.59985 -0.04122
[[3]]
Call:
lm(formula = as.formula(paste(colcc[1, x], "~", paste(colcc[2,
x], collapse = "+"))), data = mtcars)
Coefficients:
(Intercept) hp
30.09886 -0.06823
[[4]]
Call:
lm(formula = as.formula(paste(colcc[1, x], "~", paste(colcc[2,
x], collapse = "+"))), data = mtcars)
Coefficients:
(Intercept) drat
-7.525 7.678
[[5]]
Call:
lm(formula = as.formula(paste(colcc[1, x], "~", paste(colcc[2,
x], collapse = "+"))), data = mtcars)
Coefficients:
(Intercept) wt
37.285 -5.344
[[6]]
Call:
lm(formula = as.formula(paste(colcc[1, x], "~", paste(colcc[2,
x], collapse = "+"))), data = mtcars)
Coefficients:
(Intercept) qsec
-5.114 1.412