我有一个包含3个相关(LHS)变量和4个独立(RHS)变量的数据框。我想在每个RHS变量上运行每个LHS变量的线性回归,并将每个回归的结果作为一行存储在数据框中,列为:lhs,rhs,Estimate,Std。错误,t值,Pr(> | t |)。
例如,使用mtcars,我考虑了一个嵌套循环:
lhs <- c('mpg', 'cyl', 'disp')
rhs <- c('hp', 'drat', 'wt', 'qsec')
reg_count <- 1
for (i in lhs){
for (j in rhs){
model <- lm(i ~ j, data = mtcars)
results[reg_count] <- coef(summary(model))
reg_count <- reg_count + 1
}
}
然而,由于多种原因,这失败了。有一种简单的方法可以做到这一点吗?理想情况下使用apply()函数而不是循环?
答案 0 :(得分:2)
这是我将如何做到的。我稍微缩短了你的例子,但这没关系:
lhs <- c('mpg', 'cyl', 'disp')
rhs <- c('hp', 'drat')
models = list()
for (i in lhs){
for (j in rhs){
models[[paste(i, "vs", j)]] <- lm(as.formula(paste(i, "~", j)), data = mtcars)
}
}
如果您想使用apply
,则需要以matrix
开头。运行时间的差异可以忽略不计。
# with apply:
coefs_mat = expand.grid(lhs, rhs)
mods = apply(coefs_mat, 1, function(row) {
lm(as.formula(paste(row[1], "~", row[2])), data = mtcars)
})
names(mods) = with(coefs_mat, paste(Var1, "vs", Var2))
两种方法都给出了相同的结果。现在我们可以使用broom::tidy
# get coefs
library(broom)
coefs = lapply(mods, tidy, simplify = F)
# combine
dplyr::bind_rows(coefs, .id = "mod")
# mod term estimate std.error statistic p.value
# 1 mpg vs hp (Intercept) 30.09886054 1.633921e+00 18.4212465 6.642736e-18
# 2 mpg vs hp hp -0.06822828 1.011930e-02 -6.7423885 1.787835e-07
# 3 cyl vs hp (Intercept) 3.00679525 4.254852e-01 7.0667442 7.405351e-08
# 4 cyl vs hp hp 0.02168354 2.635142e-03 8.2286042 3.477861e-09
# 5 disp vs hp (Intercept) 20.99248341 3.260662e+01 0.6438104 5.245902e-01
# 6 disp vs hp hp 1.42977003 2.019414e-01 7.0801224 7.142679e-08
# 7 mpg vs drat (Intercept) -7.52461844 5.476663e+00 -1.3739423 1.796391e-01
# 8 mpg vs drat drat 7.67823260 1.506705e+00 5.0960421 1.776240e-05
我们还可以提取模型摘要统计数据:
# get summary stats
summ = lapply(mods, glance, simplify = F)
dplyr::bind_rows(summ, .id = "mod")
# mod r.squared adj.r.squared sigma statistic p.value df logLik
# 1 mpg vs hp 0.6024373 0.5891853 3.862962 45.45980 1.787835e-07 2 -87.61931
# 2 cyl vs hp 0.6929688 0.6827344 1.005944 67.70993 3.477861e-09 2 -44.56307
# 3 disp vs hp 0.6255997 0.6131197 77.089503 50.12813 7.142679e-08 2 -183.41236
# 4 mpg vs drat 0.4639952 0.4461283 4.485409 25.96964 1.776240e-05 2 -92.39996
# 5 cyl vs drat 0.4899134 0.4729105 1.296596 28.81354 8.244636e-06 2 -52.68517
# 6 disp vs drat 0.5044038 0.4878839 88.693360 30.53315 5.282022e-06 2 -187.89934
# AIC BIC deviance df.residual
# 1 181.23863 185.63584 447.67431 30
# 2 95.12614 99.52335 30.35771 30
# 3 372.82473 377.22194 178283.74604 30
# 4 190.79993 195.19714 603.56673 30
# 5 111.37033 115.76754 50.43482 30
# 6 381.79868 386.19588 235995.36410 30
答案 1 :(得分:2)
您可以从expand.grid
开始,为相关/自变量对提供一个很好的数据帧。然后将公式和模型添加到数据中。
pairings <- expand.grid(
lhs = c('mpg', 'cyl', 'disp'),
rhs = c('hp', 'drat', 'wt', 'qsec')
)
pairings[["formula"]] <- lapply(
X = paste(pairings[["lhs"]], "~", pairings[["rhs"]]),
FUN = as.formula
)
pairings[["model"]] <- lapply(
X = pairings[["formula"]],
FUN = lm,
data = mtcars
)
结果:
str(pairings, max.level = 1)
# 'data.frame': 12 obs. of 4 variables:
# $ lhs : Factor w/ 3 levels "mpg","cyl","disp": 1 2 3 1 2 3 1 2 3 1 ...
# $ rhs : Factor w/ 4 levels "hp","drat","wt",..: 1 1 1 2 2 2 3 3 3 4 ...
# $ formula:List of 12
# $ model :List of 12
# - attr(*, "out.attrs")=List of 2
pairings[["model"]][[1]]
# Call:
# FUN(formula = X[[i]], data = ..1)
#
# Coefficients:
# (Intercept) hp
# 30.09886 -0.06823