我正在尝试获取mtcars
数据集中的所有变量对,并使用lm
函数建立线性模型。但是,当我去总结或绘制模型时,我的方法使我失去了公式。这是我正在使用的代码。
library(tidyverse)
my_vars <- names(mtcars))
pairs <- t(combn(my_vars, 2)) # Get all possible pairs of variables
# Create formulas for the lm model
fmls <-
as.tibble(pairs) %>%
mutate(fml = paste(V1, V2, sep = "~")) %>%
select(fml) %>%
.[[1]] %>%
sapply(as.formula)
# Create a linear model for ear pair of variables
mods <- lapply(fmls, function(v) lm(data = mtcars, formula = v))
# print the summary of all variables
for (i in 1:length(mods)) {
print(summary(mods[[i]]))
}
(我在这里agged住了使用字符串制作公式的想法
[1]:Pass a vector of variables into lm() formula。)以下是第一个模型(summary(mods[[1]])
)的摘要输出:
Call:
lm(formula = v, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-4.9814 -2.1185 0.2217 1.0717 7.5186
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.8846 2.0738 18.27 < 2e-16 ***
cyl -2.8758 0.3224 -8.92 6.11e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.206 on 30 degrees of freedom
Multiple R-squared: 0.7262, Adjusted R-squared: 0.7171
F-statistic: 79.56 on 1 and 30 DF, p-value: 6.113e-10
我正在寻找一种(也许是元编程)技术,以便呼叫行看起来像lm(formula = var1 ~ var2, data = mtcars)
而不是formula = v
。
答案 0 :(得分:1)
为了使生活更轻松,我将对成一个数据框:
library(tidyverse)
my_vars <- names(mtcars)
pairs <- t(combn(my_vars, 2)) %>%
as.data.frame# Get all possible pairs of variables
您可以使用eval()来计算表达式。
listOfRegs <- apply(pairs, 1, function(pair) {
V1 <- pair[[1]] %>% as.character
V2 <- pair[[2]] %>% as.character
fit <- eval(parse(text = paste0("lm(", pair[[1]] %>% as.character,
"~", pair[[2]] %>% as.character,
", data = mtcars)")))
return(fit)
})
lapply(listOfRegs, summary)
然后:
> lapply(listOfRegs, summary)
[[1]]
Call:
lm(formula = mpg ~ cyl, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-4.9814 -2.1185 0.2217 1.0717 7.5186
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.8846 2.0738 18.27 < 2e-16 ***
cyl -2.8758 0.3224 -8.92 6.11e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.206 on 30 degrees of freedom
Multiple R-squared: 0.7262, Adjusted R-squared: 0.7171
F-statistic: 79.56 on 1 and 30 DF, p-value: 6.113e-10
... etc