Question

我试图在不同的数据集上运行anova，并且不太清楚如何做到这一点。我骂了一遍，发现这很有用：https://stats.idre.ucla.edu/r/codefragments/looping_strings/

hsb2 <- read.csv("https://stats.idre.ucla.edu/stat/data/hsb2.csv")
names(hsb2)
varlist <- names(hsb2)[8:11]
models <- lapply(varlist, function(x) {
lm(substitute(read ~ i, list(i = as.name(x))), data = hsb2)
})

我对上述代码的作用的理解是它创建了一个函数lm（）并将其应用于varlist中的每个变量，并对每个变量进行线性回归。

所以我认为使用aov而不是lm对我来说会像这样：

aov(substitute(read ~ i, list(i = as.name(x))), data = hsb2)

但是，我收到了这个错误：

Error in terms.default(formula, "Error", data = data) : 
no terms component nor attribute

我不知道错误的来源。请帮忙！

Answer 1

这应该这样做。 varlist向量将逐项传递给函数，并且将传递列。 lm函数只能看到两列数据帧和＆＃34; read＆＃34;列每次都是因变量。不需要花哨的替代品：

models <- sapply(varlist, function(x) {
lm(read ~ .,  data = hsb2[, c("read", x) ])
}, simplify=FALSE)

> summary(models[[1]])  # The first model. Note the use of "[["

Call:
lm(formula = read ~ ., data = hsb2[, c("read", x)])

Residuals:
     Min       1Q   Median       3Q      Max 
-19.8565  -5.8976  -0.8565   5.5801  24.2703 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 18.16215    3.30716   5.492 1.21e-07 ***
write        0.64553    0.06168  10.465  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 8.248 on 198 degrees of freedom
Multiple R-squared: 0.3561, Adjusted R-squared: 0.3529 
F-statistic: 109.5 on 1 and 198 DF,  p-value: < 2.2e-16

对于所有模型::

lapply(models, summary)

Answer 2

问题是substitute()返回表达式，而不是公式。我认为@ thelatemail建议

lm(as.formula(paste("read ~",x)), data = hsb2)

是一个很好的解决方法。或者，您可以评估表达式以获取

的公式

models <- lapply(varlist, function(x) {
    aov(eval(substitute(read ~ i, list(i = as.name(x)))), data = hsb2)
})

我想这取决于你之后要对模型列表做什么。做

models <- lapply(varlist, function(x) {
    eval(bquote(aov(read ~ .(as.name(x)), data = hsb2)))
})

为每个结果提供“更清晰”的call属性。

Answer 3

那天晚上，阿克伦借了我的答案，现在我（部分）借了他的答案。

do.call将变量放入call输出，以便正确读取。这是简单回归的一般函数。

doModel <- function(col1, col2, data = hsb2, FUNC = "lm") 
{
    form <- as.formula(paste(col1, "~", col2))
    do.call(FUNC, list(form, substitute(data)))
}     

lapply(varlist, doModel, col1 = "read")
# [[1]]
#
# Call:
# lm(formula = read ~ write, data = hsb2)
#
# Coefficients:
# (Intercept)        write  
#     18.1622       0.6455  
#
#
# [[2]]
#
# Call:
# lm(formula = read ~ math, data = hsb2)
#
# Coefficients:
# (Intercept)         math  
#     14.0725       0.7248  
#
# ...
# ...
# ...

注意：正如电子邮件在评论中提及

sapply(varlist, doModel, col1 = "read", simplify = FALSE)

会将名称保留在列表中，并允许list$name子集化。

与anova一起代替r

3 个答案: