通过循环迭代单向方差分析会在R中引发错误

时间:2019-06-12 11:53:55

标签: r anova

我试图遍历大型数据框[5413列],并在每列上运行ANOVA,但是尝试这样做时出现错误。

我想将ANOVA中的P值写入包含列标题的数据框中的新行。但是,由于我目前的知识有限,我正在将P值输出写入可以在bash中解析的文件中。

这是数据的示例布局:

data()
Name, Group, aaaA, aaaE, bbbR, cccD
Apple, Fruit, 1.23, 0.45, 0.3, 1.1
Banana, Fruit, 0.54, 0.12, 2.0, 1.32
Carrot, Vegetable, 0.01, 0.05, 0.45, 0.9
Pear, Fruit, 0.1, 0.2, 0.1, 0.3
Fox, Animal, 1.0, 0.9, 1.2, 0.8
Dog, Animal, 1.2, 1.1, 0.8, 0.7

这是dput的输出:

structure(list(Name = structure(c(1L, 2L, 3L, 6L, 5L, 4L), .Label = c("Apple", 
"Banana", "Carrot", "Dog", "Fox", "Pear"), class = "factor"), 
    Group = structure(c(2L, 2L, 3L, 2L, 1L, 1L), .Label = c(" Animal", 
    " Fruit", " Vegetable"), class = "factor"), aaaA = c(1.23, 
    0.54, 0.01, 0.1, 1, 1.2), aaaE = c(0.45, 0.12, 0.05, 0.2, 
    0.9, 1.1), bbbR = c(0.3, 2, 0.45, 0.1, 1.2, 0.8), cccD = c(1.1, 
    1.32, 0.9, 0.3, 0.8, 0.7)), class = "data.frame", row.names = c(NA, 
-6L))

要获得成功的输出,我要做:

summary(aov(aaaA ~ Group, data=data))[[1]][["Pr(>F)"]]

然后我尝试在一个循环中实现它:

for(i in names(data[3:6])){
out <- summary(aov(i ~ Group, data=data))[[1]][["Pr(>F)"]]
write.csv(out, i)}

哪个返回错误:

Error in model.frame.default(formula = i ~ Group, data = test, drop.unused.levels = TRUE) : 
variable lengths differ (found for 'Group')

有人可以帮助解决该错误或实施每列方差分析吗?

1 个答案:

答案 0 :(得分:0)

我们可以执行以下操作,然后获取p值:

to_use<-setdiff(names(df),"aaaA")
lapply(to_use,function(x) summary(do.call(aov,list(as.formula(paste("aaaA","~",x)),
                                           data=df))))

这给您:

[[1]]
            Df Sum Sq Mean Sq
Name         5   1.48   0.296

[[2]]
            Df Sum Sq Mean Sq F value Pr(>F)
Group        2 0.8113  0.4057   1.819  0.304
Residuals    3 0.6689  0.2230               

[[3]]
            Df Sum Sq Mean Sq F value Pr(>F)  
aaaE         1 0.9286  0.9286   6.733 0.0604 .
Residuals    4 0.5516  0.1379                 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

[[4]]
            Df Sum Sq Mean Sq F value Pr(>F)
bbbR         1  0.043  0.0430    0.12  0.747
Residuals    4  1.437  0.3593               

[[5]]
            Df Sum Sq Mean Sq F value Pr(>F)
cccD         1 0.1129  0.1129    0.33  0.596
Residuals    4 1.3673  0.3418