R:如何进行循环多元线性回归分析,从而将因子下降到<2级

时间:2019-02-04 05:52:07

标签: r for-loop linear-regression tidy

我正在尝试循环执行多元线性回归并自动删除没有至少两个级别的因子,以避免出现以下错误消息:

  

contrasts<-*tmp*中的错误,值= contr.funs [1 + isOF [nn]]):对比只能应用于具有2个或更多级别的因子*

现在我的代码是:

df %>% 
  group_by(crop_name) %>% 
    do(tidy(lm(formula = value ~ intercrop + 
erosion_c + purchased_seed + inorg_pest +
 org_pest + landscape + fert + inorgfert,
             data = . )))

问题是,有些农作物的样本量很大,我要回归的所有变量都有很多分,而另一些农作物的样本量很小,零接受给定的处理(即,没有种植血果作物等)。

在for循环中是否有一种方法可以告诉R退回它可以删除的所有内容,并避免出现此错误消息?

1 个答案:

答案 0 :(得分:0)

我很新,所以这可能不是最好的方法。您可能需要使用crop_name设置for循环,因为在我的示例中,df是一个作物组的子集。

df <- data.frame(intercrop = c("A","B","C","A","B","C"),
                   erosion_c = c("A","D","C","A","B","C"),
                   purchased_seed = c("A","B","D","F","E","C"),
                   inorg_pest = c("A","B","C","A","B","C"),
                   org_pest = c("A","B","A","A","B","B"),
                   landscape = c("A","A","A","A","A","A"),
                   fert = c("A","B","C","A","B","C"),
                   inorgfert = c("A","B","C","A","B","C")
                   )


yo <- sapply(df, levels)
hi <- as.data.frame(c(NA))
for(i in 1:length(yo)){
  hi[i] <- length(yo[[i]])
  names(hi)[i] <- names(df[i])
}

hi <- subset(as.data.frame(t(hi)), V1 >= 2)

formu <- row.names(hi)
formu <- as.formula(paste("value ~ ",gsub('.{3}$', '', paste( unlist(paste(formu,"+ ")), collapse=''))))