R:在Sapply中使用粘贴的配方

时间:2015-10-14 20:29:07

标签: r regression sapply

我正在尝试识别相关的解释变量并消除。我使用Sapply将回归应用于我感兴趣的变量,并手动删除带有FIV的变量> 10.然而,当我尝试重现这个以快速运行许多vif时,我无法让我的回归脚本与包含我想要保留的名称的粘贴公式对象一起运行。下面:

    regressiondata <- data.frame(matrix(ncol=9,nrow=100,runif(900,1,100)))
colnames(regressiondata) <- c("indep1","indep2","indep3","indep4","var1","var2","var3","var4","var5")
vifs1_model <- sapply(regressiondata[,indep_variables],function(x) vif(lm(x~var1+var2+var3+var4+var5, 
                                                                      data = regressiondata, 
                                                                      na.action=na.exclude)))
vifs1 <- rowMeans(vifs1_model)
formula_variables <- paste(names(vifs1),collapse="+")
final_model <- t(round(sapply(regressiondata[,indep_variables], 
           function(x) lm(x ~ formula_variables,data=regressiondata,na.action=na.exclude)$coef),2))

当我跑#34; final_model&#34;我收到这个错误:

t中的错误(圆形(sapply(regressiondata [,indep_variables],函数(x)lm(x~:   评估论证的错误&#39; x&#39;选择函数的方法&#39;:model.frame.default中的错误(formula = x~pormula_variables,data = regressiondata,:   变量长度不同(找到&#39; formula_variables&#39;)

2 个答案:

答案 0 :(得分:1)

我认为你有几个问题:

  1. 当你看起来只想在自变量名称的矢量上浏览时,你正在使用数据帧上的蓝白色
  2. 你对lm的最后一次嵌套调用似乎混合了表达式和字符串
  3. 这是我的演练。你的代码引用了一些丢失的对象,所以我添加了一些行,我认为你遗漏了

    library(car) # for fiv()
    regressiondata <- data.frame(matrix(ncol=9,nrow=100,runif(900,1,100)))
    colnames(regressiondata) <- c("indep1",
                                  "indep2",
                                  "indep3",
                                  "indep4",
                                  "var1",
                                  "var2",
                                  "var3",
                                  "var4",
                                  "var5")
    
    indep_variables <- names(regressiondata)[1:4] # object did not exist
    
    为了清楚起见,我打破了匿名函数:

    f1 <- function(x) {
        vif(lm(x~var1+var2+var3+var4+var5,
            data = regressiondata, 
            na.action=na.exclude))
    }
    

    现在你的回归

    vifs1_model <- sapply(regressiondata[,indep_variables], f1)
    vifs1 <- rowMeans(vifs1_model)
    formula_variables <- paste(names(vifs1),collapse="+")
    

    我将此函数命名为拉取系数并将lm用整个公式传递给字符向量(字符串):

    getCoefs <- function(x) {
        lm(paste(x, "~", formula_variables), data=regressiondata,
        na.action=na.exclude)$coef
    }
    

    现在,只需对名称的矢量进行讽刺,然后进行转置和循环:

    final_model <- sapply(indep_variables, getCoefs)
    final_model <- t(round(final_model ,2)) 
    

答案 1 :(得分:0)

这是一种简单的做事方式。大部分工作由sub_regression函数完成,该函数进行回归,通过vif过滤自变量,然后重做回归

library(dplyr)
library(tidyr)
library(magrittr)
library(car)

sub_regression = function(sub_data_frame)
  lm(independent_value ~ var1+var2+var3+var4+var5, 
     data = sub_data_frame , 
     na.action="na.exclude") %>%
  vif %>%
  Filter(function(x) x <= 10, .) %>%
  names %>%
  paste(collapse = " + ") %>%
  paste("independent_value ~ ", .) %>%
  as.formula %>%
  lm(. , sub_data_frame, na.action="na.exclude") %>%
  coefficients %>%
  round(3) %>%
  as.list %>%
  data.frame(check.names = FALSE)

matrix(ncol=9,nrow=100,runif(900,1,100)) %>%
  data.frame %>%
  setNames(c("indep1","indep2","indep3","indep4","var1","var2","var3","var4","var5")) %>%
  gather(independent_variable, independent_value, 
         indep1, indep2, indep3, indep4) %>%
  group_by(independent_variable) %>%
  do(sub_regression(.))