使用特定的依赖变量和自变量自动回归

时间:2017-04-04 14:15:30

标签: r automation regression plm

MVE: 设这是数据集:

data <- data.frame(year = rep(seq(1966,2015,1), 8), 
               county = c(rep('prva', 50), rep('druga', 50), rep('treća', 50), rep('četvrta', 50),
                          rep('peta', 50), rep('šesta', 50), rep('sedma', 50), rep('osma', 50)),
               crime1 = runif(400), crime2 = runif(400), crime3 = runif(400), 
               uvar1 = runif(400), uvar2 = runif(400), uvar3 = runif(400),
               var1 = runif(400), var2 = runif(400), var3 = runif(400), var4 = runif(400), var5 = runif(400))

让我们说犯罪1,2和3是具体的因变量。 uvar1,2和3是特定的独立变量。 var1,2等是其他协变量。我试图做的是自动化回归。

即,我想获得此代码的结果:

plm(log(crime1) = log(univar1) + log(var1) + log(var2) + log(var3) + log(var4), model = 'within', effect = 'twoways', data = data)

plm(log(crime2) = log(univar2) + log(var1) + log(var2) + log(var3) + log(var4), model = 'within', effect = 'twoways', data = data)

等;但是没有为每个估计模型编写20行代码。

通过查看类似的问题,就我而言:

crime <- c('crime1', 'crime2', 'crime3')
plm.results <- lapply(data[, crime], function(y) plm(y ~ var1 + var2 + var3 + var4, 
                                                     model = 'within', effect ='twoways', data = data))

这对我的因变量肯定有帮助,但我无法想出如何在每个估计中包含特定的自变量。为了再次澄清,我希望univar1能够进入第一次回归,而不是其他的回归等。

1 个答案:

答案 0 :(得分:0)

创建多组模型时,

formula功能非常有用。您可以合并变体     使用paste0formulalapply的组合来遍历指数1到3。

#remember to set.seed when sampling from distributions

set.seed(123)

#a helper function to create "log(var)" from "var"
fn_appendLog = function(x) {
 paste0("log(",x,")")
}



modelList = lapply(1:3,function(x) {


indepVars2 = Reduce(function(x,y) paste(x,y,sep="+"),lapply(colnames(regDF)[grepl("^v",colnames(regDF))],fn_appendLog))

#> indepVars2
#[1] "log(var1)+log(var2)+log(var3)+log(var4)+log(var5)"


indepVars1 = fn_appendLog(paste0("uvar",x))

depVar = fn_appendLog(paste0("crime",x))

formulaVar = formula(paste0(depVar, " ~ ",indepVars1,"+", indepVars2))

#> formulaVar
#log(crime1) ~ log(uvar1) + log(var1) + log(var2) + log(var3) +  log(var4) + log(var5)


modelObj = plm(formulaVar, model = 'within', effect = 'twoways', data = regDF)


})

<强>要点:

summary(modelList[[1]])

#> summary(modelList[[1]])
#Twoways effects Within Model
#
#Call:
#plm(formula = formulaVar, data = regDF, effect = "twoways", model = "within")
#
#Balanced Panel: n=50, T=8, N=400
#
#Residuals :
#   Min. 1st Qu.  Median 3rd Qu.    Max. 
# -5.730  -0.396   0.116   0.599   1.520 
#
#Coefficients :
#             Estimate Std. Error t-value Pr(>|t|)
#log(uvar1)  0.0393871  0.0490891  0.8024   0.4229
#log(var1)  -0.0369356  0.0541029 -0.6827   0.4953
#log(var2)  -0.0455269  0.0543664 -0.8374   0.4030
#log(var3)   0.0150516  0.0520347  0.2893   0.7726
#log(var4)  -0.0034534  0.0441506 -0.0782   0.9377
#log(var5)  -0.0109038  0.0527446 -0.2067   0.8363
#
#Total Sum of Squares:    302.23
#Residual Sum of Squares: 300.6
#R-Squared:      0.0053896
#Adj. R-Squared: 0.0045407
#F-statistic: 0.304357 on 6 and 337 DF, p-value: 0.93448

<强>解释

自变量有两种类型,第一种是uvar1,另一种是var1...varN

1)colnames(regDF)[grepl("^v",colnames(regDF))]这将为我们提供所有变量的列表        在regDF中,它与字母&#39; v&#39;开头的模式相匹配用插入符号表示开始        字符串和$作为字符串的结尾,此阶段的输出为c("var1","var2"...,"var5")

2)我们需要这个变量向量的日志变体,因此我们将它们通过lapply传递给函数        fn_appendLog,其结果是list("log(var1)","log(var2)",...,"log(var5)")

的列表输出

3)接下来,我们需要将这些变量转换为log(var1)+log(var2)...+log(var5)

4)为此,我们使用函数Reduce和函数paste(x,y,sep="+"),这需要        上面列表中的每个元素都带有相邻元素,并与分隔符连接在一起作为&#34; +&#34;

   step1 = (log(var1)+log(var2))
   step2 = (log(var1)+log(var2)) + log(var3)
   step3 = (log(var1)+log(var2)+log(var3))+ log(var4) and so on

5)函数Reduce将函数应用于列表并将输出聚合到单个向量中        得到log(var1)+log(var2)+log(var3)+log(var4)+log(var5)

的最终输出

这可能看起来很吓人,但是当你经常使用它们并探索它们的例子时        很快就会有部分内容。了解函数的最佳方法是lapply是从端到端阅读?lapply的文档并执行        列出的示例,修改参数并获得熟悉度。希望这会有所启发        在您的查询。