MVE: 设这是数据集:
data <- data.frame(year = rep(seq(1966,2015,1), 8),
county = c(rep('prva', 50), rep('druga', 50), rep('treća', 50), rep('četvrta', 50),
rep('peta', 50), rep('šesta', 50), rep('sedma', 50), rep('osma', 50)),
crime1 = runif(400), crime2 = runif(400), crime3 = runif(400),
uvar1 = runif(400), uvar2 = runif(400), uvar3 = runif(400),
var1 = runif(400), var2 = runif(400), var3 = runif(400), var4 = runif(400), var5 = runif(400))
让我们说犯罪1,2和3是具体的因变量。 uvar1,2和3是特定的独立变量。 var1,2等是其他协变量。我试图做的是自动化回归。
即,我想获得此代码的结果:
plm(log(crime1) = log(univar1) + log(var1) + log(var2) + log(var3) + log(var4), model = 'within', effect = 'twoways', data = data)
plm(log(crime2) = log(univar2) + log(var1) + log(var2) + log(var3) + log(var4), model = 'within', effect = 'twoways', data = data)
等;但是没有为每个估计模型编写20行代码。
通过查看类似的问题,就我而言:
crime <- c('crime1', 'crime2', 'crime3')
plm.results <- lapply(data[, crime], function(y) plm(y ~ var1 + var2 + var3 + var4,
model = 'within', effect ='twoways', data = data))
这对我的因变量肯定有帮助,但我无法想出如何在每个估计中包含特定的自变量。为了再次澄清,我希望univar1能够进入第一次回归,而不是其他的回归等。
答案 0 :(得分:0)
formula
功能非常有用。您可以合并变体
使用paste0
和formula
与lapply
的组合来遍历指数1到3。
#remember to set.seed when sampling from distributions
set.seed(123)
#a helper function to create "log(var)" from "var"
fn_appendLog = function(x) {
paste0("log(",x,")")
}
modelList = lapply(1:3,function(x) {
indepVars2 = Reduce(function(x,y) paste(x,y,sep="+"),lapply(colnames(regDF)[grepl("^v",colnames(regDF))],fn_appendLog))
#> indepVars2
#[1] "log(var1)+log(var2)+log(var3)+log(var4)+log(var5)"
indepVars1 = fn_appendLog(paste0("uvar",x))
depVar = fn_appendLog(paste0("crime",x))
formulaVar = formula(paste0(depVar, " ~ ",indepVars1,"+", indepVars2))
#> formulaVar
#log(crime1) ~ log(uvar1) + log(var1) + log(var2) + log(var3) + log(var4) + log(var5)
modelObj = plm(formulaVar, model = 'within', effect = 'twoways', data = regDF)
})
<强>要点:强>
summary(modelList[[1]])
#> summary(modelList[[1]])
#Twoways effects Within Model
#
#Call:
#plm(formula = formulaVar, data = regDF, effect = "twoways", model = "within")
#
#Balanced Panel: n=50, T=8, N=400
#
#Residuals :
# Min. 1st Qu. Median 3rd Qu. Max.
# -5.730 -0.396 0.116 0.599 1.520
#
#Coefficients :
# Estimate Std. Error t-value Pr(>|t|)
#log(uvar1) 0.0393871 0.0490891 0.8024 0.4229
#log(var1) -0.0369356 0.0541029 -0.6827 0.4953
#log(var2) -0.0455269 0.0543664 -0.8374 0.4030
#log(var3) 0.0150516 0.0520347 0.2893 0.7726
#log(var4) -0.0034534 0.0441506 -0.0782 0.9377
#log(var5) -0.0109038 0.0527446 -0.2067 0.8363
#
#Total Sum of Squares: 302.23
#Residual Sum of Squares: 300.6
#R-Squared: 0.0053896
#Adj. R-Squared: 0.0045407
#F-statistic: 0.304357 on 6 and 337 DF, p-value: 0.93448
<强>解释强>
自变量有两种类型,第一种是uvar1
,另一种是var1...varN
。
1)colnames(regDF)[grepl("^v",colnames(regDF))]
这将为我们提供所有变量的列表
在regDF中,它与字母&#39; v&#39;开头的模式相匹配用插入符号表示开始
字符串和$
作为字符串的结尾,此阶段的输出为c("var1","var2"...,"var5")
2)我们需要这个变量向量的日志变体,因此我们将它们通过lapply
传递给函数
fn_appendLog
,其结果是list("log(var1)","log(var2)",...,"log(var5)")
3)接下来,我们需要将这些变量转换为log(var1)+log(var2)...+log(var5)
4)为此,我们使用函数Reduce
和函数paste(x,y,sep="+")
,这需要
上面列表中的每个元素都带有相邻元素,并与分隔符连接在一起作为&#34; +&#34;
step1 = (log(var1)+log(var2))
step2 = (log(var1)+log(var2)) + log(var3)
step3 = (log(var1)+log(var2)+log(var3))+ log(var4) and so on
5)函数Reduce
将函数应用于列表并将输出聚合到单个向量中
得到log(var1)+log(var2)+log(var3)+log(var4)+log(var5)
这可能看起来很吓人,但是当你经常使用它们并探索它们的例子时
很快就会有部分内容。了解函数的最佳方法是lapply
是从端到端阅读?lapply
的文档并执行
列出的示例,修改参数并获得熟悉度。希望这会有所启发
在您的查询。