我目前正在学习R.我以前没有STATA的知识。
我想重新分析一项在Stata中完成的研究(具有面板校正标准误差的xtpcse线性回归)。我无法在Stata中找到模型或更详细的代码或任何其他提示如何在R中重写它。我已经为R安装了计量经济学的plm包。这就是我得到的。
下面复制了来自STATA的.do文件的第一行(我刚看到它非常难以理解。这是我复制.do内容的{txt文件的链接:http://dl.dropbox.com/u/4004629/This%20was%20in%20the%20.do%20file.txt)。 我不知道如何以更好的方式解决这个问题。我尝试了google-ing STATA和R比较等,但它没有用。
我要复制的研究的所有数据都在这里:
https://umdrive.memphis.edu/rblanton/public/ISQ_data
---STATA---
Group variable: c_code Number of obs = 265
Time variable: year Number of groups = 27
Panels: correlated (unbalanced) Obs per group: min = 3
Autocorrelation: common AR(1) avg = 9.814815
Sigma computed by pairwise selection max = 14
Estimated covariances = 378 R-squared = 0.8604
Estimated autocorrelations = 1 Wald chi2(11) = 8321.15
Estimated coefficients = 15 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
| Panel-corrected
food | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lag_food | .8449038 .062589 13.50 0.000 .7222316 .967576
ciri | -.010843 .0222419 -0.49 0.626 -.0544364 .0327504
human_cap | .0398406 .0142954 2.79 0.005 .0118222 .0678591
worker_rts | -.1132705 .0917999 -1.23 0.217 -.2931951 .066654
polity_4 | .0113995 .014002 0.81 0.416 -.0160439 .0388429
market_size | .0322474 .0696538 0.46 0.643 -.1042716 .1687665
income | .0382918 .0979499 0.39 0.696 -.1536865 .2302701
econ_growth | .0145589 .0105009 1.39 0.166 -.0060224 .0351402
log_trade | -.3062828 .1039597 -2.95 0.003 -.5100401 -.1025256
fix_dollar | -.0351874 .1129316 -0.31 0.755 -.2565293 .1861545
fixed_xr | -.4941214 .2059608 -2.40 0.016 -.897797 -.0904457
xr_fluct | .0019044 .0106668 0.18 0.858 -.0190021 .0228109
lab_growth | .0396278 .0277936 1.43 0.154 -.0148466 .0941022
english | -.1594438 .1963916 -0.81 0.417 -.5443641 .2254766
_cons | .4179213 1.656229 0.25 0.801 -2.828227 3.66407
-------------+----------------------------------------------------------------
rho | .0819359
------------------------------------------------------------------------------
. xtpcse fab_metal lag_fab_metal ciri human_cap worker_rts polity_4 market
> income econ_growth log_trade fix_dollar fixed_xr xr_fluct lab_growth
> english, pairwise corr(ar1)
我刚试过Vincent的代码。我尝试了pcse2和vcovBK代码,它们都有效(尽管我不知道如何处理vcocBK产生的相关矩阵)。
然而,我仍然有麻烦再现我正在重新分析的论文中回归系数的估计。我尽可能地遵循他们的食谱,我认为,唯一的缺失就是Stata“自相关:共同AR(1)”的部分。我正在分析的论文说:“OLS回归使用面板校正标准误差(Beck / Katz '95),控制每个面板内的一阶相关性(Stata中的corr AR1选项)。”
如何控制R中每个面板中的一阶相关性?
以下是我迄今为止对我的数据所做的事情:
## run lm
res.lm <- lm(total_FDI ~ ciri + human_cap + worker_rts + polity_4 + lag_total + market_size + income + econ_growth + log_trade + fixed_xr + fix_dollar + xr_fluct + english + lab_growth, data=D)
## run pcse
res.pcse <- pcse2(res.lm,groupN="c_code",groupT="year",pairwise=TRUE)
答案 0 :(得分:3)
正如Ramnath所提到的,pcse package将完成Stata的xtpcse
所做的事。或者,您可以使用plm package.中的vcovBK()
功能。如果您选择后一种选项,请确保使用cluster='time'
选项,这就是Beck&amp; amp; Katz(1995)的文章提出了Stata命令的实现。
pcse
包效果很好,但有一些问题会导致许多直观的用户输入无法接受,尤其是在数据集不平衡的情况下。您可能想尝试重新编写我之前编写的函数。只需加载pcse
包,加载pcse2
函数,然后按照pcse文档中的说明使用它。恕我直言,下面粘贴的功能比pcse
人提供的功能更清晰,更灵活,更强大。简单的基准测试也表明我的版本可能比他们的版本快5到10倍,这对于大数据集来说可能很重要。
library(Matrix)
pcse2 <- function(object, groupN, groupT, pairwise=TRUE){
## Extract basic model info
groupT <- tail(as.character((match.call()$groupT)), 1)
groupN <- tail(as.character((match.call()$groupN)), 1)
dat <- eval(parse(text=object$call$data))
## Sanity checks
if(!"lm" %in% class(object)){stop("Formula object must be of class 'lm'.")}
if(!groupT %in% colnames(dat)){stop(paste(groupT, 'was not found in data', object$call$data))}
if(!groupN %in% colnames(dat)){stop(paste(groupN, 'was not found in data', object$call$data))}
if(anyDuplicated(paste(dat[,groupN], dat[,groupT]))>0){stop(paste('There are duplicate groupN-groupT observations in', object$call$data))}
if(length(dat[is.na(dat[,groupT]),groupT])>0){stop('There are missing unit indices in the data.')}
if(length(dat[is.na(dat[,groupN]),groupN])>0){stop('There are missing time indices in the data.')}
## Expand model frame to include groupT, groupN, resid columns.
f <- as.formula(object$call$formula)
f.expanded <- update.formula(f, paste(". ~ .", groupN, groupT, sep=" + "))
dat.pcse <- model.frame(f.expanded, dat)
dat.pcse$e <- resid(object)
## Extract basic model info (part II)
N <- length(unique(dat.pcse[,groupN]))
T <- length(unique(dat.pcse[,groupT]))
nobs <- nrow(dat.pcse)
is.balanced <- length(resid(object)) == N * T
## If balanced dataset, calculate as in Beck & Katz (1995)
if(is.balanced){
dat.pcse <- dat.pcse[order(dat.pcse[,groupN], dat.pcse[,groupT]),]
X <- model.matrix(f, dat.pcse)
E <- t(matrix(dat.pcse$e, N, T, byrow=TRUE))
Omega <- kronecker((crossprod(E) / T), Matrix(diag(1, T)) )
## If unbalanced and pairwise, calculate as in Franzese (1996)
}else if(pairwise==TRUE){
## Rectangularize
rectangle <- expand.grid(unique(dat.pcse[,groupN]), unique(dat.pcse[,groupT]))
names(rectangle) <- c(groupN, groupT)
rectangle <- merge(rectangle, dat.pcse, all.x=TRUE)
rectangle <- rectangle[order(rectangle[,groupN], rectangle[,groupT]),]
valid <- ifelse(is.na(rectangle$e),0,1)
rectangle[is.na(rectangle)] <- 0
X <- model.matrix(f, rectangle)
X[valid==0,1] <- 0
## Calculate pcse
E <- crossprod(t(matrix(rectangle$e, N, T, byrow=TRUE)))
V <- crossprod(t(matrix(valid, N, T, byrow=TRUE)))
if (length(V[V==0]) > 0){stop("Error! A CS-unit exists without any obs or without any obs in a common period with another CS-unit. You must remove that unit from the data passed to pcse().")}
Omega <- kronecker(E/V, Matrix(diag(1, T)))
## If unbalanced and casewise, caluate based on largest rectangular subset of data
}else{
## Rectangularize
rectangle <- expand.grid(unique(dat.pcse[,groupN]), unique(dat.pcse[,groupT]))
names(rectangle) <- c(groupN, groupT)
rectangle <- merge(rectangle, dat.pcse, all.x=TRUE)
rectangle <- rectangle[order(rectangle[,groupN], rectangle[,groupT]),]
valid <- ifelse(is.na(rectangle$e),0,1)
rectangle[is.na(rectangle)] <- 0
X <- model.matrix(f, rectangle)
X[valid==0,1] <- 0
## Keep only years for which we have the max number of observations
large.panels <- by(dat.pcse, dat.pcse[,groupT], nrow) # How many valid observations per year?
if(max(large.panels) < N){warning('There is no time period during which all units are observed. Consider using pairwise estimation.')}
T.balanced <- names(large.panels[large.panels==max(large.panels)]) # Which years have max(valid observations)?
T.casewise <- length(T.balanced)
dat.balanced <- dat.pcse[dat.pcse[,groupT] %in% T.balanced,] # Extract biggest rectangular subset
dat.balanced <- dat.balanced[order(dat.balanced[,groupN], dat.balanced[,groupT]),]
e <- dat.balanced$e
## Calculate pcse as in Beck & Katz (1995)
E <- t(matrix(dat.balanced$e, N, T.casewise, byrow=TRUE))
Omega <- kronecker((crossprod(E) / T.casewise), Matrix(diag(1, T)))
}
## Finish evaluation, clean and output
salami <- t(X) %*% Omega %*% X
bread <- solve(crossprod(X))
sandwich <- bread %*% salami %*% bread
colnames(sandwich) <- names(coef(object))
row.names(sandwich) <- names(coef(object))
pcse <- sqrt(diag(sandwich))
b <- coef(object)
tstats <- b/pcse
df <- nobs - ncol(X)
pval <- 2*pt(abs(tstats), df, lower.tail=FALSE)
res <- list(vcov=sandwich, pcse=pcse, b=b, tstats=tstats, df=df, pval=pval, pairwise=pairwise,
nobs=nobs, nmiss=(N*T)-nobs, call=match.call())
class(res) <- "pcse"
return(res)
}
答案 1 :(得分:2)
查看pcse package,它会考虑面板更正的标准错误。您当然必须查看STATA中的文档以确定所做的假设并与pcse进行交叉检查。