我正在尝试根据“sampleSelection”包(选择命令)的输出获取Heckman选择模型的聚类标准错误。
对于复制,我使用的是STATA文档中给出的示例(请参阅第7页和第9页的示例1和2 - http://www.stata.com/manuals13/rheckman.pdf)。
在R中,我从实施例1得到如下结果:
install.packages("readstata13")
library(readstata13)
install.packages("sampleSelection")
library(sampleSelection)
## Read STATA data
dat <- data.table(read.dta13("http://www.stata-press.com/data/r13/womenwk.dta"))
## Summary statistics
summary(dat[,.(age, education, married, children, wage, county)])
## Define indicator whether wage variable is defined
dat[, lfp := !is.na(wage)]
## STATA command Example 1: heckman wage educ age, select(married children educ age)
heckmanML <- selection(selection = lfp ~ married + children + education + age, outcome = wage ~ education + age, data = dat)
## Results Example 1
summary(heckmanML)
## STATA command Example 2: heckman wage educ age, select(married children educ age) vce(cluster county)
## <<stuck here>>
我是如何使用 vce(cluster)选项复制最后一个命令的?我尝试使用multiwayvcov包中的cluster.vcov,但遇到了以下错误:
cluster.vcov(heckmanML, eval(heckmanML$call$data)[,county])
Error in `[<-.data.frame`(`*tmp*`, i, "K", value = numeric(0)) : replacement has length zero
答案 0 :(得分:0)
我改编了 Mahmoud Arai 中的代码。我摆弄自由度以匹配 Stata 手册的输出,并替换存储在拟合模型中的方差-协方差矩阵,以便 summary
输出与 Stata 的输出匹配(请参阅下面的第二组输出) .
library(haven)
library(dplyr, warn.conflicts = FALSE)
library(sampleSelection)
#> Loading required package: maxLik
#> Loading required package: miscTools
#>
#> Please cite the 'maxLik' package as:
#> Henningsen, Arne and Toomet, Ott (2011). maxLik: A package for maximum likelihood estimation in R. Computational Statistics 26(3), 443-458. DOI 10.1007/s00180-010-0217-1.
#>
#> If you have questions, suggestions, or comments regarding the 'maxLik' package, please use a forum or 'tracker' at maxLik's R-Forge site:
#> https://r-forge.r-project.org/projects/maxlik/
library(sandwich)
## Read STATA data
dat <-
read_stata("http://www.stata-press.com/data/r13/womenwk.dta") %>%
## Define indicator whether wage variable is defined
mutate(lfp = !is.na(wage))
## STATA command Example 1: heckman wage educ age, select(married children educ age)
heckmanML <- selection(selection = lfp ~ married + children + education + age,
outcome = wage ~ education + age, data = dat)
## Results Example 1
summary(heckmanML)
#> --------------------------------------------
#> Tobit 2 model (sample selection model)
#> Maximum Likelihood estimation
#> Newton-Raphson maximisation, 3 iterations
#> Return code 8: successive function values within relative tolerance limit (reltol)
#> Log-Likelihood: -5178.304
#> 2000 observations (657 censored and 1343 observed)
#> 10 free parameters (df = 1990)
#> Probit selection equation:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -2.491015 0.189340 -13.156 < 2e-16 ***
#> married 0.445171 0.067395 6.605 5.07e-11 ***
#> children 0.438707 0.027783 15.791 < 2e-16 ***
#> education 0.055732 0.010735 5.192 2.30e-07 ***
#> age 0.036510 0.004153 8.790 < 2e-16 ***
#> Outcome equation:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.48579 1.07704 0.451 0.652
#> education 0.98995 0.05326 18.588 <2e-16 ***
#> age 0.21313 0.02060 10.345 <2e-16 ***
#> Error terms:
#> Estimate Std. Error t value Pr(>|t|)
#> sigma 6.00479 0.16572 36.23 <2e-16 ***
#> rho 0.70350 0.05123 13.73 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> --------------------------------------------
vcovCL <- function(fm, cluster) {
M <- length(unique(cluster))
N <- length(cluster)
dfc <- M/(M-1)
u <- apply(estfun(fm),2, function(x) tapply(x, cluster, sum))
dfc * sandwich(fm, meat=crossprod(u)/N)
}
heckmanML[["vcovAll"]] <- vcovCL(heckmanML, dat$county)
summary(heckmanML)
#> --------------------------------------------
#> Tobit 2 model (sample selection model)
#> Maximum Likelihood estimation
#> Newton-Raphson maximisation, 3 iterations
#> Return code 8: successive function values within relative tolerance limit (reltol)
#> Log-Likelihood: -5178.304
#> 2000 observations (657 censored and 1343 observed)
#> 10 free parameters (df = 1990)
#> Probit selection equation:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -2.491015 0.115330 -21.599 < 2e-16 ***
#> married 0.445171 0.073147 6.086 1.39e-09 ***
#> children 0.438707 0.031239 14.044 < 2e-16 ***
#> education 0.055732 0.011004 5.065 4.47e-07 ***
#> age 0.036510 0.004038 9.042 < 2e-16 ***
#> Outcome equation:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.48579 1.30210 0.373 0.709
#> education 0.98995 0.06001 16.498 <2e-16 ***
#> age 0.21313 0.02099 10.151 <2e-16 ***
#> Error terms:
#> Estimate Std. Error t value Pr(>|t|)
#> sigma 6.00479 0.15520 38.691 <2e-16 ***
#> rho 0.70350 0.07088 9.925 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> --------------------------------------------
由 reprex package (v2.0.0) 于 2021 年 5 月 8 日创建