Question

我正在尝试根据“sampleSelection”包（选择命令）的输出获取Heckman选择模型的聚类标准错误。

对于复制，我使用的是STATA文档中给出的示例（请参阅第7页和第9页的示例1和2 - http://www.stata.com/manuals13/rheckman.pdf）。

在R中，我从实施例1得到如下结果：

install.packages("readstata13")
library(readstata13)

install.packages("sampleSelection")
library(sampleSelection)

## Read STATA data
dat <- data.table(read.dta13("http://www.stata-press.com/data/r13/womenwk.dta"))

## Summary statistics
summary(dat[,.(age, education, married, children, wage, county)])

## Define indicator whether wage variable is defined
dat[, lfp := !is.na(wage)]

## STATA command Example 1: heckman wage educ age, select(married children educ age)
heckmanML <- selection(selection = lfp ~ married + children + education + age, outcome = wage ~ education + age, data = dat)

## Results Example 1
summary(heckmanML)

## STATA command Example 2: heckman wage educ age, select(married children educ age) vce(cluster county)
## <<stuck here>>

我是如何使用 vce（cluster）选项复制最后一个命令的？我尝试使用multiwayvcov包中的cluster.vcov，但遇到了以下错误：

cluster.vcov(heckmanML, eval(heckmanML$call$data)[,county])
Error in `[<-.data.frame`(`*tmp*`, i, "K", value = numeric(0)) : replacement has length zero

Answer 1

我改编了 Mahmoud Arai 中的代码。我摆弄自由度以匹配 Stata 手册的输出，并替换存储在拟合模型中的方差-协方差矩阵，以便 summary 输出与 Stata 的输出匹配（请参阅下面的第二组输出） .

library(haven)
library(dplyr, warn.conflicts = FALSE)
library(sampleSelection)
#> Loading required package: maxLik
#> Loading required package: miscTools
#> 
#> Please cite the 'maxLik' package as:
#> Henningsen, Arne and Toomet, Ott (2011). maxLik: A package for maximum likelihood estimation in R. Computational Statistics 26(3), 443-458. DOI 10.1007/s00180-010-0217-1.
#> 
#> If you have questions, suggestions, or comments regarding the 'maxLik' package, please use a forum or 'tracker' at maxLik's R-Forge site:
#> https://r-forge.r-project.org/projects/maxlik/
library(sandwich)

## Read STATA data
dat <- 
  read_stata("http://www.stata-press.com/data/r13/womenwk.dta") %>%
  ## Define indicator whether wage variable is defined
  mutate(lfp = !is.na(wage))

## STATA command Example 1: heckman wage educ age, select(married children educ age)
heckmanML <- selection(selection = lfp ~ married + children + education + age, 
                       outcome = wage ~ education + age, data = dat)

## Results Example 1
summary(heckmanML)
#> --------------------------------------------
#> Tobit 2 model (sample selection model)
#> Maximum Likelihood estimation
#> Newton-Raphson maximisation, 3 iterations
#> Return code 8: successive function values within relative tolerance limit (reltol)
#> Log-Likelihood: -5178.304 
#> 2000 observations (657 censored and 1343 observed)
#> 10 free parameters (df = 1990)
#> Probit selection equation:
#>              Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) -2.491015   0.189340 -13.156  < 2e-16 ***
#> married      0.445171   0.067395   6.605 5.07e-11 ***
#> children     0.438707   0.027783  15.791  < 2e-16 ***
#> education    0.055732   0.010735   5.192 2.30e-07 ***
#> age          0.036510   0.004153   8.790  < 2e-16 ***
#> Outcome equation:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  0.48579    1.07704   0.451    0.652    
#> education    0.98995    0.05326  18.588   <2e-16 ***
#> age          0.21313    0.02060  10.345   <2e-16 ***
#>    Error terms:
#>       Estimate Std. Error t value Pr(>|t|)    
#> sigma  6.00479    0.16572   36.23   <2e-16 ***
#> rho    0.70350    0.05123   13.73   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> --------------------------------------------

vcovCL <- function(fm, cluster) {
  
  M <- length(unique(cluster))
  N <- length(cluster)
  dfc <- M/(M-1)
  
  u  <- apply(estfun(fm),2, function(x) tapply(x, cluster, sum))
  dfc * sandwich(fm, meat=crossprod(u)/N)
}

heckmanML[["vcovAll"]] <- vcovCL(heckmanML, dat$county)
summary(heckmanML)
#> --------------------------------------------
#> Tobit 2 model (sample selection model)
#> Maximum Likelihood estimation
#> Newton-Raphson maximisation, 3 iterations
#> Return code 8: successive function values within relative tolerance limit (reltol)
#> Log-Likelihood: -5178.304 
#> 2000 observations (657 censored and 1343 observed)
#> 10 free parameters (df = 1990)
#> Probit selection equation:
#>              Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) -2.491015   0.115330 -21.599  < 2e-16 ***
#> married      0.445171   0.073147   6.086 1.39e-09 ***
#> children     0.438707   0.031239  14.044  < 2e-16 ***
#> education    0.055732   0.011004   5.065 4.47e-07 ***
#> age          0.036510   0.004038   9.042  < 2e-16 ***
#> Outcome equation:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  0.48579    1.30210   0.373    0.709    
#> education    0.98995    0.06001  16.498   <2e-16 ***
#> age          0.21313    0.02099  10.151   <2e-16 ***
#>    Error terms:
#>       Estimate Std. Error t value Pr(>|t|)    
#> sigma  6.00479    0.15520  38.691   <2e-16 ***
#> rho    0.70350    0.07088   9.925   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> --------------------------------------------

^{由 reprex package (v2.0.0) 于 2021 年 5 月 8 日创建}

R中Heckman选择模型的聚类标准误差

1 个答案: