Question

我无法使用基于此post的R和指南来聚类标准错误。 cl函数返回错误：

Error in tapply(x, cluster1, sum) : arguments must have same length

在阅读tapply之后，我仍然不确定为什么我的群集参数长度错误，以及导致此错误的原因。

这是我正在使用的数据集的链接。

https://www.dropbox.com/s/y2od7um9pp4vn0s/Ec%201820%20-%20DD%20Data%20with%20Controls.csv

这是R代码：

# read in data
charter<-read.csv(file.choose())
View(charter)
colnames(charter)

# standardize NAEP scores
charter$naep.standardized <- (charter$naep - mean(charter$naep, na.rm=T))/sd(charter$naep, na.rm=T)

# change NAs in year.passed column to 2014
charter$year.passed[is.na(charter$year.passed)]<-2014

# Add column with indicator for in treatment (passed legislation)
charter$treatment<-ifelse(charter$year.passed<=charter$year,1,0)

# fit model
charter.model<-lm(naep ~ factor(year) + factor(state) + treatment, data = charter)
summary(charter.model)
# account for clustered standard errors by state
cl(dat=charter, fm=charter.model, cluster=charter$state)

# accounting for controls
charter.model.controls<-lm(naep~factor)

# clustered standard errors
# ---------

# function that calculates clustered standard errors
# source: http://thetarzan.wordpress.com/2011/06/11/clustered-standard-errors-in-r/
cl   <- function(dat, fm, cluster){
  require(sandwich, quietly = TRUE)
  require(lmtest, quietly = TRUE)
  M <- length(unique(cluster))
  N <- length(cluster)
  K <- fm$rank
  dfc <- (M/(M-1))*((N-1)/(N-K))
  print(K)
  uj  <- apply(estfun(fm),2, function(x) tapply(x, cluster, sum));
  vcovCL <- dfc*sandwich(fm, meat=crossprod(uj)/N)
  coeftest(fm, vcovCL) 
}

# calculate clustered standard errors 
cl(charter, charter.model, charter$state)

这个功能的内部运作有点过头了。

Answer 1

执行代码时，请注意线性模型中缺少观察值：

> summary(charter.model)

Call:
lm(formula = naep ~ factor(year) + factor(state) + treatment, 
    data = charter)

Residuals:
     Min       1Q   Median       3Q      Max 
-15.2420  -1.6740  -0.2024   1.8345  12.3580 

Coefficients:
                            Estimate Std. Error t value Pr(>|t|)    
(Intercept)                 250.4983     1.2115 206.767  < 2e-16 ***
factor(year)1992              3.7970     0.7198   5.275 2.17e-07 ***
factor(year)1996              7.0436     0.8607   8.183 3.64e-15 ***

[..]

Residual standard error: 3.128 on 404 degrees of freedom
  (759 observations deleted due to missingness)
Multiple R-squared:  0.9337,    Adjusted R-squared:  0.9239 
F-statistic: 94.85 on 60 and 404 DF,  p-value: < 2.2e-16

这是导致您看到的Error in tapply(x, cluster1, sum) : arguments must have same length错误消息的原因。

在cl(dat=charter, fm=charter.model, cluster=charter$state)中，群集变量charter$state应该具有与回归估计中有效使用的观察数量完全相同的长度（由于NA与行中的行数不同）原始数据框）。

要解决此问题，您可以执行以下操作。

首先，您使用的是Arai函数的旧版本（cl）（请参阅Fama-MacBeth and Cluster-Robust (by Firm and Time) Standard Errors in R以获取对旧版本或新版本的引用，后者称为{{1 }}）。
其次我认为Arai对此功能的原始方法有点复杂，并不真正遵循clx vcov*函数的标准接口。这就是为什么我带来了sandwich的略微修改版本。我使代码更具可读性，界面更像您对clx sandwich函数的期望：
```
vcov*
```

如果您对数据尝试此功能，您会发现它捕获了这个特定问题：

vcovCL <- function(x, cluster.by, type="sss", dfcw=1){
    # R-codes (www.r-project.org) for computing
    # clustered-standard errors. Mahmood Arai, Jan 26, 2008.

    # The arguments of the function are:
    # fitted model, cluster1 and cluster2
    # You need to install libraries `sandwich' and `lmtest'

    # reweighting the var-cov matrix for the within model
    require(sandwich)
    cluster <- cluster.by
    M <- length(unique(cluster))   
    N <- length(cluster)
    stopifnot(N == length(x$residuals))
    K <- x$rank
    ##only Stata small-sample correction supported right now 
    ##see plm >= 1.5-4
    stopifnot(type=="sss")  
    if(type=="sss"){
        dfc <- (M/(M-1))*((N-1)/(N-K))
    }
    uj  <- apply(estfun(x), 2, function(y) tapply(y, cluster, sum))
    mycov <- dfc * sandwich(x, meat=crossprod(uj)/N) * dfcw
    return(mycov)
}

要避免此问题，您可以按以下步骤操作：

> coeftest(charter.model, vcov=function(x) vcovCL(x, charter$state))
 Error: N == length(x$residuals) is not TRUE

这不好，但完成 > coeftest(charter.model, t test of coefficients: (Intercept) factor(year)1992 factor(year)1996 factor(year)2000 factor(year)2003 factor(year)2005 factor(year)2007 factor(year)2009 factor(year)2011 factor(year)2013 factor(state)Alaska factor(state)Arizona factor(state)Arkansas factor(state)California factor(state)Colorado factor(state)Connecticut factor(state)D.C. factor(state)Delaware factor(state)Florida factor(state)Georgia factor(state)Hawaii factor(state)Idaho factor(state)Illinois factor(state)Indianna factor(state)Iowa factor(state)Kansas factor(state)Kentucky factor(state)Louisiana factor(state)Maine factor(state)Maryland factor(state)Massachusetts factor(state)Michigan factor(state)Minnesota factor(state)Mississippi factor(state)Missouri factor(state)Montana factor(state)Nebraska factor(state)Nevada factor(state)New Hampshire factor(state)New Jersey factor(state)New Mexico factor(state)New York factor(state)North Carolina factor(state)North Dakota factor(state)Ohio factor(state)Oklahoma factor(state)Oregon factor(state)Pennsylvania factor(state)Rhode Island factor(state)South Carolina factor(state)South Dakota factor(state)Tennessee factor(state)Texas factor(state)Utah factor(state)Vermont factor(state)Virginia factor(state)Washington factor(state)West Virginia factor(state)Wisconsin factor(state)Wyoming treatment --- Signif. codes: 也可以正常工作并产生与上面相同的结果：工作。现在> charter.x <- na.omit(charter[ , c("state", all.vars(formula(charter.model)))]) vcov=function(x) vcovCL(x, charter.x$state)) Estimate Std. Error t value Pr(>|t|) 2.5050e+02 9.3781e-01 2.6711e+02 < 2.2e-16 *** 3.7970e+00 5.6019e-01 6.7780e+00 4.330e-11 *** 7.0436e+00 8.8574e-01 7.9522e+00 1.856e-14 *** 8.4313e+00 1.0906e+00 7.7311e+00 8.560e-14 *** 1.2392e+01 1.1670e+00 1.0619e+01 < 2.2e-16 *** 1.3490e+01 1.1747e+00 1.1484e+01 < 2.2e-16 *** 1.6334e+01 1.2469e+00 1.3100e+01 < 2.2e-16 *** 1.8118e+01 1.2556e+00 1.4430e+01 < 2.2e-16 *** 1.9110e+01 1.3459e+00 1.4199e+01 < 2.2e-16 *** 1.9301e+01 1.4896e+00 1.2957e+01 < 2.2e-16 *** 1.4178e+01 8.7686e-01 1.6169e+01 < 2.2e-16 *** 8.6313e+00 8.1439e-01 1.0598e+01 < 2.2e-16 *** 4.3313e+00 8.1439e-01 5.3185e+00 1.736e-07 *** 3.1103e+00 9.1619e-01 3.3948e+00 0.0007549 *** 1.7939e+01 7.9736e-01 2.2498e+01 < 2.2e-16 *** 1.8031e+01 8.1439e-01 2.2141e+01 < 2.2e-16 *** -1.8369e+01 8.1439e-01 -2.2555e+01 < 2.2e-16 *** 1.2050e+01 7.9736e-01 1.5113e+01 < 2.2e-16 *** 7.3838e+00 7.9736e-01 9.2602e+00 < 2.2e-16 *** 6.4313e+00 8.1439e-01 7.8971e+00 2.724e-14 *** 3.3313e+00 8.1439e-01 4.0906e+00 5.196e-05 *** 1.7118e+01 7.8321e-01 2.1857e+01 < 2.2e-16 *** 1.2670e+01 8.2224e-01 1.5409e+01 < 2.2e-16 *** 1.7174e+01 6.1079e-01 2.8117e+01 < 2.2e-16 *** 2.0074e+01 6.8460e-01 2.9322e+01 < 2.2e-16 *** 2.0123e+01 8.6796e-01 2.3184e+01 < 2.2e-16 *** 1.0200e+01 4.1999e-14 2.4287e+14 < 2.2e-16 *** -1.6866e-01 8.1439e-01 -2.0710e-01 0.8360322 2.0231e+01 1.7564e-01 1.1518e+02 < 2.2e-16 *** 1.4274e+01 6.1079e-01 2.3369e+01 < 2.2e-16 *** 2.4868e+01 8.3960e-01 2.9619e+01 < 2.2e-16 *** 1.2031e+01 8.1439e-01 1.4773e+01 < 2.2e-16 *** 2.5110e+01 9.1619e-01 2.7407e+01 < 2.2e-16 *** -3.5470e+00 1.7564e-01 -2.0195e+01 < 2.2e-16 *** 1.3447e+01 7.2706e-01 1.8495e+01 < 2.2e-16 *** 2.2512e+01 8.4814e-01 2.6543e+01 < 2.2e-16 *** 1.9600e+01 4.3105e-14 4.5471e+14 < 2.2e-16 *** 4.9800e+00 8.6796e-01 5.7375e+00 1.887e-08 *** 2.2026e+01 7.6338e-01 2.8853e+01 < 2.2e-16 *** 2.0651e+01 7.6338e-01 2.7052e+01 < 2.2e-16 *** 1.5313e+00 8.1439e-01 1.8803e+00 0.0607809 . 1.2152e+01 7.1259e-01 1.7054e+01 < 2.2e-16 *** 1.2231e+01 8.1439e-01 1.5019e+01 < 2.2e-16 *** 2.4278e+01 1.0420e-01 2.3299e+02 < 2.2e-16 *** 1.7118e+01 7.8321e-01 2.1857e+01 < 2.2e-16 *** 8.4518e+00 7.8321e-01 1.0791e+01 < 2.2e-16 *** 1.6535e+01 7.3538e-01 2.2486e+01 < 2.2e-16 *** 1.6651e+01 7.6338e-01 2.1812e+01 < 2.2e-16 *** 9.5313e+00 8.1439e-01 1.1704e+01 < 2.2e-16 *** 9.5346e+00 8.3960e-01 1.1356e+01 < 2.2e-16 *** 2.1211e+01 3.5103e-01 6.0425e+01 < 2.2e-16 *** 4.9148e+00 6.1473e-01 7.9951e+00 1.375e-14 *** 1.4231e+01 8.1439e-01 1.7475e+01 < 2.2e-16 *** 1.5114e+01 7.2706e-01 2.0787e+01 < 2.2e-16 *** 2.3474e+01 2.0299e-01 1.1564e+02 < 2.2e-16 *** 1.6252e+01 7.1259e-01 2.2807e+01 < 2.2e-16 *** 1.9073e+01 1.8183e-01 1.0489e+02 < 2.2e-16 *** 5.0000e+00 4.2022e-14 1.1899e+14 < 2.2e-16 *** 1.9994e+01 8.2447e-01 2.4251e+01 < 2.2e-16 *** 1.8231e+01 8.1439e-01 2.2386e+01 < 2.2e-16 *** 1.2108e+00 1.0180e+00 1.1894e+00 0.2349682 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



cl



更好的方法是使用 cl(dat=charter, fm=charter.model, cluster=charter.x$state)
 包。根据包的website，它是对Arai代码的改进：


  由于缺失而导致观察的透明处理下降


将Petersen数据与模拟的NA和multiwayvcov：一起使用

cluster.vcov()



对于使用library("lmtest")
library("multiwayvcov")

data(petersen)
set.seed(123)
petersen[ sample(1:5000, 15), 3] <- NA

m1 <- lm(y ~ x, data = petersen)
summary(m1)
## 
## Call:
## lm(formula = y ~ x, data = petersen)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.759 -1.371 -0.018  1.340  8.680 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.02793    0.02842   0.983    0.326    
## x            1.03635    0.02865  36.175   <2e-16 ***
## ---
## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## 
## Residual standard error: 2.007 on 4983 degrees of freedom
##   (15 observations deleted due to missingness)
## Multiple R-squared:  0.208,  Adjusted R-squared:  0.2078 
## F-statistic:  1309 on 1 and 4983 DF,  p-value: < 2.2e-16

coeftest(m1, vcov=function(x) cluster.vcov(x, petersen$firmid))
## 
## t test of coefficients:
## 
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.027932   0.067198  0.4157   0.6777    
## x           1.036354   0.050700 20.4407   <2e-16 ***
## ---
## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
包的其他方法，请参阅：


 Double clustered standard errors for panel data

Answer 2

对于单向群集，来自robcov包的{rms}命令工作得非常好。阅读本文以获取更多信息 http://www.inside-r.org/packages/cran/rms/docs/robcov

包含NA的数据的群集标准错误

2 个答案: