Question

最初，我主要想在R中运行带有集群标准错误的probit / logit模型，这在Stata中非常直观。我在这里找到了答案Logistic regression with robust clustered standard errors in R。因此，我尝试将Stata和R两者的结果与强大的标准误差和聚类标准误差进行比较。但我注意到软件中两个标准错误的输出并不完全相同。但是，如果我使用此处建议的方法https://diffuseprior.wordpress.com/2012/06/15/standard-robust-and-clustered-standard-errors-computed-in-r/。我可以从R和Stata获得线性回归的确切输出。因此，我担心我在R中编写的代码是不正确的，如果我想运行probit模型而不是logit模型，那么使用什么命令。或者如果有任何优雅的替代方案来解决这个问题？感谢。

R代码

## 1. linear regression
library(rms) 
# model<-lm(Sepal.Length~Sepal.Width+Petal.Length+Petal.Width,iris)
summary(model)
fit=ols(Sepal.Length~Sepal.Width+Petal.Length+Petal.Width, x=T, y=T, data=iris)
fit
robcov(fit) #robust standard error
robcov(fit, cluster=iris$Species) #clustered standard error


## 2. logistic regression
##demo data generation   
set.seed(1234)
subj<-rep(1:20,each=4)
con1<-rep(c(1,0),40)
con2<-rep(c(1,1,0,0),20) 
effect<-rbinom(80,1,0.34)
data<-data.frame(subj,con1,con2,effect)
library(foreign);write.dta(data,'demo_data.dta')

library(rms)
fit=lrm(effect ~ con1 + con2, x=T, y=T, data=data)
fit
robcov(fit)  ##robust standard error
robcov(fit, cluster=data$subj) ## clustered standard error

Stata代码

## 1. linear regression
webuse iris
reg seplen sepwid petlen petwid
reg seplen sepwid petlen petwid,r
reg seplen sepwid petlen petwid,cluster(iris)


## 2. logistic regression

use demo_data,clear
logit effect con1 con2
logit effect con1 con2,r
logit effect con1 con2,cluster(subj)

Answer 1

我更喜欢sandwich包来计算可靠的标准错误。一个原因是它出色的文档。请参阅vignette("sandwich")，其中清楚地显示了所有可用的默认值和选项，以及corresponding article，其中说明了如何将?sandwich与自定义bread和meat用于特殊情况。

我们可以使用sandwich来确定您发布的选项之间的差异。差异很可能是自由修正程度。这里是简单线性回归的比较：

library(rms)
library(sandwich)

fitlm <-lm(Sepal.Length~Sepal.Width+Petal.Length+Petal.Width,iris)

#Your Blog Post:
X <- model.matrix(fitlm)
n <- dim(X)[1]; k <- dim(X)[2]; dfc <- n/(n-k)    
u <- matrix(resid(fitlm))
meat1 <- t(X) %*% diag(diag(crossprod(t(u)))) %*% X
Blog <- sqrt(dfc*diag(solve(crossprod(X)) %*% meat1 %*% solve(crossprod(X))))

# rms fits:
fitols <- ols(Sepal.Length~Sepal.Width+Petal.Length+Petal.Width, x=T, y=T, data=iris)
Harrell <- sqrt(diag(robcov(fitols, method="huber")$var))
Harrell_2 <- sqrt(diag(robcov(fitols, method="efron")$var))

# Variations available in sandwich:    
variations <- c("const", "HC0", "HC1", "HC2","HC3", "HC4", "HC4m", "HC5")
Zeileis <- t(sapply(variations, function(x) sqrt(diag(vcovHC(fitlm, type = x)))))
rbind(Zeileis, Harrell, Harrell_2, Blog)

          (Intercept) Sepal.Width Petal.Length Petal.Width
const       0.2507771  0.06664739   0.05671929   0.1275479
HC0         0.2228915  0.05965267   0.06134461   0.1421440
HC1         0.2259241  0.06046431   0.06217926   0.1440781
HC2         0.2275785  0.06087143   0.06277905   0.1454783
HC3         0.2324199  0.06212735   0.06426019   0.1489170
HC4         0.2323253  0.06196108   0.06430852   0.1488708
HC4m        0.2339698  0.06253635   0.06482791   0.1502751
HC5         0.2274557  0.06077326   0.06279005   0.1454329
Harrell     0.2228915  0.05965267   0.06134461   0.1421440
Harrell_2   0.2324199  0.06212735   0.06426019   0.1489170
Blog        0.2259241  0.06046431   0.06217926   0.1440781

博客条目的结果相当于HC1。如果博客条目与您的Stata输出相似，则Stata会使用HC1。
Frank Harrel的函数产生与HC0类似的结果。据我了解，这是第一个提出的解决方案，当您查看vignette(sandwich)或?sandwich::vcovHC中提到的文章时，其他方法的属性稍好一些。他们的自由度调整程度不同。另请注意，对robcov(., method = "efron")的调用与HC3类似。

在任何情况下，如果您想要相同的输出，请使用HC1或仅适当调整方差 - 协方差矩阵。毕竟，在查看vignette(sandwich)之后不同版本之间的差异时，您会发现只需要使用常量进行重新缩放以从HC1转到HC0，这应该不会太难。顺便说一下，请注意HC3或HC4通常是首选，因为样本属性较小，并且在有影响的观察时存在行为。因此，您可能希望更改Stata中的默认值。

您可以将这些方差 - 协方差矩阵提供给适当的函数，例如lmtest::coeftest或car::linearHypothesis。例如：

library(lmtest)
coeftest(fitlm, vcov=vcovHC(fitlm, "HC1"))

t test of coefficients:

              Estimate Std. Error t value  Pr(>|t|)    
(Intercept)   1.855997   0.225924  8.2151 1.038e-13 ***
Sepal.Width   0.650837   0.060464 10.7640 < 2.2e-16 ***
Petal.Length  0.709132   0.062179 11.4046 < 2.2e-16 ***
Petal.Width  -0.556483   0.144078 -3.8624 0.0001683 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

对于群集强大的标准错误，您必须调整三明治的内容（请参阅?sandwich）或寻找function这样做。 several sources explaining excruciating detail how {已to do it with { {3}} appropriate codes or functions。我没有理由在这里重新发明轮子，所以我跳过这个。

对于线性模型和广义线性模型，还有一种相对较新且方便的包计算集群 - 稳健的标准误差。见here。

用于概率和logit回归的R中的鲁棒和聚类标准误差

1 个答案: