我正在尝试在R中复制Stata输出。我正在使用数据集affairs。我无法通过强大的标准错误复制probit函数。
Stata代码如下:
probit affair male age yrsmarr kids relig educ ratemarr, r
我开始时:
probit1 <- glm(affair ~ male + age + yrsmarr + kids + relig + educ + ratemarr,
family = binomial (link = "probit"), data = mydata)
然后我尝试使用sandwich
包进行各种调整,例如:
myProbit <- function(probit1, vcov = sandwich(..., adjust = TRUE)) {
print(coeftest(probit1, vcov = sandwich(probit1, adjust = TRUE)))
}
或(所有类型HC0
至HC5
):
myProbit <- function(probit1, vcov = sandwich) {
print(coeftest(probit1, vcovHC(probit1, type = "HC0"))
}
或者这样,按照建议here(我是否必须为object
输入不同的内容?):
sandwich1 <- function(object, ...) sandwich(object) * nobs(object) / (nobs(object) - 1)
coeftest(probit1, vcov = sandwich1)
这些尝试都没有导致stata输出中出现相同的标准错误或z值。
希望有一些建设性的想法!
提前致谢!
答案 0 :(得分:3)
对于正在考虑跳上这辆旅行车的人来说,这里有一些代码可以证明这个问题(数据here):
clear
set more off
capture ssc install bcuse
capture ssc install rsource
bcuse affairs
saveold affairs, version(12) replace
rsource, terminator(XXX)
library("foreign")
library("lmtest")
library("sandwich")
mydata<-read.dta("affairs.dta")
probit1<-glm(affair ~ male + age + yrsmarr + kids + relig + educ + ratemarr, family = binomial (link = "probit"), data = mydata)
sandwich1 <- function(object,...) sandwich(object) * nobs(object)/(nobs(object) - 1)
coeftest(probit1,vcov = sandwich1)
XXX
probit affair male age yrsmarr kids relig educ ratemarr, robust cformat(%9.6f) nolog
R给出:
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.764157 0.546692 1.3978 0.1621780
male 0.188816 0.133260 1.4169 0.1565119
age -0.024400 0.011423 -2.1361 0.0326725 *
yrsmarr 0.054608 0.019025 2.8703 0.0041014 **
kids 0.208072 0.168222 1.2369 0.2161261
relig -0.186085 0.053968 -3.4480 0.0005647 ***
educ 0.015506 0.026389 0.5876 0.5568012
ratemarr -0.272711 0.053668 -5.0814 3.746e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Stata产量:
Probit regression Number of obs = 601
Wald chi2(7) = 54.93
Prob > chi2 = 0.0000
Log pseudolikelihood = -305.2525 Pseudo R2 = 0.0961
------------------------------------------------------------------------------
| Robust
affair | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | 0.188817 0.131927 1.43 0.152 -0.069755 0.447390
age | -0.024400 0.011124 -2.19 0.028 -0.046202 -0.002597
yrsmarr | 0.054608 0.018963 2.88 0.004 0.017441 0.091775
kids | 0.208075 0.166243 1.25 0.211 -0.117754 0.533905
relig | -0.186085 0.053240 -3.50 0.000 -0.290435 -0.081736
educ | 0.015505 0.026355 0.59 0.556 -0.036150 0.067161
ratemarr | -0.272710 0.053392 -5.11 0.000 -0.377356 -0.168064
_cons | 0.764160 0.534335 1.43 0.153 -0.283117 1.811437
------------------------------------------------------------------------------
<强>附录:强>
系数的协方差估计的差异是由于
不同的拟合算法。在R中,glm
命令使用迭代最小二乘法,而Stata的probit
使用基于Newton-Raphson算法的ML方法。您可以使用glm
选项与<{1}}选项中的R irls
匹配R:
glm affair male age yrsmarr kids relig educ ratemarr, irls family(binomial) link(probit) robust
这会产生:
Generalized linear models No. of obs = 601
Optimization : MQL Fisher scoring Residual df = 593
(IRLS EIM) Scale parameter = 1
Deviance = 610.5049916 (1/df) Deviance = 1.029519
Pearson = 619.0405832 (1/df) Pearson = 1.043913
Variance function: V(u) = u*(1-u) [Bernoulli]
Link function : g(u) = invnorm(u) [Probit]
BIC = -3183.862
------------------------------------------------------------------------------
| Semirobust
affair | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | 0.188817 0.133260 1.42 0.157 -0.072367 0.450002
age | -0.024400 0.011422 -2.14 0.033 -0.046787 -0.002012
yrsmarr | 0.054608 0.019025 2.87 0.004 0.017319 0.091897
kids | 0.208075 0.168222 1.24 0.216 -0.121634 0.537785
relig | -0.186085 0.053968 -3.45 0.001 -0.291862 -0.080309
educ | 0.015505 0.026389 0.59 0.557 -0.036216 0.067226
ratemarr | -0.272710 0.053668 -5.08 0.000 -0.377898 -0.167522
_cons | 0.764160 0.546693 1.40 0.162 -0.307338 1.835657
------------------------------------------------------------------------------
这些将会很接近,但不完全相同。我不知道如何让R在没有大量工作的情况下使用像NR这样的东西。
答案 1 :(得分:2)
我正在使用详细描述的here(p.57)中的矩阵方法来将R结果与Stata相匹配。但是,我还不能完全匹配结果。我认为差异可能是因为分数不同。 R
中的得分与Stata
匹配,最多只有4位小数。
<强>的Stata 强>
clear all
bcuse affairs
probit affair male age yrsmarr kids relig educ ratemarr
mat var_nr=e(V)
predict double u, score
matrix accum s = male age yrsmarr kids relig educ ratemarr [iweight=u^2*601/600] //n=601,n-1=600
matrix rv = var_nr*s*var_nr
mat diagrv=vecdiag(rv)
matmap diagrv rse,m(sqrt(@)) //install matmap
mat list rse //standard errors
这会给您带来与以下相同的标准错误:
qui probit affair male age yrsmarr kids relig educ ratemarr,r
rse[1,8]
affair: affair: affair: affair: affair: affair: affair: affair:
male age yrsmarr kids relig educ ratemarr _cons
r1 .13192707 .01112372 .01896336 .16624258 .05324046 .02635524 .05339163 .53433495
R:
library(AER) # Affairs data
data(Affairs)
mydata<-Affairs
mydata$affairs<-with(mydata,ifelse(affairs>0,1,affairs)) # convert to 1 and 0
probit1<-glm(affairs ~ gender+ age + yearsmarried + children + religiousness+education + rating,family = binomial(link = "probit"),data = mydata)
u<-subset(estfun(probit1),select="(Intercept)") #scores: perfectly matches to 4 decimals with Stata: difference may be due to this step
w0<-u%*%t(u)*(601/600) #(n/n-1)
iweight<-matrix(0,nrow=601,ncol=601) #perfectly matches to 4 decimals with Stata
diag(iweight)<-diag(w0)
x<-model.matrix(probit1)
s<-t(x)%*%iweight%*%x #doesn't match with Stata :
rv<-vcov(probit1)%*%s%*%vcov(probit1)
rse<-sqrt(diag(rv)) # standard errors
rse
(Intercept) gendermale age yearsmarried childrenyes religiousness education rating
0.54669177 0.13325951 0.01142258 0.01902537 0.16822161 0.05396841 0.02638902 0.05366828
这符合:
sandwich1 <- function(object, ...) sandwich(object) * nobs(object) / (nobs(object) - 1)
coeftest(probit1, vcov = sandwich1)
结论:R和Stata之间的结果差异是由于得分的差异(仅匹配最多4位小数)。
答案 2 :(得分:2)
在本次讨论中,您可以使用std::thread
进行估算,并使用sampleSelection::probit
包(我使用2.5版)来计算鲁棒的标准误差,从而匹配R中的原始Stata输出。 。 sandwich
函数及其Stata对应函数使用最大似然。
与原始帖子一样,Stata代码为
probit
给出
probit affair male age yrsmarr kids relig educ ratemarr, robust
给出相同结果的R代码是
------------------------------------------------------------------------------
| Robust
affair | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | .1888175 .1319271 1.43 0.152 -.0697548 .4473898
age | -.0243996 .0111237 -2.19 0.028 -.0462017 -.0025975
yrsmarr | .054608 .0189634 2.88 0.004 .0174405 .0917755
kids | .2080754 .1662426 1.25 0.211 -.117754 .5339049
relig | -.1860854 .0532405 -3.50 0.000 -.2904348 -.081736
educ | .0155052 .0263552 0.59 0.556 -.0361501 .0671605
ratemarr | -.2727101 .0533916 -5.11 0.000 -.3773558 -.1680644
_cons | .76416 .534335 1.43 0.153 -.2831173 1.811437
------------------------------------------------------------------------------
这给
library(AER)
library(sampleSelection)
data(Affairs)
Affairs$affair = Affairs$affairs > 0
Affairs$male = Affairs$gender == 'male'
reg = probit(affair ~ male + age + yearsmarried + children + religiousness +
education + rating, data=Affairs)
print(coeftest(reg, vcovCL), digits=6)
使用这些函数,都可以计算最大似然概率估计,并且都可以计算可靠的标准误差。顺便说一句:向 Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.7641600 0.5343350 1.43011 0.1532109
maleTRUE 0.1888175 0.1319271 1.43123 0.1528921
age -0.0243996 0.0111237 -2.19347 0.0286608 *
yearsmarried 0.0546080 0.0189634 2.87966 0.0041248 **
childrenyes 0.2080755 0.1662426 1.25164 0.2111955
religiousness -0.1860854 0.0532405 -3.49519 0.0005091 ***
education 0.0155052 0.0263552 0.58832 0.5565446
rating -0.2727101 0.0533916 -5.10773 4.4012e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
程序包的作者致敬,该程序包(IMO)确实清除了R中的标准错误计算。