我正在尝试在 R 中复制一些Stata结果,并且难以解释其他结果。作为可重现的示例,我使用了调查包中的API数据。
data(api, package = 'survey')
df <- apistrat[, c('api00', 'ell', 'meals', 'mobility', 'cname', 'pw')]
df$cname <- ifelse(df$cname %in% c('Fresno', 'Santa Clara', 'San Bernadino'),
'Group1', 'Group2')
在 R 中:
y <- felm(api00 ~ ell + meals + mobility|cname|0|cname, data=df, weights=df$pw)
summary(y)
Call:
felm(formula = api00 ~ ell + meals + mobility|cname|0|cname, data=df, weights=df$pw)
Weighted Residuals:
Min 1Q Median 3Q Max
-1076.96 -317.47 -87.58 217.20 1164.29
Coefficients:
Estimate Cluster s.e. t value Pr(>|t|)
ell -0.5139 0.6136 -0.837 0.4033
meals -3.1483 0.3341 -9.424 <2e-16 ***
mobility 0.2347 0.1071 2.192 0.0296 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 404 on 195 degrees of freedom
Multiple R-squared(full model): 0.6601 Adjusted R-squared: 0.6531
Multiple R-squared(proj model): 0.6496 Adjusted R-squared: 0.6425
F-statistic(full model, *iid*):94.68 on 4 and 195 DF, p-value: < 2.2e-16
F-statistic(proj model): 0.2338 on 3 and 1 DF, p-value: 0.8695
在Stata中:
areg api00 ell meals mobility [pw=pw], absorb(cname) vce(cl cname)
Linear regression, absorbing indicators Number of obs = 200
Absorbed variable: cname No. of categories = 2
F( 1, 1) = .
Prob > F = .
R-squared = 0.6601
Adj R-squared = 0.6531
Root MSE = 72.2192
(Std. Err. adjusted for 2 clusters in cname)
------------------------------------------------------------------------------
| Robust
api00 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ell | -.513901 .6136284 -0.84 0.556 -8.310789 7.282987
meals | -3.148314 .3340804 -9.42 0.067 -7.393208 1.09658
mobility | .2346743 .1070769 2.19 0.273 -1.125866 1.595215
_cons | 821.8216 .237367 3462.24 0.000 818.8056 824.8376
------------------------------------------------------------------------------
R 和Stata返回相同的估算值,聚类SE和t值,但p值不同。具体来说,根据 R ,进餐和流动性是重要的预测因素,但根据Stata而言则不是。
谁能建议我造成差异的原因?