我使用SAT分数拟合逻辑回归来预测二元结果 - 双变量相关系数为0.17。 Stata和R(aod包)都给出了logit系数0.004,但statsmodel(python)给出-0.0013(我已经尝试了MLE和IRLS)。没有丢失的数据,并且所有三个平台的观察数量完全相同 - 在每种情况下都使用相同的.csv文件。
R:
Call:
glm(formula = df$outcome ~ df$sat, family = "binomial", data = df)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.7527 -0.5911 -0.4778 -0.3406 3.0509
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -7.758e+00 6.274e-02 -123.7 <2e-16 ***
df$sat 4.151e-03 4.351e-05 95.4 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 257024 on 334357 degrees of freedom
Residual deviance: 245878 on 334356 degrees of freedom
AIC: 245882
Number of Fisher Scoring iterations: 5
的Stata:
. logit outcome sat
Iteration 0: log likelihood = -128512.03
Iteration 1: log likelihood = -123233.13
Iteration 2: log likelihood = -122939.88
Iteration 3: log likelihood = -122939.1
Iteration 4: log likelihood = -122939.1
Logistic regression Number of obs = 334,358
LR chi2(1) = 11145.86
Prob > chi2 = 0.0000
Log likelihood = -122939.1 Pseudo R2 = 0.0434
------------------------------------------------------------------------------
outcome | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
sat | .0041509 .0000435 95.40 0.000 .0040656 .0042362
_cons | -7.75775 .0627402 -123.65 0.000 -7.880719 -7.634782
Statsmodel:
Optimization terminated successfully.
Current function value: 0.399258
Iterations 5
Logit Regression Results
==============================================================================
Dep. Variable: outcome No. Observations: 334358
Model: Logit Df Residuals: 334357
Method: MLE Df Model: 0
Date: Wed, 15 Jul 2015 Pseudo R-squ.: -0.03878
Time: 13:09:47 Log-Likelihood: -1.3350e+05
converged: True LL-Null: -1.2851e+05
LLR p-value: 1.000
==============================================================================
coef std err z P>|z| [95.0% Conf. Int.]
------------------------------------------------------------------------------
sat -0.0013 3.69e-06 -363.460 0.000 -0.001 -0.001
==============================================================================
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: outcome No. Observations: 334358
Model: GLM Df Residuals: 334357
Model Family: Binomial Df Model: 0
Link Function: logit Scale: 1.0
Method: IRLS Log-Likelihood: -1.3350e+05
Date: Wed, 15 Jul 2015 Deviance: 2.6699e+05
Time: 13:09:48 Pearson chi2: 3.50e+05
No. Iterations: 7
==============================================================================
coef std err z P>|z| [95.0% Conf. Int.]
------------------------------------------------------------------------------
sat -0.0013 3.69e-06 -363.460 0.000 -0.001 -0.001
==============================================================================