我正在尝试在SAS和R中进行这些阶乘逻辑回归,但在dry = rt * chi_ur中获得了不同的结果!!!为什么?
我的数据:
id dry rt chi_ur
1 1 0 1
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 1
6 0 0 0
7 0 0 0
8 0 0 1
9 0 0 0
10 0 0 0
11 0 0 0
12 0 0 0
13 1 0 0
14 0 0 0
15 0 0 1
16 0 0 1
17 0 0 0
18 1 0 0
19 0 0 0
20 0 0 0
21 0 0 1
22 1 1 0
23 0 1 1
24 0 0 1
25 0 0 1
26 1 0 0
27 1 0 0
28 0 0 0
29 1 0 0
30 1 0 0
31 1 0 1
32 1 0 0
33 0 0 0
34 1 0 0
35 0 0 0
36 0 0 1
37 1 0 0
38 1 0 0
39 0 0 1
40 0 1 0
41 0 1 0
42 1 1 0
43 0 1 0
44 0 0 0
45 0 0 0
46 0 0 1
47 0 0 0
48 0 0 1
49 1 0 0
50 0 0 1
51 0 0 0
52 1 0 0
53 1 0 0
54 1 0 0
55 1 0 0
56 0 0 0
57 1 0 0
58 0 0 0
59 1 0 0
60 1 0 0
61 0 0 0
62 0 1 0
63 0 0 0
64 0 0 0
65 1 1 0
66 0 0 0
67 1 0 0
68 1 0 0
69 1 0 0
70 1 0 0
71 1 0 0
72 1 0 0
73 1 0 0
74 1 0 0
75 1 0 0
76 1 0 0
77 0 1 0
78 1 0 0
79 0 1 0
80 0 1 0
81 1 0 0
82 1 0 0
83 1 0 0
84 1 0 0
85 1 0 0
86 0 0 1
87 1 0 0
88 1 0 0
89 1 0 0
90 1 0 1
91 1 0
92 1 0
93 0 0
94 0 1
95 0 1
96 0 1
97 1 0
98 1 0
R代码:
summary(glm(dry ~ chi_ur, data = en, family = binomial))
summary(glm(dry ~ rt, data = en, family = binomial))
summary(glm(dry ~ rt*chi_ur, data = en, family = binomial))
SAS代码:
proc logistic data = en.en1 desc;
class chi_ur ;
model dry = chi_ur / expb;
run;
proc logistic data = en.en1 desc;
class rt ;
model dry = rt / expb;
run;
proc logistic data = en.en1 desc;
class rt chi_ur ;
model dry = rt chi_ur rt*chi_ur/ expb;
run;
我的R结果:
> summary(glm(dry ~ chi_ur, data = en, family = binomial))
Call:
glm(formula = dry ~ chi_ur, family = binomial, data = en)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.2601 -1.2601 -0.6231 1.0969 1.8626
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.1924 0.2352 0.818 0.4133
chi_ur -1.7328 0.6782 -2.555 0.0106 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 124.59 on 89 degrees of freedom
Residual deviance: 116.37 on 88 degrees of freedom
(8 observations deleted due to missingness)
AIC: 120.37
Number of Fisher Scoring iterations: 3
> summary(glm(dry ~ rt, data = en, family = binomial))
Call:
glm(formula = dry ~ rt, family = binomial, data = en)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.2181 -1.2181 -0.6945 1.1372 1.7552
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.09531 0.21847 0.436 0.6626
rt -1.39459 0.68700 -2.030 0.0424 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 135.69 on 97 degrees of freedom
Residual deviance: 130.81 on 96 degrees of freedom
AIC: 134.81
Number of Fisher Scoring iterations: 4
> summary(glm(dry ~ rt*chi_ur, data = en, family = binomial))
Call:
glm(formula = dry ~ rt * chi_ur, family = binomial, data = en)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.3304 -1.3304 -0.6444 1.0317 1.8297
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.3528 0.2559 1.379 0.16798
rt -1.2001 0.7360 -1.631 0.10297
chi_ur -1.8192 0.6897 -2.637 0.00835 **
rt:chi_ur -12.8996 1455.3979 -0.009 0.99293
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 124.59 on 89 degrees of freedom
Residual deviance: 113.07 on 86 degrees of freedom
(8 observations deleted due to missingness)
AIC: 121.07
Number of Fisher Scoring iterations: 14
我的SAS结果:
The SAS System
The LOGISTIC Procedure
Model Information
Data Set EN.EN1
Response Variable dry
Number of Response Levels 2
Model binary logit
Optimization Technique Fisher's scoring
Number of Observations Read 98
Number of Observations Used 90
Response Profile
Ordered
Value dry Total
Frequency
1 1 43
2 0 47
Probability modeled is dry='1'.
Note: 8 observations were deleted due to missing values for the response or explanatory variables.
Class Level Information
Class Value Design
Variables
chi_ur 0 1
1 -1
Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
Model Fit Statistics
Criterion Intercept Only Intercept and
Covariates
AIC 126.589 120.371
SC 129.088 125.371
-2 Log L 124.589 116.371
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 8.2175 1 0.0041
Score 7.6262 1 0.0058
Wald 6.5262 1 0.0106
Type 3 Analysis of Effects
Effect DF Wald
Chi-Square Pr > ChiSq
chi_ur 1 6.5262 0.0106
Analysis of Maximum Likelihood Estimates
Parameter DF Estimate Standard
Error Wald
Chi-Square Pr > ChiSq Exp(Est)
Intercept 1 -0.6740 0.3391 3.9498 0.0469 0.510
chi_ur 0 1 0.8664 0.3391 6.5262 0.0106 2.378
Odds Ratio Estimates
Effect Point Estimate 95% Wald
Confidence Limits
chi_ur 0 vs 1 5.656 1.497 21.372
Association of Predicted Probabilities and
Observed Responses
Percent Concordant 27.7 Somers' D 0.228
Percent Discordant 4.9 Gamma 0.700
Percent Tied 67.4 Tau-a 0.115
Pairs 2021 c 0.614
--------------------------------------------------------------------------------
The SAS System
The LOGISTIC Procedure
Model Information
Data Set EN.EN1
Response Variable dry
Number of Response Levels 2
Model binary logit
Optimization Technique Fisher's scoring
Number of Observations Read 98
Number of Observations Used 98
Response Profile
Ordered
Value dry Total
Frequency
1 1 47
2 0 51
Probability modeled is dry='1'.
Class Level
Information
Class Value Design
Variables
rt 0 1
1 -1
Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
Model Fit Statistics
Criterion Intercept Only Intercept and
Covariates
AIC 137.694 134.806
SC 140.279 139.976
-2 Log L 135.694 130.806
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 4.8871 1 0.0271
Score 4.6063 1 0.0319
Wald 4.1208 1 0.0424
Type 3 Analysis of Effects
Effect DF Wald
Chi-Square Pr > ChiSq
rt 1 4.1208 0.0424
Analysis of Maximum Likelihood Estimates
Parameter DF Estimate Standard
Error Wald
Chi-Square Pr > ChiSq Exp(Est)
Intercept 1 -0.6020 0.3435 3.0712 0.0797 0.548
rt 0 1 0.6973 0.3435 4.1208 0.0424 2.008
Odds Ratio Estimates
Effect Point Estimate 95% Wald
Confidence Limits
rt 0 vs 1 4.033 1.049 15.504
Association of Predicted Probabilities and
Observed Responses
Percent Concordant 20.2 Somers' D 0.152
Percent Discordant 5.0 Gamma 0.603
Percent Tied 74.8 Tau-a 0.077
Pairs 2397 c 0.576
--------------------------------------------------------------------------------
The SAS System
The LOGISTIC Procedure
Model Information
Data Set EN.EN1
Response Variable dry
Number of Response Levels 2
Model binary logit
Optimization Technique Fisher's scoring
Number of Observations Read 98
Number of Observations Used 90
Response Profile
Ordered
Value dry Total
Frequency
1 1 43
2 0 47
Probability modeled is dry='1'.
Note: 8 observations were deleted due to missing values for the response or explanatory variables.
Class Level Information
Class Value Design
Variables
rt 0 1
1 -1
chi_ur 0 1
1 -1
Model Convergence Status
Quasi-complete separation of data points detected.
Warning: The maximum likelihood estimate may not exist.
Warning: The LOGISTIC procedure continues in spite of the above warning. Results shown are based on the last maximum likelihood iteration. Validity of the model fit is questionable.
Model Fit Statistics
Criterion Intercept Only Intercept and
Covariates
AIC 126.589 121.066
SC 129.088 131.065
-2 Log L 124.589 113.066
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 11.5228 3 0.0092
Score 10.6138 3 0.0140
Wald 8.6501 3 0.0343
Joint Tests
Effect DF Wald
Chi-Square Pr > ChiSq
rt 1 0.0007 0.9787
chi_ur 1 0.0009 0.9765
rt*chi_ur 1 0.0005 0.9830
Note: Under full-rank parameterizations, Type 3 effect tests are replaced by joint tests. The joint test for an effect is a test that all the parameters associated with that effect are zero. Such joint tests might not be equivalent to Type 3 effect tests under GLM parameterization.
Analysis of Maximum Likelihood Estimates
Parameter DF Estimate Standard
Error Wald
Chi-Square Pr > ChiSq Exp(Est)
Intercept 1 -3.5417 111.8 0.0010 0.9747 0.029
rt 0 1 2.9849 111.8 0.0007 0.9787 19.785
chi_ur 0 1 3.2945 111.8 0.0009 0.9765 26.963
rt*chi_ur 0 0 1 -2.3849 111.8 0.0005 0.9830 0.092
Association of Predicted Probabilities and
Observed Responses
Percent Concordant 40.7 Somers' D 0.319
Percent Discordant 8.8 Gamma 0.646
Percent Tied 50.6 Tau-a 0.161
Pairs 2021 c 0.660
我认为有人怀疑SAS最大似然估计分析中的标准误差是否保持不变...
有什么主意吗?我该如何解决?谢谢!
答案 0 :(得分:0)
我怀疑这是因为您没有在PROC LOGISTIC的CLASS语句上指定PARAMETERIZATION和REF选项,因此参数化方法将有所不同。 R也没有指定“事件”是什么,假设它使用1,那么结果应该相似。
class rt (param=ref);