SAS和R中用于阶乘逻辑回归的不同输出

时间:2018-10-22 14:55:25

标签: r sas regression

我正在尝试在SAS和R中进行这些阶乘逻辑回归,但在dry = rt * chi_ur中获得了不同的结果!!!为什么?

我的数据:

id  dry rt  chi_ur
1   1   0   1
2   0   0   0
3   0   0   0
4   0   0   0
5   0   0   1
6   0   0   0
7   0   0   0
8   0   0   1
9   0   0   0
10  0   0   0
11  0   0   0
12  0   0   0
13  1   0   0
14  0   0   0
15  0   0   1
16  0   0   1
17  0   0   0
18  1   0   0
19  0   0   0
20  0   0   0
21  0   0   1
22  1   1   0
23  0   1   1
24  0   0   1
25  0   0   1
26  1   0   0
27  1   0   0
28  0   0   0
29  1   0   0
30  1   0   0
31  1   0   1
32  1   0   0
33  0   0   0
34  1   0   0
35  0   0   0
36  0   0   1
37  1   0   0
38  1   0   0
39  0   0   1
40  0   1   0
41  0   1   0
42  1   1   0
43  0   1   0
44  0   0   0
45  0   0   0
46  0   0   1
47  0   0   0
48  0   0   1
49  1   0   0
50  0   0   1
51  0   0   0
52  1   0   0
53  1   0   0
54  1   0   0
55  1   0   0
56  0   0   0
57  1   0   0
58  0   0   0
59  1   0   0
60  1   0   0
61  0   0   0
62  0   1   0
63  0   0   0
64  0   0   0
65  1   1   0
66  0   0   0
67  1   0   0
68  1   0   0
69  1   0   0
70  1   0   0
71  1   0   0
72  1   0   0
73  1   0   0
74  1   0   0
75  1   0   0
76  1   0   0
77  0   1   0
78  1   0   0
79  0   1   0
80  0   1   0
81  1   0   0
82  1   0   0
83  1   0   0
84  1   0   0
85  1   0   0
86  0   0   1
87  1   0   0
88  1   0   0
89  1   0   0
90  1   0   1
91  1   0   
92  1   0   
93  0   0   
94  0   1   
95  0   1   
96  0   1   
97  1   0   
98  1   0   

R代码:

summary(glm(dry ~ chi_ur, data = en, family = binomial))
summary(glm(dry ~ rt, data = en, family = binomial))
summary(glm(dry ~ rt*chi_ur, data = en, family = binomial))

SAS代码:

proc logistic data = en.en1 desc;
class chi_ur ;
model dry = chi_ur / expb;
run;

proc logistic data = en.en1 desc;
class rt ;
model dry = rt / expb;
run;

proc logistic data = en.en1 desc;
class rt chi_ur ;
model dry = rt chi_ur rt*chi_ur/ expb;
run;

我的R结果:

> summary(glm(dry ~ chi_ur, data = en, family = binomial))

Call:
glm(formula = dry ~ chi_ur, family = binomial, data = en)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.2601  -1.2601  -0.6231   1.0969   1.8626  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept)   0.1924     0.2352   0.818   0.4133  
chi_ur       -1.7328     0.6782  -2.555   0.0106 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 124.59  on 89  degrees of freedom
Residual deviance: 116.37  on 88  degrees of freedom
  (8 observations deleted due to missingness)
AIC: 120.37

Number of Fisher Scoring iterations: 3

> summary(glm(dry ~ rt, data = en, family = binomial))

Call:
glm(formula = dry ~ rt, family = binomial, data = en)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.2181  -1.2181  -0.6945   1.1372   1.7552  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept)  0.09531    0.21847   0.436   0.6626  
rt          -1.39459    0.68700  -2.030   0.0424 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 135.69  on 97  degrees of freedom
Residual deviance: 130.81  on 96  degrees of freedom
AIC: 134.81

Number of Fisher Scoring iterations: 4

> summary(glm(dry ~ rt*chi_ur, data = en, family = binomial))

Call:
glm(formula = dry ~ rt * chi_ur, family = binomial, data = en)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.3304  -1.3304  -0.6444   1.0317   1.8297  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)   
(Intercept)    0.3528     0.2559   1.379  0.16798   
rt            -1.2001     0.7360  -1.631  0.10297   
chi_ur        -1.8192     0.6897  -2.637  0.00835 **
rt:chi_ur    -12.8996  1455.3979  -0.009  0.99293   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 124.59  on 89  degrees of freedom
Residual deviance: 113.07  on 86  degrees of freedom
  (8 observations deleted due to missingness)
AIC: 121.07

Number of Fisher Scoring iterations: 14

我的SAS结果:

The SAS System     

The LOGISTIC Procedure

Model Information 
Data Set EN.EN1 
Response Variable dry 
Number of Response Levels 2 
Model binary logit 
Optimization Technique Fisher's scoring     

Number of Observations Read 98 
Number of Observations Used 90    

Response Profile 
Ordered
Value dry Total
Frequency 
1 1 43 
2 0 47 

Probability modeled is dry='1'.   

Note: 8 observations were deleted due to missing values for the response or explanatory variables. 

Class Level Information 
Class Value Design
Variables 
chi_ur 0 1 
  1 -1 


Model Convergence Status 
Convergence criterion (GCONV=1E-8) satisfied.        

Model Fit Statistics 
Criterion Intercept Only Intercept and
Covariates 
AIC 126.589 120.371 
SC 129.088 125.371 
-2 Log L 124.589 116.371         

Testing Global Null Hypothesis: BETA=0 
Test Chi-Square DF Pr > ChiSq 
Likelihood Ratio 8.2175 1 0.0041 
Score 7.6262 1 0.0058 
Wald 6.5262 1 0.0106    

Type 3 Analysis of Effects 
Effect DF Wald
Chi-Square Pr > ChiSq 
chi_ur 1 6.5262 0.0106     

Analysis of Maximum Likelihood Estimates 
Parameter   DF Estimate Standard
Error Wald
Chi-Square Pr > ChiSq Exp(Est) 
Intercept   1 -0.6740 0.3391 3.9498 0.0469 0.510 
chi_ur 0 1 0.8664 0.3391 6.5262 0.0106 2.378 

Odds Ratio Estimates 
Effect Point Estimate 95% Wald
Confidence Limits 
chi_ur 0 vs 1 5.656 1.497 21.372 

Association of Predicted Probabilities and
Observed Responses 
Percent Concordant 27.7 Somers' D 0.228 
Percent Discordant 4.9 Gamma 0.700 
Percent Tied 67.4 Tau-a 0.115 
Pairs 2021 c 0.614     
  --------------------------------------------------------------------------------
The SAS System 


The LOGISTIC Procedure

Model Information 
Data Set EN.EN1 
Response Variable dry 
Number of Response Levels 2 
Model binary logit 
Optimization Technique Fisher's scoring 

Number of Observations Read 98 
Number of Observations Used 98      

Response Profile 
Ordered
Value dry Total
Frequency 
1 1 47 
2 0 51     


Probability modeled is dry='1'.    

Class Level
Information 
Class Value Design
Variables 
rt 0 1 
  1 -1 

Model Convergence Status 
Convergence criterion (GCONV=1E-8) satisfied. 

Model Fit Statistics 
Criterion Intercept Only Intercept and
Covariates 
AIC 137.694 134.806 
SC 140.279 139.976 
-2 Log L 135.694 130.806 

Testing Global Null Hypothesis: BETA=0 
Test Chi-Square DF Pr > ChiSq 
Likelihood Ratio 4.8871 1 0.0271 
Score 4.6063 1 0.0319 
Wald 4.1208 1 0.0424 

Type 3 Analysis of Effects 
Effect DF Wald
Chi-Square Pr > ChiSq 
rt 1 4.1208 0.0424 

Analysis of Maximum Likelihood Estimates 
Parameter   DF Estimate Standard
Error Wald
Chi-Square Pr > ChiSq Exp(Est) 
Intercept   1 -0.6020 0.3435 3.0712 0.0797 0.548 
rt 0 1 0.6973 0.3435 4.1208 0.0424 2.008 

Odds Ratio Estimates 
Effect Point Estimate 95% Wald
Confidence Limits 
rt 0 vs 1 4.033 1.049 15.504 


Association of Predicted Probabilities and
Observed Responses 
Percent Concordant 20.2 Somers' D 0.152 
Percent Discordant 5.0 Gamma 0.603 
Percent Tied 74.8 Tau-a 0.077 
Pairs 2397 c 0.576 

--------------------------------------------------------------------------------
The SAS System 

The LOGISTIC Procedure

Model Information 
Data Set EN.EN1 
Response Variable dry 
Number of Response Levels 2 
Model binary logit 
Optimization Technique Fisher's scoring 

Number of Observations Read 98 
Number of Observations Used 90 

Response Profile 
Ordered
Value dry Total
Frequency 
1 1 43 
2 0 47 

Probability modeled is dry='1'. 

Note: 8 observations were deleted due to missing values for the response or explanatory variables. 

Class Level Information 
Class Value Design
Variables 
rt 0 1 
  1 -1 
chi_ur 0 1 
  1 -1 

Model Convergence Status 
Quasi-complete separation of data points detected. 

Warning: The maximum likelihood estimate may not exist. 


Warning: The LOGISTIC procedure continues in spite of the above warning. Results shown are based on the last maximum likelihood iteration. Validity of the model fit is questionable. 


Model Fit Statistics 
Criterion Intercept Only Intercept and
Covariates 
AIC 126.589 121.066 
SC 129.088 131.065 
-2 Log L 124.589 113.066 

Testing Global Null Hypothesis: BETA=0 
Test Chi-Square DF Pr > ChiSq 
Likelihood Ratio 11.5228 3 0.0092 
Score 10.6138 3 0.0140 
Wald 8.6501 3 0.0343       

Joint Tests 
Effect DF Wald
Chi-Square Pr > ChiSq 
rt 1 0.0007 0.9787 
chi_ur 1 0.0009 0.9765 
rt*chi_ur 1 0.0005 0.9830 

Note: Under full-rank parameterizations, Type 3 effect tests are replaced by joint tests. The joint test for an effect is a test that all the parameters associated with that effect are zero. Such joint tests might not be equivalent to Type 3 effect tests under GLM parameterization. 

Analysis of Maximum Likelihood Estimates 
Parameter     DF Estimate Standard
Error Wald
Chi-Square Pr > ChiSq Exp(Est) 
Intercept     1 -3.5417 111.8 0.0010 0.9747 0.029 
rt 0   1 2.9849 111.8 0.0007 0.9787 19.785 
chi_ur 0   1 3.2945 111.8 0.0009 0.9765 26.963 
rt*chi_ur 0 0 1 -2.3849 111.8 0.0005 0.9830 0.092       

Association of Predicted Probabilities and
Observed Responses 
Percent Concordant 40.7 Somers' D 0.319 
Percent Discordant 8.8 Gamma 0.646 
Percent Tied 50.6 Tau-a 0.161 
Pairs 2021 c 0.660 

我认为有人怀疑SAS最大似然估计分析中的标准误差是否保持不变...

有什么主意吗?我该如何解决?谢谢!

1 个答案:

答案 0 :(得分:0)

我怀疑这是因为您没有在PROC LOGISTIC的CLASS语句上指定PARAMETERIZATION和REF选项,因此参数化方法将有所不同。 R也没有指定“事件”是什么,假设它使用1,那么结果应该相似。

class rt (param=ref);