无法在R汇总函数中获得正确的系数

时间:2012-02-15 23:36:05

标签: r

我在为R中的数据获取正确的摘要时遇到了困难 这是我到目前为止所做的,但最后一部分是不对的。摘要不是应该的。

目标是使用四个Myers-Briggs量表拟合模型作为pi =经常饮酒的概率的预测因子。有人能指出我正确的方向吗?

> data(MBdrink)
> MBdrink
EI SN TF JP  Drink Count
1   E  S  T  J  Often    10
2   E  S  T  P  Often     8
3   E  S  F  J  Often     5
4   E  S  F  P  Often     7
5   E  S  T  J Rarely    67
6   E  S  T  P Rarely    34
7   E  S  F  J Rarely   101
8   E  S  F  P Rarely    72
9   E  N  T  J  Often     3
10  E  N  T  P  Often     2
11  E  N  F  J  Often     4
12  E  N  F  P  Often    15
13  E  N  T  J Rarely    20
14  E  N  T  P Rarely    16
15  E  N  F  J Rarely    27
16  E  N  F  P Rarely    65
17  I  S  T  J  Often    17
18  I  S  T  P  Often     3
19  I  S  F  J  Often     6
20  I  S  F  P  Often     4
21  I  S  T  J Rarely   123
22  I  S  T  P Rarely    49
23  I  S  F  J Rarely   132
24  I  S  F  P Rarely   102
25  I  N  T  J  Often     1
26  I  N  T  P  Often     5
27  I  N  F  J  Often     1
28  I  N  F  P  Often     6
29  I  N  T  J Rarely    12
30  I  N  T  P Rarely    30
31  I  N  F  J Rarely    30
32  I  N  F  P Rarely    73

> summary(MBdrink)
EI     SN     TF     JP        Drink        Count       
E:16   S:16   T:16   J:16   Rarely:16   Min.   :  1.00  
I:16   N:16   F:16   P:16   Often :16   1st Qu.:  5.00  
                                     Median : 15.50  
                                     Mean   : 32.81  
                                     3rd Qu.: 53.00  
                                     Max.   :132.00





> MBdrink<-transform(MBdrink, EI=as.factor(EI))
> MBdrink<-transform(MBdrink, SN=as.factor(SN))
> MBdrink<-transform(MBdrink, TF=as.factor(TF))
> MBdrink<-transform(MBdrink, JP=as.factor(JP))

> levels(MBdrink$EI)
[1] "E" "I"
> levels(MBdrink$SN)
[1] "S" "N"
> levels(MBdrink$TF)
[1] "T" "F"
> levels(MBdrink$JP)
[1] "J" "P"

> MBdrink.fit<-
+ glm((Count>0)~EI+SN+TF+JP+Drink,family=binomial,data=MBdrink)
> summary(MBdrink.fit)

Call:
glm(formula = (Count > 0) ~ EI + SN + TF + JP + Drink, family = binomial, 
data = MBdrink)

Deviance Residuals: 
  Min         1Q     Median         3Q        Max  
3.971e-06  3.971e-06  3.971e-06  3.971e-06  3.971e-06  

Coefficients:
          Estimate Std. Error z value Pr(>|z|)
(Intercept)  2.557e+01  9.353e+04       0        1
EII         -4.602e-10  7.637e+04       0        1
SNN         -4.602e-10  7.637e+04       0        1
TFF         -4.602e-10  7.637e+04       0        1
JPP         -4.602e-10  7.637e+04       0        1
DrinkOften   4.602e-10  7.637e+04       0        1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 0.0000e+00  on 31  degrees of freedom
Residual deviance: 5.0463e-10  on 26  degrees of freedom
AIC: 12

Number of Fisher Scoring iterations: 24

谢谢!

1 个答案:

答案 0 :(得分:3)

Count>0始终为TRUE:您正在尝试预测常量变量,因此会产生奇怪的结果。

对于逻辑回归,您需要原始数据,而不是聚合数据。如果您想预测Drink列, 它不应该是预测者之一。

# Sample data
n <- 100
MBdrink <- data.frame(
  EI=sample(c("E","I"), n, replace=TRUE),
  SN=sample(c("S","N"), n, replace=TRUE),
  TF=sample(c("T","F"), n, replace=TRUE),
  JP=sample(c("J","P"), n, replace=TRUE),
  Drink=factor( sample(c("Rarely","Often"), n, p=c(.2,.8), replace=TRUE), levels=c("Rarely", "Often")),
  Count=rpois(n,5)
)
library(plyr)
MBdrink <- ddply(MBdrink, c("EI","SN","TF","JP","Drink"), summarize, Count=sum(Count))
# dis-aggregate the data
d <- ddply(MBdrink, "Count", function (u) 
  do.call( rbind, replicate(unique(u$Count), u, simplify=FALSE)))
# Run the regression you want
r <- glm( 
  Drink ~ EI + SN + TF + JP, 
  data=d, 
  family=binomial(link="logit") # Logistic regression
)
result <- cbind(d, Probability=predict(r, type="response"))
result <- unique(result)
result <- result[order(result$Probability),]
result