我在为R中的数据获取正确的摘要时遇到了困难 这是我到目前为止所做的,但最后一部分是不对的。摘要不是应该的。
目标是使用四个Myers-Briggs量表拟合模型作为pi =经常饮酒的概率的预测因子。有人能指出我正确的方向吗?
> data(MBdrink)
> MBdrink
EI SN TF JP Drink Count
1 E S T J Often 10
2 E S T P Often 8
3 E S F J Often 5
4 E S F P Often 7
5 E S T J Rarely 67
6 E S T P Rarely 34
7 E S F J Rarely 101
8 E S F P Rarely 72
9 E N T J Often 3
10 E N T P Often 2
11 E N F J Often 4
12 E N F P Often 15
13 E N T J Rarely 20
14 E N T P Rarely 16
15 E N F J Rarely 27
16 E N F P Rarely 65
17 I S T J Often 17
18 I S T P Often 3
19 I S F J Often 6
20 I S F P Often 4
21 I S T J Rarely 123
22 I S T P Rarely 49
23 I S F J Rarely 132
24 I S F P Rarely 102
25 I N T J Often 1
26 I N T P Often 5
27 I N F J Often 1
28 I N F P Often 6
29 I N T J Rarely 12
30 I N T P Rarely 30
31 I N F J Rarely 30
32 I N F P Rarely 73
> summary(MBdrink)
EI SN TF JP Drink Count
E:16 S:16 T:16 J:16 Rarely:16 Min. : 1.00
I:16 N:16 F:16 P:16 Often :16 1st Qu.: 5.00
Median : 15.50
Mean : 32.81
3rd Qu.: 53.00
Max. :132.00
> MBdrink<-transform(MBdrink, EI=as.factor(EI))
> MBdrink<-transform(MBdrink, SN=as.factor(SN))
> MBdrink<-transform(MBdrink, TF=as.factor(TF))
> MBdrink<-transform(MBdrink, JP=as.factor(JP))
> levels(MBdrink$EI)
[1] "E" "I"
> levels(MBdrink$SN)
[1] "S" "N"
> levels(MBdrink$TF)
[1] "T" "F"
> levels(MBdrink$JP)
[1] "J" "P"
> MBdrink.fit<-
+ glm((Count>0)~EI+SN+TF+JP+Drink,family=binomial,data=MBdrink)
> summary(MBdrink.fit)
Call:
glm(formula = (Count > 0) ~ EI + SN + TF + JP + Drink, family = binomial,
data = MBdrink)
Deviance Residuals:
Min 1Q Median 3Q Max
3.971e-06 3.971e-06 3.971e-06 3.971e-06 3.971e-06
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.557e+01 9.353e+04 0 1
EII -4.602e-10 7.637e+04 0 1
SNN -4.602e-10 7.637e+04 0 1
TFF -4.602e-10 7.637e+04 0 1
JPP -4.602e-10 7.637e+04 0 1
DrinkOften 4.602e-10 7.637e+04 0 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 0.0000e+00 on 31 degrees of freedom
Residual deviance: 5.0463e-10 on 26 degrees of freedom
AIC: 12
Number of Fisher Scoring iterations: 24
谢谢!
答案 0 :(得分:3)
Count>0
始终为TRUE
:您正在尝试预测常量变量,因此会产生奇怪的结果。
对于逻辑回归,您需要原始数据,而不是聚合数据。如果您想预测Drink
列,
它不应该是预测者之一。
# Sample data
n <- 100
MBdrink <- data.frame(
EI=sample(c("E","I"), n, replace=TRUE),
SN=sample(c("S","N"), n, replace=TRUE),
TF=sample(c("T","F"), n, replace=TRUE),
JP=sample(c("J","P"), n, replace=TRUE),
Drink=factor( sample(c("Rarely","Often"), n, p=c(.2,.8), replace=TRUE), levels=c("Rarely", "Often")),
Count=rpois(n,5)
)
library(plyr)
MBdrink <- ddply(MBdrink, c("EI","SN","TF","JP","Drink"), summarize, Count=sum(Count))
# dis-aggregate the data
d <- ddply(MBdrink, "Count", function (u)
do.call( rbind, replicate(unique(u$Count), u, simplify=FALSE)))
# Run the regression you want
r <- glm(
Drink ~ EI + SN + TF + JP,
data=d,
family=binomial(link="logit") # Logistic regression
)
result <- cbind(d, Probability=predict(r, type="response"))
result <- unique(result)
result <- result[order(result$Probability),]
result