我正在尝试在ggplot2
中制作此逻辑回归图。
df <- structure(list(y = c(2L, 7L, 776L, 19L, 12L, 26L, 7L, 12L, 8L,
24L, 20L, 16L, 12L, 10L, 23L, 20L, 16L, 12L, 18L, 22L, 23L, 22L,
13L, 7L, 20L, 12L, 13L, 11L, 11L, 14L, 10L, 8L, 10L, 11L, 5L,
5L, 1L, 2L, 1L, 1L, 0L, 0L, 0L), n = c(3L, 7L, 789L, 20L, 14L,
27L, 7L, 13L, 9L, 29L, 22L, 17L, 14L, 11L, 30L, 21L, 19L, 14L,
22L, 29L, 28L, 28L, 19L, 10L, 27L, 22L, 18L, 18L, 14L, 23L, 18L,
12L, 19L, 15L, 13L, 9L, 7L, 3L, 1L, 1L, 1L, 1L, 1L), x = c(18L,
19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L,
32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L,
45L, 46L, 47L, 48L, 49L, 50L, 51L, 52L, 53L, 54L, 55L, 56L, 59L,
62L, 63L, 66L)), .Names = c("y", "n", "x"), class = "data.frame", row.names = c(NA,
-43L))
mod.fit <- glm(formula = y/n ~ x, data = df, weight=n, family = binomial(link = logit),
na.action = na.exclude, control = list(epsilon = 0.0001, maxit = 50, trace = T))
summary(mod.fit)
Pi <- c(0.25, 0.5, 0.75)
LD <- (log(Pi /(1-Pi))-mod.fit$coefficients[1])/mod.fit$coefficients[2]
LD.summary <- data.frame(Pi , LD)
LD.summary
plot(df$x, df$y/df$n, xlab = "x", ylab = "Estimated probability")
lin.pred <- predict(mod.fit)
pi.hat <- exp(lin.pred)/(1 + exp(lin.pred))
lines(df$x, pi.hat, lty = 1, col = "red")
segments(x0 = LD.summary$LD, y0 = -0.1, x1 = LD.summary$LD, y1 = LD.summary$Pi,
lty=2, col=c("darkblue","darkred","darkgreen"))
segments(x0 = 15, y0 = LD.summary$Pi, x1 = LD.summary$LD, y1 = LD.summary$Pi,
lty=2, col=c("darkblue","darkred","darkgreen"))
legend("bottomleft", legend=c("LD25", "LD50", "LD75"), lty=2, col=c("darkblue","darkred","darkgreen"), bty="n", cex=0.75)
以下是ggplot2
library(ggplot2)
p <- ggplot(data = df, aes(x = x, y = y/n)) +
geom_point() +
stat_smooth(method = "glm", family = "binomial")
p <- p + geom_segment(aes(
x = LD.summary$LD
, y = 0
, xend = LD.summary$LD
, yend = LD.summary$Pi
)
, colour="red"
)
p <- p + geom_segment(aes(
x = 0
, y = LD.summary$Pi
, xend = LD.summary$LD
, yend = LD.summary$Pi
)
, colour="red"
)
print(p)
glm
和stat_smooth
的预测值看起来不同。这两种方法会产生不同的结果,或者我在这里遗漏了一些东西。 提前感谢您的帮助和时间。感谢
答案 0 :(得分:16)
@ mathetmatical.coffee的答案只是一些小的补充。通常情况下,geom_smooth
不应该取代实际的建模,这就是为什么当你想要使用从glm
获得的特定输出时,它似乎很不方便。但实际上,我们需要做的就是将拟合值添加到我们的数据框中:
df$pred <- pi.hat
LD.summary$group <- c('LD25','LD50','LD75')
ggplot(df,aes(x = x, y = y/n)) +
geom_point() +
geom_line(aes(y = pred),colour = "black") +
geom_segment(data=LD.summary, aes(y = Pi,
xend = LD,
yend = Pi,
col = group),x = -Inf,linetype = "dashed") +
geom_segment(data=LD.summary,aes(x = LD,
xend = LD,
yend = Pi,
col = group),y = -Inf,linetype = "dashed")
最后一个小技巧是使用Inf
和-Inf
来使虚线一直延伸到地图边界。
这里的教训是,如果您想要做的只是为绘图添加平滑,并且绘图中的其他内容都不依赖于它,请使用geom_smooth
。如果您想参考拟合模型的输出,通常更容易使模型适合ggplot
以外的情况,然后进行绘图。
答案 1 :(得分:6)
修改您的LD.summary
以包含一个包含group
(或相应标签)的新列。
LD.summary$group <- c('LD25','LD50','LD75')
然后修改您的geom_segment
命令,在其中加col=LD.summary$group
(并移除colour="red"
),以不同颜色绘制每个细分,并添加图例:
geom_segment( aes(...,col=LD.summary$group) )
另外,为避免必须始终LD.summary$xxx
,请data=LD.summary
向geom_segment
提供信息:
geom_segment(data=LD.summary, aes(x=0, y=Pi,xend=LD, yend=Pi, colour=group) )
至于为什么图形不完全相同,在基础R图中,x轴从〜20开始,而在ggplot
中,它从零开始。这是因为您的第二个geom_segment
从x=0
开始。
要解决此问题,您可以将x=0
更改为x=min(df$x)
。
要使用+ scale_y_continuous('Estimated probability')
来获取y轴标签。
总结:
LD.summary$group <- c('LD25','LD50','LD75')
p <- ggplot(data = df, aes(x = x, y = y/n)) +
geom_point() +
stat_smooth(method = "glm", family = "binomial") +
scale_y_continuous('Estimated probability') # <-- add y label
p <- p + geom_segment(data=LD.summary, aes( # <-- data=Ld.summary
x = LD
, y = 0
, xend = LD
, yend = Pi
, col = group # <- colours
)
)
p <- p + geom_segment(data=LD.summary, aes( # <-- data=Ld.summary
x = min(df$x) # <-- don't plot all the way to x=0
, y = Pi
, xend = LD
, yend = Pi
, col = group # <- colours
)
)
print(p)
产生: