我有一些观察结果可用来确定各种浓度的化学品的死亡率。我已经根据基础观测的数量对这些比率进行了加权,然后将它们拟合为glm(binomial(link = logit))模型。我一直没有尝试在ggplot中显示此模型的图,包括原始观测值(大小=权重),模型拟合线和置信区间,但是没有运气。我可以使用一个简单的plot()来工作,但随后无法显示所需的其他图形。有任何想法吗?在此先感谢!
#data:
C <- data.frame("region" = c("r29","r31","r2325","r25","r2526", "r26"),"conc" = c(755.3189,1689.6680,1781.8450,1902.8830,2052.1133,4248.7832),"nr_dead" = c(1,1,18,44,170,27), "nr_survived" = c(2,3,29,1370,1910,107),"death_rate" = c(0.33333333,0.25000000,0.38297872,0.03111740,0.08173077
,0.20149254))
C$tot_obsv <- (C$nr_survived+C$nr_dead)
#glm model:
C_glm <- glm(cbind(nr_dead, nr_survived) ~ conc, data = C, family = "binomial")
#ggplot line is incorrect:
ggplot(C_glm, aes(C$conc,C$death_rate, size = C$tot_obsv)) + coord_cartesian(ylim = c(0, 0.5)) + theme_bw() + geom_point() + geom_smooth(method = "glm", mapping = aes(weight = C$tot_obsv))
#correct plot of inv.logit = logistic function (1/(1+exp(-x)))
plot(inv.logit(-3.797+0.0005751*(0:6700)))
#using predict function works, but doesn't display confidence interval or nice point sizes:
x_conc <-seq (750, 6700, 1)
y_death_rate <- predict.glm(C_glm, list(conc=x_conc), type="response")
plot(C$conc, C$death_rate, pch = 10, lwd = 3, cex = C$tot_obsv/300, ylim = c(0, 0.5), xlim = c(0,7000), xlab = "conc", ylab = "death rate")
lines(x_conc, y_death_rate, col = "red", lwd = 2)
基本上,我试图使用ggplot绘制glm预测的逻辑曲线,观察权重和置信区间,但只能使用plot()正确显示该曲线。
答案 0 :(得分:1)
tibble(
x_conc = c(seq(750, 6700, 1), C$conc),
y_death_rate = predict.glm(C_glm, list(conc = x_conc), type = "response")
) %>%
left_join(C, by = c('x_conc' = 'conc')) %>%
ggplot(aes(x = x_conc, y = y_death_rate)) +
#geom_line(aes(size = 0.8)) + commented out as binomial smooth does this
geom_point(aes(y = death_rate, size = tot_obsv)) + binomial_smooth()
我们当然需要定义函数binomial_smooth
摘自:https://ggplot2.tidyverse.org/reference/geom_smooth.html
binomial_smooth <- function(...) {
geom_smooth(method = "glm", method.args = list(family = "binomial"), ...)
}
答案 1 :(得分:0)
您可以在predict
函数中包含回归变量的实现值,然后仅使用回归变量实现的值中的数据添加geom_point
。如果您使用size = tot_obs
,则只会绘制该列不丢失的点,即仅绘制C
中的值。
tibble(
x_conc = c(seq(750, 6700, 1), C$conc),
y_death_rate = predict.glm(C_glm, list(conc = x_conc), type = "response")
) %>%
left_join(C, by = c('x_conc' = 'conc')) %>%
ggplot(aes(x = x_conc, y = y_death_rate)) +
geom_line(aes(size = 0.8)) +
geom_point(aes(y = death_rate, size = tot_obsv))