预测概率比率的标准误差

时间:2017-07-20 09:17:52

标签: r statistics prediction glm

我有一个带有连续自变量X的数据集和一个带有三个类别(a,b和c)的分类因变量Y.我已经为数据拟合了多项logit模型。除了作为X的函数的每个结果的预测概率之外,我感兴趣的是c条件为b或c的概率(即,不是a)。

我可以轻松地从模型中提取预测概率及其标准误差。 c的预测概率除以b或c的预测概率之和也是直截了当的。但我无法弄清楚如何计算最后一个数量的标准误差。

MWE在这里:

# load packages
require(ggplot2)
require(nnet)
require(effects)
require(dplyr)
require(tidyr)

# simulate data with categorical dependent variable
simdat<-data.frame(x=1:10,
                   y=c("a","a","a","b","a","b","c","b","a","c"))

# fit multinomial logit model
mm1<-multinom(y~x,data=simdat)

# get predicted probabilities of each outcome across the range of x
preds<-effect("x",mm1,xlevels=list(x=1:10),se=T) 

# collect predicted probs and prediction se's in data frame
predsdf<-gather(as_data_frame(preds$prob),"probcat","prob") %>% 
  bind_cols(gather(as_data_frame(preds$se.prob),"secat","se")) %>% 
  mutate(x=rep(1:10,3))

# calculate probability of c conditional on b or c (i.e., not a)
predsdf<-predsdf %>% 
  bind_rows(.,data_frame(probcat="prob.c / (prob.b + prob.c)",
                         prob=.$prob[21:30]/(.$prob[11:20]+.$prob[21:30]),
                         secat="prob.c / (prob.b + prob.c)",
                         se=NA, # how to calculate this standard error?
                         x=1:10))

# plot predicted probs and se's across range of x
ggplot(predsdf,aes(x=x,y=prob,ymin=prob-1*se,ymax=prob+1*se,group=probcat,color=probcat)) +
  geom_line() +
  geom_ribbon(alpha=.2) +
  theme_bw()

最后一行给出了这个情节:

predicted probabilities of each category

我正在寻找的是图中紫色线的标准误差。直觉上它应该大于b和c预测的标准误差,但我无法弄清楚如何计算它。

0 个答案:

没有答案