使用贝叶斯图绘制来自多个模型的后验参数估计

时间:2018-10-18 13:54:35

标签: r ggplot2 rstan rstanarm

我正在使用大型绘图库bayesplot来可视化我用rstanarm估计的模型的后验概率区间。我想通过将系数的后验区间放在同一图上,以图形方式比较不同模型的绘图。

例如,想象一下,对于两个不同的模型,我从三个参数beta1, beta2, beta3的后验中抽取了1000张图纸:

# load the plotting library
library(bayesplot)
#> This is bayesplot version 1.6.0
#> - Online documentation and vignettes at mc-stan.org/bayesplot
#> - bayesplot theme set to bayesplot::theme_default()
#>    * Does _not_ affect other ggplot2 plots
#>    * See ?bayesplot_theme_set for details on theme setting
library(ggplot2)

# generate fake posterior draws from model1
fdata <- matrix(rnorm(1000 * 3), ncol = 3)
colnames(fdata) <- c('beta1', 'beta2', 'beta3')

# fake posterior draws from model 2
fdata2 <- matrix(rnorm(1000 * 3, 1, 2), ncol = 3)
colnames(fdata2) <- c('beta1', 'beta2', 'beta3')

Bayesplot为单个模型绘制提供了出色的可视化效果,它是ggplot2的“底层”,因此我可以根据需要进行自定义:

# a nice plot of 1
color_scheme_set("orange")
mcmc_intervals(fdata) + theme_minimal() + ggtitle("Model 1")

# a nice plot of 2
color_scheme_set("blue")
mcmc_intervals(fdata2) + ggtitle("Model 2")

但是我要实现的是将这两个模型一起绘制在同一图上,这样对于每个系数,我都有两个间隔,并且可以通过将颜色映射到模型来区分哪个间隔。但是我不知道如何做到这一点。一些不起作用的东西:

# doesnt work
mcmc_intervals(fdata) + mcmc_intervals(fdata2)
#> Error: Don't know how to add mcmc_intervals(fdata2) to a plot

# appears to pool
mcmc_intervals(list(fdata, fdata2))

关于如何做到这一点的任何想法?还是在给定后验矩阵的情况下如何手动进行?

reprex package(v0.2.1)于2018-10-18创建

2 个答案:

答案 0 :(得分:1)

我在GitHub的bayesplot页上问了这个问题,并得到了回复Module Mode

答案 1 :(得分:1)

所以答案也发布在这里,我已经扩展了@Manny T(https://github.com/stan-dev/bayesplot/issues/232)链接上的代码

# simulate having posteriors for two different models each with parameters beta[1],..., beta[4]
posterior_1 <- matrix(rnorm(4000), 1000, 4)
posterior_2 <- matrix(rnorm(4000), 1000, 4)
colnames(posterior_1) <- colnames(posterior_2) <- paste0("beta[", 1:4, "]")

# use bayesplot::mcmc_intervals_data() function to get intervals data in format easy to pass to ggplot
library(bayesplot)
combined <- rbind(mcmc_intervals_data(posterior_1), mcmc_intervals_data(posterior_2))
combined$model <- rep(c("Model 1", "Model 2"), each = ncol(posterior_1))

# make the plot using ggplot 
library(ggplot2)
theme_set(bayesplot::theme_default())
pos <- position_nudge(y = ifelse(combined$model == "Model 2", 0, 0.1))
ggplot(combined, aes(x = m, y = parameter, color = model)) + 
  geom_linerange(aes(xmin = l, xmax = h), position = pos, size=2)+
  geom_linerange(aes(xmin = ll, xmax = hh), position = pos)+
  geom_point(position = pos, color="black")

enter image description here

如果像我一样,您将需要80%和90%的可信区间(而不是50%的内部区间),并且可能希望坐标发生翻转,让我们在0处添加虚线(模型估计没有变化) 。您可以这样做:

# use bayesplot::mcmc_intervals_data() function to get intervals data in format easy to pass to ggplot
library(bayesplot)
combined <- rbind(mcmc_intervals_data(posterior_1,prob=0.8,prob_outer = 0.9), mcmc_intervals_data(posterior_2,prob=0.8,prob_outer = 0.9))
combined$model <- rep(c("Model 1", "Model 2"), each = ncol(posterior_1))

# make the plot using ggplot 
library(ggplot2)
theme_set(bayesplot::theme_default())
pos <- position_nudge(y = ifelse(combined$model == "Model 2", 0, 0.1))
ggplot(combined, aes(x = m, y = parameter, color = model)) + 
  geom_linerange(aes(xmin = l, xmax = h), position = pos, size=2)+
  geom_linerange(aes(xmin = ll, xmax = hh), position = pos)+
  geom_point(position = pos, color="black")+
  coord_flip()+
  geom_vline(xintercept=0,linetype="dashed")

enter image description here

这最后一点要注意的几件事。我添加了prob_outer = 0.9,即使这是默认设置,也只是为了说明如何更改外部可信间隔。由于geom_vline,在这里用xintercept = geom_hline而不是yintercept = coord_flip创建虚线(一切颠倒了)。因此,如果您不翻转轴,则需要做相反的事情。