Question

背景 McElearth（2016）在他的rethinking book第158-159页中，使用索引变量代替虚拟编码，用于称为“clade”的3类变量来预测“kcal.per.g”（线性回归）。

问题：我想知道我们是否可以在"rstanarm"中应用相同的方法？我已经为下面的可能演示提供了数据和R代码。

library("rethinking") # A github package not on CRAN
data(milk)
d <- milk
d$clade_id <- coerce_index(d$clade) # Index variable maker
#[1] 4 4 4 4 4 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 1 1 1 1 1 1 1 1 1 # index variable

# Model Specification:
fit1 <- map(
alist(
kcal.per.g ~ dnorm( mu , sigma ) ,
mu <- a[clade_id] ,
a[clade_id] ~ dnorm( 0.6 , 10 ) ,
sigma ~ dunif( 0 , 10 )
 ) ,
data = d )

Answer 1

使用 rstanarm 程序包执行此操作的最类似方法是使用

library(rstanarm)
fit1 <- stan_glmer(kcal.per.g ~ 1 + (1 | clade_id), data = milk,
                   prior_intercept = normal(0.6, 1, autoscale = FALSE), 
                   prior_aux = exponential(rate = 1/5, autoscale = FALSE),
                   prior_covariance = decov(shape = 10, scale = 1))

但是，由于以下原因，这并不完全相同：

sigma上有界的统一先验未实现，因为它们不是一个好主意，因此我使用指数分布，期望为5而不是
修复a上的标准偏差也未实现，因此我使用了期望值为10的gamma分布
rstanarm （和 lme4 ）中的分层模型参数化与公共参数的偏差，因此我没有使用a的期望值0.6，对全局截距使用了0.6的预期，a上的先验是正常的，期望为零。这意味着您需要拨打coef(fit1)而不是ranef(fit1)来查看“拦截”，因为它们在原始模型中已参数化。

R中“rstanarm”包中的分类变量的唯一截距方法

1 个答案: