我正在尝试使用rethinking
软件包(在rstan
MCMC上绘制)来拟合二项式GLM。
该模型合适,但是采样效率很低,Rhat指示出了问题。我不明白出现此问题的原因。
这是数据:
d <- read_csv("https://gist.githubusercontent.com/sebastiansauer/a2519de39da49d70c4a9e7a10191cb97/raw/election.csv")
d <- as.data.frame(dummy)
这是模型:
m1_stan <- map2stan(
alist(
afd_votes ~ dbinom(votes_total, p),
logit(p) <- beta0 + beta1*foreigner_n,
beta0 ~ dnorm(0, 10),
beta1 ~ dnorm(0, 10)
),
data = d,
WAIC = FALSE,
iter = 1000)
适合的诊断(正确的样本数量)表明出了点问题:
Mean StdDev lower 0.89 upper 0.89 n_eff Rhat
beta0 -3.75 0 -3.75 -3.75 3 2.21
beta1 0.00 0 0.00 0.00 10 1.25
示踪图未显示“肥毛毛虫”:
stan输出建议增加两个参数,adapt_delta
和max_treedepth
,我这样做了。这在某种程度上改善了采样过程:
Mean StdDev lower 0.89 upper 0.89 n_eff Rhat
beta0 18.1 0.09 18.11 18.16 28 1.06
beta1 0.0 0.00 0.00 0.00 28 1.06
配对图看起来也很奇怪:
我还尝试了什么:
但是到目前为止,没有任何帮助。效率低下的原因可能是什么?我可以尝试什么?
每一个中的大计数是否会成为问题?
mean(d_short$afd_votes)
[1] 19655.83
数据摘录:
head(d)
afd_votes votes_total foreigner_n
1 11647 170396 16100
2 9023 138075 12600
3 11176 130875 11000
4 11578 156268 9299
5 10390 150173 25099
6 11161 130514 13000
会话信息:
sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] viridis_0.5.1 viridisLite_0.3.0 sjmisc_2.7.3 pradadata_0.1.3 rethinking_1.59 rstan_2.17.3 StanHeaders_2.17.2 forcats_0.3.0 stringr_1.3.1
[10] dplyr_0.7.6 purrr_0.2.5 readr_1.1.1 tidyr_0.8.1 tibble_1.4.2 ggplot2_3.0.0 tidyverse_1.2.1
loaded via a namespace (and not attached):
[1] httr_1.3.1 jsonlite_1.5 modelr_0.1.2 assertthat_0.2.0 stats4_3.5.0 cellranger_1.1.0 yaml_2.1.19 pillar_1.3.0 backports_1.1.2
[10] lattice_0.20-35 glue_1.3.0 digest_0.6.15 rvest_0.3.2 snakecase_0.9.1 colorspace_1.3-2 htmltools_0.3.6 plyr_1.8.4 pkgconfig_2.0.1
[19] broom_0.5.0 haven_1.1.2 bookdown_0.7 mvtnorm_1.0-8 scales_0.5.0 stringdist_0.9.5.1 sjlabelled_1.0.12 withr_2.1.2 RcppTOML_0.1.3
[28] lazyeval_0.2.1 cli_1.0.0 magrittr_1.5 crayon_1.3.4 readxl_1.1.0 evaluate_0.11 nlme_3.1-137 MASS_7.3-50 xml2_1.2.0
[37] blogdown_0.8 tools_3.5.0 loo_2.0.0 data.table_1.11.4 hms_0.4.2 matrixStats_0.54.0 munsell_0.5.0 prediction_0.3.6 bindrcpp_0.2.2
[46] compiler_3.5.0 rlang_0.2.1 grid_3.5.0 rstudioapi_0.7 labeling_0.3 rmarkdown_1.10 gtable_0.2.0 codetools_0.2-15 inline_0.3.15
[55] curl_3.2 R6_2.2.2 gridExtra_2.3 lubridate_1.7.4 knitr_1.20 bindr_0.1.1 rprojroot_1.3-2 KernSmooth_2.23-15 stringi_1.2.4
[64] Rcpp_0.12.18 tidyselect_0.2.4 xfun_0.3 coda_0.19-1
答案 0 :(得分:2)
STAN在单位比例和不相关参数方面表现更好。摘自STAN手册§28.4模型条件和曲率:
理想情况下,应对所有参数进行编程,以使它们具有单位 规模,使后方相关性降低;一起,这些 属性表示不需要旋转或缩放 Stan算法的最佳性能。对于汉密尔顿蒙特卡洛, 这意味着单位质量矩阵,它不需要任何调整 算法初始化的地方。黎曼哈密顿蒙特卡洛 在每个步骤中即时执行此条件,但是这样 条件在计算上非常昂贵。
在您的情况下,beta1
与foreigner_n
绑定,后者没有单位规模,因此与beta0
相比不平衡。此外,由于foreigner_n
不在居中,因此两个beta都在采样期间更改p
的位置,因此后验相关。
标准化可产生更易于处理的模型。将foreigner_n
转换为居中和单位比例可使模型快速收敛并产生有效的有效样本量。我还认为,这种形式的beta更具解释性,因为beta0
仅关注p
的位置,而beta1
仅关注foreigner_n
的变化如何解释afd_votes/total_votes
中的变化。
library(readr)
library(rethinking)
d <- read_csv("https://gist.githubusercontent.com/sebastiansauer/a2519de39da49d70c4a9e7a10191cb97/raw/election.csv")
d <- as.data.frame(d)
d$foreigner_z <- scale(d$foreigner_n)
m1 <- alist(
afd_votes ~ dbinom(votes_total, p),
logit(p) <- beta0 + beta1*foreigner_z,
c(beta0, beta1) ~ dnorm(0, 1)
)
m1_stan <- map2stan(m1, data = d, WAIC = FALSE,
iter = 10000, chains = 4, cores = 4)
检查样本,我们有
> summary(m1_stan)
Inference for Stan model: afd_votes ~ dbinom(votes_total, p).
4 chains, each with iter=10000; warmup=5000; thin=1;
post-warmup draws per chain=5000, total post-warmup draws=20000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
beta0 -1.95 0.00 0.00 -1.95 -1.95 -1.95 -1.95 -1.95 16352 1
beta1 -0.24 0.00 0.00 -0.24 -0.24 -0.24 -0.24 -0.24 13456 1
dev 861952.93 0.02 1.97 861950.98 861951.50 861952.32 861953.73 861958.26 9348 1
lp__ -17523871.11 0.01 0.99 -17523873.77 -17523871.51 -17523870.80 -17523870.39 -17523870.13 9348 1
Samples were drawn using NUTS(diag_e) at Sat Sep 1 11:48:55 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
再看成对图,我们发现beta之间的相关性降低到0.15:
我本来以为foreigner_n
居中是主要问题。同时,我有点困惑,因为STAN使用的是HMC / NUTS,我一直认为它对相关的潜在变量相当健壮。但是,我注意到STAN手册中有关由于数值不稳定性而导致标度不变的实际问题的评论,这些评论也是commented on by Michael Betancourt in a CrossValidated answer(尽管这是一个相当古老的文章)。因此,我想测试居中或缩放对改善采样效果是否最有效。
居中仍然会导致性能很差。请注意,有效样本量实际上是每条链上一个有效样本。
library(readr)
library(rethinking)
d <- read_csv("https://gist.githubusercontent.com/sebastiansauer/a2519de39da49d70c4a9e7a10191cb97/raw/election.csv")
d <- as.data.frame(d)
d$foreigner_c <- d$foreigner_n - mean(d$foreigner_n)
m2 <- alist(
afd_votes ~ dbinom(votes_total, p),
logit(p) <- beta0 + beta1*foreigner_c,
c(beta0, beta1) ~ dnorm(0, 1)
)
m2_stan <- map2stan(m2, data = d, WAIC = FALSE,
iter = 10000, chains = 4, cores = 4)
产生
Inference for Stan model: afd_votes ~ dbinom(votes_total, p).
4 chains, each with iter=10000; warmup=5000; thin=1;
post-warmup draws per chain=5000, total post-warmup draws=20000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
beta0 -0.64 0.4 0.75 -1.95 -1.29 -0.54 0.2 0.42 4 2.34
beta1 0.00 0.0 0.00 0.00 0.00 0.00 0.0 0.00 4 2.35
dev 18311608.99 8859262.1 17270228.21 861951.86 3379501.84 14661443.24 37563992.4 46468786.08 4 1.75
lp__ -26248697.70 4429630.9 8635113.76 -40327285.85 -35874888.93 -24423614.49 -18782644.5 -17523870.54 4 1.75
Samples were drawn using NUTS(diag_e) at Sun Sep 2 18:59:52 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
而且似乎仍然有问题的配对图:
缩放比例可以大大提高采样率!即使得出的后代仍具有很高的相关性,但有效样本量仍在可接受的范围内,尽管远低于完全标准化的范围。
library(readr)
library(rethinking)
d <- read_csv("https://gist.githubusercontent.com/sebastiansauer/a2519de39da49d70c4a9e7a10191cb97/raw/election.csv")
d <- as.data.frame(d)
d$foreigner_s <- d$foreigner_n / sd(d$foreigner_n)
m3 <- alist(
afd_votes ~ dbinom(votes_total, p),
logit(p) <- beta0 + beta1*foreigner_s,
c(beta0, beta1) ~ dnorm(0, 1)
)
m3_stan <- map2stan(m2, data = d, WAIC = FALSE,
iter = 10000, chains = 4, cores = 4)
屈服
Inference for Stan model: afd_votes ~ dbinom(votes_total, p).
4 chains, each with iter=10000; warmup=5000; thin=1;
post-warmup draws per chain=5000, total post-warmup draws=20000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
beta0 -1.58 0.00 0.00 -1.58 -1.58 -1.58 -1.58 -1.57 5147 1
beta1 -0.24 0.00 0.00 -0.24 -0.24 -0.24 -0.24 -0.24 5615 1
dev 861952.93 0.03 2.01 861950.98 861951.50 861952.31 861953.69 861958.31 5593 1
lp__ -17523870.45 0.01 1.00 -17523873.15 -17523870.83 -17523870.14 -17523869.74 -17523869.48 5591 1
Samples were drawn using NUTS(diag_e) at Sun Sep 2 19:02:00 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
对图显示仍然存在显着的相关性:
因此,尽管对变量进行解相关确实可以改善采样,但消除规模不平衡在此模型中最为重要。