我正在使用R中的mgcv
包来拟合空间二项式模型,并希望模拟预测点的后验分布(下面的代码)。我一直在使用模拟数据来测试后路的覆盖范围属性。我发现,当总体患病率约为0.5(50%)时,覆盖率非常差(约35%的真实值位于95%的后验区间内),但是当您远离0.5时,情况会有所改善。例如,当平均患病率为1%时,〜97%位于后部95%以内。我想我的问题是:
spaMM
包来拟合具有空间相关随机效应(使用Laplace近似)的模型,该模型的效果要好一些。毫无疑问,MCMC方法会更好,但地统计方法在缩放时会受到限制模型/预测点的数量,所以我很想使用mgcv
。任何想法/评论都将受到欢迎!
干杯,休
library(mgcv)
library(RandomFields)
library(raster)
# Simluate some data
set.seed(1981)
mean <- 0
model <- RMexp(var=0.5, scale=50)
simu <- RandomFields::RFsimulate(model, x=1:256,
y=1:256, RFoptions(spConform=FALSE))
# Convert to raster
simu_raster <- raster(nrows = 256, ncol = 256, xmn=0, xmx=1, ymn=0, ymx=1)
simu_raster[] <- as.vector(simu)
# Add mean and onvert to probability
log_odds_raster <- mean + simu_raster
prev_raster <- exp(log_odds_raster) / (1 + exp(log_odds_raster))
# simulate 1000 candidate sampling points
candidate_points <- coordinates(prev_raster)[sample(1:nrow(coordinates(prev_raster)), 1000),]
# Sample 100 of those and take binomial sample of 100 individuals per location
sampled_points_idx <- sample(1:nrow(candidate_points), 100)
sampled_points <- as.data.frame(candidate_points[sampled_points_idx,])
sampled_points$n_pos <- rbinom(100, 100, extract(prev_raster, sampled_points))
sampled_points$n_neg <- 100 - sampled_points$n_pos
# Fit spatial GAM
spatial_mod <- gam(cbind(n_pos, n_neg) ~ s(x, y),
data = sampled_points,
family="binomial")
# check k and plot observed v predicted
gam.check(spatial_mod)
# Simulate 1000 draws from the posterior at every non-sampled location
prediction_data <- as.data.frame(candidate_points[-sampled_points_idx,])
prediction_data$prev <- extract(prev_raster, prediction_data)
Cg <- predict(spatial_mod, prediction_data, type = "lpmatrix")
sims <- rmvn(1000, mu = coef(spatial_mod), V = vcov(spatial_mod, unconditional = TRUE))
fits <- Cg %*% t(sims)
fits_prev <- exp(fits) / (1 + exp(fits))
# For every prediction point, see whether the true/simulated prevalence
# lies within the posterior with correct accuracy. i.e. 95% of the time,
# the true value should lie within the 95% BCI.
BCI_95 <- apply(fits_prev, 1, FUN=function(x){quantile(x, prob = c(0.025, 0.975))})
within_BCI <- c()
for(i in 1:nrow(prediction_data)){
within_BCI <- c(within_BCI, (prediction_data$prev[i] >= BCI_95[1,i] &
prediction_data$prev[i] <= BCI_95[2,i]))
}
mean(within_BCI)
答案 0 :(得分:0)
更新:
已多次运行此实验,均值和覆盖率之间的关系没有上面示例中的极端。
此外,在上面,我使用了默认的k
(即s(x,y)
),并使用gam.check
表示k
足够高。但是,如果您使用较高的k(即s(x, y, k=100)
),从而使样条曲线更加摆动,则预测间隔自然会更宽(即更加不确定),并且覆盖率也会提高。覆盖率仍然是相当可变的,但是要好得多。
很想听听别人的想法。