Jags中的贝叶斯滚动泊松回归(通过R2jags)

时间:2018-02-10 17:13:00

标签: r bayesian r2jags

问题

我有一个小数据集(N = 100)。我需要运行泊松回归,但一次排除一个观察值(因此, Rolling Poisson回归)。

在等式中有几个预测因子,但我关心一个(称之为 b.x )。我的想法是看看在100个型号中b.x有多大差异。然后,我想用Y轴上的效果大小和X轴上的模型编号绘制这些100点估计值。

总之,我需要以下内容:

  1. 在JAGS中运行滚动泊松回归(通过R2jags)。

  2. 获得估算后,绘制它们。

  3. 请注意我在JAGS中的Poisson模型运行良好(下面是我的模型/数据的样本玩具)。但是,我无法实现“滚动”版本

    自包含示例

    # clear R
    rm(list=ls())
    cat("\014")
    
    # load libraries
    if (!require("pacman")) install.packages("pacman"); library(pacman) 
    p_load(R2jags)
    
    
    # Toy Data
    N <- 100
    x <- rnorm(n=N)  # standard Normal predictor
    y <- rpois(n=N, lambda = 1)  # Poisson DV
    
    
    # model
    model <- function() {
        ## Likelihood
        for(i in 1:N){
                y[i] ~ dpois(lambda[i])
                log(lambda[i]) <- 
                        mu + # intercept
                        b.x*x[i]
                }
        ## Priors 
        mu ~  dnorm(0, 0.01) ## intercept
        b.x ~ dnorm(0, 0.01)
        }
    
    # list elements
    data.list <- list(N = N, y = y, x = x)
    
    # run model
    model.fit <- jags(
        data=data.list,
        inits=NULL,
        parameters.to.save = c("b.x"),
        n.chains = 1,
        n.iter = 20,
        n.burnin = 2, 
        model.file=model,
        progress.bar = "none")
    

    确定。这就是模型。在model.fit中有 b.x ,系数我必须得到100次。使用我当前的代码,我只能使用完整的数据集获取它一次。但是,我需要第二次获取它,排除df的第一行,然后第三次,但排除df的第二行,依此类推。然后,绘制所有这些 b.x

    现在,仅仅为了示例,我将创建一个简单的表,只是为了表明我需要第一个元素( b.x 的系数)。

    ## I sourced this function below from    https://raw.githubusercontent.com/jkarreth/JKmisc/master/mcmctab.R
    
    # Function to Create Table
    mcmctab <- function(sims, ci = .8, digits = 2){
    
        require(coda) 
    
        if(class(sims) == "jags" | class(sims) == "rjags"){
                sims <- as.matrix(as.mcmc(sims))
        }
        if(class(sims) == "bugs"){
                sims <- sims$sims.matrix
        }  
        if(class(sims) == "mcmc"){
                sims <- as.matrix(sims)
        }    
        if(class(sims) == "mcmc.list"){
                sims <- as.matrix(sims)
        }      
        if(class(sims) == "stanfit"){
                stan_sims <- rstan::As.mcmc.list(sims)
                sims <- as.matrix(stan_sims)
        }      
    
    
        dat <- t(sims)
    
        mcmctab <- apply(dat, 1,
                         function(x) c(Mean = round(mean(x), digits = digits), # Posterior mean
                                       SD = round(sd(x), digits = 3), # Posterior SD
                                       Lower = as.numeric(
                                               round(quantile(x, probs = c((1 - ci) / 2)), 
                                                     digits = digits)), # Lower CI of posterior
                                       Upper = as.numeric(
                                               round(quantile(x, probs = c((1 + ci) / 2)), 
                                                     digits = digits)), # Upper CI of posterior
                                       Pr. = round(
                                               ifelse(mean(x) > 0, length(x[x > 0]) / length(x),
                                                      length(x[x < 0]) / length(x)), 
                                               digits = digits) # Probability of posterior >/< 0
                         ))
        return(t(mcmctab))
    }
    
    
    # this is the coefficient I need, but with different data frames.
    mcmctab(model.fit)[1,1]
    

    抱歉,我甚至无法在此提供尝试的解决方案。非常感谢提前。

2 个答案:

答案 0 :(得分:1)

使用for循环或其中一个apply家庭成员一次排除一个观察结果:

sims <- lapply(1:100, function(i) {
  data.list <- list(N = N - 1, y = y[-i], x = x[-i])
  # run model
  model.fit <- jags(
      data=data.list,
      inits=NULL,
      parameters.to.save = c("b.x"),
      n.chains = 1,
      n.iter = 20,
      n.burnin = 2, 
      model.file=model,
      progress.bar = "none")
  return(model.fit)
})

然后,您可以通过循环输出来提取您感兴趣的数量:

sapply(sims, function(x) x$BUGSoutput$mean$b.x)
#  [1] -0.018966261 -0.053383364 -0.030193649 -0.097046841 -0.026258934
#  [6] -0.005486296  0.084811315 -0.047736880  0.142379194 -0.026583145
# <snip>

答案 1 :(得分:0)

# clear R
rm(list=ls())

# load libraries
library(R2jags)

# Toy Data
set.seed(123) # set RNG seed for reproducibility
N <- 100
x <- rnorm(n=N)  # standard Normal predictor
y <- rpois(n=N, lambda = 1)  # Poisson DV

# model
model <- function() {
  ## Likelihood
  for(i in 1:N){
    y[i] ~ dpois(lambda[i])
    log(lambda[i]) <- 
      mu + # intercept
      b.x*x[i]
  }
  ## Priors 
  mu ~  dnorm(0, 0.01) ## intercept
  b.x ~ dnorm(0, 0.01)
}

# list elements
data.list <- list() # create empty list to fill in next line

# fill list with one data set for each step, with one row excluded per step
for(i in 1:100){
  data.list[[i]] <- list(N = 99, y = y[-i], x = x[-i])
}

# Starting value for reproducibility
model.inits <- function(){
  list("b.x" = 0)
}

# run model
model.fit <- list() # again, create empty list first

for(i in 1:100){  # use loop here to fit one model per data set
model.fit[[i]] <- jags(
  data=data.list[[i]],
  inits=NULL,
  parameters.to.save = c("b.x"),
  n.chains = 1,
  n.iter = 20,
  n.burnin = 2, 
  model.file=model,
  progress.bar = "none")
}


# helper function for output
devtools::source_url("https://raw.githubusercontent.com/jkarreth/JKmisc/master/mcmctab.R")

# create empty data frame to be filled with estimation results per data set
tab <- data.frame(index = c(1:100), b = rep(NA, 100), lower = rep(NA, 100), upper = rep(NA, 100))

# fill with estimates, using mcmctab to extract mean & lower & upper CIs
for(i in 1:100){
  tab[i, 2] <- mcmctab(model.fit[[i]])[1, 1]
  tab[i, 3] <- mcmctab(model.fit[[i]])[1, 3]
  tab[i, 4] <- mcmctab(model.fit[[i]])[1, 4]
}

# plot results
library(ggplot2)

p <- ggplot(data = tab, aes(x = b, y = index)) + geom_point() + geom_segment(aes(x = lower, xend = upper, yend = index))
p

output

我感谢Johannes Karreth提供这个好的答案。