Rstan中三级嵌套线性模型出错

时间:2015-09-03 15:16:18

标签: r hierarchical-clustering rstan

我试图在Rstan中运行三级嵌套线性模型但仍然收到错误消息。

我从这个三级嵌套模型中汲取灵感:

http://rstudio-pubs-static.s3.amazonaws.com/64315_bc3a395edd104095a8384db8d9952f43.html

本质上,我重复测量了四个不同年份的平均宽带速度数据(总计696个观测值),这些数据嵌套在174个地方当局中,最终嵌套在11个地区内。

这是数据:

yrid laid rnid speed
1     1    1     2.3
2     1    1     8.4
3     1    1     9.2
4     1    1     5.8
5     2    1     1.5
6     2    1     9.8
7     2    1     4.7

代码:

#load and prepare data
total<-read.csv("total.csv")

names(total)[names(total)=="X"] <- "yrid"
names(total)[names(total)=="lad.num"] <- "laid"
names(total)[names(total)=="region"] <- "rnid"

## Sort by laid for easier handling in Stan
total <- arrange(total, laid, rnid)

## Create a vector of school IDs where j-th element gives school ID 
for class ID j
regionLookupVec <- unique(total[c("laid","rnid")])[,"rnid"]

## Combine as a stan dataset
Ni <- length(unique(total$yrid))
Nj <- length(unique(total$laid))
Nk <- length(unique(total$rnid))              
laid <- (total$laid)
rnid <- (total$rnid)
regionLookup <- (regionLookupVec)
speed <- (total$sp)

## Combine as a stan dataset
dat <- (list(Ni           = length(unique(Ni)),
             Nj           = length(unique(Nj)),
             Nk           = length(unique(Nk)),
             laid         = laid,
             rnid         = rnid,
             regionLookup = regionLookupVec,
             speed        = speed))

#load model
stanmodelcode <- "data {
   ##Define variables in data
   ##Number of level-1 observations (an integer)
   int<lower=0> Ni;
   ## Number of level-2 clusters
   int<lower=0> Nj;
   ## Number of level-3 clusters
   int<lower=0> Nk;
   ## Cluster IDs
   int<lower=1> laid[Ni];
   int<lower=1> rnid[Ni];
   ## Level 3 look up vector for level 2
   int<lower=1> regionLookup[Nj];
   ## Continuous outcome
   real speed[Ni];
   ## Continuous predictor
   ## real X_1ijk[Ni];
   }

 parameters {
   ## Define parameters to estimate
   ## Population intercept (a real number)
   real beta_0;
   ## Population slope
   ## real beta_1;
   ## Level-1 errors
   real<lower=0> sigma_e0;
   ## Level-2 random effect
   real u_0jk[Nj];
   real<lower=0> sigma_u0jk;
   ## Level-3 random effect
   real u_0k[Nk];
   real<lower=0> sigma_u0k;
   }
transformed parameters  {
   ## Varying intercepts
   real beta_0jk[Nj];
   real beta_0k[Nk];
   ## Individual mean
   real mu[Ni];
   ## Varying intercepts definition
   ## Level-3 (level-3 random intercepts)
   for (k in 1:Nk) {
   beta_0k[k] <- beta_0 + u_0k[k];
   }
   ## Level-2 (level-2 random intercepts)
   for (j in 1:Nj) {
   beta_0jk[j] <- beta_0k[regionLookup[j]] + u_0jk[j];
   }
   ## Individual mean
   for (i in 1:Ni) {
   mu[i] <- beta_0jk[laid[i]];
   }
   }
model {
   ## Prior part of Bayesian inference
   ## Flat prior for mu (no need to specify if non-informative)
   ## Random effects distribution
   u_0k  ~ normal(0, sigma_u0k);
   u_0jk ~ normal(0, sigma_u0jk);
   ## Likelihood part of Bayesian inference
   ## Outcome model N(mu, sigma^2) (use SD rather than Var)
   for (i in 1:Ni) {
   speed[i] ~ normal(mu[i], sigma_e0);
   }
   }"
#run model
   resStan <-stan(model_code = stanmodelcode, data=dat, 
   iter=3000,    chains=3,   warmup=500, thin=10)}  

错误:

COMPILING THE C++ CODE FOR MODEL 'stanmodelcode' NOW.
Error : mismatch in dimension declared and found in context; 
processing stage=data initialization; variable name=laid; position=0; 
dims declared=(1); dims found=(696) failed to create the sampler; 
sampling not done

1 个答案:

答案 0 :(得分:1)

错误消息表明它期望laid变量的维度为1,但实际传递给stan的数据的维度为696.

在您的代码中

dat <- (list(Ni           = length(unique(Ni)),

表示:为Ni赋予Ni中唯一值长度的值。什么是倪?

Ni <- length(unique(total$yrid))

我认为是696.因此Ni = length(unique(696))给出Ni = 1,这是错误的来源。

尝试将dat变量更改为:

dat <- (list(Ni           = Ni,
             Nj           = Nj,
             Nk           = Nk,
             laid         = laid,
             rnid         = rnid,
             regionLookup = regionLookupVec,
             speed        = speed))