数据

Question

我有一组不同长度的时间序列变量我正在尝试使用Stan语言（使用rstan）估计分层自回归模型。我已经学习了Stan的基础知识，但我不确定如何表达一组长度不同的向量。

我最后通过填充任意数字来增加较短的向量，以便所有向量具有相同的长度，并忽略模型部分中的这些填充值。

虽然这似乎有效，但我想知道是否有更好的方法来定义这种类型的数据。任何建议表示赞赏！

我在Ubuntu 14.04 R version 3.2.3和rstan_2.9.0。

数据

我的数据看起来像这样。请注意，y中有三个向量，每个向量的长度为3,5和10.我使用100填充较短的向量，以便所有向量的大小为10.

$N
[1]  3  5 10

$K
[1] 3

$maxN
[1] 10

$y
$y[[1]]
 [1]  -0.626453811  -0.069571811  -0.004434404 100.000000000 100.000000000 100.000000000 100.000000000
 [8] 100.000000000 100.000000000 100.000000000

$y[[2]]
 [1]   1.5952808   0.9305912   0.4832488   0.3903673   0.3690161 100.0000000 100.0000000 100.0000000 100.0000000
[10] 100.0000000

$y[[3]]
 [1] 0.5757814 0.5876644 0.7800761 0.8410528 0.7948234 0.5938711 0.7469771 0.7677860 0.7893884 0.9048332

Stan代码

我的Stan代码如下：

data {
  int<lower=0> K;     // number of series
  int<lower=0> N[K];  // length of each series

  int<lower=0> maxN;  // maximum length of series
  vector[maxN] y[K];  // time series (define as the same length)
}

parameters {
  // For simplicity, assume only beta varies across series
  real alpha;
  real beta[K];
  real<lower=0> sigma;

  real beta0;
  real<lower=0> beta_sig; 
}

model {
  for (k in 1:K) {
    beta[k] ~ normal(beta0, beta_sig);
    y[k][2:N[k]] ~ normal(alpha + beta[k]*y[k][1:(N[k]-1)], sigma);
  }
}

R脚本

下面的代码重现我的数据，Stan代码和R中的拟合。

library(rstan)
dat <- structure(list(N = c(3, 5, 10), K = 3, maxN = 10, y = list(c(-0.626453810742332,
                                                                    -0.0695718108004915, -0.00443440448115216, 100, 100, 100, 100,
                                                                    100, 100, 100), c(1.59528080213779, 0.930591178250432, 0.483248750713414,
                                                                                      0.390367280599556, 0.3690161108127, 100, 100, 100, 100, 100),
                                                                  c(0.575781351653492, 0.587664377772507, 0.780076056840342,
                                                                    0.841052774797451, 0.794823439263525, 0.593871106619423,
                                                                    0.746977087771791, 0.767786018093089, 0.789388389973885,
                                                                    0.904833172045027))), .Names = c("N", "K", "maxN", "y"))
stancode <- structure("data {\n  int<lower=0> K;     // number of series\n  int<lower=0> N[K];  // length of each series\n\n  int<lower=0> maxN;  // maximum length of series\n  vector[maxN] y[K];  // time series\n}\n\nparameters {\n  // For simplicity, assume only beta varies across series\n  real alpha;\n  real beta[K];\n  real<lower=0> sigma;\n\n  real beta0;\n  real<lower=0> beta_sig;\n}\n\nmodel {\n  for (k in 1:K) {\n    beta[k] ~ normal(beta0, beta_sig);\n    y[k][2:N[k]] ~ normal(alpha + beta[k]*y[k][1:(N[k]-1)], sigma);\n  }\n}", model_name2 = "HierarchicalAR")
fit <- stan(model_code = stancode, data = dat,
            pars = c("alpha", "beta", "sigma", "beta0", "beta_sig"))

Answer 1

Currently, there are no ragged array data structures in Stan, so there is no elegant way to accomplish your task. In this situation, I think padding the shorter vectors so that they are all the same length is the easiest one, although I would use more extreme values to make indexing mistakes easier to detect.

如何在Stan模型中定义一组具有不同长度的向量

数据

Stan代码

R脚本

1 个答案: