在R

时间:2017-12-04 05:11:19

标签: r

我正在尝试使用DirichReg包中的DirichletReg函数对测试集进行预测。当我运行只有少数预测变量的模型时,它工作正常,但当我使用超过5个预测变量时,我得到一个我无法弄清楚的错误。下面的代码创建了一个再现错误的MWE。

library(DirichletReg)
set.seed(1)
# create dataset
predictor1 <- rnorm(n = 1000, mean = 5, sd = 1)
predictor2 <- rnorm(n = 1000, mean = 5, sd = 1)
predictor3 <- rnorm(n = 1000, mean = 5, sd = 1)
predictor4 <- rnorm(n = 1000, mean = 5, sd = 1)
predictor5 <- rnorm(n = 1000, mean = 5, sd = 1)
predictor6 <- rnorm(n = 1000, mean = 5, sd = 1)
prob_A <- runif(n = 1000, min = 0, max = 0.5)
prob_B <- runif(n = 1000, min = 0, max = 0.5)
prob_C <- 1 - prob_A - prob_B
dat <- data.frame(predictor1, predictor2, predictor3, predictor4, predictor5,
                  predictor6, prob_A, prob_B, prob_C)

# split data into training and test sets
train_vec <- sample(c(0, 1), size = nrow(dat), replace = T, prob = c(0.2, 0.8))
train_dat <- dat[train_vec == 1, ]
test_dat <- dat[train_vec == 0, ]

# run model
train_dat$prob <- DR_data(train_dat[, c('prob_A', 'prob_B', 'prob_C')])
mod <- DirichReg(prob ~ predictor1 + predictor2 + predictor3 + predictor4 +
                        predictor5 + predictor6,
                 data = train_dat, model = 'common')

# run predictions
test_dat$prob <- DR_data(test_dat[, c('prob_A', 'prob_B', 'prob_C')])
preds <- predict(object = mod, newdata = test_dat)

这是我得到的错误:

Error in parse(text = x, keep.source = FALSE) : 
  <text>:1:74: unexpected '|'
1: prob ~ predictor1 + predictor2 + predictor3 + predictor4 + predictor5 +  |
                                                                             ^

我将不胜感激任何帮助。我还没有能够发现错误或在软件包文档中找到它。

1 个答案:

答案 0 :(得分:1)

这似乎是包中的一个错误。我建议您联系软件包维护人员进行报告。

一种可能的解决方法是明确列出回归规范的各个部分,而不是依赖于包来内部复制所有部分的回归量。

mod2 <- DirichReg(prob ~
  predictor1 + predictor2 + predictor3 + predictor4 + predictor5 + predictor6 |
  predictor1 + predictor2 + predictor3 + predictor4 + predictor5 + predictor6 |
  predictor1 + predictor2 + predictor3 + predictor4 + predictor5 + predictor6,
  data = train_dat, model = "common")
all.equal(coef(mod), coef(mod2))
## [1] TRUE
predict(mod2, newdata = test_dat)
##             [,1]      [,2]      [,3]
##   [1,] 0.2436493 0.2715895 0.4847612
##   [2,] 0.2541715 0.2252292 0.5205993
##   [3,] 0.2618741 0.2345063 0.5036196
##   ...