Question

我正在进行几项分析，我希望预测每个因素水平或甚至多个因素的某些数值，例如：性别和年龄的条件。到目前为止，我的过程是相当手动的，如下所示，这对于一个变量/因子来说很好，比如2-5级。但是对于具有多个级别或多个因素的因素进行调整是不可扩展的。

预测包中是否有任何类型的“分组依据”或“子集”功能会有所帮助？我开始编写一个程序，在最常见的情况下（即对于任意数量的因素和级别）执行以下过程，但尚未成功。

顺便说一句，不幸的是我的数据是私有的，我不能在这里分享。但它并不重要，因为下面的代码可行，我正在寻找更好的，即可扩展的解决方案。

# Example code

# category is a factor with levels A and B; amt is the variable to model/forecast
# using data.table syntax to create a vector for each category
vec1 <- dt[category == 'A']$amount
vec2 <- dt[category == 'B']$amount

# Create ts objects from above vectors
ts1 <- ts(vec1, start=c(start_year, start_month), end=c(end_year, end_month), frequency=12)
ts2 <- ts(vec2, start=c(start_year, start_month), end=c(end_year, end_month), frequency=12)

# Fit model 
fit1 <- auto.arima(ts1, trace = TRUE, stepwise = FALSE)
fit2 <- auto.arima(ts2, trace = TRUE, stepwise = FALSE)


# Forecast out using selected models
h <- 12
fcast1 <- forecast(fit1, h)
fcast2 <- forecast(fit2, h)

# funggcast pulls out data from the forecast object into a df (needed for ggplot2)
# output columns are date, observed, fitted, forecast, lo80, hi80, lo95, hi95
fcastdf1 <- funggcast(ts1, fcast1)
fcastdf2 <- funggcast(ts2, fcast2)

# Add in category
fcastdf1$category <- 'A'
fcastdf2$category <- 'B'


# Merge into one df
df <- merge(fcastdf1, fcastdf2, all=T)

# Basic qplot from ggplot2 package, I am actually incorporating quite a bit more formatting but this is just to give an idea
qplot(x=date, 
      y=observed, 
      data=df, 
      color=category, 
      group=category, geom="line") +
geom_line(aes(y=forecast), col='blue')

Answer 1

您可以使用tapply：

执行此操作

  res <- tapply(amount, category, function(x) {
    ts <- ts(x, start = start, frequency = 12)
    fit <- auto.arima(ts, trace = TRUE, stepwise = FALSE)
    fcastdf <- forecast(fit, h = h)
    return(fcastdf)
  })

这将返回一个命名的预测列表。

您必须将开始设置为数据集中最早的日期。

使用R中的预测包进行分组

1 个答案: