我正在将fpp2软件包中的数据集与预测软件包中的ets函数组合在一起,因为我预测了多个时间序列,所以我使用自己的函数同时进行了多个投影。
# CODE
library(fpp2) # required for the data
library(dplyr)
library(forecast)
MY_DATA<-uschange[,1:4]
head(MY_DATA)
tail(MY_DATA)
#1. Own forecasting function
FORECASTING_FUNCTION_ETS <- function(Z, hrz = 16) {
timeseries <- msts(Z, start = 1970, seasonal.periods = 4)
forecast <- ets(timeseries)
}
为了获得更准确的投影,我想使用分区。通过将序列修剪成两个周期来完成分区。较早的时期是训练集,较晚的时期是测试集。
#2.Partitioning (training and test set)
for (i in 1:20)
{ nTest <- 16*i
nTrain <- length(MY_DATA[,2:2])- nTest
train <- window(MY_DATA[,2:2],start=1970, end=c(2015,3),nTrain)
test <- window(MY_DATA[,2:2], start=1970, end=c(2016,3),nTrain+16)
s <- FORECASTING_FUNCTION_ETS(train)
sp<- predict(s,h=16)
cat("----------------------------------
Data Partition",i,"
Training Set includes",nTrain," time periods. Observations 1 to", nTrain, "
Test Set includes 16 time periods. Observations", nTrain+1, "to", nTrain+16,"
")
print(accuracy(sp,test))
cat("
")
print(sp$model)
}
到目前为止非常好:)这段代码可以很好地用于一个系列(“消耗”),我可以看到“训练和测试集”的所有结果。
但是在这里,我的意图是使用上述代码进行分区,不仅可以同时对一个系列,而且可以对所有四个系列(消费,收入,生产和储蓄)进行分区。 出于这个原因,我尝试使用下面的代码使用“ [,i]”,以便使用下面的代码从所有四个系列中获取结果:
#3.Trying to upgrade code above
for (i in 1:20)
{ nTest[,i] <- 16*i
nTrain[,i] <- length(MY_DATA[,i])- nTest
train[,i] <- window(MY_DATA[,i],start=1970, end=c(2015,3),nTrain)
test[,i] <- window(MY_DATA[,i], start=1970, end=c(2016,3),nTrain+16)
s <- FORECASTING_FUNCTION_ETS(train[,i])
sp<- predict(s[,i],h=16)
cat("----------------------------------
Data Partition",i,"
Training Set includes",nTrain," time periods. Observations 1 to", nTrain, "
Test Set includes 16 time periods. Observations", nTrain+1, "to", nTrain+16,"
")
print(accuracy(sp,test))
cat("
")
print(sp$model)
}
但是有一些错误,此代码无法正常工作。那么有人可以帮我解决这个问题并修复上面的代码吗?
答案 0 :(得分:2)
这不完全是您要的,因此我不希望您接受此答案,但这对我来说是一个有趣的问题,因此我认为我还是会提供一种方法。
我将首先假设您的主要目标是弄清楚如何迭代一个过程来评估多个时间序列的预测方法的准确性。您想通过扩大的窗口来做到这一点,在此窗口中,您逐渐增加训练集中包含的数据的比例,同时反复尝试预测未来的固定步数,该过程模仿了此任务在现实生活中的执行情况。 / p>
为简单起见,我还要假设您真的不需要将所有输出打印到控制台,并且真的更希望查看与这些迭代相关联的准确性度量的分布和摘要统计信息(例如该表格位于您尝试遵循的示例末尾)。
从这些假设出发,这是一种可行的方法。
geography
在这种情况下,该过程返回四个向量的列表,每个向量的长度为53。由于这些向量在列表中,因此您可以轻松地对其进行汇总,以大致了解每个系列的总体准确性。我想看一下精度度量的分布,您可以在此处轻松地通过密度图进行处理。当然,最简单的事情就是看趋势:
# Split your data frame into a list of one-column data frames (here, time series) using as.list,
# then use lapply to iterate your validation process over those series.
Y <- lapply(as.list(MY_DATA), function(x) {
# Instead of a for loop, let's use sapply to iterate over a vector of integers
# representing the width of the training set in our expanding window, starting at
# 70 percent of the full series and running to the series' end. Let's assume that,
# in each iteration, we're going to forecast the following four quarters.
sapply(ceiling(length(x) * 0.7):(length(x) - 4), function(i) {
# Because we're using indices instead of dates, we need to partition the
# series with subset instead of window. The training set runs from the start
# of the series to our integer, and the test set grabs the next 4 quarters.
train <- subset(x, end = i)
test <- subset(x, start = i + 1, end = i + 4)
# Now we fit an ETS model to that training set and use it to generate
# forecasts for the following 4 quarters.
mod <- ets(train)
preds <- predict(mod, h = 4)
# Finally, we check the accuracy of those forecasts against the test set...
check <- accuracy(preds, test)
# ...and return the accuracy metric of our choice (I've picked MAPE because
# that's the one used in the example you're trying to follow, but that's easy
# to change, or you could just return the accuracy object if you want options).
return(check["Test set", "MAPE"])
})
})
如果您想将ETS的结果与其他预测过程的结果进行比较,则可以只换出拟合模型的位,然后重新运行并比较摘要。或者,您可以使用> sapply(Y, mean)
Consumption Income Production Savings
131.4818 172.7535 138.3171 106.9114
而不是lapply
将比较添加到流程中,并返回矩阵或数据框,并排显示两个流程的结果。
就像我说的那样,我知道您尝试直接在该博客文章中实施该方法有一些误解,但我认为这与您的努力精神相一致,而我的工作很有趣。