应用错误收集

如何将时间序列代码转换为可用于多个时间序列的自动代码？

时间：2019-07-10 08:43:23

标签： r time-series

我想将一个时间序列的时间序列代码转换为可用于多个时间序列数据的自动化代码（我的数据包含每月的时间序列）。对于一个时间序列，我的一般方法是删除季节性成分，并采取第一个差异以实现平稳性。然后，我使用auto.arima来获取ARIMA参数。我使用这些参数以原始时间序列数据构建ARIMA模型。然后，我预测并与4个月的实际数据（之前已经截断了）进行比较，并计算出RMSE。由于无法使用实际数据，因此仅生成一个随机时间序列和测试集作为示例-当然，结果没有太大意义。

library('forecast')
set.seed(123)

# create random time series and 4 months testing data
ts <- ts(runif(26, min = 50, max = 3000), start = c(2017,01), end = c(2019,02), frequency = 12)
test.data <- runif(4, min = 50, max = 3000)

# Decomompose
comp.ts = decompose(ts)

# subtrect seasonal trend
ts2 <- ts - comp.ts$seasonal
ts2 <- diff(ts2, differences=1)

auto.arima(ts2, trace = T, seasonal = TRUE,ic = 'aicc', max.p = 10,max.q = 10,max.P = 10,max.Q = 10,max.d = 10, stepwise = F)

# Use auto.arima outcome as input
my.arima <- Arima(ts2, order=c(0,0,0),seasonal = list(order = c(0,1,0), period = 12),method="ML", include.drift = F)
# Forecast and calculate RMSE
data.forecast <- forecast(my.arima, h=4, level=c(99.5))
my.difference <- test.data - data.forecast$mean
my.rmse <- (sum(sqrt(my.difference^2)))/length(my.difference)

由于我的实际数据集包含500多个时间序列，因此我需要使整个过程自动化。不幸的是，到目前为止，我还没有在时间序列中使用R，因此在自动化过程中遇到了问题。

让我们假设具有4个随机测试集的4个随机时间序列。我该如何为这些时间序列生成一个自动化过程（我也可以将其用于实际的500多个时间序列），该过程与上面的过程完全相同？

ts1 <- ts(runif(26, min = 50, max = 3000), start = c(2017,01), end = c(2019,02), frequency = 12)
ts2 <- ts(runif(26, min = 50, max = 3000), start = c(2017,01), end = c(2019,02), frequency = 12)
ts3 <- ts(runif(26, min = 50, max = 3000), start = c(2017,01), end = c(2019,02), frequency = 12)
ts4 <- ts(runif(26, min = 50, max = 3000), start = c(2017,01), end = c(2019,02), frequency = 12)
test.data1 <- runif(4, min = 50, max = 3000)
test.data2 <- runif(4, min = 50, max = 3000)
test.data3 <- runif(4, min = 50, max = 3000)
test.data4 <- runif(4, min = 50, max = 3000)

感谢您的帮助！

1 个答案:

答案 0 :(得分：1)

只需将您的工作流程放入函数中即可。

serialArima <- function(ts, test.data) {
  library(forecast)
  # Decomompose
  comp.ts=decompose(ts)

  # subtrect seasonal trend
  ts2 <- ts - comp.ts$seasonal
  ts2 <- diff(ts2, differences=1)

  auto.arima(ts2, trace=T, seasonal=TRUE, ic='aicc', max.p=0, max.q=0, max.P=0,
             max.Q=0, max.d=0, stepwise=F)

  # Use auto.arima outcome as input
  my.arima <- Arima(ts2, order=c(0, 0, 0),
                    seasonal=list(order=c(0, 1, 0), period=2), 
                    method="ML", include.drift=F)
  # Forecast and calculate RMSE
  data.forecast <- forecast(my.arima, h=4, level=c(99.5))
  my.difference <- test.data - data.forecast$mean
  my.rmse <- (sum(sqrt(my.difference^2)))/length(my.difference)
  return(list(data.forecast=data.forecast, my.difference=my.difference, my.rmse=my.rmse))
}

单个应用

serialArima(ts, test.data)
# ARIMA(0,0,0)           with zero mean     : 82.45803
# ARIMA(0,0,0)           with non-zero mean : 88.13593
# 
# 
# 
# Best model: ARIMA(0,0,0)           with zero mean     
# 
# $data.forecast
# Point Forecast   Lo 99.5  Hi 99.5
# 2020.00      -349.1424 -2595.762 1897.477
# 2020.50       772.6014 -1474.018 3019.221
# 2021.00      -349.1424 -3526.342 2828.057
# 2021.50       772.6014 -2404.598 3949.801
# 
# $my.difference
# Time Series:
#   Start = c(2020, 1) 
# End = c(2021, 2) 
# Frequency = 2 
# [1] 1497.2446  840.4139 2979.4553  993.5614
# 
# $my.rmse
# [1] 1577.669

多个应用程序

Map(serialArima, list(ts1, ts2, ts3, ts4), 
    list(test.data1, test.data2, test.data3, test.data4))