Question

我每小时都有一个时间序列数据。我正在尝试为该数据构建预测。以下是数据样本：

sample <-
structure(list(group_type = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Group 1",
"Group 2", "Group 5"), class = "factor"), sub_group_type = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L), .Label = c("Sub Group 1", "Sub Group 2", "Sub Group 3"),
class = "factor"), date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("1/1/17",
"1/2/17", "1/3/17"), class = "factor"), hour = c(6L, 7L, 8L, 9L, 10L, 11L, 12L,
6L, 7L, 8L, 9L, 10L, 11L, 12L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 6L, 7L, 8L, 9L,
10L, 11L, 12L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 6L,
7L, 8L, 9L, 10L, 11L, 12L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 6L, 7L, 8L, 9L, 10L,
11L, 12L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 6L, 7L,
8L, 9L, 10L, 11L, 12L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 6L, 7L, 8L, 9L, 10L, 11L,
12L, 6L, 7L, 8L, 9L, 10L, 11L, 12L), weekday = structure(c(2L, 2L, 2L, 2L, 2L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L),
.Label = c("Monday", "Sunday", "Tuesday"), class = "factor"), total = c(9L, 9L,
10L, 6L, 2L, 14L, 3L, 11L, 12L, 12L, 0L, 10L, 8L, 13L, 14L, 17L, 12L, 5L, 9L, 7L,
10L, 13L, 23L, 11L, 3L, 6L, 10L, 11L, 14L, 16L, 13L, 2L, 3L, 4L, 14L, 11L, 16L,
8L, 12L, 7L, 6L, 13L, 13L, 22L, 12L, 7L, 9L, 8L, 14L, 9L, 16L, 15L, 6L, 7L, 6L,
12L, 13L, 14L, 7L, 3L, 13L, 11L, 6L, 8L, 15L, 11L, 3L, 10L, 9L, 7L, 12L, 10L, 10L,
3L, 14L, 8L, 12L, 10L, 20L, 5L, 4L, 8L, 12L, 3L, 0L, 4L, 5L, 1L, 6L, 7L, 0L, 3L,
1L, 1L, 0L, 2L, 2L, 0L, 2L, 0L, 3L, 7L, 6L, 2L, 1L)), .Names = c("group_type",
"sub_group_type", "date", "hour", "weekday", "total"), class = "data.frame",
row.names = c(NA, -105L))

我将以下功能应用于上述数据：

models <- function(x){
  x <- msts(x, seasonal.periods=c(24,168))
  mod_exp <- ets(x, ic='aicc', restrict=T)
  mod_hwa <- HoltWinters(x,seasonal = "additive")
  mod_hwm <- HoltWinters(x,seasonal = "multiplicative")
  mod_neural <- nnetar(x, p=7, size=25)
  mod_tbats <- tbats(x, ic='aicc', seasonal.periods=7)
  mod_bats <- bats(x, ic='aicc', seasonal.periods=7)
  mod_stl <- stlm(x, s.window=7, ic='aicc', robust=TRUE, method='ets')
  mod_sts <- StructTS(x)
}

test <- by(sample,list(sample$group_type,sample$sub_group_type,sample$date, sample$hour
),models)

但是，我收到以下错误：

 Error in ets(x, ic = "aicc", restrict = T) : y should be a univariate time series

如果我将数据拆分为如下应用ets（）函数，我可以毫无问题地运行它。但是，这种分割数据对我来说不是一个非常可行的选择，因为组和子组的数量太多而且每个都有不同的时间序列模式：

sub_sample_1 <- sample[sample$group_type == "Group 1" &    sample$sub_group_type == "Sub Group 1",6]
x <- msts(sub_sample_1, seasonal.periods=24)
mod_arima <- auto.arima(x, ic='aicc', stepwise=F)
mod_exp <- ets(x, ic='aicc', restrict=T)
mod_hwa <- HoltWinters(x,seasonal = "additive")
mod_hwm <- HoltWinters(x,seasonal = "multiplicative")
mod_neural <- nnetar(x, p=24, size=10)
mod_tbats <- tbats(x, ic='aicc', seasonal.periods=24)
mod_bats <- bats(x, ic='aicc', seasonal.periods=24)
mod_stl <- stlm(x, s.window=24, ic='aicc', robust=TRUE, method='ets')
mod_sts <- StructTS(x)

是否有任何解决方法，以便我可以按列组应用模型而不会遇到任何错误？

此外，并非所有模型都适用于所有组。对于sub_sample_1数据，HoltWinters，neuralnet，bats和stl给我错误，其他人正在工作

> mod_hwa <- HoltWinters(x,seasonal = "additive")
Error in decompose(ts(x[1L:wind], start = start(x), frequency = f), seasonal) : 
 time series has no or less than 2 periods

> mod_hwm <- HoltWinters(x,seasonal = "multiplicative")
Error in HoltWinters(x, seasonal = "multiplicative") : 
 data must be non-zero for multiplicative Holt-Winters

> mod_bats <- bats(x, ic='aicc', seasonal.periods=24)
Error in optim(par = param.vector$vect, fn = calcLikelihoodNOTransformed,  : 
function cannot be evaluated at initial parameters

我可以理解为什么这些模型不能用于我的数据。当我应用函数时，如何在出错时将它们排除？

提前感谢您的帮助！

这个问题与我的另一个问题here

类似（可能是扩展名）

Answer 1

您当前的设置会出现几个问题：

如果未指定return()，函数将返回最后一行。因此，您的第一次尝试将丢失除mod_sts之外的所有行，这些行将为test的每个子集分配值by。
在您的子集代码中，您实际上是在传递第6列（原子向量），而在第一次代码尝试中传递了所有dataframe列。这可能是您输入错误的原因，其输入应符合msts文档：

数字向量，ts对象，矩阵或数据框。它的目的是时间序列数据是单变量的，否则处理相同 TS（）。
您的by正在接收四个分组， group_type ， sub_group_type ，日期和小时< / em>与你的第二个子集代码不同。除非您的数据非常大，否则这些分组可能会产生很少的行或没有行，因此模型过程的数据点不足，因为您的上一个代码块似乎表明了这一点。

说到这里，考虑以下调整，通过前两个分组返回一个命名元素列表，指定第6列。由于by采用了一些因素组合，这些因素在子集化数据框架中可能不会产生任何行，因此下面使用tryCatch来捕获任何错误并返回要在最后一行过滤掉的空列表。

models <- function(x){ x <- msts(x, seasonal.periods=c(24,168)) list( mod_exp = ets(x, ic='aicc', restrict=T), mod_hwa = HoltWinters(x,seasonal = "additive"), mod_hwm = HoltWinters(x,seasonal = "multiplicative"), mod_neural = nnetar(x, p=7, size=25), mod_tbats = tbats(x, ic='aicc', seasonal.periods=7), mod_bats = bats(x, ic='aicc', seasonal.periods=7), mod_stl = stlm(x, s.window=7, ic='aicc', robust=TRUE, method='ets'), mod_sts = StructTS(x) ) } # TRY/CATCH TO CAPTURE ERRORS AND RETURN EMPTY LIST test <- by(sample[,6], list(sample$group_type, sample$sub_group_type), function(x) tryCatch({ models(x) }, error=function(e) return(list(NA)))) # TO REMOVE NULLs AND NAs (EMPTY ITEMS) test <- Filter(function(i) length(i) > 0, test)

R：在时间序列数据

1 个答案: