R package caret提供了一个方便的函数createFolds,它返回了用于交叉验证的训练集的索引列表:
set.seed(1)
require(caret)
x <- rnorm(10)
createFolds(x,k=5,returnTrain=TRUE)
$Fold1
[1] 1 2 5 6 7 8 9 10
$Fold2
[1] 1 3 4 5 6 8 9 10
$Fold3
[1] 1 2 3 4 5 7 8 10
$Fold4
[1] 1 2 3 4 6 7 8 9
$Fold5
[1] 2 3 4 5 6 7 9 10
我想创建一个类似的函数,除了我想返回一个要在time-series cross validation中使用的索引列表。我发现了一些example code in R,但我想更多地概括和功能化。这是我最初想出的:
createTSfolds <- function(y, Min=max(frequency(y),3)) {
i <- seq(along=y)
stops <- i[Min:(length(i)-1)]
starts <- rep(1,length(stops))
out <- mapply(seq,starts,stops)
names(out) <- paste("Fold", gsub(" ", "0", format(seq(along = out))), sep = "")
out
}
createTSfolds(x)
$Fold1
[1] 1 2 3
$Fold2
[1] 1 2 3 4
$Fold3
[1] 1 2 3 4 5
$Fold4
[1] 1 2 3 4 5 6
$Fold5
[1] 1 2 3 4 5 6 7
$Fold6
[1] 1 2 3 4 5 6 7 8
$Fold7
[1] 1 2 3 4 5 6 7 8 9
(最小值是拟合模型所需的最小观察次数)
此功能现在运行良好,但我想添加Rob Hyndman discusses的两个函数:
以下是我实施窗口的方法:
createTSfolds <- function(y, Min=max(frequency(y),3), lookback=NA) {
i <- seq(along=y)
stops <- i[Min:(length(i)-1)]
if (is.na(lookback)) {
starts <- as.list(rep(1,length(stops)))
out <- mapply(seq,starts,stops)
} else {
starts <- stops-Min+1
out <- mapply(seq,starts,stops)
out <- split(t(out),1:nrow(t(out)))
}
names(out) <- paste("Fold", gsub(" ", "0", format(seq(along = out))), sep = "")
out
}
createTSfolds(x,Min=4,lookback=4)
我无法弄清楚如何实现变量预测视野,如下所示: 例如,如果k = 3:
$Fold1
[1] 1 2 3
$Fold2
[1] 1 2 3 4 5 6
$Fold3
[1] 1 2 3 4 5 6 7 8 9
我正在寻找改进现有代码的方法,以及每次折叠时向训练集添加变量增量的方法。
谢谢
答案 0 :(得分:3)
这是一种方法。它并不完全健壮,因为我不确定lookback
和k
存在时您所寻求的输出。如果这是您正在寻找的,请告诉我。
createTSfolds2 <- function(y, Min = max(frequency(y), 3), lookback = NA, k = NA){
out = llply(Min:(length(y) - 1), seq)
if (!is.na(k)) {out = out[seq(1, length(out), k)]}
if (!is.na(lookback)) {
out = plyr::llply(out, function(z) z[(length(z) - lookback + 1):length(z)])
}
names(out) <- paste("Fold", gsub(" ", "0", format(seq(along = out))), sep = "")
return(out)
}
createTSfolds2(x, Min = 3, lookback = NA, k = 3)
$Fold1
[1] 1 2 3
$Fold2
[1] 1 2 3 4 5 6
$Fold3
[1] 1 2 3 4 5 6 7 8 9
createTSfolds2(x, Min = 3, lookback = 3, k = 3)
$Fold1
[1] 1 2 3
$Fold2
[1] 4 5 6
$Fold3
[1] 7 8 9