使用带有固定验证集大小的TimeSeriesSplit的更简单方法?

时间:2018-08-31 16:01:22

标签: python machine-learning time-series cross-validation

我在GridSearchCV中为cv参数所需的训练/验证集应如下所示:

[1,2,3][4]
[1,2,3,4][5]
[1,2,3,4,5][6]

为解决这个问题,我重写了TimeSeriesSplit函数中的索引:

tscv = TimeSeriesSplit(n_splits=8)
cv_start = round(len(dataframe) * 0.98)
count = -1
for train_index, test_index in tscv.split(trainY):
    count += 1
    train_index = list(range(cv_start + count))
    test_index = list(range(cv_start + count, cv_start + count +1))
    print(train_index, test_index)

有没有更简单或更干净的方法?

1 个答案:

答案 0 :(得分:-1)

您可以选择n_splits,以便测试集包含您想要的内容。

类似的想法在我的其他答案中使用:-

假设您的数据有6个样本:

import numpy as np
X = np.array([1,2,3,4,5,6,7]()

# Here put the number you want in test data,
# I used 1 because your example has only 1 test data in each split
num_in_test = 1

test_size = float(num_in_test) / len(X)

n_splits = int((1//test_size)-1)

tscv = TimeSeriesSplit(n_splits=n_splits)

for train_index, test_index in tscv.split(X):
    print(X[train_index], X[test_index])

# Output
(array([1]), array([2]))
(array([1, 2]), array([3]))
(array([1, 2, 3]), array([4]))
(array([1, 2, 3, 4]), array([5]))
(array([1, 2, 3, 4, 5]), array([6]))
(array([1, 2, 3, 4, 5, 6]), array([7]))

一旦固定了n_splits,就可以轻松地将TimeSeriesSplit对象传递给GridSearchCV或任何其他实用程序。