Question

我想在TimeSeriesSplit中使用RandomSearchCV。

看下面的例子。

X = np.array([[1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4]])
df = pd.DataFrame(X, columns = ['one', 'two'])
df.index = [0,0,0,1,1,2]

df
    one two
0   1   2
0   3   4
0   1   2
1   3   4
1   1   2
2   3   4

说我想将X拆分为：

在第一个拆分中，训练集对应于索引为0,0,0的行，而验证集为索引为1,1的行
在第二个拆分中，训练集是索引为0,0,0,1,1的行和验证集为索引2的行

我尝试将TimeSeriesSplit与n_splits = 2一起使用，但无法获得想要的结果。

tscv = TimeSeriesSplit(n_splits=2)
for train_index, test_index in tscv.split(df.index):
    print(df.index[train_index], df.index[test_index])

Int64Index([0, 0], dtype='int64') Int64Index([0, 1], dtype='int64')
Int64Index([0, 0, 0, 1], dtype='int64') Int64Index([1, 2], dtype='int64')

P.S：如果不能使用TimeSeriesSplit，我可以使用PredefinedSplit吗？

Answer 1

如果要基于索引过滤行，可以使用loc中的DataFrames方法：

例如，对于您的初始数据划分，您具有：

>>> df.loc[[0]] # train set
   one  two
0    1    2
0    3    4
0    1    2
>>> df.loc[[1]] # validation set
   one  two
1    3    4
1    1    2

对于第二个拆分，您具有：

>>> df.loc[[0,1]] # train set
   one  two
0    1    2
0    3    4
0    1    2
1    3    4
1    1    2
>>> df.loc[[2]] # validation set
   one  two
2    3    4

在RandomSearchCV中使用TimeSeriesSplit

1 个答案: