Question

我正在尝试使用我提供的分割在cross_val_score中运行sklearn。 sklearn文档为here提供了以下示例：

>>> from sklearn.model_selection import PredefinedSplit
>>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
>>> y = np.array([0, 0, 1, 1])
>>> test_fold = [0, 1, -1, 1]
>>> ps = PredefinedSplit(test_fold)
>>> ps.get_n_splits()
2
>>> print(ps)       
PredefinedSplit(test_fold=array([ 0,  1, -1,  1]))
>>> for train_index, test_index in ps.split():
...    print("TRAIN:", train_index, "TEST:", test_index)
...    X_train, X_test = X[train_index], X[test_index]
...    y_train, y_test = y[train_index], y[test_index]
TRAIN: [1 2 3] TEST: [0]
TRAIN: [0 2] TEST: [1 3]

我在理解这个例子时遇到了麻烦。特别是，

为什么ps.get_n_splits()在此示例中返回2的原因如何;和
为什么test_fold数组会导致代码段底部显示的拆分？

此外，我想问一下，在这种情况下，如果我将ps对象传递给cross_val_score中的sklearn函数，它是否会使用这两个拆分执行交叉验证？

Answer 1

分割数是test_folder排除（-1）中的唯一值。

在本示例中，使用test_fold = [0，1，-1，1]，

零索引为0，表示测试集为0，其余1、2、3为训练集。

  --- > TRAIN: [1 2 3] TEST: [0]

-第一个和第三个索引为1，表示测试集为1、3，其余的0、2为训练集

  ---> TRAIN: [0 2] TEST: [1 3]

第二个索引为-1，表示没有火车/测试拆分。
请注意，整数值本身确实会有所不同，因此，如果test_folder = [5，0，-1，0]，则拆分是相同的

  --- > TRAIN: [1 2 3] TEST: [0]

最后，对于典型的k文件夹拆分，可以使用test_fold = [0，1，2，3]

sklearn中的PredefinedSplit函数

1 个答案: