如何拆分交叉验证以拆分火车和测试装置?

时间:2016-02-03 10:48:09

标签: python scikit-learn

我有一套文件和一套标签。 现在,我正在使用train_test_split以90:10的比例分割我的数据集。但是,我希望使用Kfold交叉验证。

train=[]

with open("/Users/rte/Documents/Documents.txt") as f:
    for line in f:
        train.append(line.strip().split())

labels=[]
with open("/Users/rte/Documents/Labels.txt") as t:
    for line in t:
        labels.append(line.strip().split())

X_train, X_test, Y_train, Y_test= train_test_split(train, labels, test_size=0.1, random_state=42)

当我尝试scikit文档中提供的方法时:我收到一条错误消息:

kf=KFold(len(train), n_folds=3)

for train_index, test_index in kf:
     X_train, X_test = train[train_index],train[test_index]
     y_train, y_test = labels[train_index],labels[test_index]

错误

   X_train, X_test = train[train_index],train[test_index]
TypeError: only integer arrays with one element can be converted to an index

如何在文档和标签上执行10倍交叉验证?

1 个答案:

答案 0 :(得分:2)

有两种方法可以解决此错误:

第一种方式:

将数据转换为numpy数组:

import numpy as np
[...]
train = np.array(train)
labels = np.array(labels)

然后它应该适用于您当前的代码。

第二种方式:

使用列表理解来索引火车和火车。带有train_index&的标签列表test_index list

for train_index, test_index in kf:
    X_train, X_test = [train[i] for i in train_index],[train[j] for j in test_index]
    y_train, y_test = [labels[i] for i in train_index],[labels[j] for j in test_index]

(对于此解决方案,请参阅相关问题index list with another list