作为交叉验证的一部分,需要将火车阵列分成N折。然后,每折叠进行一次实验。后者意味着我需要将N-1折合并为一个阵列,然后将其余折用于验证。
假设我将binary_train_X作为初始数组,并希望将其拆分为5折。我有一些有效的代码:
num_folds = 5
train_folds_X = []
# Split the training data in folds
step = int(binary_train_X.shape[0] / num_folds)
for i in range(num_folds):
train_folds_X.append(binary_train_X[i*step:(i+1)*step])
# Prepare train and test arrays
for i in range(num_folds):
if i == 0:
train_temp_X = np.concatenate((train_folds_X[1:]))
elif i == num_folds - 1:
train_temp_X = np.concatenate((train_folds_X[0:(num_folds - 1)]))
else:
train_temp_X1 = np.concatenate((train_folds_X[0:i]))
train_temp_X2 = np.concatenate((train_folds_X[(i+1):(num_folds)]))
train_temp_X = np.concatenate((train_temp_X1, train_temp_X2))
test_temp_X = train_folds_X[i]
# Run classifier based on train_temp_X and test_temp_X
...
pass
问题-如何以更优雅的方式做到这一点?
答案 0 :(得分:2)
为什么不这样做:
splits = np.array_split(binary_train_X, num_folds)
for i in range(num_folds):
fold_train_X = np.concatenate([*splits[:i], *splits[i + 1:]])
fold_test_X = splits[i]
# use your folds here
如果要使用预构建的解决方案,则可以使用sklearn.model_selection.KFold
:
kf = KFold(num_folds)
for train_index, test_index in kf.split(binary_train_X):
fold_train_X = binary_train_X[train_index]
fold_test_X = binary_test_X[train_index]
# use your folds here