我有一个(123,3072)数组,我需要将它分成5个大致相同的折叠(例如,因为123不能被5除),以便进行5倍交叉验证。不允许使用scikit-learn。我试图得到2个大小为(3,25,3072)和(2,24,3072)的ndarrays。现在我需要将它们结合起来,但是我尝试的每个函数都会引发这个问题:
ValueError: all the input array dimensions except for the concatenation
axis must match exactly
是否可以将它们串联起来?
这是我的代码:
num_folds = 5
mod = binary_train_X.shape[0] % num_folds
first_records = (binary_train_X.shape[0] - mod) // num_folds + 1
last_records = first_records - 1
first_part = binary_train_X[:mod * first_records].reshape([mod, first_records, -1])
second_part = binary_train_X[mod * first_records:].reshape([num_folds - mod, last_records, -1])
folds_X = np.concatenate((first_part, second_part))
或者也许有另一种方法可以将其分为5个部分(折叠)?
答案 0 :(得分:0)
与此非常相似的东西。
def k_fold(array, num_folds): #New to WOS
#Splits along axis 0 of array
folds = []
start = 0
step = array.shape[0]/num_folds
for i in range(num_folds):
end = int(start + step)
start = int(start)
fold = array[start:end]
rest_of_array = np.concatenate((array[:start],array[end:]), axis = 0)
start = end
folds.append((fold, rest_of_array))
return folds
答案 1 :(得分:0)
由于377856 (123*3072)
不能被15360 (5*3072)
整除(123不能被5整除),因此只能通过截断或填充至15360 (5*3072)
的倍数来创建5个相等的切片3072。
截断通过从末端丢弃值直到对齐来创建形状(5, 24, 3072)
:
folds = binary_train_X.flatten()[:np.prod(binary_train_X.shape)//(5*3072)*(5*3072)].reshape(5, -1, 3072)
# this discards 9216 (3072*3) values
填充通过在末尾附加零直到对齐来创建形状(5, 25, 3072)
:
folds = np.pad(binary_train_X.flatten(), (0, -(-np.prod(binary_train_X.shape)//(5*3072))*(5*3072)-np.prod(binary_train_X.shape)), 'constant').reshape(5, -1, 3072)
# this appends 6144 (3072*2) zeros