Question

x_train:(153347,53)
x_test:(29039,52)
y:(153347,)

我正在使用sklearn。为了交叉验证和重塑我的数据集，我做了：

x_train, x_test, y_train, y_test = cross_validation.train_test_split(
x, y, test_size=0.3)

x_train = np.pad(x, [(0,0)], mode='constant')
x_test = np.pad(x, [(0,0)], mode='constant')
y = np.pad(y, [(0,0)], mode='constant')
x_train = np.arange(8127391).reshape((-1,1))
c = x.T
np.all(x_train == c)
x_test = np.arange(1510028).reshape((-1,1))
c2 = x.T
np.all(x_test == c2)
y = np.arange(153347).reshape((-1,1))
c3 = x.T
np.all(y == c3)

我的错误消息是：ValueError：找到样本数不一致的数组：[2 153347]

我不确定在这种情况下我需要填充数据集，并且重塑不起作用。关于如何解决这个问题的任何想法？

Answer 1

我们在这里看到的很少，我相信调用cross_validation.train_test_split转储因为两个向量的长度不一致。因此，对于每个X（我们观察到的数据元组），您需要一个Y（作为结果观察到的数据点）。

至少会导致上面显示的错误。

你应该明确改善问题的表述。非常如此。

问候，fricke

如何修复重塑数据集以进行交叉验证？

1 个答案: