我正在从numpy数组列表中创建几个numpy数组,如下所示:
seq_length = 1500
seq_diff = 200 # difference between start of two sequences
# x and y are 2D numpy arrays
x_seqs = [x[i:i+seq_length,:] for i in range(0, seq_diff*(len(x) // seq_diff), seq_diff)]
y_seqs = [y[i:i+seq_length,:] for i in range(0, seq_diff*(len(y) // seq_diff), seq_diff)]
boundary1 = int(0.7 * len(x_seqs)) # 70% is training set
boundary2 = int(0.85 * len(x_seqs)) # 15% validation, 15% test
x_train = np.array(x_seqs[:boundary1])
y_train = np.array(y_seqs[:boundary1])
x_valid = np.array(x_seqs[boundary1:boundary2])
y_valid = np.array(y_seqs[boundary1:boundary2])
x_test = np.array(x_seqs[boundary2:])
y_test = np.array(y_seqs[boundary2:])
我想得到6个形状阵列(n,1500,300),其中n分别是训练,验证和测试阵列数据的70%,15%或15%。
这是出错的地方:_train
和_valid
数组结果很好,但_test
数组是一维数组数组。那就是:
x_train.shape
是(459, 1500, 300)
x_valid.shape
是(99, 1500, 300)
x_test.shape
是(99,)
但是打印x_test
会验证它是否包含正确的元素 - 即它是一个长度为99个元素的(1500, 300)
数组。
为什么_test
矩阵形状错误,而_train
和_valid
矩阵却没有?
答案 0 :(得分:2)
x_seqs
中的项目长度不一。当它们的长度相同时,np.array
可以从它们制作一个3d数组;当它们不同时,它会生成一个列表的对象数组。查看dtype
的{{1}}。查看x_test
。
我拿了你的代码,补充道:
[len(i) for i in x_test]
得到了:
x=np.zeros((2000,10))
y=x.copy()
...
print([len(i) for i in x_seqs])
print(x_train.shape)
print(x_valid.shape)
print(x_test.shape)