Question

考虑一下我有一个ndarray：

all_data.shape
(220000, 28, 28)

type(all_data)
numpy.ndarray

我想查看这个数组的每个成员并过滤掉那些我不想要的内容。结果我想得到完全相同形状的新ndarray。

类似的东西：

#save first image and its label in separate array
#we will store unique values
sanitized_data = all_data[0]
sanitized_labels = all_labels[0]
#lets illimnate dupes
#store of existing hashes
hashes = set()
#go over each image
for i in range(0,len(all_labels)):
    #check if its hash is in list hashes
    if not md5(all_data[i]).hexdigest() in hashes:
        #record its hash and copy to new dataset
        sanitized_data = np.stack((sanitized_data, all_data[i]))
        sanitized_labels = np.stack((sanitized_labels, all_labels[i]))      
        hashes.add(md5(all_data[i]).hexdigest())

但我明白了：

ValueError: all input arrays must have the same shape

我不确定如何正确地做到这一点。一旦找到我喜欢的数组，我想沿第一轴逐步添加新数组。不确定如何用numpy正确地做到这一点？我用谷歌搜索dstack动作，但似乎它沿着错误的轴堆叠东西。

Answer 1

复制评论：

最好在列表中累积组件数组，并将concatenate一次应用于整个列表。在你去的时候也要习惯检查尺寸。

@hpaulj最后的建议有效，谢谢！

numpy - 通过过滤不需要的成员

1 个答案: