Question

我的目标是获取uniques的列表，并创建一个关联的列表或数组，将每个列表或数组归类为是否具有重复项。这是我认为可行的方法：

np.array

在这种情况下所需的输出为：

www = [np.array([1, 1, 1]), np.array([1, 1, 1]), np.array([2, 1, 1])] uniques, counts = np.unique(www, axis = 0, return_counts = True) counts = [1 if x > 1 else 0 for x in counts] count_dict = dict(zip(uniques, counts)) [count_dict[i] for i in www]

因为第一个和第二个元素在原始列表中具有另一个副本。看来问题在于我无法使用[1, 1, 0]作为字典的键。

建议？

Answer 1

首先将www转换为二维Numpy数组，然后执行以下操作：

In [18]: (counts[np.where((www[:,None] == uniques).all(2))[1]] > 1).astype(int)
Out[18]: array([1, 1, 0])

这里，我们使用广播检查带有www数组的所有uniques行的相等性，然后在最后一个轴上使用all()找出其哪行完全等于{{ 1}}行。

详细说明如下：

uniques

Answer 2

在Python中，列表（和numpy数组）不能被散列，因此不能用作字典键。但是元组可以！因此，一种选择是将原始列表转换为元组，并将uniques转换为元组。以下对我有用：

www = [np.array([1, 1, 1]), np.array([1, 1, 1]), np.array([2, 1, 1])]
www_tuples = [tuple(l) for l in www]  # list of tuples
uniques, counts = np.unique(www, axis = 0, return_counts = True)
counts = [1 if x > 1 else 0 for x in counts]
# convert uniques to tuples
uniques_tuples = [tuple(l) for l in uniques]
count_dict = dict(zip(uniques_tuples, counts))
[count_dict[i] for i in www_tuples]

请注意：这将使您的内存消耗增加一倍，因此，如果www很大，它可能不是最佳解决方案。您可以通过将数据作为元组（而不是numpy数组）提取数据来减轻额外的内存消耗。

将np.arrays分类为重复项

2 个答案: