Question

如何从具有相同第n列值的ndarray数组中删除行？

例如，

a = np.ndarray([[1, 3, 4],
     [1, 3, 4],
     [1, 3, 5]])

我希望第三列有唯一的行。我想只留下[1, 3, 5]行。

numpy.unique没有这样做。它将检查每列中的唯一性;我无法指定用于检查唯一性的列。

如何有效地为千+行做到这一点？谢谢。

Answer 1

您可以尝试bincount，nonzero和in1d

的组合

import numpy as np
a = np.array([[1, 3, 4],
    [1, 3, 4],
    [1, 3, 5]])

#A tuple containing the values which are unique in column 3
unique_in_column = (np.bincount(a[:,2]) == 1).nonzero()

a[:,2] == unique_in_column[0]
unique_index = np.in1d(a[:,2], unique_in_column[0])

unique_a = a[unique_index]

这应该可以解决问题。但是，我不确定这种方法如何扩展1000多行。

Answer 2

我最终做到了这一点：

repeatdict = {}
todel = []
for i, row in enumerate(kplist):
    if repeatdict.get(row[2], 0):
        todel.append(i)
    else:
        repeatdict[row[2]] = 1
kplist = np.delete(kplist, todel, axis=0)

基本上，我在列表存储中迭代第三列的值，如果在下一次迭代中已经在repeatdict dict中找到相同的值，则通过存储其索引将该行标记为删除在todel列表中。

然后我们可以通过调用np.delete来删除不需要的行，并列出我们要删除的所有行索引。

另外，我不是选择我的答案作为选择的答案，因为我知道可能有一个更好的方法来做这个只是numpy魔术。我等一下。

Numpy删除具有相同列值的行

2 个答案: