在Numpy数组中查找非唯一元素的索引

时间:2015-12-06 21:39:59

标签: python arrays python-2.7 numpy

我找到了其他方法,例如this,以从数组中删除重复的元素。我的要求略有不同。如果我开始:

array([[1, 2, 3],
       [2, 3, 4],
       [1, 2, 3],
       [3, 2, 1],
       [3, 4, 5]])

我想最终:

array([[2, 3, 4],
       [3, 2, 1]
       [3, 4, 5]])

这就是我最终想要的结果,但还有一个额外的要求。我还想存储一系列索引来丢弃或保留(la numpy.take)。

我正在使用Numpy 1.8.1

4 个答案:

答案 0 :(得分:0)

我们希望在保留顺序的同时找到数组中没有重复的行。

我使用此solutiona的每一行合并为一个元素,以便我们可以使用np.unique(,return_index=True, return_inverse= True)找到唯一的行。然后,我修改了这个function以使用索引和反向输出唯一行的计数。从那里,我可以选择所有具有counts == 1的唯一行。

a = np.array([[1, 2, 3],
       [2, 3, 4],
       [1, 2, 3],
       [3, 2, 1],
       [3, 4, 5]])

#use a flexible data type, np.void, to combine the columns of `a`
#size of np.void is the number of bytes for an element in `a` multiplied by number of columns
b = a.view(np.dtype((np.void, a.dtype.itemsize * a.shape[1])))
_, index, inv = np.unique(b, return_index = True, return_inverse = True)

def return_counts(index, inv):
    count = np.zeros(len(index), np.int)
    np.add.at(count, inv, 1)
    return count

counts = return_counts(index, inv)

#if you want the indices to discard replace with: counts[i] > 1
index_keep = [i for i, j in enumerate(index) if counts[i] == 1]

>>>a[index_keep]
array([[2, 3, 4],
   [3, 2, 1],
   [3, 4, 5]])

#if you don't need the indices and just want the array returned while preserving the order
a_unique = np.vstack(a[idx] for i, idx in enumerate(index) if counts[i] == 1])
>>>a_unique
array([[2, 3, 4],
   [3, 2, 1],
   [3, 4, 5]])

对于np.version> = 1.9

b = a.view(np.dtype((np.void, a.dtype.itemsize * a.shape[1])))
_, index, counts = np.unique(b, return_index = True, return_counts = True)

index_keep = [i for i, j in enumerate(index) if counts[i] == 1]
>>>a[index_keep]
array([[2, 3, 4],
   [3, 2, 1],
   [3, 4, 5]])

答案 1 :(得分:0)

如果要删除重复版本中存在的所有元素实例,可以遍历数组,查找多个版本中存在的元素索引,最后删除这些元素:

# The array to check:
array = numpy.array([[1, 2, 3],
        [2, 3, 4],
        [1, 2, 3],
        [3, 2, 1],
        [3, 4, 5]])

# List that contains the indices of duplicates (which should be deleted)
deleteIndices = []

for i in range(0,len(array)): # Loop through entire array
    indices = range(0,len(array)) # All indices in array
    del indices[i] # All indices in array, except the i'th element currently being checked

for j in indexes: # Loop through every other element in array, except the i'th element, currently being checked
    if(array[i] == array[j]).all(): # Check if element being checked is equal to the j'th element
        deleteIndices.append(j) # If i'th and j'th element are equal, j is appended to deleteIndices[]

# Sort deleteIndices in ascending order:
deleteIndices.sort()

# Delete duplicates
array = numpy.delete(array,deleteIndices,axis=0)

输出:

>>> array
array([[2, 3, 4],
       [3, 2, 1],
       [3, 4, 5]])

>>> deleteIndices
[0, 2]

就像你那样,你们都删除了重复项并得到了要丢弃的索引列表。

答案 2 :(得分:0)

numpy_indexed包(免责声明:我是其作者)可用于以矢量化方式解决此类问题:

index = npi.as_index(arr)
keep = index.count == 1
discard = np.invert(keep)
print(index.unique[keep])

答案 3 :(得分:0)

您可以按照以下步骤操作:

Property Get

您得到:

(Len([SIM / ENGRV]) = 20) or (isnull([SIM / ENGRV])) or ([SIM / ENGRV]="")