Question

我正在使用2D形状阵列来存储经度+纬度对。有一次，我必须合并其中两个2D数组，然后删除任何重复的条目。我一直在寻找类似于numpy.unique的功能，但我没有运气。我做过的任何实施看起来非常“未经优化”。例如，我正在尝试将数组转换为元组列表，删除带有set的重复项，然后再次转换为数组：

coordskeys = np.array(list(set([tuple(x) for x in coordskeys])))

有没有现成的解决方案，所以我不重新发明轮子？

为了说清楚，我正在寻找：

>>> a = np.array([[1, 1], [2, 3], [1, 1], [5, 4], [2, 3]])
>>> unique_rows(a)
array([[1, 1], [2, 3],[5, 4]])

顺便说一句，我想只使用一个元组列表，但这些列表非常大，以至于它们消耗了我的4Gb RAM + 4Gb交换（numpy数组的内存效率更高）。

Answer 1

这应该可以解决问题：

def unique_rows(a):
    a = np.ascontiguousarray(a)
    unique_a = np.unique(a.view([('', a.dtype)]*a.shape[1]))
    return unique_a.view(a.dtype).reshape((unique_a.shape[0], a.shape[1]))

示例：

>>> a = np.array([[1, 1], [2, 3], [1, 1], [5, 4], [2, 3]])
>>> unique_rows(a)
array([[1, 1],
       [2, 3],
       [5, 4]])

Answer 2

这是一个想法，它需要一些工作，但可能会很快。我会给你1d案例，让你弄清楚如何将它扩展到2d。以下函数查找1d数组的唯一元素：

import numpy as np
def unique(a):
    a = np.sort(a)
    b = np.diff(a)
    b = np.r_[1, b]
    return a[b != 0]

现在要将它扩展到2d，你需要改变两件事。您将需要弄清楚如何自己进行排序，关于排序的重要事情是两个相同的条目最终彼此相邻。其次，您需要执行(b != 0).all(axis)之类的操作，因为您要比较整个行/列。让我知道这是否足以让你开始。

更新：在doug的帮助下，我认为这应该适用于2d案例。

import numpy as np
def unique(a):
    order = np.lexsort(a.T)
    a = a[order]
    diff = np.diff(a, axis=0)
    ui = np.ones(len(a), 'bool')
    ui[1:] = (diff != 0).any(axis=1) 
    return a[ui]

Answer 3

我的方法是将2d数组转换为1d复数数组，其中实部为第1列，虚部为第2列。然后使用np.unique。虽然这只适用于2列。

import numpy as np 
def unique2d(a):
    x, y = a.T
    b = x + y*1.0j 
    idx = np.unique(b,return_index=True)[1]
    return a[idx]

示例 -

a = np.array([[1, 1], [2, 3], [1, 1], [5, 4], [2, 3]])
unique2d(a)
array([[1, 1],
       [2, 3],
       [5, 4]])

Answer 4

>>> import numpy as NP
>>> # create a 2D NumPy array with some duplicate rows
>>> A
    array([[1, 1, 1, 5, 7],
           [5, 4, 5, 4, 7],
           [7, 9, 4, 7, 8],
           [5, 4, 5, 4, 7],
           [1, 1, 1, 5, 7],
           [5, 4, 5, 4, 7],
           [7, 9, 4, 7, 8],
           [5, 4, 5, 4, 7],
           [7, 9, 4, 7, 8]])

>>> # first, sort the 2D NumPy array row-wise so dups will be contiguous
>>> # and rows are preserved
>>> a, b, c, d, e = A.T    # create the keys for to pass to lexsort
>>> ndx = NP.lexsort((a, b, c, d, e))
>>> ndx
    array([1, 3, 5, 7, 0, 4, 2, 6, 8])
>>> A = A[ndx,]

>>> # now diff by row
>>> A1 = NP.diff(A, axis=0)
>>> A1
    array([[0, 0, 0, 0, 0],
           [4, 3, 3, 0, 0],
           [0, 0, 0, 0, 0],
           [0, 0, 0, 1, 0],
           [0, 0, 1, 0, 0],
           [2, 5, 0, 2, 1],
           [0, 0, 0, 0, 0],
           [0, 0, 0, 0, 0]])

>>> # the index array holding the location of each duplicate row
>>> ndx = NP.any(A1, axis=1)  
>>> ndx
    array([False,  True, False,  True,  True,  True, False, False], dtype=bool)  

>>> # retrieve the duplicate rows:
>>> A[1:,:][ndx,]
    array([[7, 9, 4, 7, 8],
           [1, 1, 1, 5, 7],
           [5, 4, 5, 4, 7],
           [7, 9, 4, 7, 8]])

Answer 5

GCC extended assembler templates包（免责声明：我是它的作者）将user545424发布的解决方案包装在一个经过测试的良好界面中，加上许多相关功能：

import numpy_indexed as npi
npi.unique(coordskeys)

Answer 6

因为你引用了numpy.unique，你不关心维持原始顺序，对吗？转换为集合，删除重复，然后返回列表通常使用成语：

>>> x = [(1, 1), (2, 3), (1, 1), (5, 4), (2, 3)]
>>> y = list(set(x))
>>> y
[(5, 4), (2, 3), (1, 1)]
>>>

从NumPy 2D阵列中删除重复的列和行

6 个答案: