Question

我有2个阵列，例如喜欢：

A: [[1 2 3][2 2 2][1 2 3][2 3 3][2 2 2][2 3 3][2 3 3]]  
B: [[1 2 3][2 2 2][2 3 3]]

B是A的排序唯一行。
我需要：

C: [0 1 0 2 1 2 2]

B的索引列表是A的顺序。我想避免循环，因为即使是非常大的数组它也需要很快。

我发现的唯一解决方案仅适用于1D阵列（例如Getting the indices of several elements in a NumPy array at once）我认为这可以用类似的方式使用np.void来解决：Find unique rows in numpy.array但是我无法理解它：/

我需要使用NumPy 1.10而不提供其他库。

Answer 1

鉴于A和B，您可以使用

生成C

In [25]: (B[:,None,:] == A).all(axis=-1).argmax(axis=0)
Out[25]: array([0, 1, 0, 2, 1, 2, 2])

请注意，这假定B的每一行都在A中。（否则，argmax可能会返回伪指数，其中等式为False。）

请注意，如果你有NumPy 1.13或更新版本，那么您可以使用np.unique同时生成B和C：

In [33]: np.unique(A, axis=0, return_inverse=True)
Out[33]: 
(array([[1, 2, 3],
        [2, 2, 2],
        [2, 3, 3]]), array([0, 1, 0, 2, 1, 2, 2]))

请注意，Divakar's solution（使用np.void）要快得多，尤其是A有很多行时：

A = np.random.randint(10, size=(1000, 3))
B, C = np.unique(A, axis=0, return_inverse=True)

In [44]: %%timeit
   ....: A1D, B1D = view1D(A, B)
   ....: sidx = B1D.argsort()
   ....: out = argsort_unique(sidx)[np.searchsorted(B1D, A1D, sorter=sidx)]
   ....: 
1000 loops, best of 3: 271 µs per loop

In [45]: %timeit (B[:,None,:] == A).all(axis=-1).argmax(axis=0)
100 loops, best of 3: 15.5 ms per loop

Answer 2

使用void dtypes -

# https://stackoverflow.com/a/45313353/ @Divakar
def view1D(a, b): # a, b are arrays
    a = np.ascontiguousarray(a)
    b = np.ascontiguousarray(b)
    void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
    return a.view(void_dt).ravel(),  b.view(void_dt).ravel()

# https://stackoverflow.com/a/41242285/ @Andras Deak
def argsort_unique(idx):
    n = idx.size
    sidx = np.empty(n,dtype=int)
    sidx[idx] = np.arange(n)
    return sidx

A1D, B1D = view1D(A, B)
sidx = B1D.argsort()
out = argsort_unique(sidx)[np.searchsorted(B1D, A1D, sorter=sidx)]

示例运行 -

In [36]: # Let's take OP sample and shuffle them 
         # to make for a more generic sample case
    ...: A = np.array([[1 ,2, 3],[2, 2, 2],[1, 2, 3],[2, 3, 3],[2 ,2, 2],[2, 3, 3],[2 ,3 ,3]])
    ...: B = np.array([[1, 2, 3],[2, 2 ,2],[2 ,3, 3]])
    ...: 
    ...: np.random.seed(0)
    ...: np.random.shuffle(B)
    ...: indx = np.array([0,1,0,2,1,2,2]) # we need to  retrieve these
                            # as the desired o/p
    ...: A = B[indx]

In [37]: A
Out[37]: 
array([[2, 3, 3],
       [2, 2, 2],
       [2, 3, 3],
       [1, 2, 3],
       [2, 2, 2],
       [1, 2, 3],
       [1, 2, 3]])

In [38]: B
Out[38]: 
array([[2, 3, 3],
       [2, 2, 2],
       [1, 2, 3]])

In [39]: A1D, B1D = view1D(A, B)
    ...: sidx = B1D.argsort()
    ...: out = argsort_unique(sidx)[np.searchsorted(B1D, A1D, sorter=sidx)]

In [40]: out
Out[40]: array([0, 1, 0, 2, 1, 2, 2])

一次获取NumPy数组中多行的索引

2 个答案: