Question

我想在数组中合并两个相等的元素，假设我有一个像这样的数组

np.array([[0,1,1,2,2],
          [0,1,1,2,2],
          [0,2,2,2,2]])

如果我正确指导，我想制作类似的东西

np.array([[0,0,2,0,4],
          [0,0,2,0,4],
          [0,0,4,0,4]])

如果我正在向上移动

np.array([[0,2,2,4,4],
          [0,0,0,0,0],
          [0,2,2,2,2]])

我当前的代码只是循环通过正常列表

    for i in range(4):
     for j in range(3):
         if mat[i][j]==matrix[i][j+1] and matrix[i][j]!=0:
             matrix[i][j]*=2
             matrix[i][j+1]=0

如果可能的话，我更喜欢numpy和没有循环

Answer 1

如果没有循环，这项任务很难做到！你需要一堆高级numpy技巧才能让它发挥作用。我在这里飞过它们，但我会尽力链接到其他资源。

从here开始，进行逐行比较的最佳方法是：

a = np.array([[0,1,1,2,2],
              [0,1,1,2,2],
              [0,2,2,2,2]])

b = np.ascontiguousarray(a).view(np.dtype((np.void, a.dtype.itemsize * a.shape[1])))

b

array([[[0 0 0 0 1 0 0 0 1 0 0 0 2 0 0 0 2 0 0 0]],
       [[0 0 0 0 1 0 0 0 1 0 0 0 2 0 0 0 2 0 0 0]],
       [[0 0 0 0 2 0 0 0 2 0 0 0 2 0 0 0 2 0 0 0]]], 
       dtype='|V20')

b.shape
(3, 1)

请注意，最里面的括号不是附加维度，而是可以与np.unique之类的内容进行比较的np.void对象。

尽管如此，获得你想要保留的指数并不是很容易，但这里只是单行：

c = np.flatnonzero(np.r_[1, np.diff(np.unique(b, return_inverse = 1)[1])])

EECH。它有点凌乱。基本上你正在寻找线条变化的指数和第一线。通常情况下，您不需要np.unique来电，只能np.diff(b)，但您无法减去np.void。 np.r_是np.concatenate的快捷方式，更具可读性。并且np.flatnonzero为您提供新数组不为零的索引（即您要保留的索引）

c
array([0, 2], dtype=int32)

在那里，现在你可以使用一些花哨的ufunc.reduceat数学来做你的补充：

d = np.add.reduceat(a, c, axis = 0)

d

array([[0, 2, 2, 4, 4],
       [0, 2, 2, 2, 2]], dtype=int32)

好了，现在要添加零，我们只需使用advanced indexing将其插入np.zero数组

e = np.zeros_like(a)
e[c] = d
e

array([[0, 2, 2, 4, 4],
       [0, 0, 0, 0, 0],
       [0, 2, 2, 2, 2]])

然后我们去！您可以通过在开头和结尾处移调或翻转矩阵来向其他方向前进。

def reduce_duplicates(a):
    b = np.ascontiguousarray(a).view(np.dtype((np.void, a.dtype.itemsize * a.shape[1])))
    c = np.flatnonzero(np.r_[1, np.diff(np.unique(b, return_inverse = 1)[1])])
    d = np.add.reduceat(a, c, axis = 0)
    e = np.zeros_like(a)
    e[c] = d
    return e

reduce_duplicates(a.T[::-1,:])[::-1,:].T  #reducing right

array([[0, 0, 2, 0, 4],
       [0, 0, 2, 0, 4],
       [0, 0, 4, 0, 4]])

我没有numba所以我无法针对另一个建议测试速度（知道numba它可能更慢），但它是无环的和numpy。

Answer 2

A＆＃34;矢量化＆＃34;函数的版本会非常混乱，因为合并可能发生在每行/每列的偶数或奇数索引中，具体取决于该行/列中的先前值。

为了说明，请看看这个矢量化版本如何在您的（水平）示例上运行，该示例恰好将所有合并置于奇数索引上：

>>> x
array([[0, 1, 1, 2, 2],
       [0, 1, 1, 2, 2],
       [0, 2, 2, 2, 2]])
>>> y=x==np.roll(x, 1, axis=1); y[:,1::2]=False; x*y*2
array([[0, 0, 2, 0, 4],
       [0, 0, 2, 0, 4],
       [0, 0, 4, 0, 4]])

但如果我将其中一行移动1，它就不再有效了：

>>> x2
array([[0, 1, 1, 2, 2],
       [0, 0, 1, 1, 2],
       [0, 2, 2, 2, 2]])
>>> y=x2==np.roll(x2, 1, axis=1); y[:,1::2]=False; x2*y*2
array([[0, 0, 2, 0, 4],
       [0, 0, 0, 0, 0],
       [0, 0, 4, 0, 4]])

我不确定我接下来要采取什么策略，如果可以以矢量化的方式实现它，但它不会很干净。

我建议使用numba来做这样的事情。它将使您的代码可读，并使其更快。只需将@jit装饰器添加到您的函数中，并评估它可以提高性能。

编辑：我为你做了一些时间安排。此外，还有一个小功能修复功能，使其与您的示例一致。

>>> def foo(matrix):
...     for i in range(matrix.shape[0]):
...         for j in range(matrix.shape[1]-1):
...             if matrix[i][j]==matrix[i][j+1] and matrix[i][j]!=0:
...                 matrix[i][j+1]*=2
...                 matrix[i][j]=0
...
>>> from numba import jit
>>> @jit
... def foo2(matrix):
...     for i in range(matrix.shape[0]):
...         for j in range(matrix.shape[1]-1):
...             if matrix[i][j]==matrix[i][j+1] and matrix[i][j]!=0:
...                 matrix[i][j+1]*=2
...                 matrix[i][j]=0
...
>>> import time
>>> z=np.random.random((1000,1000)); start=time.time(); foo(z); print(time.time()-start)
1.0277159214
>>> z=np.random.random((1000,1000)); start=time.time(); foo2(z); print(time.time()-start)
0.00354909896851

PYTHON：如何合并等元素numpy数组

2 个答案: