Question

我有一个大的2d numpy数组和两个1d数组，代表2d数组中的x / y索引。我想使用这些1d数组在2d数组上执行操作。我可以使用for循环执行此操作，但在处理大型数组时速度非常慢。有更快的方法吗？我尝试将1d数组简单地用作索引但不起作用。见这个例子：

import numpy as np

# Two example 2d arrays
cnt_a   =   np.zeros((4,4))
cnt_b   =   np.zeros((4,4))

# 1d arrays holding x and y indices
xpos    =   [0,0,1,2,1,2,1,0,0,0,0,1,1,1,2,2,3]
ypos    =   [3,2,1,1,3,0,1,0,0,1,2,1,2,3,3,2,0]

# This method works, but is very slow for a large array
for i in range(0,len(xpos)):
    cnt_a[xpos[i],ypos[i]] = cnt_a[xpos[i],ypos[i]] + 1

# This method is fast, but gives incorrect answer
cnt_b[xpos,ypos] = cnt_b[xpos,ypos]+1


# Print the results
print 'Good:'
print cnt_a
print ''
print 'Bad:'
print cnt_b

这个输出是：

Good:
[[ 2.  1.  2.  1.]
 [ 0.  3.  1.  2.]
 [ 1.  1.  1.  1.]
 [ 1.  0.  0.  0.]]

Bad:
[[ 1.  1.  1.  1.]
 [ 0.  1.  1.  1.]
 [ 1.  1.  1.  1.]
 [ 1.  0.  0.  0.]]

对于cnt_b数组，numpy显然没有正确求和，但我不确定如何解决这个问题而不采用用于计算cnt_a的（v。效率低）for循环。

Answer 1

使用1D索引（由@Shai建议）的另一种方法扩展到回答实际问题：

>>> out = np.zeros((4, 4))
>>> idx = np.ravel_multi_index((xpos, ypos), out.shape) # extract 1D indexes
>>> x = np.bincount(idx, minlength=out.size)
>>> out.flat += x

np.bincount计算xpos, ypos中每个索引的出现次数，并将其存储在x中。

或者，正如@Divakar所建议的那样：

>>> out.flat += np.bincount(idx, minlength=out.size)

Answer 2

我们可以使用np.add.at计算线性索引，然后累积到零初始化的输出数组中。因此，以xpos和ypos作为数组，这是一个实现 -

m,n = xpos.max()+1, ypos.max()+1
out = np.zeros((m,n),dtype=int)
np.add.at(out.ravel(), xpos*n+ypos, 1)

示例运行 -

In [95]: # 1d arrays holding x and y indices
    ...: xpos    =   np.array([0,0,1,2,1,2,1,0,0,0,0,1,1,1,2,2,3])
    ...: ypos    =   np.array([3,2,1,1,3,0,1,0,0,1,2,1,2,3,3,2,0])
    ...: 

In [96]: cnt_a   =   np.zeros((4,4))

In [97]: # This method works, but is very slow for a large array
    ...: for i in range(0,len(xpos)):
    ...:     cnt_a[xpos[i],ypos[i]] = cnt_a[xpos[i],ypos[i]] + 1
    ...:     

In [98]: m,n = xpos.max()+1, ypos.max()+1
    ...: out = np.zeros((m,n),dtype=int)
    ...: np.add.at(out.ravel(), xpos*n+ypos, 1)
    ...: 

In [99]: cnt_a
Out[99]: 
array([[ 2.,  1.,  2.,  1.],
       [ 0.,  3.,  1.,  2.],
       [ 1.,  1.,  1.,  1.],
       [ 1.,  0.,  0.,  0.]])

In [100]: out
Out[100]: 
array([[2, 1, 2, 1],
       [0, 3, 1, 2],
       [1, 1, 1, 1],
       [1, 0, 0, 0]])

Answer 3

你可以一次迭代两个列表，并为每对夫妇增加（如果你不习惯，zip可以组合列表）

for x, y in zip(xpos, ypos):
    cnt_b[x][y] += 1

但这与您的解决方案A的速度大致相同。如果您的列表xpos / ypos的长度为n，我看不到如何在 o（n）之内更新矩阵，因为您必须以一种方式或另一种方式检查每对

其他解决方案：您可以计算（可能collections.Counter）相似的索引对（例如：（0,3）等...）并使用计数值更新矩阵。但是我怀疑它会更快，因为你在更新矩阵时获得的时间会因计算多次出现而丢失。

也许我完全错了，在这种情况下，我也很好奇，看到不是 o（n）的回答

Answer 4

我认为你正在寻找ravel_multi_index功能

lidx = np.ravel_multi_index((xpos, ypos), cnt_a.shape)

转换为＆＃34;展平＆＃34; 1D指数分为cnt_a和cnt_b：

np.add.at( cnt_b, lidx, 1 )

使用两个1d数组有效地索引2d numpy数组

4 个答案: