Vectorized sum operation on a numpy array with masked indices

时间:2018-03-09 19:20:00

标签: python numpy vectorization

I'm trying to do a vectorized sum operation using a numpy array of masked indices.

So for example, without a mask:

import numpy as np

# data to be used in a vectorized sum operation
data = np.array([[1,0,0,0,0,0],
                 [0,1,0,0,0,0],
                 [0,0,1,0,0,0]])

# data indices i wish to sum together
idx = np.array([[0,1,2],   # sum data rows 0,1 and 2
                [2,1,1]])  # sum data rows 2,1 and 1

# without a mask this is straighforward
print np.sum(data[idx],axis=1)
#[[1 1 1 0 0 0]
# [0 2 1 0 0 0]]

Now with a mask, I can't figure out how to do it without looping over the masked index array:

# introduce a mask
mask = np.array([[True,  True, True],  # sum data rows 0,1 and 2
                 [False, True, True]]) # sum data rows 1 and 1 (masking out idx[1,0])

summed = np.zeros((idx.shape[0],data.shape[1]),dtype='int')
for i in xrange(idx.shape[0]):
    summed[i] =  np.sum(data[idx[i][mask[i]]],axis=0)
print summed
#[[1 1 1 0 0 0]
 #[0 2 0 0 0 0]]

QUESTION

Is there a proper way to this type of operation without a loop?

1 个答案:

答案 0 :(得分:3)

您可以使用np.einsum -

解决此问题
v = data[idx]
summed = np.einsum('ijk,ij->ik', v, mask)

运行给定样本 -

In [43]: v = data[idx]

In [44]: np.einsum('ijk,ij->ik', v, mask)
Out[44]: 
array([[1, 1, 1, 0, 0, 0],
       [0, 2, 0, 0, 0, 0]])

或者,使用np.matmul -

In [67]: np.matmul(v.swapaxes(1,2), mask[...,None])[...,0]
Out[67]: 
array([[1, 1, 1, 0, 0, 0],
       [0, 2, 0, 0, 0, 0]])

# Put another way
In [80]: np.matmul(mask[:,None,:], v)[:,0]
Out[80]: 
array([[1, 1, 1, 0, 0, 0],
       [0, 2, 0, 0, 0, 0]])

保持循环并提高效果

如果你没有足够的循环并且每次迭代发生了足够的和减少,那么迭代操作可以用矩阵乘法替换。因此 -

for i in xrange(idx.shape[0]):
    summed[i] = mask[i].dot(data[idx[i]])