I'm trying to do a vectorized
sum
operation using a numpy
array of masked
indices.
So for example, without a mask:
import numpy as np
# data to be used in a vectorized sum operation
data = np.array([[1,0,0,0,0,0],
[0,1,0,0,0,0],
[0,0,1,0,0,0]])
# data indices i wish to sum together
idx = np.array([[0,1,2], # sum data rows 0,1 and 2
[2,1,1]]) # sum data rows 2,1 and 1
# without a mask this is straighforward
print np.sum(data[idx],axis=1)
#[[1 1 1 0 0 0]
# [0 2 1 0 0 0]]
Now with a mask, I can't figure out how to do it without looping over the masked index array:
# introduce a mask
mask = np.array([[True, True, True], # sum data rows 0,1 and 2
[False, True, True]]) # sum data rows 1 and 1 (masking out idx[1,0])
summed = np.zeros((idx.shape[0],data.shape[1]),dtype='int')
for i in xrange(idx.shape[0]):
summed[i] = np.sum(data[idx[i][mask[i]]],axis=0)
print summed
#[[1 1 1 0 0 0]
#[0 2 0 0 0 0]]
Is there a proper way to this type of operation without a loop?
答案 0 :(得分:3)
您可以使用np.einsum
-
v = data[idx]
summed = np.einsum('ijk,ij->ik', v, mask)
运行给定样本 -
In [43]: v = data[idx]
In [44]: np.einsum('ijk,ij->ik', v, mask)
Out[44]:
array([[1, 1, 1, 0, 0, 0],
[0, 2, 0, 0, 0, 0]])
或者,使用np.matmul
-
In [67]: np.matmul(v.swapaxes(1,2), mask[...,None])[...,0]
Out[67]:
array([[1, 1, 1, 0, 0, 0],
[0, 2, 0, 0, 0, 0]])
# Put another way
In [80]: np.matmul(mask[:,None,:], v)[:,0]
Out[80]:
array([[1, 1, 1, 0, 0, 0],
[0, 2, 0, 0, 0, 0]])
保持循环并提高效果
如果你没有足够的循环并且每次迭代发生了足够的和减少,那么迭代操作可以用矩阵乘法替换。因此 -
for i in xrange(idx.shape[0]):
summed[i] = mask[i].dot(data[idx[i]])