在我目前的theano脚本中,瓶颈是以下代码:
import numpy as np
axis = 0
prob = np.random.random( ( 1, 1000, 50 ) )
cases = np.random.random( ( 1000, 1000, 50 ) )
start = time.time( )
for i in xrange( 1000 ):
result = ( cases * prob ).sum( axis=1-axis, keepdims=True )
print '3D naive method took {} seconds'.format( time.time() - start )
print result.shape
print
我在2D情况下看到用点积替换elementwise + sum给了我5倍的加速。在这种情况下,是否有任何矩阵操作可以帮助我?
修改:
Divakar 给了我一个基于 einsum 的版本。但是,我的目的是将其移植到 theano , theano 不支持 einsum 。因此,欢迎使用 theano 的替代品。
答案 0 :(得分:1)
我们可以使用np.einsum
-
result = np.einsum('ijk,ijk->ik', prob, cases)[:,None,:]
另一个np.matmul
-
result = np.matmul(prob.transpose(2,0,1), cases.T).T
运行时测试 -
In [70]: axis = 0
...: prob = np.random.random( ( 1, 1000, 50 ) )
...: cases = np.random.random( ( 1000, 1000, 50 ) )
...:
In [71]: out1 = ( cases * prob ).sum( axis=1-axis, keepdims=True )
In [72]: out2 = np.einsum('ijk,ijk->ik', prob, cases)[:,None,:]
In [73]: out3 = np.matmul(prob.transpose(2,0,1), cases.T).T
In [74]: np.allclose(out1, out2)
Out[74]: True
In [75]: np.allclose(out1, out3)
Out[75]: True
In [76]: %timeit ( cases * prob ).sum( axis=1-axis, keepdims=True )
10 loops, best of 3: 101 ms per loop
In [77]: %timeit np.einsum('ijk,ijk->ik', prob, cases)[:,None,:]
10 loops, best of 3: 44.1 ms per loop
In [78]: %timeit np.matmul(prob.transpose(2,0,1), cases.T).T
10 loops, best of 3: 44 ms per loop