Question

我需要一些大的复数数组来执行计算。

import numpy as np

# Reduced sizes -- real ones are orders of magnitude larger
n, d, l = 50000, 3, 1000

# Two complex matrices and a column vector
x = np.random.rand(n, d) + 1j*np.random.rand(n, d)
s = np.random.rand(n, d) + 1j*np.random.rand(n, d)
v = np.random.rand(l)[:, np.newaxis]

对于x*v*s（和x）的每一行，该函数基本上是s，然后对该行求和。由于数组的大小不同，我无法找到一种向量化计算的方法，并且使用for循环的速度太慢了。

我当前的实现是这样（〜3.5秒）：

h = []
for i in range(len(x)):
    h.append(np.sum(x[i,:]*v*s[i,:], axis=1))

h = np.asarray(h)

我还尝试将np.apply_along_axis()与增强矩阵一起使用，但这只是稍快一些（〜2.6s）并且不那么可读。

def func(m, v):
    return np.sum(m[:d]*v*m[d:], axis=1)

h = np.apply_along_axis(func, 1, np.hstack([x, s]), v)

什么是计算此结果更快的方法？如果有帮助，我可以利用dask等其他软件包。

Answer 1

通过广播，这应该起作用：

np.sum(((x*s)[...,None]*v[:,0], axis=1)

但是在您的样本尺寸下，我遇到了内存错误。 “外部”广播数组(n,d,l)的形状对我的记忆来说太大。

我可以通过迭代较小的d维度来减少内存使用量：

res = np.zeros((n,l), dtype=x.dtype) 
for i in range(d): 
    res += (x[:,i]*s[:,i])[:,None]*v[:,0]

此测试与您的h相同，但是我无法完成时间测试。通常，在较小的维度上迭代会更快。

我可能会重复小尺寸的事情。

这可能也可以表示为einsum问题，尽管它可能无法解决这些问题。

In [1]: n, d, l = 5000, 3, 1000 
   ...:  
   ...: # Two complex matrices and a column vector 
   ...: x = np.random.rand(n, d) + 1j*np.random.rand(n, d) 
   ...: s = np.random.rand(n, d) + 1j*np.random.rand(n, d) 
   ...: v = np.random.rand(l)[:, np.newaxis]                                                           
In [2]:                                                                                                
In [2]: h = [] 
   ...: for i in range(len(x)): 
   ...:     h.append(np.sum(x[i,:]*v*s[i,:], axis=1)) 
   ...:  
   ...: h = np.asarray(h)                                                                              
In [3]: h.shape                                                                                        
Out[3]: (5000, 1000)
In [4]: res = np.zeros((n,l), dtype=x.dtype) 
   ...: for i in range(d): 
   ...:     res += (x[:,i]*s[:,i])[:,None]*v[:,0] 
   ...:                                                                                                
In [5]: res.shape                                                                                      
Out[5]: (5000, 1000)
In [6]: np.allclose(res,h)                                                                             
Out[6]: True
In [7]: %%timeit  
   ...: h = [] 
   ...: for i in range(len(x)): 
   ...:     h.append(np.sum(x[i,:]*v*s[i,:], axis=1)) 
   ...: h = np.asarray(h) 
   ...:  
   ...:                                                                                                
490 ms ± 3.36 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [8]: %%timeit 
   ...: res = np.zeros((n,l), dtype=x.dtype) 
   ...: for i in range(d): 
   ...:     res += (x[:,i]*s[:,i])[:,None]*v[:,0] 
   ...:                                                                                                

354 ms ± 1.44 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [9]:                                                                                                
In [9]: np.sum((x*s)[...,None]*v[:,0], axis=1).shape                                                   
Out[9]: (5000, 1000)
In [10]: out = np.sum((x*s)[...,None]*v[:,0], axis=1)                                                  
In [11]: np.allclose(h,out)                                                                            
Out[11]: True
In [12]: timeit out = np.sum((x*s)[...,None]*v[:,0], axis=1)                                           
310 ms ± 964 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

可以节省一些时间，但是不多。

还有einsum版本：

In [13]: np.einsum('ij,ij,k->ik',x,s,v[:,0]).shape                                                     
Out[13]: (5000, 1000)
In [14]: np.allclose(np.einsum('ij,ij,k->ik',x,s,v[:,0]),h)                                            
Out[14]: True
In [15]: timeit np.einsum('ij,ij,k->ik',x,s,v[:,0]).shape                                              
167 ms ± 137 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

节省大量时间。但是我不知道它将如何扩展。

但是einsum让我意识到，我们可以在乘以d之前在v维度上求和-并获得大量的时间和内存使用量：

In [16]: np.allclose(np.sum(x*s, axis=1)[:,None]*v[:,0],h)                                             
Out[16]: True
In [17]: timeit np.sum(x*s, axis=1)[:,None]*v[:,0]                                                     
68.4 ms ± 1.58 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

@ cs95首先到达那里！

根据@PaulPanzer的评论，optimize标志很有帮助。可能做出了相同的推论-我们可以在j上进行总结：

In [18]: timeit np.einsum('ij,ij,k->ik',x,s,v[:,0],optimize=True).shape                                
91.6 ms ± 991 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

如何对不同维数的数组进行矢量化计算？

1 个答案: