优化平均外部产品

时间:2016-03-26 21:10:02

标签: performance numpy optimization matrix eigenvalue

我目前正在编写一个简短的程序来对随机矩阵特征值分布进行一些分析,但是我的分析所需的参数选择使整个事情变得极其缓慢。基本上我应该循环遍历下面的函数,理想情况下大约5000次,并最终在最后收集完整的特征值列表。

C = np.zeros((N,N))
time_series = np.random.normal(mu,sigma,  (N + B*(M-1))    )

for k in range(int(M)):
    C += np.outer(time_series[k*B : (N) + k*B], time_series[k*B : (N) + k*B])
C = C/M

eg_v = np.linalg.eigvalsh(C)

我需要N = 1000,B大约10,M = 100。 但是,通过这种参数选择,程序需要花费4-5个小时才能在我性能相当的笔记本电脑上运行。

除了硬件限制之外,我是否可以针对代码加快整个过程的速度?

提前致谢!

1 个答案:

答案 0 :(得分:1)

您可以使用np.tensordot

将矢量化解决方案替换为循环

因此,以下 -

C = np.zeros((N,N))
for k in range(int(M)):
    C += np.outer(time_series[k*B : (N) + k*B], time_series[k*B : (N) + k*B])

可以替换为 -

# Get the starting indices for each iteration
idx = (np.arange(M)*B)[:,None] + np.arange(N)

# Get the range of indices across all iterations as a 2D array and index 
# time_series with it to give us "time_series[k*B : (N) + k*B]" equivalent  
time_idx = time_series[idx]

# Use broadcasting to perform summation accumulation
C = np.tensordot(time_idx,time_idx,axes=([0],[0]))

tensordot可以用简单的点积替代:

C = time_idx.T.dot(time_idx)

运行时测试

功能:

def original_app(time_series,B,N,M):
    C = np.zeros((N,N))
    for k in range(int(M)):
        C += np.outer(time_series[k*B : (N) + k*B], time_series[k*B : (N) + k*B])
    return C

def vectorized_app(time_series,B,N,M):
    idx = (np.arange(M)*B)[:,None] + np.arange(N)
    time_idx = time_series[idx]
    return np.tensordot(time_idx,time_idx,axes=([0],[0]))

输入:

In [115]: # Inputs
     ...: mu = 1.2
     ...: sigma = 0.5
     ...: N = 1000
     ...: M = 100
     ...: B = 10
     ...: time_series = np.random.normal(mu,sigma,  (N + B*(M-1))  )
     ...: 

时间:

In [116]: out1 = original_app(time_series,B,N,M)

In [117]: out2 = vectorized_app(time_series,B,N,M)

In [118]: np.allclose(out1,out2)
Out[118]: True

In [119]: %timeit original_app(time_series,B,N,M)
1 loops, best of 3: 1.56 s per loop

In [120]: %timeit vectorized_app(time_series,B,N,M)
10 loops, best of 3: 26.2 ms per loop

因此,我们看到问题中列出的输入的 60x 加速!