如何向量化多级递归?

时间:2013-11-26 15:05:03

标签: python recursion numpy vectorization

我是python和numpy(以及一般的编程)的noobie。我试图尽可能加快我的代码。数学涉及几个阵列的多个轴上的几个求和。我已经达到了一个级别的向量化,但我似乎无法得到任何更深层次的东西并且必须求助于循环(我相信有三个递归级别,M,N和I,其中一个我已经淘汰了,I)。这是我的相关部分的代码(这段代码有效,但我想加快速度):

def B1(n, i):
    return np.pi * n * dmaxi * (-1)**(n+1) * np.sin(qi[i]*dmaxi) * ((np.pi*n)**2 - (qi[i]*dmaxi)**2)**(-1)

for n in N:
    B[n, :] = B1(n, I)

for m in M:
    for n in N:
        C[m, n] = np.dot((1/np.square(qi*Iq[0, :, 2]))*B[m, :], B[n, :])

    Y[m] = np.dot((1/np.square(qi*Iq[0, :, 2]))*U[0, :, 1], B[m, :])

A = np.linalg.solve(C[1:, 1:], (0.25)*Y[1:])

dmaxi只是一个浮点数,m,n和i是整数。数组具有以下形状:

>>> qi.shape
(551,)
>>> N.shape
(18,)
>>> M.shape
(18,)
>>> I.shape
(551,)
>>> Iq.shape
(1, 551, 3)
>>> U.shape
(1, 551, 3)

正如你所看到的,我已经对B的第二轴的计算进行了矢量化,但我似乎无法对第一轴,C和Y这样做,它仍然需要for循环。似乎当我尝试对B的第一轴做同样形式的矢量化(定义一个函数,然后将数组作为参数)时,我得到一个广播错误,因为它似乎试图计算两者轴同时,而不是第1,然后是第2,这就是为什么我不得不强迫它进入for循环。对于C和Y都会出现同样的问题,这就是为什么它们也都在for循环中。如果这令人困惑,基本上我尝试的是:

>>> B[:, :] = B1(N, I)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "sasrec_v6.py", line 155, in B1
    return np.pi * n * dmaxi * (-1)**(n+1) * np.sin(qi[i]*dmaxi) * ((np.pi*n)**2 - (qi[i]*dmaxi)**2)**(-1)
ValueError: operands could not be broadcast together with shapes (18) (551) 

对B的第二轴进行矢量化可以大大提高我的代码速度,所以我假设同样适用于进一步的矢量化(我希望顺便使用这个术语)。

1 个答案:

答案 0 :(得分:2)

您可以使用broadcasting从1d索引向量中生成2d数组。我还没有测试过这些,但它们应该可以工作:

如果您将N重塑为列向量,那么B1将返回一个二维数组:

B[N] = B1(N[:, None], I)

对于YC,我会使用np.einsum来更好地控制哪些轴被多播(可能这也可以通过np.dot来完成,但我我不确定如何。

C[M[:, None], N] = np.einsum('ij,kj->ik',
        B[M]/np.square(qi*Iq[0, :, 2]),
        B[N])

Y[M] = np.einsum('i, ki->k',
        U[0, :, 1]/np.square(qi*Iq[0, :, 2]),
        B[M])

要查看索引技巧的作用:

In [1]: a = np.arange(3)

In [2]: a
Out[2]: array([0, 1, 2])

In [3]: a[:, None]
Out[3]: 
array([[0],
       [1],
       [2]])

In [4]: b = np.arange(4,1,-1)

In [5]: b
Out[5]: array([4, 3, 2])

In [6]: a[:, None] * b
Out[6]: 
array([[0, 0, 0],
       [4, 3, 2],
       [8, 6, 4]])

它可以节省两个数量级的时间:

In [92]: %%timeit
   ....: B = np.zeros((18, 551))
   ....: C = np.zeros((18, 18))
   ....: Y = np.zeros((18))
   ....: for n in N:
   ....:     B[n, :] = B1(n, I)
   ....: for m in M:
   ....:     for n in N:
   ....:         C[m, n] = np.dot((1/np.square(qi*Iq[0, :, 2]))*B[m, :], B[n, :])
   ....:     Y[m] = np.dot((1/np.square(qi*Iq[0, :, 2]))*U[0, :, 1], B[m, :])
   ....: 
100 loops, best of 3: 15.8 ms per loop

In [93]: %%timeit
   ....: Bv = np.zeros((18, 551))
   ....: Cv = np.zeros((18, 18))
   ....: Yv = np.zeros((18))
   ....: Bv[N] = B1(N[:, None], I)
   ....: Cv[M[:, None], N] = np.einsum('ij,kj->ik', B[M]/np.square(qi*Iq[0, :, 2]), B[N])
   ....: Yv[M] = np.einsum('i, ki->k', U[0, :, 1]/np.square(qi*Iq[0, :, 2]), B[M])
   ....: 
1000 loops, best of 3: 1.34 ms per loop

这是我的测试:

import numpy as np

# make fake data:
np.random.seed(5)

qi = np.random.rand(551)
N = np.random.randint(0,18,18)#np.arange(18)
M = np.random.randint(0,18,18)#np.arange(18)
I = np.arange(551)
Iq = np.random.rand(1, 551, 3)
U = np.random.rand(1, 551, 3)

B = np.zeros((18, 551))
C = np.zeros((18, 18))
Y = np.zeros((18))
Bv = np.zeros((18, 551))
Cv = np.zeros((18, 18))
Yv = np.zeros((18))

dmaxi = 1.

def B1(n, i):
    return np.pi * n * dmaxi * (-1)**(n+1) * np.sin(qi[i]*dmaxi) * ((np.pi*n)**2 - (qi[i]*dmaxi)**2)**(-1)

for n in N:
    B[n, :] = B1(n, I)

for m in M:
    for n in N:
        C[m, n] = np.dot((1/np.square(qi*Iq[0, :, 2]))*B[m, :], B[n, :])
    Y[m] = np.dot((1/np.square(qi*Iq[0, :, 2]))*U[0, :, 1], B[m, :])

Bv[N] = B1(N[:, None], I)
print "B correct?", np.allclose(Bv, B)

# np.einsum test case:
n, m = 2, 3
a = np.arange(n*m).reshape(n,m)*8 + 2
b = np.arange(n*m)[::-1].reshape(n,m)
c = np.empty((n,n))
for i in range(n):
    for j in range(n):
        c[i,j] = np.dot(a[i],b[j])
cv = np.einsum('ij,kj->ik', a, b)
print "einsum test successful?", np.allclose(c,cv)

Cv[M[:, None], N] = np.einsum('ij,kj->ik',
        B[M]/np.square(qi*Iq[0, :, 2]),
        B[N])
print "C correct?", np.allclose(Cv, C)

Yv[M] = np.einsum('i, ki->k',
        U[0, :, 1]/np.square(qi*Iq[0, :, 2]),
        B[M])
print "Y correct?", np.allclose(Yv, Y)

输出:D

B correct? True
einsum test successful? True
C correct? True
Y correct? True