我有两个代码来计算函数。一个是基于python for循环,如下所示:
@nb.autojit()
def ODEfunction(r):
tic = time.process_time()
NP=10
s=1
l=100
f = np.zeros(len(r))
lbound=-4* (12*s**12/(-0.5*l-r[0])**13-6*s**6/(-0.5*l-r[0])**7)
rbound=-4* (12*s**12/(0.5*l-r[NP-1])**13-6*s**6/(0.5*l-r[NP-1])**7)
f[0:NP]=r[NP:2*NP]
for i in range(NP):
fi = 0.0
for j in range(NP):
if (j!=i):
fij = -4*(12*s**12/(r[j]-r[i])**13-6*s**6/(r[j]-r[i]) ** 7)
fi = fi + fij
f[i+NP]=fi
f[NP]=f[NP]+lbound
f[2*NP-1]=f[2*NP-1]+rbound
toc = time.process_time()
print(toc-tic)
return f
另一个是此代码的等效矢量化版本:
@nb.autojit()
def ODEfunction(r):
tic=time.process_time()
NP=10
s=1
l=100
f = np.zeros(len(r))
lbound=-4* (12*s**12/(-0.5*l-r[0])**13-6*s**6/(-0.5*l-r[0])**7)
rbound=-4* (12*s**12/(0.5*l-r[NP-1])**13-6*s**6/(0.5*l-r[NP-1])**7)
f[0:NP]=r[NP:2*NP]
ri=r[0:NP]
rj = r[0:NP]
rij=np.subtract.outer(rj,ri)
fij = -4 * (12 * s ** 12 / (rij) ** 13 - 6 * s ** 6 / (rij) ** 7)
fij[np.diag_indices(NP)]=0
f[NP:2*NP] = fij.sum(axis=0)
f[NP]=f[NP]+lbound
f[2*NP-1]=f[2*NP-1]+rbound
toc=time.process_time()
print(toc-tic)
return f
在输入r中都是一个numpy 1 x 20数组,如你所见,我正在使用numba来加速代码。令人惊讶的是,在这种情况下,矢量化代码比for循环慢5倍。我之前在一些帖子中看到过类似的问题: Numpy: Single loop vectorized code slow compared to two loop iteration
然而,问题是数组的大小。正如你在我的例子中所看到的,没有涉及任何大阵列。有谁知道原因以及如何解决?