Question

我有两个代码来计算函数。一个是基于python for循环，如下所示：

@nb.autojit()
def ODEfunction(r):
    tic = time.process_time()
    NP=10
    s=1
    l=100
    f = np.zeros(len(r))
    lbound=-4* (12*s**12/(-0.5*l-r[0])**13-6*s**6/(-0.5*l-r[0])**7)
    rbound=-4* (12*s**12/(0.5*l-r[NP-1])**13-6*s**6/(0.5*l-r[NP-1])**7)

    f[0:NP]=r[NP:2*NP]

    for i in range(NP):
        fi = 0.0
        for j in range(NP):
            if (j!=i):
                fij = -4*(12*s**12/(r[j]-r[i])**13-6*s**6/(r[j]-r[i]) ** 7)
                fi = fi + fij
        f[i+NP]=fi

    f[NP]=f[NP]+lbound
    f[2*NP-1]=f[2*NP-1]+rbound

    toc = time.process_time()
    print(toc-tic)
    return f

另一个是此代码的等效矢量化版本：

@nb.autojit()
def ODEfunction(r):
    tic=time.process_time()
    NP=10
    s=1
    l=100
    f = np.zeros(len(r))
    lbound=-4* (12*s**12/(-0.5*l-r[0])**13-6*s**6/(-0.5*l-r[0])**7)
    rbound=-4* (12*s**12/(0.5*l-r[NP-1])**13-6*s**6/(0.5*l-r[NP-1])**7)

    f[0:NP]=r[NP:2*NP]

    ri=r[0:NP]
    rj = r[0:NP]
    rij=np.subtract.outer(rj,ri)
    fij = -4 * (12 * s ** 12 / (rij) ** 13 - 6 * s ** 6 / (rij) ** 7)
    fij[np.diag_indices(NP)]=0
    f[NP:2*NP] = fij.sum(axis=0)

    f[NP]=f[NP]+lbound
    f[2*NP-1]=f[2*NP-1]+rbound

    toc=time.process_time()
    print(toc-tic)
    return f

在输入r中都是一个numpy 1 x 20数组，如你所见，我正在使用numba来加速代码。令人惊讶的是，在这种情况下，矢量化代码比for循环慢5倍。我之前在一些帖子中看到过类似的问题： Numpy: Single loop vectorized code slow compared to two loop iteration

然而，问题是数组的大小。正如你在我的例子中所看到的，没有涉及任何大阵列。有谁知道原因以及如何解决？

为什么矢量化计算比for循环等效计算慢

0 个答案: