Question

我正在尝试优化一个被调用的片段（数百万次），因此任何类型的速度提升（希望删除for循环）都会很棒。

我正在计算某些第j个粒子与所有其他粒子的相关函数

C_j（| r-r＆＃39; |）= sqrt（E（（s_j（r＆＃39;） - s_k（r））^ 2））k的平均值。

我的想法是有一个变量corrfun，它将数据分成一些箱子（r，在其他地方定义）。我发现每个s_k属于哪个bin，这个存储在ind中。因此ind [0]是j = 0点对应的r（因而是corrfun）的索引。多个点可以落入同一个bin（实际上我希望bin足够大以包含多个点）所以我将所有（s_j（r＆＃39;） - s_k（r））^ 2加在一起然后除以该箱中的点数（存储在变量rw中）。我最终为此做的代码如下（np代表numpy）：

for k, v in enumerate(ind):
        if j==k:
            continue
        corrfun[v] += (s[k]-s[j])**2
        rw[v] += 1
rw2 = rw
rw2[rw < 1] = 1
corrfun = np.sqrt(np.divide(corrfun, rw2))

注意，rw2业务是因为我想避免除以0问题，但我确实返回了rw数组，我希望能够区分rw = 0和rw = 1元素。也许有一个更优雅的解决方案。

有没有办法让for循环更快？虽然我不想添加自我交互（j == k）我甚至可以进行自我交互，如果这意味着我可以得到明显更快的计算（ind~1E6的长度，所以自我交互可能无关紧要）。

谢谢！

伊利亚安德

编辑：

这是完整的代码。请注意，在完整代码中，我也在j上求平均值。

import numpy as np

def twopointcorr(x,y,s,dr):

    width = np.max(x)-np.min(x)
    height = np.max(y)-np.min(y)

    n = len(x)

    maxR = np.sqrt((width/2)**2 + (height/2)**2)

    r = np.arange(0, maxR, dr)
    print(r)
    corrfun = r*0
    rw = r*0
    print(maxR)
    ''' go through all points'''
    for j in range(0, n-1):
        hypot = np.sqrt((x[j]-x)**2+(y[j]-y)**2)
        ind = [np.abs(r-h).argmin() for h in hypot]

        for k, v in enumerate(ind):
            if j==k:
                continue
            corrfun[v] += (s[k]-s[j])**2
            rw[v] += 1

    rw2 = rw
    rw2[rw < 1] = 1
    corrfun = np.sqrt(np.divide(corrfun, rw2))
    return r, corrfun, rw

我按以下方式调试它

from twopointcorr import twopointcorr
import numpy as np
import matplotlib.pyplot as plt
import time

n=1000
x = np.random.rand(n)
y = np.random.rand(n)
s = np.random.rand(n)

print('running two point corr functinon')

start_time = time.time()
r,corrfun,rw = twopointcorr(x,y,s,0.1)
print("--- Execution time is %s seconds ---" % (time.time() - start_time))

fig1=plt.figure()
plt.plot(r, corrfun,'-x')

fig2=plt.figure()
plt.plot(r, rw,'-x')
plt.show()

同样，主要问题是在真实数据集n~1E6中。当然，我可以重新采样以使其变小，但我很乐意实际操作数据集。

Answer 1

我系统上的原始代码大约需要5.7秒。我完全向内化了内循环并让它在0.39秒内运行。只需用以下内容替换“通过所有点”循环：

    points = np.column_stack((x,y))
    hypots = scipy.spatial.distance.cdist(points, points)
    inds = np.rint(hypots.clip(max=maxR) / dr).astype(np.int)

    # go through all points            
    for j in range(n): # n.b. previously n-1, not sure why
        ind = inds[j]

        np.add.at(corrfun, ind, (s - s[j])**2)

        np.add.at(rw, ind, 1)
        rw[ind[j]] -= 1 # subtract self

第一个观察结果是您的hypot代码正在计算2D距离，因此我将其替换为SciPy中的cdist，以便在一次调用中完成所有操作。第二个是内部for循环缓慢，并且由于来自@hpaulj的深刻见解，我使用np.add.at()进行了矢量化。

既然你也问过如何对内循环进行矢量化，那我后来就这样做了。它现在需要0.25秒才能运行，总速度超过20倍。这是最终的代码：

    points = np.column_stack((x,y))
    hypots = scipy.spatial.distance.cdist(points, points)
    inds = np.rint(hypots.clip(max=maxR) / dr).astype(np.int)

    sn = np.tile(s, (n,1)) # n copies of s                                                                              
    diffs = (sn - sn.T)**2 # squares of pairwise differences
    np.add.at(corrfun, inds, diffs)

    rw = np.bincount(inds.flatten(), minlength=len(r))
    np.subtract.at(rw, inds.diagonal(), 1) # subtract self

这会占用更多内存，但与上面的单循环版本相比确实产生了相当大的加速。

Answer 2

以下是使用broadcast，hypot，round，bincount删除所有循环的代码：

for pt in 1,1 4,8 87,34; do
    points+=(-draw "point $pt")
done
convert -size 100x100 xc:black -fill white "${points[@]}" image.png

比较，我修改了你的代码如下：

def twopointcorr2(x, y, s, dr):
    width = np.max(x)-np.min(x)
    height = np.max(y)-np.min(y)
    n = len(x)
    maxR = np.sqrt((width/2)**2 + (height/2)**2)
    r = np.arange(0, maxR, dr)    
    osub = lambda x:np.subtract.outer(x, x)

    ind = np.clip(np.round(np.hypot(osub(x), osub(y)) / dr), 0, len(r)-1).astype(int)
    rw = np.bincount(ind.ravel())
    rw[0] -= len(x)
    corrfun = np.bincount(ind.ravel(), (osub(s)**2).ravel())
    return r, corrfun, rw

以下是检查结果的代码：

def twopointcorr(x,y,s,dr):
    width = np.max(x)-np.min(x)
    height = np.max(y)-np.min(y)

    n = len(x)

    maxR = np.sqrt((width/2)**2 + (height/2)**2)

    r = np.arange(0, maxR, dr)
    corrfun = r*0
    rw = r*0
    for j in range(0, n):
        hypot = np.sqrt((x[j]-x)**2+(y[j]-y)**2)
        ind = [np.abs(r-h).argmin() for h in hypot]
        for k, v in enumerate(ind):
            if j==k:
                continue
            corrfun[v] += (s[k]-s[j])**2
            rw[v] += 1

    return r, corrfun, rw

和％timeit结果：

import numpy as np

n=1000
x = np.random.rand(n)
y = np.random.rand(n)
s = np.random.rand(n)

r1, corrfun1, rw1 = twopointcorr(x,y,s,0.1)
r2, corrfun2, rw2 = twopointcorr2(x,y,s,0.1)

assert np.allclose(r1, r2)
assert np.allclose(corrfun1, corrfun2)
assert np.allclose(rw1, rw2)

输出：

%timeit twopointcorr(x,y,s,0.1)
%timeit twopointcorr2(x,y,s,0.1)

Answer 3

好的，因为事实证明外部产品的内存非常昂贵，但是，使用来自@HYRY和@JohnZwinck的答案，我能够在内存中制作仍然大致线性的代码并快速计算（0.5秒）测试用例）

import numpy as np

def twopointcorr(x,y,s,dr,maxR=-1):

    width = np.max(x)-np.min(x)
    height = np.max(y)-np.min(y)

    n = len(x)

    if maxR < dr:
        maxR = np.sqrt((width/2)**2 + (height/2)**2)

    r = np.arange(0, maxR+dr, dr)

    corrfun = r*0
    rw = r*0


    for j in range(0, n):

        ind = np.clip(np.round(np.hypot(x[j]-x,y[j]-y) / dr), 0, len(r)-1).astype(int)
        np.add.at(corrfun, ind, (s - s[j])**2)
        np.add.at(rw, ind, 1)

    rw[0] -= n

    corrfun = np.sqrt(np.divide(corrfun, np.maximum(rw,1)))
    r=np.delete(r,-1)
    rw=np.delete(rw,-1)
    corrfun=np.delete(corrfun,-1)
    return r, corrfun, rw

矢量化循环与python中的重复索引

3 个答案: