在计算交叉点时避免numpy循环

时间:2015-12-14 16:16:49

标签: python arrays performance numpy vectorization

我想加快处理r光线和n球体的以下计算速度。这是我到目前为止所得到的:

# shape of mu1 and mu2 is (r, n)
# shape of rays is (r, 3)
# note that intersections has 2n columns because for every sphere one can
# get up to two intersections (secant, tangent, no intersection)
intersections = np.empty((r, 2*n, 3))
for col in range(n):
    intersections[:, col, :] = rays * mu1[:, col][:, np.newaxis]
    intersections[:, col + n, :] = rays * mu2[:, col][:, np.newaxis]

# [...]

# calculate euclidean distance from the center of gravity (0,0,0)
distances = np.empty((r, 2 * n))
for col in range(n):
    distances[:, col] = np.linalg.norm(intersections[:, col], axis=1)
    distances[:, col + n] = np.linalg.norm(intersections[:, col + n], axis=1)

我试图通过避免for - 循环来加快速度,但无法弄清楚如何正确地广播数组,这样我只需要一个函数调用。非常感谢任何帮助。

1 个答案:

答案 0 :(得分:2)

这是使用broadcasting -

的矢量化方式
intersections = np.hstack((mu1,mu2))[...,None]*rays[:,None,:]
distances = np.sqrt((intersections**2).sum(2))

最后一步可以替换为使用np.einsum,如此 -

distances = np.sqrt(np.einsum('ijk,ijk->ij',intersections,intersections))

或用np.einsum替换几乎整个事物的另一种矢量化方式,如此 -

mu = np.hstack((mu1,mu2))
distances = np.sqrt(np.einsum('ij,ij,ik,ik->ij',mu,mu,rays,rays))

运行时测试并验证输出 -

def original_app(mu1,mu2,rays):
    intersections = np.empty((r, 2*n, 3))
    for col in range(n):
        intersections[:, col, :] = rays * mu1[:, col][:, np.newaxis]
        intersections[:, col + n, :] = rays * mu2[:, col][:, np.newaxis]

    distances = np.empty((r, 2 * n))
    for col in range(n):
        distances[:, col] = np.linalg.norm(intersections[:, col], axis=1)
        distances[:, col + n] = np.linalg.norm(intersections[:, col + n], axis=1)
    return distances                    

def vectorized_app1(mu1,mu2,rays):
    intersections = np.hstack((mu1,mu2))[...,None]*rays[:,None,:]
    return np.sqrt((intersections**2).sum(2))

def vectorized_app2(mu1,mu2,rays):
    intersections = np.hstack((mu1,mu2))[...,None]*rays[:,None,:]
    return np.sqrt(np.einsum('ijk,ijk->ij',intersections,intersections))

def vectorized_app3(mu1,mu2,rays):
    mu = np.hstack((mu1,mu2))
    return np.sqrt(np.einsum('ij,ij,ik,ik->ij',mu,mu,rays,rays))

计时 -

In [101]: # Inputs
     ...: r = 1000
     ...: n = 1000
     ...: mu1 = np.random.rand(r, n)
     ...: mu2 = np.random.rand(r, n)
     ...: rays = np.random.rand(r, 3)


In [102]: np.allclose(original_app(mu1,mu2,rays),vectorized_app1(mu1,mu2,rays))
Out[102]: True

In [103]: np.allclose(original_app(mu1,mu2,rays),vectorized_app2(mu1,mu2,rays))
Out[103]: True

In [104]: np.allclose(original_app(mu1,mu2,rays),vectorized_app3(mu1,mu2,rays))
Out[104]: True

In [105]: %timeit original_app(mu1,mu2,rays)
     ...: %timeit vectorized_app1(mu1,mu2,rays)
     ...: %timeit vectorized_app2(mu1,mu2,rays)
     ...: %timeit vectorized_app3(mu1,mu2,rays)
     ...: 
1 loops, best of 3: 306 ms per loop
1 loops, best of 3: 215 ms per loop
10 loops, best of 3: 140 ms per loop
10 loops, best of 3: 136 ms per loop