Question

我需要帮助矢量化此代码。现在，N = 100，运行需要一分钟左右。我想加快速度。我已经做了类似这样的双循环，但从来没有使用3D循环，我遇到了困难。

import numpy as np
N = 100
n = 12
r = np.sqrt(2)

x = np.arange(-N,N+1)
y = np.arange(-N,N+1)
z = np.arange(-N,N+1)

C = 0

for i in x:
    for j in y:
        for k in z:
            if (i+j+k)%2==0 and (i*i+j*j+k*k!=0):
                p = np.sqrt(i*i+j*j+k*k)
                p = p/r
                q = (1/p)**n
                C += q

print '\n'
print C

Answer 1

感谢@Bill，我能够让它发挥作用。现在很快。也许可以做得更好，尤其是使用两个面具来摆脱我最初用于循环的两个条件。

    from __future__ import division
    import numpy as np

    N = 100
    n = 12
    r = np.sqrt(2)

    x, y, z = np.meshgrid(*[np.arange(-N, N+1)]*3)

    ind = np.where((x+y+z)%2==0)
    x = x[ind]
    y = y[ind]
    z = z[ind]
    ind = np.where((x*x+y*y+z*z)!=0)
    x = x[ind]
    y = y[ind]
    z = z[ind]

    p=np.sqrt(x*x+y*y+z*z)/r

    ans = (1/p)**n
    ans = np.sum(ans)
    print 'ans'
    print ans

Answer 2

meshgrid / where / indexing解决方案已经非常快了。我把它提高了大约65％。这不是太多，但我还是一步一步解释：

对于我来说，最容易解决这个问题，网格中的所有3D矢量都是一个大型2D 3 x M数组中的列。 meshgrid是创建所有组合的正确工具（请注意，3D网格网格需要numpy版本＆gt; = 1.7），vstack + reshape将数据转换为所需的形式。例如：

>>> np.vstack(np.meshgrid(*[np.arange(0, 2)]*3)).reshape(3,-1)
array([[0, 0, 1, 1, 0, 0, 1, 1],
       [0, 0, 0, 0, 1, 1, 1, 1],
       [0, 1, 0, 1, 0, 1, 0, 1]])

每列是一个3D矢量。这八个向量中的每一个都代表1x1x1立方体的一个角（一个3D网格，步长为1，长度为1）。

让我们调用这个数组vectors（它包含代表网格中所有点的所有3D向量）。然后，准备一个bool掩码，用于选择满足mod2标准的矢量：

    mod2bool = np.sum(vectors, axis=0) % 2 == 0

np.sum(vectors, axis=0)创建一个1 x M数组，其中包含每个列向量的元素总和。因此，mod2bool是一个1 x M数组，每个列向量都有一个bool值。现在使用这个bool面具：

    vectorsubset = vectors[:,mod2bool]

这将选择所有行（:)并使用布尔索引来过滤列，两者都是numpy中的快速操作。使用原生numpy方法计算剩余向量的长度：

    lengths = np.sqrt(np.sum(vectorsubset**2, axis=0))

这非常快 - 但是，scipy.stats.ss和bottleneck.ss可以比这更快地执行平方和计算。

使用您的说明转换长度：

    with np.errstate(divide='ignore'):
        p = (r/lengths)**n

这涉及有限数除以零，导致输出数组中的Inf s。这完全没问题。我们使用numpy的errstate上下文管理器来确保这些零分区不会抛出异常或运行时警告。

现在总结有限元（忽略infs）并返回总和：

    return  np.sum(p[np.isfinite(p)])

我已经在下面两次实现了这个方法。一旦完全解释，并且曾经涉及瓶颈的ss和nansum功能。我还添加了您的比较方法，以及跳过np.where((x*x+y*y+z*z)!=0)索引的方法的修改版本，而是创建Inf s，最后总结isfinite方式。

import sys
import numpy as np
import bottleneck as bn

N = 100
n = 12
r = np.sqrt(2)


x,y,z = np.meshgrid(*[np.arange(-N, N+1)]*3)
gridvectors = np.vstack((x,y,z)).reshape(3, -1)


def measure_time(func):
    import time
    def modified_func(*args, **kwargs):
        t0 = time.time()
        result = func(*args, **kwargs)
        duration = time.time() - t0
        print("%s duration: %.3f s" % (func.__name__, duration))
        return result
    return modified_func


@measure_time
def method_columnvecs(vectors):
    mod2bool = np.sum(vectors, axis=0) % 2 == 0
    vectorsubset = vectors[:,mod2bool]
    lengths = np.sqrt(np.sum(vectorsubset**2, axis=0))
    with np.errstate(divide='ignore'):
        p = (r/lengths)**n
    return  np.sum(p[np.isfinite(p)])


@measure_time
def method_columnvecs_opt(vectors):
    # On my system, bn.nansum is even slightly faster than np.sum.
    mod2bool = bn.nansum(vectors, axis=0) % 2 == 0
    # Use ss from bottleneck or scipy.stats (axis=0 is default).
    lengths = np.sqrt(bn.ss(vectors[:,mod2bool]))
    with np.errstate(divide='ignore'):
        p = (r/lengths)**n
    return  bn.nansum(p[np.isfinite(p)])


@measure_time
def method_original(x,y,z):
    ind = np.where((x+y+z)%2==0)
    x = x[ind]
    y = y[ind]
    z = z[ind]
    ind = np.where((x*x+y*y+z*z)!=0)
    x = x[ind]
    y = y[ind]
    z = z[ind]
    p=np.sqrt(x*x+y*y+z*z)/r
    return np.sum((1/p)**n)


@measure_time
def method_original_finitesum(x,y,z):
    ind = np.where((x+y+z)%2==0)
    x = x[ind]
    y = y[ind]
    z = z[ind]
    lengths = np.sqrt(x*x+y*y+z*z)
    with np.errstate(divide='ignore'):
        p = (r/lengths)**n
    return  np.sum(p[np.isfinite(p)])


print method_columnvecs(gridvectors)
print method_columnvecs_opt(gridvectors)
print method_original(x,y,z)
print method_original_finitesum(x,y,z)

这是输出：

$ python test.py
method_columnvecs duration: 1.295 s
12.1318801965
method_columnvecs_opt duration: 1.162 s
12.1318801965
method_original duration: 1.936 s
12.1318801965
method_original_finitesum duration: 1.714 s
12.1318801965

所有方法都会产生相同的结果。在执行isfinite样式求和时，您的方法会变得更快一些。我的方法更快，但我会说这是一种学术性的练习而不是一项重要的改进： - ）

我还有一个问题：你说N = 3，计算应该产生12.甚至你的不会这样做。对于N = 3，上述所有方法产生12.1317530867。这是预期的吗？

3D距离矢量化

2 个答案: