矢量化循环NumPy

时间:2013-07-22 15:56:20

标签: python numpy scipy vectorization cython

我对Python比较新,我有一个嵌套的for循环。由于for循环需要一段时间才能运行,我正试图找到一种方法来对这段代码进行矢量化,以便它可以更快地运行。

在这种情况下,coord是一个三维数组,其中coord [x,0,0]和coord [x,0,1]是整数,coord [x,0,2]是0或1. H是一个SciPy稀疏矩阵,x_dist,y_dist,z_dist和a都是浮点数。

# x_dist, y_dist, and z_dist are floats
# coord is a num x 1 x 3 numpy array where num can go into the hundreds of thousands
num = coord.shape[0]    
H = sparse.lil_matrix((num, num))
for i in xrange(num):
    for j in xrange(num):
        if (np.absolute(coord[i, 0, 0] - coord[j, 0, 0]) <= 2 and
                (np.absolute(coord[i, 0, 1] - coord[j, 0, 1]) <= 1)):

            x = ((coord[i, 0, 0] * x_dist + coord[i, 0, 2] * z_dist) -
                 (coord[j, 0, 0] * x_dist + coord[j, 0, 2] * z_dist))

            y = (coord[i, 0, 1] * y_dist) - (coord[j, 0, 1] * y_dist)

            if a - 0.5 <= np.sqrt(x ** 2 + y ** 2) <= a + 0.5:
                H[i, j] = -2.7

我还读到,使用NumPy进行广播虽然速度更快,但可能导致临时阵列的大量内存使用。去矢量化路线或尝试使用像Cython这样的东西会更好吗?

2 个答案:

答案 0 :(得分:7)

这就是我对代码进行矢量化的方法,稍后会对一些警告进行讨论:

import numpy as np
import scipy.sparse as sps

idx = ((np.abs(coord[:, 0, 0] - coord[:, 0, 0, None]) <= 2) &
       (np.abs(coord[:, 0, 1] - coord[:, 0, 1, None]) <= 1))

rows, cols = np.nonzero(idx)
x = ((coord[rows, 0, 0]-coord[cols, 0, 0]) * x_dist +
     (coord[rows, 0, 2]-coord[cols, 0, 2]) * z_dist)
y = (coord[rows, 0, 1]-coord[cols, 0, 1]) * y_dist
r2 = x*x + y*y

idx = ((a - 0.5)**2 <= r2) & (r2 <= (a + 0.5)**2)

rows, cols = rows[idx], cols[idx]
data = np.repeat(2.7, len(rows))

H = sps.coo_matrix((data, (rows, cols)), shape=(num, num)).tolil()

正如您所指出的那样,问题将出现在第一个idx数组中,因为它的形状为(num, num),因此如果num它可能会将您的记忆分解为碎片是“成千上万。”

一个可能的解决方案是将您的问题分解为可管理的块。如果您有100,000个元素数组,则可以将其拆分为100个元素的100个块,并为10,000个块组合中的每一个运行上面代码的修改版本。您只需要1,000,000个元素idx数组(您可以预先分配和重用以获得更好的性能),并且您将只有10,000次迭代的循环,而不是当前实现的10,000,000,000次循环。它是一种穷人的并行化方案,如果你有一台多核机器,可以通过并行处理其中几个块来实际改进。

答案 1 :(得分:2)

计算的本质使我很难用我熟悉的numpy方法进行矢量化。我认为在速度和内存使用方面最好的解决方案是cython。但是,您可以使用numba获得一些加速。这是一个示例(请注意,通常您使用autojit作为装饰器):

import numpy as np
from scipy import sparse
import time
from numba.decorators import autojit
x_dist=.5
y_dist = .5
z_dist = .4
a = .6
coord = np.random.normal(size=(1000,1000,1000))

def run(coord, x_dist,y_dist, z_dist, a):
    num = coord.shape[0]    
    H = sparse.lil_matrix((num, num))
    for i in xrange(num):
        for j in xrange(num):
            if (np.absolute(coord[i, 0, 0] - coord[j, 0, 0]) <= 2 and
                    (np.absolute(coord[i, 0, 1] - coord[j, 0, 1]) <= 1)):

                x = ((coord[i, 0, 0] * x_dist + coord[i, 0, 2] * z_dist) -
                     (coord[j, 0, 0] * x_dist + coord[j, 0, 2] * z_dist))

                y = (coord[i, 0, 1] * y_dist) - (coord[j, 0, 1] * y_dist)

                if a - 0.5 <= np.sqrt(x ** 2 + y ** 2) <= a + 0.5:
                    H[i, j] = -2.7
    return H

runaj = autojit(run)

t0 = time.time()
run(coord,x_dist,y_dist, z_dist, a)
t1 = time.time()
print 'First Original Runtime:', t1 - t0

t0 = time.time()
run(coord,x_dist,y_dist, z_dist, a)
t1 = time.time()
print 'Second Original Runtime:', t1 - t0

t0 = time.time()
run(coord,x_dist,y_dist, z_dist, a)
t1 = time.time()
print 'Third Original Runtime:', t1 - t0

t0 = time.time()
runaj(coord,x_dist,y_dist, z_dist, a)
t1 = time.time()
print 'First Numba Runtime:', t1 - t0

t0 = time.time()
runaj(coord,x_dist,y_dist, z_dist, a)
t1 = time.time()
print 'Second Numba Runtime:', t1 - t0

t0 = time.time()
runaj(coord,x_dist,y_dist, z_dist, a)
t1 = time.time()
print 'Third Numba Runtime:', t1 - t0

我得到了这个输出:

First Original Runtime: 21.3574919701
Second Original Runtime: 15.7615520954
Third Original Runtime: 15.3634860516
First Numba Runtime: 9.87108802795
Second Numba Runtime: 9.32944011688
Third Numba Runtime: 9.32300305367