使用列表建立索引时如何加快“ for”循环? (蟒蛇)

时间:2020-10-24 04:50:14

标签: python performance numpy loops vectorization

我尝试通过使用Numpy函数或向量而不是for循环来加速此代码:

sommes = []
for j in range(vertices.shape[0]):
    terme = new_vertices[j] - new_vertices[vertex_neighbors[j]]
    somme_j = np.sum(terme)
    sommes.append(somme_j)
E_int = np.sum(sommes)

(它是迭代算法的一部分,并且有很多“顶点”,所以我认为for循环花费的时间太长。)

例如,要计算j = 0时的“ terme”,我有:

In: new_vertices[0]
Out: array([ 10.2533888 , -42.32279717,  68.27230793])

In: vertex_neighbors[0]
Out: [1280, 2, 1511, 511, 1727, 1887, 759, 509, 1023]

In: new_vertices[vertex_neighbors[0]]
Out: array([[ 10.47121043, -42.00123956,  68.218715  ],
            [ 10.2533888 , -43.26905874,  62.59473849],
            [ 10.69773735, -41.26464083,  68.09594854],
            [ 10.37030712, -42.16729601,  68.24639107],
            [ 10.12158146, -42.46624547,  68.29621598],
            [  9.81850836, -42.71158695,  68.33710623],
            [  9.97615447, -42.59625943,  68.31788497],
            [ 10.37030712, -43.11676015,  62.54960623],
            [ 10.55512696, -41.82622703,  68.18954624]])

In: new_vertices[0] - new_vertices[vertex_neighbors[0]]
Out: array([[-0.21782162, -0.32155761,  0.05359293],
             [ 0.        ,  0.94626157,  5.67756944],
             [-0.44434855, -1.05815634,  0.17635939],
             [-0.11691832, -0.15550116,  0.02591686],
             [ 0.13180734,  0.1434483 , -0.02390805],
             [ 0.43488044,  0.38878979, -0.0647983 ],
             [ 0.27723434,  0.27346227, -0.04557704],
             [-0.11691832,  0.79396298,  5.7227017 ],
             [-0.30173816, -0.49657014,  0.08276169]])

问题在于new_vertices [vertex_neighbors [j]]的大小并不总是相同。例如,当j = 7时:

In: new_vertices[7]
Out: array([ 10.74106112, -63.88592276, -70.15593947])

In: vertex_neighbors[7]
Out: [1546, 655, 306, 1879, 920, 925]

In: new_vertices[vertex_neighbors[7]]
Out: array([[  9.71830698, -69.07323638, -83.10229623],
           [ 10.71123017, -64.06983438, -70.09345104],
           [  9.74836003, -68.88820555, -83.16187474],
           [ 10.78982867, -63.70552665, -70.2169896 ],
           [  9.74627177, -60.87823935, -60.13032811],
           [  9.79419242, -60.69528267, -60.182843  ]])

In: new_vertices[7] - new_vertices[vertex_neighbors[7]]
Out: array([[  1.02275414,   5.18731363,  12.94635676],
             [  0.02983095,   0.18391163,  -0.06248843],
             [  0.99270108,   5.0022828 ,  13.00593527],
             [ -0.04876756,  -0.18039611,   0.06105013],
             [  0.99478934,  -3.00768341, -10.02561137],
             [  0.94686869,  -3.19064009,  -9.97309648]])

没有for循环是否可能?我的想法不多了,因此不胜感激!

谢谢。

1 个答案:

答案 0 :(得分:1)

,这是可能的。想法是使用np.repeat创建一个向量,其中将这些项重复可变的次数。 这是代码:

# The two following lines can be done only once if the indices are constant between iterations (precomputation)
counts = np.array([len(e) for e in vertex_neighbors])
flatten_indices = np.concatenate(vertex_neighbors)

E_int = np.sum(np.repeat(new_vertices, counts, axis=0) - new_vertices[flatten_indices])

这是一个基准:

import numpy as np
from time import *


n = 32768
vertices = np.random.rand(n, 3)
indices = []

count = np.random.randint(1, 10, size=n)

for i in range(n):
    indices.append(np.random.randint(0, n, size=count[i]))

def initial_version(vertices, vertex_neighbors):
    sommes = []
    for j in range(vertices.shape[0]):
        terme = vertices[j] - vertices[vertex_neighbors[j]]
        somme_j = np.sum(terme)
        sommes.append(somme_j)
    return np.sum(sommes)

def optimized_version(vertices, vertex_neighbors):
    # The two following lines can be precomputed
    counts = np.array([len(e) for e in indices])
    flatten_indices = np.concatenate(indices)

    return np.sum(np.repeat(vertices, counts, axis=0) - vertices[flatten_indices])

def more_optimized_version(vertices, vertex_neighbors, counts, flatten_indices):
    return np.sum(np.repeat(vertices, counts, axis=0) - vertices[flatten_indices])

timesteps = 20

a = time()
for t in range(timesteps):
    res = initial_version(vertices, indices)
b = time()
print("V1: time:", b - a)
print("V1: result", res)

a = time()
for t in range(timesteps):
    res = optimized_version(vertices, indices)
b = time()
print("V2: time:", b - a)
print("V2: result", res)

a = time()
counts = np.array([len(e) for e in indices])
flatten_indices = np.concatenate(indices)
for t in range(timesteps):
    res = more_optimized_version(vertices, indices, counts, flatten_indices)
b = time()
print("V3: time:", b - a)
print("V3: result", res)

这是我机器上的基准测试结果:

V1: time: 3.656714916229248
V1: result -395.8416223057596
V2: time: 0.19800186157226562
V2: result -395.8416223057595
V3: time: 0.07983255386352539
V3: result -395.8416223057595

如您所见,此优化版本比参考实现快18倍,而预计算索引的版本比参考实现快46倍。

请注意,优化后的版本应需要更多的RAM(尤其是每个顶点的邻居数量很大时)。