我尝试通过使用Numpy函数或向量而不是for循环来加速此代码:
sommes = []
for j in range(vertices.shape[0]):
terme = new_vertices[j] - new_vertices[vertex_neighbors[j]]
somme_j = np.sum(terme)
sommes.append(somme_j)
E_int = np.sum(sommes)
(它是迭代算法的一部分,并且有很多“顶点”,所以我认为for循环花费的时间太长。)
例如,要计算j = 0时的“ terme”,我有:
In: new_vertices[0]
Out: array([ 10.2533888 , -42.32279717, 68.27230793])
In: vertex_neighbors[0]
Out: [1280, 2, 1511, 511, 1727, 1887, 759, 509, 1023]
In: new_vertices[vertex_neighbors[0]]
Out: array([[ 10.47121043, -42.00123956, 68.218715 ],
[ 10.2533888 , -43.26905874, 62.59473849],
[ 10.69773735, -41.26464083, 68.09594854],
[ 10.37030712, -42.16729601, 68.24639107],
[ 10.12158146, -42.46624547, 68.29621598],
[ 9.81850836, -42.71158695, 68.33710623],
[ 9.97615447, -42.59625943, 68.31788497],
[ 10.37030712, -43.11676015, 62.54960623],
[ 10.55512696, -41.82622703, 68.18954624]])
In: new_vertices[0] - new_vertices[vertex_neighbors[0]]
Out: array([[-0.21782162, -0.32155761, 0.05359293],
[ 0. , 0.94626157, 5.67756944],
[-0.44434855, -1.05815634, 0.17635939],
[-0.11691832, -0.15550116, 0.02591686],
[ 0.13180734, 0.1434483 , -0.02390805],
[ 0.43488044, 0.38878979, -0.0647983 ],
[ 0.27723434, 0.27346227, -0.04557704],
[-0.11691832, 0.79396298, 5.7227017 ],
[-0.30173816, -0.49657014, 0.08276169]])
问题在于new_vertices [vertex_neighbors [j]]的大小并不总是相同。例如,当j = 7时:
In: new_vertices[7]
Out: array([ 10.74106112, -63.88592276, -70.15593947])
In: vertex_neighbors[7]
Out: [1546, 655, 306, 1879, 920, 925]
In: new_vertices[vertex_neighbors[7]]
Out: array([[ 9.71830698, -69.07323638, -83.10229623],
[ 10.71123017, -64.06983438, -70.09345104],
[ 9.74836003, -68.88820555, -83.16187474],
[ 10.78982867, -63.70552665, -70.2169896 ],
[ 9.74627177, -60.87823935, -60.13032811],
[ 9.79419242, -60.69528267, -60.182843 ]])
In: new_vertices[7] - new_vertices[vertex_neighbors[7]]
Out: array([[ 1.02275414, 5.18731363, 12.94635676],
[ 0.02983095, 0.18391163, -0.06248843],
[ 0.99270108, 5.0022828 , 13.00593527],
[ -0.04876756, -0.18039611, 0.06105013],
[ 0.99478934, -3.00768341, -10.02561137],
[ 0.94686869, -3.19064009, -9.97309648]])
没有for循环是否可能?我的想法不多了,因此不胜感激!
谢谢。
答案 0 :(得分:1)
是,这是可能的。想法是使用np.repeat
创建一个向量,其中将这些项重复可变的次数。
这是代码:
# The two following lines can be done only once if the indices are constant between iterations (precomputation)
counts = np.array([len(e) for e in vertex_neighbors])
flatten_indices = np.concatenate(vertex_neighbors)
E_int = np.sum(np.repeat(new_vertices, counts, axis=0) - new_vertices[flatten_indices])
这是一个基准:
import numpy as np
from time import *
n = 32768
vertices = np.random.rand(n, 3)
indices = []
count = np.random.randint(1, 10, size=n)
for i in range(n):
indices.append(np.random.randint(0, n, size=count[i]))
def initial_version(vertices, vertex_neighbors):
sommes = []
for j in range(vertices.shape[0]):
terme = vertices[j] - vertices[vertex_neighbors[j]]
somme_j = np.sum(terme)
sommes.append(somme_j)
return np.sum(sommes)
def optimized_version(vertices, vertex_neighbors):
# The two following lines can be precomputed
counts = np.array([len(e) for e in indices])
flatten_indices = np.concatenate(indices)
return np.sum(np.repeat(vertices, counts, axis=0) - vertices[flatten_indices])
def more_optimized_version(vertices, vertex_neighbors, counts, flatten_indices):
return np.sum(np.repeat(vertices, counts, axis=0) - vertices[flatten_indices])
timesteps = 20
a = time()
for t in range(timesteps):
res = initial_version(vertices, indices)
b = time()
print("V1: time:", b - a)
print("V1: result", res)
a = time()
for t in range(timesteps):
res = optimized_version(vertices, indices)
b = time()
print("V2: time:", b - a)
print("V2: result", res)
a = time()
counts = np.array([len(e) for e in indices])
flatten_indices = np.concatenate(indices)
for t in range(timesteps):
res = more_optimized_version(vertices, indices, counts, flatten_indices)
b = time()
print("V3: time:", b - a)
print("V3: result", res)
这是我机器上的基准测试结果:
V1: time: 3.656714916229248
V1: result -395.8416223057596
V2: time: 0.19800186157226562
V2: result -395.8416223057595
V3: time: 0.07983255386352539
V3: result -395.8416223057595
如您所见,此优化版本比参考实现快18倍,而预计算索引的版本比参考实现快46倍。
请注意,优化后的版本应需要更多的RAM(尤其是每个顶点的邻居数量很大时)。