我主要使用Python开发。在那里,我注意到,对numpy进行专门的操作可以极大地提高速度。有时会快1000倍。
我刚在Performance: SIMD, Vectorization and Performance Tuning中听过James Reinders(以前的Intel总监)的话,向量化最多可以使速度提高16倍(03:00-03:09分钟),但是并行化最多可以使256倍。加速。
这些数字从哪里来?我以为并行化的加速是线程数,因此Intel i7-6700HQ上的线程数是8倍?
这是我看到巨大差异的一个示例:
import timeit
import numpy as np
def print_durations(durations):
print('min: {min:5.1f}ms, mean: {mean:5.1f}ms, max: {max:6.1f}ms (total: {len})'
.format(min=min(durations) * 10**3,
mean=np.mean(durations) * 10**3,
max=max(durations) * 10**3,
len=len(durations)
))
def test_speed(nb_items=1000):
print('## nb_items={}'.format(nb_items))
durations = timeit.repeat('cosine_similarity(mat)',
setup='from sklearn.metrics.pairwise import cosine_similarity;import numpy as np;mat = np.random.random(({}, 50))'.format(nb_items),
repeat=10, number=1)
print_durations(durations)
durations = timeit.repeat('for i, j in combinations(range({}), 2): cosine_similarity([mat[i], mat[j]])'.format(nb_items),
setup='from itertools import combinations;from sklearn.metrics.pairwise import cosine_similarity;import numpy as np;mat = np.random.random(({}, 50))'.format(nb_items),
repeat=10, number=1)
print_durations(durations)
print('First vectorized, second with loops')
test_speed(nb_items=100)
test_speed(nb_items=200)
test_speed(nb_items=300)
test_speed(nb_items=400)
test_speed(nb_items=500)