向量的最大加速速度是多少?

时间:2019-05-02 05:15:25

标签: parallel-processing vectorization

我主要使用Python开发。在那里,我注意到,对numpy进行专门的操作可以极大地提高速度。有时会快1000倍。

我刚在Performance: SIMD, Vectorization and Performance Tuning中听过James Reinders(以前的Intel总监)的话,向量化最多可以使速度提高16倍(03:00-03:09分钟),但是并行化最多可以使256倍。加速。

这些数字从哪里来?我以为并行化的加速是线程数,因此Intel i7-6700HQ上的线程数是8倍?

Python向量化示例

这是我看到巨大差异的一个示例:

import timeit
import numpy as np

def print_durations(durations):
    print('min: {min:5.1f}ms, mean: {mean:5.1f}ms, max: {max:6.1f}ms (total: {len})'
          .format(min=min(durations) * 10**3,
                  mean=np.mean(durations) * 10**3,
                  max=max(durations) * 10**3,
                  len=len(durations)
                  ))

def test_speed(nb_items=1000):
    print('## nb_items={}'.format(nb_items))
    durations = timeit.repeat('cosine_similarity(mat)',
                  setup='from sklearn.metrics.pairwise import cosine_similarity;import numpy as np;mat = np.random.random(({}, 50))'.format(nb_items),
                  repeat=10, number=1)
    print_durations(durations)

    durations = timeit.repeat('for i, j in combinations(range({}), 2): cosine_similarity([mat[i], mat[j]])'.format(nb_items),
                  setup='from itertools import combinations;from sklearn.metrics.pairwise import cosine_similarity;import numpy as np;mat = np.random.random(({}, 50))'.format(nb_items),
                  repeat=10, number=1)
    print_durations(durations)

print('First vectorized, second with loops')
test_speed(nb_items=100)
test_speed(nb_items=200)
test_speed(nb_items=300)
test_speed(nb_items=400)
test_speed(nb_items=500)

0 个答案:

没有答案