在这种情况下,为什么广播比循环慢?

时间:2018-04-30 04:52:21

标签: performance for-loop euclidean-distance numpy-broadcasting

我写了一个简单的代码来比较广播与循环的性能。代码如下:

import numpy as np

D = 3072
num_train = 5000
test = np.random.rand(D)
X_train = np.random.rand(num_train, D)


def time_function(f, *args):
    """
    Call a function f with args and return the time (in seconds) that it took to execute.
    """
    import time
    tic = time.time()
    f(*args)
    toc = time.time()
    return toc - tic

def one_test_one_loop():
    dists = np.zeros(num_train)
    for i in range(num_train):
        square_sum = np.sum((test - X_train[i]) ** 2)
        dists[i] = square_sum ** (1 / 2)

def one_test_no_loop():
    dists = np.zeros(num_train)
    square_diffs = (test - X_train) ** 2
    square_sums = np.sum(square_diffs, 1)
    dists = square_sums ** (1 / 2)

one_loop_time, no_loop_time = 0, 0
for i in range(10):
    one_loop_time += time_function(one_test_one_loop)
    no_loop_time += time_function(one_test_no_loop)

print ("X_train's shape: (%d, %d)" % X_train.shape)
print ("test's shape: (%d, )" % test.shape)
print('One loop version took %f seconds' % one_loop_time)
print('No loop version took %f seconds' % no_loop_time)

结果如下:

X_train's shape: (5000, 3072)
test's shape: (3072, )
One loop version took 0.484136 seconds
No loop version took 0.934610 seconds

基本上,我计算一个测试样本与所有5000个列车数据之间的L2距离。时间函数只返回函数的运行时间 我期待广播将比循环版本更快,但是,广播版本比循环版本慢两倍。为什么呢?

0 个答案:

没有答案