Question

Ich希望优化我的程序中完成最多工作的关键部分。这项工作包括计算许多点积。

目前我正在做这样的事情：

W = [np.random.rand(n_output, n_input) for _ in range(n_child)]
x = np.random.rand(n_input, batch_size)

for _ in range(n_times):
    for i in range(n_child):
        result[i] = np.dot(W[i],x)
    # work with result

我想通过摆脱for i in range(n_child)循环来优化这一部分。所以我做了以下事情：

W = np.random.rand(n_child, n_output, n_input)
x = np.random.rand(n_input, batch_size)

for _ in range(n_times):
    results = np.dot(W,x)
    # work with result

但是，事实证明这要慢得多。我错过了什么？在下面的示例中，我的第二种方法慢了大约30倍。那怎么可能？

这是一个可以正常工作的示例：

import numpy as np
import time

n_child = 32
n_input = 1000
n_output = 20
batch_size = 64

W = np.random.rand(n_child, n_output, n_input)
x = np.random.rand(n_input, batch_size)

n_times = 1000

t0 = time.time()
for _ in range(n_times):
    np.dot(W,x)
t_a = time.time() - t0
print(t_a) # takes about 60 seconds on my machine

W = [np.random.rand(n_output, n_input) for _ in range(n_child)]

t0 = time.time()
for _ in range(n_times):
    for i in range(n_child):
        np.dot(W[i],x)
t_b = time.time() - t0
print(t_b) # takes about 3 seconds on my machine

矢量化点积比for循环版本慢得多

0 个答案: