Ich希望优化我的程序中完成最多工作的关键部分。这项工作包括计算许多点积。
目前我正在做这样的事情:
W = [np.random.rand(n_output, n_input) for _ in range(n_child)]
x = np.random.rand(n_input, batch_size)
for _ in range(n_times):
for i in range(n_child):
result[i] = np.dot(W[i],x)
# work with result
我想通过摆脱for i in range(n_child)
循环来优化这一部分。所以我做了以下事情:
W = np.random.rand(n_child, n_output, n_input)
x = np.random.rand(n_input, batch_size)
for _ in range(n_times):
results = np.dot(W,x)
# work with result
但是,事实证明这要慢得多。我错过了什么?在下面的示例中,我的第二种方法慢了大约30倍。那怎么可能?
这是一个可以正常工作的示例:
import numpy as np
import time
n_child = 32
n_input = 1000
n_output = 20
batch_size = 64
W = np.random.rand(n_child, n_output, n_input)
x = np.random.rand(n_input, batch_size)
n_times = 1000
t0 = time.time()
for _ in range(n_times):
np.dot(W,x)
t_a = time.time() - t0
print(t_a) # takes about 60 seconds on my machine
W = [np.random.rand(n_output, n_input) for _ in range(n_child)]
t0 = time.time()
for _ in range(n_times):
for i in range(n_child):
np.dot(W[i],x)
t_b = time.time() - t0
print(t_b) # takes about 3 seconds on my machine