下面的代码测量线性模型y = X0*X1*X2
的渐变,其中X0,...
是等级5000的矩阵。梯度以两种方式测量,由combined
确定:
tf.gradients(y, [x0, x1, ...])
tf.gradients(y, x0), ...
我的期望是方法1比方法2快得多,因为它能够使用反向传播并在不同的“层”(在这种情况下是不同的矩阵)之间共享结果。我认为方法2不能这样做,并且必须从头开始计算每层的梯度。
实际上,运行时间几乎相等(对于CPU和GPU而言),并且在更改排名时它们的扩展方式相同。那是为什么?
from time import time
import numpy as np
import tensorflow as tf
depth = 3
rank = 5000
combined = False
def get_matrix(rank):
return np.random.rand(rank, rank).astype(np.float32)
xs = [tf.placeholder(tf.float32, shape=(rank, rank)) for i in range(depth)]
feed = {x: get_matrix(rank) for x in xs}
y = xs[0]
for i in range(1, depth):
y = tf.matmul(y, xs[i], name='mul{}'.format(i))
if combined:
# Method 1
g = tf.gradients(y, xs)
else:
# Method 2
g = [tf.gradients(y, x) for x in xs]
sess = tf.Session()
sess.run(tf.global_variables_initializer())
# Warmup
for i in range(3):
sess.run(g, feed_dict=feed)
start = time()
result = sess.run(g, feed_dict=feed)
print('Combined:' if combined else 'Separate:', time() - start, 'secs')