我正在进行批MLP评估。由于GPU内存大小有限,因此我必须使用非常大的矩阵(例如1000000x43)的小批量(例如1000x43)。我发现批次大小的计算结果不一致。这是一个最小的示例:
with tf.Session():
x = tf.Variable(np.random.rand(43, 60000).astype('f2'))
w = tf.Variable(np.ones((58624, 43), 'f2'))
b = tf.Variable(np.ones((58624, 1), 'f2'))
v1 = w @ x + b
w = tf.Variable(np.ones((5376, 43), 'f2'))
b = tf.Variable(np.ones((5376, 1), 'f2'))
v2 = w @ x + b
tf.global_variables_initializer().run()
print(v1.eval(), v2.eval())
输出:
[[ 21.375 18.578125 22.765625 ..., 23.828125 24.203125 22.0625 ]
[ 21.375 18.578125 22.765625 ..., 23.828125 24.203125 22.0625 ]
[ 21.375 18.578125 22.765625 ..., 23.828125 24.203125 22.0625 ]
...,
[ 21.375 18.578125 22.765625 ..., 23.828125 24.203125 22.0625 ]
[ 21.375 18.578125 22.765625 ..., 23.828125 24.203125 22.0625 ]
[ 21.375 18.578125 22.765625 ..., 23.828125 24.203125 22.0625 ]] [[ 22.375 19.578125 23.765625 ..., 24.828125 25.203125 23.0625 ]
[ 22.375 19.578125 23.765625 ..., 24.828125 25.203125 23.0625 ]
[ 22.375 19.578125 23.765625 ..., 24.828125 25.203125 23.0625 ]
...,
[ 22.375 19.578125 23.765625 ..., 24.828125 25.203125 23.0625 ]
[ 22.375 19.578125 23.765625 ..., 24.828125 25.203125 23.0625 ]
[ 22.375 19.578125 23.765625 ..., 24.828125 25.203125 23.0625 ]]
显然,两个批处理大小(58624和5376)的结果相差1,但由于矩阵w
和b
的第一轴应该是可广播的,所以不会发生。 / p>
我正在使用tensorflow-gpu 1.11.0
中的conda
,平台是Ubuntu Server 18.04.01
,体系结构amd64
,图形卡Nvidia Titan Xp
。确实存在错误,还是我的逻辑在某处错误?
顺便说一句:
在此示例中,结果似乎与批处理大小的值无关,但在我的实际(更大)示例中,该示例使用常规和批处理矩阵乘法(均为@
)进行广播-ed tanh
,张量加法(+
)和tf.linalg.norm
在最后两个轴上分批计算矩阵范数,结果似乎与批大小成正比。
如果我从+b
和v1
中删除v2
部分,则结果是等效的。但是,如果我用随机张量替换矩阵乘法,就无法重现该问题。
更大的例子在这里:
x = tf.Variable(np.ones((43, 60000), 'f2'))
y = tf.Variable(np.ones((8, 60000), 'f2'))
w = tf.Variable(np.ones((916* 64, 43), 'f2'))
b = tf.Variable(np.ones((916* 64, 1), 'f2'))
v = tf.Variable(np.ones((916, 8, 64), 'f2'))
c = tf.Variable(np.ones((916, 8, 1), 'f2'))
r = tf.linalg.norm(tf.cast(v@tf.reshape(tf.tanh(w@x+b),(916,64,60000))+c,'float32'),axis=(1,2))#prevent overflow
w = tf.Variable(np.ones((84* 64, 43), 'f2'))
b = tf.Variable(np.ones((84* 64, 1), 'f2'))
v = tf.Variable(np.ones((84, 8, 64), 'f2'))
c = tf.Variable(np.ones((84, 8, 1), 'f2'))
R = tf.linalg.norm(tf.cast(v@tf.reshape(tf.tanh(w@x+b),(84,64,60000))+c,'float32'),axis=(1,2))#prevent overflow
tf.global_variables_initializer().run()
print(r.eval()[::100], R.eval()[::10])#summary
产生:
[ 1906641.625 1906641.625 1906641.625 1906641.625 1906641.625
1906641.625 1906641.625 1906641.625 1906641.625 1906641.625]
[ 45033.32421875 45033.32421875 45033.32421875 45033.32421875
45033.32421875 45033.32421875 45033.32421875 45033.32421875
45033.32421875]
谢谢!