这是执行时间日志
正如您所看到的,它越来越快,直到一次迭代使用1.5秒为止
然后越来越慢
iter: 0/700000
loss:8.13768323263
speed: 4.878s / iter
iter: 1/700000
loss:4.69941059748
speed: 3.162s / iter
...
...
...
iter: 1560/700000
loss:2.16679636637
speed: 1.496s / iter
iter: 1561/700000
loss:2.9271744887
speed: 1.496s / iter
...
...
...
iter: 3698/700000
loss:1.47574504217
speed: 1.701s / iter
iter: 3699/700000
loss:1.75555475553
speed: 1.701s / iter
使用graph.finalize()
冻结图表
从源代码安装tensorflow 1.0,使用jemalloc
,使用XLA
构建,SSE
等等
threads = tf.train.start_queue_runners(coord=coord, sess=sess)
sess.graph.finalize() # Graph is read-only after this statement.
并按照此github实现image_reader并累积渐变(如caffe中的iter_size
),所有OP都在训练循环之外
不确定是否相关
GPU内存略有增长,从5707 MiB增加到5717MiB
GPU-util变得低而奇怪
1% - > 59% - > 1% - > 99% - > 0% - > 54% - > 1% - > 48%