当我在Tensorflow中描述深层模型架构的训练时,我看到在跟踪结束时有很长一段时间(超过总时间的1/2),而没有跟踪中显示的任何操作。这是预期的吗?
根据建议,我将tf.Print语句添加到几个张量器中以获取时间戳,并且它们与上面的时间轴匹配。我不包括所有渐变打印语句,只包括第一个/最后一个3.看起来计算渐变在每批总共490毫秒中需要360毫秒。为什么这不在剖析中?
2017-09-11 14:53:59.908936: I tensorflow/core/kernels/logging_ops.cc:79] dummy_input[0]
2017-09-11 14:54:00.021368: I tensorflow/core/kernels/logging_ops.cc:79] predictions[0]
2017-09-11 14:54:00.022132: I tensorflow/core/kernels/logging_ops.cc:79] total_loss[0]
2017-09-11 14:54:00.022495: I tensorflow/core/kernels/logging_ops.cc:79] grad[0]
2017-09-11 14:54:00.022673: I tensorflow/core/kernels/logging_ops.cc:79] grad[0]
2017-09-11 14:54:00.023230: I tensorflow/core/kernels/logging_ops.cc:79] grad[0]
...
2017-09-11 14:54:00.383078: I tensorflow/core/kernels/logging_ops.cc:79] grad[0]
2017-09-11 14:54:00.383580: I tensorflow/core/kernels/logging_ops.cc:79] grad[0]
2017-09-11 14:54:00.389668: I tensorflow/core/kernels/logging_ops.cc:79] grad[0]
2017-09-11 14:54:00.394650: I tensorflow/core/kernels/logging_ops.cc:79] train_op[0]
i = 3 : Duration = 490 ms
我在python每次迭代中看到的没有分析的墙上时间约为500毫秒,与跟踪持续时间相匹配。在没有显示操作的300毫秒期间发生了什么?这是将更新分配给模型变量需要多长时间?有没有办法可以减少这300毫秒的死亡时间#34;是时候加快培训了吗?
我正在使用GTX 1080ti GPU在Ubuntu 16.04上运行Tensorflow r1.3(从源代码安装)。
这是训练tf.contrib.keras.xception模型的虚拟示例,并给出了上面包含的分析跟踪。我在使用真实数据时看到了类似的行为。
import tensorflow as tf
import tensorflow.contrib.keras as keras
import time
import os
nclasses = 128
batch_size = 20
def main(unused_argv):
# Specify GPU Index
os.environ["CUDA_VISIBLE_DEVICES"] = '0'
# Configure session
gpu_options = tf.GPUOptions(allow_growth=True)
config = tf.ConfigProto(gpu_options=gpu_options,
log_device_placement=False)
with tf.Session(config=config) as sess:
# Dummy random inputs
dummy_input = tf.random_uniform((batch_size, 299, 299, 3),
minval=-1,
maxval=1,
dtype=tf.float32)
dummy_input = tf.Print(dummy_input, [0], message='dummy_input')
dummy_reference = tf.random_uniform((batch_size, 1),
minval=0,
maxval=nclasses,
dtype=tf.int32)
dummy_reference = tf.to_float(dummy_reference)
# Xception network
keras.backend.set_learning_phase(True)
xmod = keras.applications.xception.Xception(weights=None,
input_tensor=dummy_input,
classes=nclasses)
predictions = xmod.output
# Cross entropy loss
loss = keras.losses.categorical_crossentropy(dummy_reference,
predictions)
total_loss = tf.reduce_mean(loss)
total_loss = tf.Print(total_loss, [0], message='total_loss')
# Optimizer
optimizer = tf.train.GradientDescentOptimizer(0.01)
# Training op wrapper
train_op = tf.contrib.training.create_train_op(total_loss, optimizer)
train_op = tf.Print(train_op, [0], message='train_op')
# Initialize
init_op = tf.global_variables_initializer()
sess.run(init_op)
# Setup profiler
option_builder = tf.profiler.ProfileOptionBuilder
profiler = tf.profiler.Profiler(sess.graph)
run_meta = tf.RunMetadata()
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
profile_op = train_op.op
# Profile on 5th repetition
nrep = 5
for i in range(nrep):
tstart = time.time()
if i < nrep - 1:
sess.run(profile_op)
duration = time.time() - tstart
print('i = %d : Duration = %0.0f ms' % (i, duration*1000))
else:
sess.run(profile_op,
options=run_options,
run_metadata=run_meta)
duration = time.time() - tstart
print('i = %d : Duration = %0.0f ms' % (i, duration*1000))
profiler.add_step(i, run_meta)
# Profile the parameters of your model.
profiler.profile_name_scope(options=(option_builder.trainable_variables_parameter()))
# Or profile the timing of your model operations.
opts = option_builder.time_and_memory()
profiler.profile_operations(options=opts)
# Or you can generate a timeline:
filename = 'tfprof.json'
opts = (option_builder(
option_builder.time_and_memory()).with_step(i).with_timeline_output(filename).build())
profiler.profile_graph(options=opts)
# Profiler advice
ALL_ADVICE = {'ExpensiveOperationChecker': {},
'AcceleratorUtilizationChecker': {},
'JobChecker': {}, # Only available internally.
'OperationChecker': {}}
profiler.advise(ALL_ADVICE)
if __name__ == '__main__':
tf.app.run()