Question

当我在Tensorflow中描述深层模型架构的训练时，我看到在跟踪结束时有很长一段时间（超过总时间的1/2），而没有跟踪中显示的任何操作。这是预期的吗？

根据建议，我将tf.Print语句添加到几个张量器中以获取时间戳，并且它们与上面的时间轴匹配。我不包括所有渐变打印语句，只包括第一个/最后一个3.看起来计算渐变在每批总共490毫秒中需要360毫秒。为什么这不在剖析中？

2017-09-11 14:53:59.908936: I tensorflow/core/kernels/logging_ops.cc:79] dummy_input[0]
2017-09-11 14:54:00.021368: I tensorflow/core/kernels/logging_ops.cc:79] predictions[0]
2017-09-11 14:54:00.022132: I tensorflow/core/kernels/logging_ops.cc:79] total_loss[0]
2017-09-11 14:54:00.022495: I tensorflow/core/kernels/logging_ops.cc:79] grad[0]
2017-09-11 14:54:00.022673: I tensorflow/core/kernels/logging_ops.cc:79] grad[0]
2017-09-11 14:54:00.023230: I tensorflow/core/kernels/logging_ops.cc:79] grad[0]
...
2017-09-11 14:54:00.383078: I tensorflow/core/kernels/logging_ops.cc:79] grad[0]
2017-09-11 14:54:00.383580: I tensorflow/core/kernels/logging_ops.cc:79] grad[0]
2017-09-11 14:54:00.389668: I tensorflow/core/kernels/logging_ops.cc:79] grad[0]
2017-09-11 14:54:00.394650: I tensorflow/core/kernels/logging_ops.cc:79] train_op[0]
i = 3 : Duration = 490 ms

我在python每次迭代中看到的没有分析的墙上时间约为500毫秒，与跟踪持续时间相匹配。在没有显示操作的300毫秒期间发生了什么？这是将更新分配给模型变量需要多长时间？有没有办法可以减少这300毫秒的死亡时间＃34;是时候加快培训了吗？

我正在使用GTX 1080ti GPU在Ubuntu 16.04上运行Tensorflow r1.3（从源代码安装）。

这是训练tf.contrib.keras.xception模型的虚拟示例，并给出了上面包含的分析跟踪。我在使用真实数据时看到了类似的行为。

import tensorflow as tf
import tensorflow.contrib.keras as keras
import time
import os

nclasses = 128
batch_size = 20


def main(unused_argv):
    # Specify GPU Index
    os.environ["CUDA_VISIBLE_DEVICES"] = '0'

    # Configure session
    gpu_options = tf.GPUOptions(allow_growth=True)
    config = tf.ConfigProto(gpu_options=gpu_options,
                            log_device_placement=False)

    with tf.Session(config=config) as sess:
        # Dummy random inputs
        dummy_input = tf.random_uniform((batch_size, 299, 299, 3),
                                        minval=-1,
                                        maxval=1,
                                        dtype=tf.float32)
        dummy_input = tf.Print(dummy_input, [0], message='dummy_input')
        dummy_reference = tf.random_uniform((batch_size, 1),
                                            minval=0,
                                            maxval=nclasses,
                                            dtype=tf.int32)
        dummy_reference = tf.to_float(dummy_reference)

        # Xception network
        keras.backend.set_learning_phase(True)
        xmod = keras.applications.xception.Xception(weights=None,
                                                    input_tensor=dummy_input,
                                                    classes=nclasses)
        predictions = xmod.output

        # Cross entropy loss
        loss = keras.losses.categorical_crossentropy(dummy_reference,
                                                     predictions)
        total_loss = tf.reduce_mean(loss)
        total_loss = tf.Print(total_loss, [0], message='total_loss')

        # Optimizer
        optimizer = tf.train.GradientDescentOptimizer(0.01)
        # Training op wrapper
        train_op = tf.contrib.training.create_train_op(total_loss, optimizer)
        train_op = tf.Print(train_op, [0], message='train_op')

        # Initialize
        init_op = tf.global_variables_initializer()
        sess.run(init_op)

        # Setup profiler
        option_builder = tf.profiler.ProfileOptionBuilder
        profiler = tf.profiler.Profiler(sess.graph)
        run_meta = tf.RunMetadata()
        run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
        profile_op = train_op.op

        # Profile on 5th repetition
        nrep = 5
        for i in range(nrep):
            tstart = time.time()
            if i < nrep - 1:
                sess.run(profile_op)
                duration = time.time() - tstart
                print('i = %d : Duration = %0.0f ms' % (i, duration*1000))
            else:
                sess.run(profile_op,
                         options=run_options,
                         run_metadata=run_meta)
                duration = time.time() - tstart
                print('i = %d : Duration = %0.0f ms' % (i, duration*1000))

                profiler.add_step(i, run_meta)

                # Profile the parameters of your model.
                profiler.profile_name_scope(options=(option_builder.trainable_variables_parameter()))

                # Or profile the timing of your model operations.
                opts = option_builder.time_and_memory()
                profiler.profile_operations(options=opts)

                # Or you can generate a timeline:
                filename = 'tfprof.json'
                opts = (option_builder(
                        option_builder.time_and_memory()).with_step(i).with_timeline_output(filename).build())
                profiler.profile_graph(options=opts)

        # Profiler advice
        ALL_ADVICE = {'ExpensiveOperationChecker': {},
                      'AcceleratorUtilizationChecker': {},
                      'JobChecker': {},  # Only available internally.
                      'OperationChecker': {}}
        profiler.advise(ALL_ADVICE)


if __name__ == '__main__':
    tf.app.run()

Tensorflow Profiling：持续时间长而没有跟踪信息

0 个答案: