Question

我正在培训tensorflow.contrib.seq2seq编码器 - 解码器模型，每个小批量的培训时间单调增加。

Step Number: 10 Elapsed time: 52.89215302467346 Loss: 1.0420862436294556 Metrics: {'accuracy': 0.22499999} Step Number: 20 Elapsed time: 60.28505992889404 Loss: 0.8007364869117737 Metrics: {'accuracy': 0.28} Step Number: 30 Elapsed time: 73.98479580879211 Loss: 0.7292348742485046 Metrics: {'accuracy': 0.34} Step Number: 40 Elapsed time: 82.99069213867188 Loss: 0.6843382120132446 Metrics: {'accuracy': 0.345} Step Number: 50 Elapsed time: 86.97363901138306 Loss: 0.6808319687843323 Metrics: {'accuracy': 0.38999999} Step Number: 60 Elapsed time: 106.96697807312012 Loss: 0.601255476474762 Metrics: {'accuracy': 0.44} Step Number: 70 Elapsed time: 124.17725801467896 Loss: 0.5971778035163879 Metrics: {'accuracy': 0.405} Step Number: 80 Elapsed time: 137.91252613067627 Loss: 0.596596896648407 Metrics: {'accuracy': 0.43000001} Step Number: 90 Elapsed time: 146.6834409236908 Loss: 0.5921837687492371 Metrics: {'accuracy': 0.42500001}

我的所有数据都是人工生成的，并且是随机抽样的，这意味着（一般来说）培训早期的小型客舱与培训后期的小型客车之间应该没有区别。此外，我的所有数据都具有相同的输入序列长度和相同的输出序列长度。为什么我的模型需要更长时间来训练以后的小型车？

我发现这是相关的post，但我在训练循环期间没有更改计算图。

要显示一些代码，请从main开始：

def main(_):
    x_minibatch, y_minibatch, y_lengths_minibatch = construct_data_pipeline()

    model = import_model()

    train(model=model, x_minibatch=x_minibatch, y_minibatch=y_minibatch, y_lengths_minibatch=y_lengths_minibatch)

```

我的数据存储为SequenceExample个，每TFRecord个文件存储一个。我的construct_data_pipeline()函数定义如下：

def construct_data_pipeline():
    # extract TFRecord filenames located in data directory
    tfrecord_filenames = []
    for dirpath, dirnames, filenames in os.walk(tf.app.flags.FLAGS.data_dir):
        for filename in filenames:
            if filename.endswith('.tfrecord'):
                tfrecord_filenames.append(os.path.join(dirpath, filename))

    # read and parse data from TFRecords into tensors
    x, y, x_len, y_len = construct_examples_queue(tfrecord_filenames)

    # group tensors into minibatches
    x_minibatch, y_minibatch, y_lengths_minibatch = construct_minibatches(x=x, y=y,
                                                                      y_len=y_len,
                                                                      x_len=x_len)

    return x_minibatch, y_minibatch, y_lengths_minibatch

进入construct_examples_queue()

def construct_examples_queue(tfrecords_filenames):
    number_of_readers = tf.flags.FLAGS.number_of_readers

    with tf.name_scope('examples_queue'):
        key, example_serialized = tf.contrib.slim.parallel_reader.parallel_read(tfrecords_filenames,
                                                                            tf.TFRecordReader,
                                                                            num_readers=number_of_readers)

        x, y, x_len, y_len = parse_example(example_serialized)

        return x, y, x_len, y_len

我不认为我可以展示parse_example，因为数据不是我自己的。主要部分是我指定SequenceExample包含的内容，然后调用

    context_parsed, sequence_parsed = tf.parse_single_sequence_example(example_serialized,
                                                                   context_features=context_features,
                                                                   sequence_features=sequence_features)

我正在考虑如何构建minibatches，我使用

def construct_minibatches(x, y, y_len, x_len,
                      bucket_boundaries=list(range(400, tf.app.flags.FLAGS.max_x_len, 100))):

    batch_size = tf.app.flags.FLAGS.batch_size

    with tf.name_scope('batch_examples_using_buckets'):
        _, outputs = tf.contrib.training.bucket_by_sequence_length(input_length=len_x,
                                                               tensors=[x, y, y_len],
                                                               batch_size=batch_size,
                                                               bucket_boundaries=bucket_boundaries,
                                                               dynamic_pad=True,
                                                               capacity=2 * batch_size,
                                                               allow_smaller_final_batch=True)

        x_minibatch = outputs[0]
        y_minibatch = outputs[1]
        y_lengths_minibatch = outputs[2]
        return x_minibatch, y_minibatch, y_lengths_minibatch

注意：我必须更改一些变量名称以解决隐私问题。希望我没有犯任何错误。

Answer 1

归功于faddy-w同时解决我的两个问题！

事实证明我在不知情的情况下改变了我的计算图。

我正在打电话

+(NSNumber * _Nonnull) compilerCompletelyFineWithThis {
    return NO;
    // error: Expression which evaluates to zero treated as a null pointer constant of type 'NSNumber * _Nonnull'
}

来自循环内，其中

Equals

和

System.Runtime.CompilerServices.RuntimeHelpers.Equals(value, default(T))

不知道sess.run([model.optimizer.minimize(model.loss), model.y_predicted_logits], feed_dict={model.x: x_values, model.y_actual: y_values, model.y_actual_lengths: y_lengths_values})在我的图表中添加了其他操作。

TensorFlow Seq2Seq每个Minibatch的训练时间单调增加

1 个答案: