在TensorFlow Estimator model_fn中使用占位符初始化RNN单元的初始状态

时间:2018-08-12 15:33:37

标签: python tensorflow recurrent-neural-network tensorflow-estimator

我在配置单元格initial_state时遇到问题,因此我可以使用不同的批次大小进行训练和预测。从本质上讲,在训练中,我将喂入固定大小的微型批次,而在预测时,我将一次预测一个输入,然后将其输入模型以获取下一个输出。

但是,我无法创建具有可配置单元格initial_state的第一维的图形。这是一个简单的model_fn来模拟字符输入

def model_fn(features, labels, mode, params):
    inputs = tf.one_hot(features, params["VOCAB_SIZE"], 1.0, 0.0)

    cell = tf.nn.rnn_cell.MultiRNNCell([
        tf.nn.rnn_cell.GRUCell(params["INTERNAL_SIZE"]) for _ in range(params["NUM_LAYERS"])
    ], state_is_tuple=False)

    pkeep = params["DROPOUT_PKEEP"] if mode == tf.estimator.ModeKeys.TRAIN else 1.0
    cell = tf.nn.rnn_cell.DropoutWrapper(cell, input_keep_prob=pkeep)

    initial_state = tf.get_variable(
        "initial_state", 
        dtype=tf.float32, 
        initializer=cell.zero_state(params["BATCH_SIZE"], dtype=tf.float32),
    )

    if mode == tf.estimator.ModeKeys.EVAL:
        initial_state = cell.zero_state(1, dtype=tf.float32)

    outputs, final_state = tf.nn.dynamic_rnn(cell, inputs, initial_state=initial_state)

    if mode != tf.estimator.ModeKeys.EVAL:
        tf.assign(initial_state, final_state)

    logits = ...

    if mode == tf.estimator.ModeKeys.PREDICT:
        logits = tf.reshape(logits, [-1, 1, 98])
    else:
        logits = tf.reshape(logits, [-1, features.shape[1], 98])

    probabilities = tf.nn.softmax(logits)
    predictions = tf.argmax(probabilities, 2)

    if mode == tf.estimator.ModeKeys.PREDICT:
        predictions = { "predictions": predictions, "probabilities": probabilities }
        return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)

    loss = ...

    if mode == tf.estimator.ModeKeys.EVAL:
        accuracy = tf.metrics.accuracy(labels=labels, predictions=predictions)
        return tf.estimator.EstimatorSpec(mode, loss=loss, eval_metric_ops={
            "accuracy": accuracy,
        })

    optimizer = tf.train.AdamOptimizer(learning_rate=params["LEARNING_RATE"])
    train_op = optimizer.minimize(loss, global_step=tf.train.get_global_step())

    return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op)

问题在于,要求我在用于定义BATCH_SIZE的参数中传递initial_state,在训练时该值类似于200。但是在测试时,对于单个批次,则显示错误消息[1, 384] dimensional tensor cannot be assigned to a [200, 384] dimensional variable。如何根据训练模式使initial_state的尺寸可配置?

0 个答案:

没有答案