TensorFlow LSTM for imdb task is strongly overfitting

时间:2016-08-31 18:32:17

标签: tensorflow recurrent-neural-network lstm imdb

I'm currently trying to get into Tensorflow and Recurrent Neural Networks. I modified the "ptb_word_lm" example to have a model for the imdb task. I pushed the modified source code to

https://github.com/MathiasKraus/LSTM_imdb

The parameters, that I'm using are:

num_layers = 2
num_steps = 50
hidden_size = 128
keep_prob = 0.5
batch_size = 32
vocab_size = 10000

The network definition is

    self._input_data = tf.placeholder(tf.int32, [batch_size, num_steps])
    self._target = tf.placeholder(tf.float32, [batch_size, n_classes])

    lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(size, forget_bias=0.0, state_is_tuple=True)
    cell = tf.nn.rnn_cell.MultiRNNCell([lstm_cell] * config.num_layers, state_is_tuple=True)

    self._initial_state = cell.zero_state(batch_size, tf.float32)

    with tf.device("/cpu:0"):
        embedding = tf.get_variable("embedding", [vocab_size, size], dtype=tf.float32)
        inputs = tf.nn.embedding_lookup(embedding, self._input_data)

    output, state = tf.nn.dynamic_rnn(cell, inputs, initial_state=self._initial_state)

    output = tf.transpose(output, [1, 0, 2])
    last = tf.gather(output, int(output.get_shape()[0]) - 1)
    softmax_w = tf.get_variable("softmax_w", [size, n_classes], dtype=tf.float32)
    softmax_b = tf.get_variable("softmax_b", [n_classes], dtype=tf.float32)
    logits = tf.matmul(last, softmax_w) + softmax_b

    self._cost = cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, self._target))
    self._train_op = tf.train.AdamOptimizer().minimize(cost)

The training part looks good, however the model is overfitting after a few epochs. After filtering out very short and very long samples (I don't use padding yet), I have ~11k samples for training.

Do you have an idea, why this model is overfitting so quickly? Or do you have any suggestions for parameter settings?

I would appreciate any help.

Edit: To load the sequences, I use http://deeplearning.net/tutorial/code/imdb.py. After loading the data I filter out sequences which are shorter than the number of steps that I want to process in the LSTM. For that reason, I don't need to pad zeros. I also cut longer sequences to the specified size.

1 个答案:

答案 0 :(得分:1)

如果有人想使用LSTM和TensorFlow尝试imdb任务:

我根据https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py更改了参数。通过这些参数,该模型的测试精度达到约82%。我还将代码推送到github。

感谢您的帮助!