I'm currently trying to get into Tensorflow and Recurrent Neural Networks. I modified the "ptb_word_lm" example to have a model for the imdb task. I pushed the modified source code to
https://github.com/MathiasKraus/LSTM_imdb
The parameters, that I'm using are:
num_layers = 2
num_steps = 50
hidden_size = 128
keep_prob = 0.5
batch_size = 32
vocab_size = 10000
The network definition is
self._input_data = tf.placeholder(tf.int32, [batch_size, num_steps])
self._target = tf.placeholder(tf.float32, [batch_size, n_classes])
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(size, forget_bias=0.0, state_is_tuple=True)
cell = tf.nn.rnn_cell.MultiRNNCell([lstm_cell] * config.num_layers, state_is_tuple=True)
self._initial_state = cell.zero_state(batch_size, tf.float32)
with tf.device("/cpu:0"):
embedding = tf.get_variable("embedding", [vocab_size, size], dtype=tf.float32)
inputs = tf.nn.embedding_lookup(embedding, self._input_data)
output, state = tf.nn.dynamic_rnn(cell, inputs, initial_state=self._initial_state)
output = tf.transpose(output, [1, 0, 2])
last = tf.gather(output, int(output.get_shape()[0]) - 1)
softmax_w = tf.get_variable("softmax_w", [size, n_classes], dtype=tf.float32)
softmax_b = tf.get_variable("softmax_b", [n_classes], dtype=tf.float32)
logits = tf.matmul(last, softmax_w) + softmax_b
self._cost = cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, self._target))
self._train_op = tf.train.AdamOptimizer().minimize(cost)
The training part looks good, however the model is overfitting after a few epochs. After filtering out very short and very long samples (I don't use padding yet), I have ~11k samples for training.
Do you have an idea, why this model is overfitting so quickly? Or do you have any suggestions for parameter settings?
I would appreciate any help.
Edit: To load the sequences, I use http://deeplearning.net/tutorial/code/imdb.py. After loading the data I filter out sequences which are shorter than the number of steps that I want to process in the LSTM. For that reason, I don't need to pad zeros. I also cut longer sequences to the specified size.
答案 0 :(得分:1)
如果有人想使用LSTM和TensorFlow尝试imdb任务:
我根据https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py更改了参数。通过这些参数,该模型的测试精度达到约82%。我还将代码推送到github。
感谢您的帮助!