我在为TensorFlow中的文本分类构建堆叠LSTM模型时迷失了方法。
我的输入数据类似于:
x_train = [[1.,1.,1.],[2.,2.,2.],[3.,3.,3.],...,[0.,0.,0.],[0.,0.,0.],
...... #I trained the network in batch with batch size set to 32.
]
y_train = [[1.,0.],[1.,0.],[0.,1.],...,[1.,0.],[0.,1.]]
# binary classification
我的代码的骨架如下:
self._input = tf.placeholder(tf.float32, [self.batch_size, self.max_seq_length, self.vocab_dim], name='input')
self._target = tf.placeholder(tf.float32, [self.batch_size, 2], name='target')
lstm_cell = rnn_cell.BasicLSTMCell(self.vocab_dim, forget_bias=1.)
lstm_cell = rnn_cell.DropoutWrapper(lstm_cell, output_keep_prob=self.dropout_ratio)
self.cells = rnn_cell.MultiRNNCell([lstm_cell] * self.num_layers)
self._initial_state = self.cells.zero_state(self.batch_size, tf.float32)
inputs = tf.nn.dropout(self._input, self.dropout_ratio)
inputs = [tf.reshape(input_, (self.batch_size, self.vocab_dim)) for input_ in
tf.split(1, self.max_seq_length, inputs)]
outputs, states = rnn.rnn(self.cells, inputs, initial_state=self._initial_state)
# We only care about the output of the last RNN cell...
y_pred = tf.nn.xw_plus_b(outputs[-1], tf.get_variable("softmax_w", [self.vocab_dim, 2]), tf.get_variable("softmax_b", [2]))
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_pred, self._target))
correct_pred = tf.equal(tf.argmax(y_pred, 1), tf.argmax(self._target, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
train_op = tf.train.AdamOptimizer(self.lr).minimize(loss)
init = tf.initialize_all_variables()
with tf.Session() as sess:
initializer = tf.random_uniform_initializer(-0.04, 0.04)
with tf.variable_scope("model", reuse=True, initializer=initializer):
sess.run(init)
# generate batches here (omitted for clarity)
print sess.run([train_op, loss, accuracy], feed_dict={self._input: batch_x, self._target: batch_y})
问题在于,无论数据集有多大,损失和准确性都没有改善的迹象(看起来完全随机)。我做错了吗?
更新
# First, load Word2Vec model in Gensim.
model = Doc2Vec.load(word2vec_path)
# Second, build the dictionary.
gensim_dict = Dictionary()
gensim_dict.doc2bow(model.vocab.keys(), allow_update=True)
w2indx = {v: k + 1 for k, v in gensim_dict.items()}
w2vec = {word: model[word] for word in w2indx.keys()}
# Third, read data from a text file.
for fname in fnames:
i = 0
with codecs.open(fname, 'r', encoding='utf8') as fr:
for line in fr:
tmp = []
for t in line.split():
tmp.append(t)
X_train.append(tmp)
i += 1
if i is samples_count:
break
# Fourth, convert words into vectors, and pad each sentence with ZERO arrays to a fixed length.
result = np.zeros((len(data), self.max_seq_length, self.vocab_dim), dtype=np.float32)
for rowNo in xrange(len(data)):
rowLen = len(data[rowNo])
for colNo in xrange(rowLen):
word = data[rowNo][colNo]
if word in w2vec:
result[rowNo][colNo] = w2vec[word]
else:
result[rowNo][colNo] = [0] * self.vocab_dim
for colPadding in xrange(rowLen, self.max_seq_length):
result[rowNo][colPadding] = [0] * self.vocab_dim
return result
# Fifth, generate batches and feed them to the model.
... Trivias ...
答案 0 :(得分:1)
以下是可能没有培训的几个原因和尝试的建议:
您不允许更新单词向量,预先学习的向量空间可能无法正常工作。
RNN在训练时确实需要渐变剪裁。您可以尝试添加this。
单位比例初始化似乎效果更好,因为它考虑了图层的大小,并允许渐变在更深的时候正确缩放。
你应该尝试删除dropout和第二层 - 只是为了检查你的数据传递是否正确以及你的损失是否正在下降。
我还建议您在数据中使用此示例:https://github.com/tensorflow/skflow/blob/master/examples/text_classification.py
它从头开始训练单词向量,已经有渐变剪辑并使用通常更容易训练的GRUCell。您还可以通过运行self.navigationItem.setHidesBackButton(true, animated:false);
来查看损失和其他内容的良好可视化。