Question

我试图理解Tensorflow中的LSTM，我正在使用tf.nn.bidirectional_dynamic_rnn进行简单分类，它返回两个东西，一个是每个单元格的final_result，第二个是最后一个单元格的隐藏状态，现在我的困惑是我正在为下一个完全连接的层采取最终输出，然后它需要花费太多时间和迭代（甚至10000次迭代不足以减少损失），而如果我正在为下一层采取最终状态输出那么它的给定仅在500次迭代中获得良好结果：

我的分类数据是：

vocab_ = {'\xa0': 60, 'S': 26, 'W': 30, 'É': 62, 'Á': 61, 'ò': 75, 'ê': 71, 'õ': 77, 'ñ': 74, 'J': 17, 'o': 48, ',': 3, "'": 2, 'g': 40, 'Q': 24, 'ż': 87, 'B': 9, 'ç': 68, 'O': 22, 'N': 21, 'D': 11, 'd': 37, 'x': 57, 'q': 50, 'L': 19, 'z': 59, 'U': 28, 'F': 13, 'w': 56, 't': 53, 'h': 41, 'j': 43, '1': 6, 'r': 51, 'e': 38, 'K': 18, 'k': 44, 'ú': 80, 'a': 34, 'ü': 81, 'é': 70, 'I': 16, 'Y': 32, 'ì': 72, 'ó': 76, 'A': 8, 'c': 36, 'E': 12, 'i': 42, 'G': 14, 'à': 64, 'y': 58, 'V': 29, 'C': 10, 'X': 31, 'ä': 67, '0': 0, 'b': 35, 's': 52, '/': 5, 'n': 47, 'p': 49, 'ö': 78, 'ą': 82, ' ': 1, 'Ż': 86, 'l': 45, 'á': 65, 'ù': 79, ':': 7, 'u': 54, 'Z': 33, 'è': 69, 'Ś': 85, 'm': 46, '-': 4, 'ł': 83, 'T': 27, 'P': 23, 'ń': 84, 'R': 25, 'í': 73, 'ã': 66, 'ß': 63, 'v': 55, 'M': 20, 'H': 15, 'f': 39}


sequences=[[18, 41, 48, 54, 51, 58, 0, 0],[18, 41, 48, 54, 51, 58, 0, 0], [21, 34, 41, 34, 52, 0, 0, 0], [11, 34, 41, 38, 51, 0, 0, 0], [14, 38, 51, 40, 38, 52, 0, 0], [21, 34, 59, 34, 51, 42, 0, 0], [20, 34, 34, 45, 48, 54, 39, 0], [14, 38, 51, 40, 38, 52, 0, 0], [21, 34, 42, 39, 38, 41, 0, 0], [14, 54, 42, 51, 40, 54, 42, 52], [9, 34, 35, 34, 0, 0, 0, 0], [26, 34, 35, 35, 34, 40, 41, 0], [8, 53, 53, 42, 34, 0, 0, 0], [27, 34, 41, 34, 47, 0, 0, 0], [15, 34, 37, 37, 34, 37, 0, 0], [8, 52, 56, 34, 37, 0, 0, 0], [21, 34, 43, 43, 34, 51, 0, 0], [11, 34, 40, 41, 38, 51, 0, 0], [20, 34, 45, 48, 48, 39, 0, 0], [16, 52, 34, 0, 0, 0, 0, 0], [8, 52, 40, 41, 34, 51, 0, 0], [21, 34, 37, 38, 51, 0, 0, 0], [14, 34, 35, 38, 51, 0, 0, 0], [8, 35, 35, 48, 54, 37, 0, 0], [20, 34, 34, 45, 48, 54, 39, 0], [33, 48, 40, 35, 58, 0, 0, 0], [26, 51, 48, 54, 51, 0, 0, 0], [9, 34, 41, 34, 51, 0, 0, 0], [20, 54, 52, 53, 34, 39, 34, 0], [15, 34, 47, 34, 47, 42, 34, 0], [11, 34, 41, 38, 51, 0, 0, 0], [27, 54, 46, 34, 0, 0, 0, 0], [21, 34, 41, 34, 52, 0, 0, 0], [26, 34, 45, 42, 35, 34, 0, 0], [26, 41, 34, 46, 48, 48, 47, 0]]


labels_x = [9, 0, 12, 4, 8, 12, 6, 1, 6, 7, 11, 14, 8, 4, 0, 5, 7, 12, 2, 5, 3, 9, 14, 1, 10, 12, 12, 14, 2, 2, 12, 13, 0, 2, 11]

首先，如果我采用最终输出而不是状态输出，那么它采取更多迭代并且结果不好这里是代码：

import tensorflow as tf
import numpy as np
from tensorflow.contrib import rnn

epoch=2

tf.reset_default_graph()

input_x = tf.placeholder(tf.int32,shape=[None,None])

output_y = tf.placeholder(tf.int32,shape=[None,])

word_embedding = tf.get_variable('embedding',shape=[len(vocab_),250],dtype=tf.float32,initializer=tf.random_uniform_initializer(-0.01,0.01))

sequence_len= tf.count_nonzero(input_x,axis=-1)

with tf.variable_scope('encoder') as scope:

    output,state_output=tf.nn.bidirectional_dynamic_rnn(tf.nn.rnn_cell.LSTMCell(150),tf.nn.rnn_cell.LSTMCell(150),inputs=tf.nn.embedding_lookup(word_embedding,input_x),sequence_length=sequence_len,dtype=tf.float32)


transpose_w=tf.transpose(output[0],[1,0,2])
transpose_r=tf.transpose(output[1],[1,0,2])

final_output= tf.concat([transpose_r[-1],transpose_w[-1]],axis=-1)

weights=tf.get_variable('weights',shape=[2*150,len(labels_x)],dtype=tf.float32,initializer=tf.random_uniform_initializer(-0.01,0.01))

bias = tf.get_variable('bias',shape=[len(labels_x)],dtype=tf.float32,initializer=tf.random_uniform_initializer(-0.01,0.01))

final_result = tf.matmul(final_output,weights) + bias

#normalization
prob=tf.nn.softmax(final_result)
pred=tf.argmax(prob,axis=-1)

#cross entropy
ce=tf.nn.sparse_softmax_cross_entropy_with_logits(logits=final_result,labels=output_y)
loss=tf.reduce_mean(ce)

#evaluate
acc=tf.reduce_mean(tf.cast((tf.equal(tf.cast(pred,tf.int32),output_y)),tf.float32))

#train
train=tf.train.AdamOptimizer().minimize(loss)

with tf.Session() as sess:

    sess.run(tf.global_variables_initializer())

    for i in range(epoch):
        for j in range(200):
            first,second,third,forth,fifth,_=sess.run([loss,prob,pred,final_result,acc,train],feed_dict={input_x:sequences,output_y:labels_x})



            print("Iteration {}th epoch  {}th loss {}  accuracy {} ".format(j,i,first,fifth))

输出：

Iteration 0th epoch  0th loss 3.558173179626465  accuracy 0.02857142873108387 
Iteration 1th epoch  0th loss 3.556957960128784  accuracy 0.02857142873108387 
Iteration 2th epoch  0th loss 3.5557243824005127  accuracy 0.05714285746216774 

.
.
.
Iteration 197th epoch  1th loss 3.102834939956665  accuracy 0.20000000298023224 
Iteration 198th epoch  1th loss 3.1021459102630615  accuracy 0.20000000298023224 
Iteration 199th epoch  1th loss 3.101456880569458  accuracy 0.20000000298023224 

Process finished with exit code 0

正如你在400次迭代后看到的那样，结果不仅仅是0.20准确度，现在如果我采用隐藏状态输出而不是最终输出：

然后代码是：

import tensorflow as tf
import numpy as np
from tensorflow.contrib import rnn

epoch=2

tf.reset_default_graph()

input_x = tf.placeholder(tf.int32,shape=[None,None])
output_y = tf.placeholder(tf.int32,shape=[None,])

word_embedding = tf.get_variable('embedding',shape=[len(vocab_),250],dtype=tf.float32,initializer=tf.random_uniform_initializer(-0.01,0.01))

sequence_len= tf.count_nonzero(input_x,axis=-1)

with tf.variable_scope('encoder') as scope:
    output,state_output=tf.nn.bidirectional_dynamic_rnn(tf.nn.rnn_cell.LSTMCell(150),tf.nn.rnn_cell.LSTMCell(150),inputs=tf.nn.embedding_lookup(word_embedding,input_x),sequence_length=sequence_len,dtype=tf.float32)


transpose_w=tf.transpose(output[0],[1,0,2])
transpose_r=tf.transpose(output[1],[1,0,2])

state_out = tf.concat([state_output[0].c,state_output[1].c],axis=-1)
weights=tf.get_variable('weights',shape=[2*150,len(labels_x)],dtype=tf.float32,initializer=tf.random_uniform_initializer(-0.01,0.01))

bias = tf.get_variable('bias',shape=[len(labels_x)],dtype=tf.float32,initializer=tf.random_uniform_initializer(-0.01,0.01))

final_result = tf.matmul(state_out,weights) + bias

#normalization
prob=tf.nn.softmax(final_result)
pred=tf.argmax(prob,axis=-1)

#cross entropy
ce=tf.nn.sparse_softmax_cross_entropy_with_logits(logits=final_result,labels=output_y)
loss=tf.reduce_mean(ce)


#evaluate
acc=tf.reduce_mean(tf.cast((tf.equal(tf.cast(pred,tf.int32),output_y)),tf.float32))

#train
train=tf.train.AdamOptimizer().minimize(loss)


with tf.Session() as sess:

    sess.run(tf.global_variables_initializer())

    for i in range(epoch):
        for j in range(200):
            first,second,third,forth,fifth,_=sess.run([loss,prob,pred,final_result,acc,train],feed_dict={input_x:sequences,output_y:labels_x})
            print("Iteration {}th epoch  {}th loss {}  accuracy {} ".format(j,i,first,fifth))

，输出

Iteration 0th epoch  0th loss 3.557037830352783  accuracy 0.0 
Iteration 1th epoch  0th loss 3.553581476211548  accuracy 0.11428571492433548 
Iteration 2th epoch  0th loss 3.549212694168091  accuracy 0.17142857611179352 
Iteration 3th epoch  0th loss 3.5429491996765137  accuracy 0.2857142984867096 
.
.
.
.
.
Iteration 197th epoch  1th loss 0.19866235554218292  accuracy 0.8571428656578064 
Iteration 198th epoch  1th loss 0.19868074357509613  accuracy 0.8571428656578064 
Iteration 199th epoch  1th loss 0.19868910312652588  accuracy 0.8571428656578064 

Process finished with exit code 0

正如您所看到的，它在相同的迭代下提供了良好的准确性，但是如果您查看不同的github LSTM分类代码或任何教程，您会发现每个人都在进行最终输出而不是最后一个状态输出，我在做任何错误的同时采取最终结果，这就是为什么我没有得到好结果？请指导我，

提前致谢。

Answer 1

这不是一个完整的答案，但在这里我可以指出一些可能对你有用的观点，

我正在使用tf.nn.bidirectional_dynamic_rnn进行简单分类返回两件事，一件是每个细胞的最终结果第二个是最后一个单元格的隐藏状态

这是对的。但是当您使用LSTM时，根据documentation， tf.nn.bidirectional_dynamic_rnn的输出为pair (outputs, state)，其中state为LSTMStateTuple，其hidden state和cell state包含sequence_length给出的最后一个单元格example中每个batch的参数。

鉴于您要对序列（而不是每个单词）进行分类，lstm的最后一个状态包含所有先前状态的直觉和最后状态（根据sequence length）输出。因此，仅使用单元格状态为ok，因为从这里您将获得序列的先前状态的所有直觉。这就是它运作良好的原因。

此处，pair (outputs, state)，output包含单元格的所有输出。请记住，您使用0填充每个序列，以使序列的大小相同。如果t^th大于特定示例的t，则sequence length单元格的输出为空，但如果t，则将单元格状态复制到前一单元格中的下一个单元格}大于sequence length。

现在，如果您使用LSTM的output，您将获取所有单元格的输出，包括应丢弃的填充零单元格。 这可能会产生问题。

LSTM Tensorflow中最终单元状态和RNN输出之间的差异？

1 个答案: