我试图理解Tensorflow中的LSTM,我正在使用tf.nn.bidirectional_dynamic_rnn
进行简单分类,它返回两个东西,一个是每个单元格的final_result,第二个是最后一个单元格的隐藏状态,现在我的困惑是我正在为下一个完全连接的层采取最终输出,然后它需要花费太多时间和迭代(甚至10000次迭代不足以减少损失),而如果我正在为下一层采取最终状态输出那么它的给定仅在500次迭代中获得良好结果:
我的分类数据是:
vocab_ = {'\xa0': 60, 'S': 26, 'W': 30, 'É': 62, 'Á': 61, 'ò': 75, 'ê': 71, 'õ': 77, 'ñ': 74, 'J': 17, 'o': 48, ',': 3, "'": 2, 'g': 40, 'Q': 24, 'ż': 87, 'B': 9, 'ç': 68, 'O': 22, 'N': 21, 'D': 11, 'd': 37, 'x': 57, 'q': 50, 'L': 19, 'z': 59, 'U': 28, 'F': 13, 'w': 56, 't': 53, 'h': 41, 'j': 43, '1': 6, 'r': 51, 'e': 38, 'K': 18, 'k': 44, 'ú': 80, 'a': 34, 'ü': 81, 'é': 70, 'I': 16, 'Y': 32, 'ì': 72, 'ó': 76, 'A': 8, 'c': 36, 'E': 12, 'i': 42, 'G': 14, 'à': 64, 'y': 58, 'V': 29, 'C': 10, 'X': 31, 'ä': 67, '0': 0, 'b': 35, 's': 52, '/': 5, 'n': 47, 'p': 49, 'ö': 78, 'ą': 82, ' ': 1, 'Ż': 86, 'l': 45, 'á': 65, 'ù': 79, ':': 7, 'u': 54, 'Z': 33, 'è': 69, 'Ś': 85, 'm': 46, '-': 4, 'ł': 83, 'T': 27, 'P': 23, 'ń': 84, 'R': 25, 'í': 73, 'ã': 66, 'ß': 63, 'v': 55, 'M': 20, 'H': 15, 'f': 39}
sequences=[[18, 41, 48, 54, 51, 58, 0, 0],[18, 41, 48, 54, 51, 58, 0, 0], [21, 34, 41, 34, 52, 0, 0, 0], [11, 34, 41, 38, 51, 0, 0, 0], [14, 38, 51, 40, 38, 52, 0, 0], [21, 34, 59, 34, 51, 42, 0, 0], [20, 34, 34, 45, 48, 54, 39, 0], [14, 38, 51, 40, 38, 52, 0, 0], [21, 34, 42, 39, 38, 41, 0, 0], [14, 54, 42, 51, 40, 54, 42, 52], [9, 34, 35, 34, 0, 0, 0, 0], [26, 34, 35, 35, 34, 40, 41, 0], [8, 53, 53, 42, 34, 0, 0, 0], [27, 34, 41, 34, 47, 0, 0, 0], [15, 34, 37, 37, 34, 37, 0, 0], [8, 52, 56, 34, 37, 0, 0, 0], [21, 34, 43, 43, 34, 51, 0, 0], [11, 34, 40, 41, 38, 51, 0, 0], [20, 34, 45, 48, 48, 39, 0, 0], [16, 52, 34, 0, 0, 0, 0, 0], [8, 52, 40, 41, 34, 51, 0, 0], [21, 34, 37, 38, 51, 0, 0, 0], [14, 34, 35, 38, 51, 0, 0, 0], [8, 35, 35, 48, 54, 37, 0, 0], [20, 34, 34, 45, 48, 54, 39, 0], [33, 48, 40, 35, 58, 0, 0, 0], [26, 51, 48, 54, 51, 0, 0, 0], [9, 34, 41, 34, 51, 0, 0, 0], [20, 54, 52, 53, 34, 39, 34, 0], [15, 34, 47, 34, 47, 42, 34, 0], [11, 34, 41, 38, 51, 0, 0, 0], [27, 54, 46, 34, 0, 0, 0, 0], [21, 34, 41, 34, 52, 0, 0, 0], [26, 34, 45, 42, 35, 34, 0, 0], [26, 41, 34, 46, 48, 48, 47, 0]]
labels_x = [9, 0, 12, 4, 8, 12, 6, 1, 6, 7, 11, 14, 8, 4, 0, 5, 7, 12, 2, 5, 3, 9, 14, 1, 10, 12, 12, 14, 2, 2, 12, 13, 0, 2, 11]
首先,如果我采用最终输出而不是状态输出,那么它采取更多迭代并且结果不好这里是代码:
import tensorflow as tf
import numpy as np
from tensorflow.contrib import rnn
epoch=2
tf.reset_default_graph()
input_x = tf.placeholder(tf.int32,shape=[None,None])
output_y = tf.placeholder(tf.int32,shape=[None,])
word_embedding = tf.get_variable('embedding',shape=[len(vocab_),250],dtype=tf.float32,initializer=tf.random_uniform_initializer(-0.01,0.01))
sequence_len= tf.count_nonzero(input_x,axis=-1)
with tf.variable_scope('encoder') as scope:
output,state_output=tf.nn.bidirectional_dynamic_rnn(tf.nn.rnn_cell.LSTMCell(150),tf.nn.rnn_cell.LSTMCell(150),inputs=tf.nn.embedding_lookup(word_embedding,input_x),sequence_length=sequence_len,dtype=tf.float32)
transpose_w=tf.transpose(output[0],[1,0,2])
transpose_r=tf.transpose(output[1],[1,0,2])
final_output= tf.concat([transpose_r[-1],transpose_w[-1]],axis=-1)
weights=tf.get_variable('weights',shape=[2*150,len(labels_x)],dtype=tf.float32,initializer=tf.random_uniform_initializer(-0.01,0.01))
bias = tf.get_variable('bias',shape=[len(labels_x)],dtype=tf.float32,initializer=tf.random_uniform_initializer(-0.01,0.01))
final_result = tf.matmul(final_output,weights) + bias
#normalization
prob=tf.nn.softmax(final_result)
pred=tf.argmax(prob,axis=-1)
#cross entropy
ce=tf.nn.sparse_softmax_cross_entropy_with_logits(logits=final_result,labels=output_y)
loss=tf.reduce_mean(ce)
#evaluate
acc=tf.reduce_mean(tf.cast((tf.equal(tf.cast(pred,tf.int32),output_y)),tf.float32))
#train
train=tf.train.AdamOptimizer().minimize(loss)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(epoch):
for j in range(200):
first,second,third,forth,fifth,_=sess.run([loss,prob,pred,final_result,acc,train],feed_dict={input_x:sequences,output_y:labels_x})
print("Iteration {}th epoch {}th loss {} accuracy {} ".format(j,i,first,fifth))
输出:
Iteration 0th epoch 0th loss 3.558173179626465 accuracy 0.02857142873108387
Iteration 1th epoch 0th loss 3.556957960128784 accuracy 0.02857142873108387
Iteration 2th epoch 0th loss 3.5557243824005127 accuracy 0.05714285746216774
.
.
.
Iteration 197th epoch 1th loss 3.102834939956665 accuracy 0.20000000298023224
Iteration 198th epoch 1th loss 3.1021459102630615 accuracy 0.20000000298023224
Iteration 199th epoch 1th loss 3.101456880569458 accuracy 0.20000000298023224
Process finished with exit code 0
正如你在400次迭代后看到的那样,结果不仅仅是0.20准确度,现在如果我采用隐藏状态输出而不是最终输出:
然后代码是:
import tensorflow as tf
import numpy as np
from tensorflow.contrib import rnn
epoch=2
tf.reset_default_graph()
input_x = tf.placeholder(tf.int32,shape=[None,None])
output_y = tf.placeholder(tf.int32,shape=[None,])
word_embedding = tf.get_variable('embedding',shape=[len(vocab_),250],dtype=tf.float32,initializer=tf.random_uniform_initializer(-0.01,0.01))
sequence_len= tf.count_nonzero(input_x,axis=-1)
with tf.variable_scope('encoder') as scope:
output,state_output=tf.nn.bidirectional_dynamic_rnn(tf.nn.rnn_cell.LSTMCell(150),tf.nn.rnn_cell.LSTMCell(150),inputs=tf.nn.embedding_lookup(word_embedding,input_x),sequence_length=sequence_len,dtype=tf.float32)
transpose_w=tf.transpose(output[0],[1,0,2])
transpose_r=tf.transpose(output[1],[1,0,2])
state_out = tf.concat([state_output[0].c,state_output[1].c],axis=-1)
weights=tf.get_variable('weights',shape=[2*150,len(labels_x)],dtype=tf.float32,initializer=tf.random_uniform_initializer(-0.01,0.01))
bias = tf.get_variable('bias',shape=[len(labels_x)],dtype=tf.float32,initializer=tf.random_uniform_initializer(-0.01,0.01))
final_result = tf.matmul(state_out,weights) + bias
#normalization
prob=tf.nn.softmax(final_result)
pred=tf.argmax(prob,axis=-1)
#cross entropy
ce=tf.nn.sparse_softmax_cross_entropy_with_logits(logits=final_result,labels=output_y)
loss=tf.reduce_mean(ce)
#evaluate
acc=tf.reduce_mean(tf.cast((tf.equal(tf.cast(pred,tf.int32),output_y)),tf.float32))
#train
train=tf.train.AdamOptimizer().minimize(loss)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(epoch):
for j in range(200):
first,second,third,forth,fifth,_=sess.run([loss,prob,pred,final_result,acc,train],feed_dict={input_x:sequences,output_y:labels_x})
print("Iteration {}th epoch {}th loss {} accuracy {} ".format(j,i,first,fifth))
,输出
Iteration 0th epoch 0th loss 3.557037830352783 accuracy 0.0
Iteration 1th epoch 0th loss 3.553581476211548 accuracy 0.11428571492433548
Iteration 2th epoch 0th loss 3.549212694168091 accuracy 0.17142857611179352
Iteration 3th epoch 0th loss 3.5429491996765137 accuracy 0.2857142984867096
.
.
.
.
.
Iteration 197th epoch 1th loss 0.19866235554218292 accuracy 0.8571428656578064
Iteration 198th epoch 1th loss 0.19868074357509613 accuracy 0.8571428656578064
Iteration 199th epoch 1th loss 0.19868910312652588 accuracy 0.8571428656578064
Process finished with exit code 0
正如您所看到的,它在相同的迭代下提供了良好的准确性,但是如果您查看不同的github LSTM分类代码或任何教程,您会发现每个人都在进行最终输出而不是最后一个状态输出,我在做任何错误的同时采取最终结果,这就是为什么我没有得到好结果?请指导我,
提前致谢。
答案 0 :(得分:0)
这不是一个完整的答案,但在这里我可以指出一些可能对你有用的观点,
我正在使用tf.nn.bidirectional_dynamic_rnn进行简单分类 返回两件事,一件是每个细胞的最终结果 第二个是最后一个单元格的隐藏状态
这是对的。但是当您使用LSTM时,根据documentation,
tf.nn.bidirectional_dynamic_rnn
的输出为pair (outputs, state)
,其中state
为LSTMStateTuple
,其hidden state
和cell state
包含sequence_length
给出的最后一个单元格example
中每个batch
的参数。
鉴于您要对序列(而不是每个单词)进行分类,lstm的最后一个状态包含所有先前状态的直觉和最后状态(根据sequence length
)输出。因此,仅使用单元格状态为ok
,因为从这里您将获得序列的先前状态的所有直觉。这就是它运作良好的原因。
此处,pair (outputs, state)
,output
包含单元格的所有输出。请记住,您使用0
填充每个序列,以使序列的大小相同。如果t^th
大于特定示例的t
,则sequence length
单元格的输出为空,但如果t
,则将单元格状态复制到前一单元格中的下一个单元格}大于sequence length
。
现在,如果您使用LSTM的output
,您将获取所有单元格的输出,包括应丢弃的填充零单元格。 这可能会产生问题。