我已经编写了自己的代码,参考了this精彩的教程,根据我对类AttentionModel的理解,使用光束搜索注意时无法得到结果_build_decoder_cell函数创建单独的解码器单元和注意力推理模式的包装器,假设这个(我认为这是不正确的,无法找到解决方法),
with tf.name_scope("Decoder"):
mem_units = 2*dim
dec_cell = tf.contrib.rnn.BasicLSTMCell( 2*dim )
beam_cel = tf.contrib.rnn.BasicLSTMCell( 2*dim )
beam_width = 3
out_layer = Dense( output_vocab_size )
with tf.name_scope("Training"):
attn_mech = tf.contrib.seq2seq.BahdanauAttention( num_units = mem_units, memory = enc_rnn_out, normalize=True)
attn_cell = tf.contrib.seq2seq.AttentionWrapper( cell = dec_cell,attention_mechanism = attn_mech )
batch_size = tf.shape(enc_rnn_out)[0]
initial_state = attn_cell.zero_state( batch_size = batch_size , dtype=tf.float32 )
initial_state = initial_state.clone(cell_state = enc_rnn_state)
helper = tf.contrib.seq2seq.TrainingHelper( inputs = emb_x_y , sequence_length = seq_len )
decoder = tf.contrib.seq2seq.BasicDecoder( cell = attn_cell, helper = helper, initial_state = initial_state ,output_layer=out_layer )
outputs, final_state, final_sequence_lengths= tf.contrib.seq2seq.dynamic_decode(decoder=decoder,impute_finished=True)
training_logits = tf.identity(outputs.rnn_output )
training_pred = tf.identity(outputs.sample_id )
with tf.name_scope("Inference"):
enc_rnn_out_beam = tf.contrib.seq2seq.tile_batch( enc_rnn_out , beam_width )
seq_len_beam = tf.contrib.seq2seq.tile_batch( seq_len , beam_width )
enc_rnn_state_beam = tf.contrib.seq2seq.tile_batch( enc_rnn_state , beam_width )
batch_size_beam = tf.shape(enc_rnn_out_beam)[0] # now batch size is beam_width times
# start tokens mean be the original batch size so divide
start_tokens = tf.tile(tf.constant([27], dtype=tf.int32), [ batch_size_beam//beam_width ] )
end_token = 0
attn_mech_beam = tf.contrib.seq2seq.BahdanauAttention( num_units = mem_units, memory = enc_rnn_out_beam, normalize=True)
cell_beam = tf.contrib.seq2seq.AttentionWrapper(cell=beam_cel,attention_mechanism=attn_mech_beam,attention_layer_size=mem_units)
initial_state_beam = cell_beam.zero_state(batch_size=batch_size_beam,dtype=tf.float32).clone(cell_state=enc_rnn_state_beam)
my_decoder = tf.contrib.seq2seq.BeamSearchDecoder( cell = cell_beam,
embedding = emb_out,
start_tokens = start_tokens,
end_token = end_token,
initial_state = initial_state_beam,
beam_width = beam_width
,output_layer=out_layer)
beam_output, t1 , t2 = tf.contrib.seq2seq.dynamic_decode( my_decoder,
maximum_iterations=maxlen )
beam_logits = tf.no_op()
beam_sample_id = beam_output.predicted_ids
当我在训练后调用beam _sample_id时,我得不到正确的结果。
我的猜测是我们应该使用相同的注意包装器,但这是不可能的,因为我们必须使用tile_sequence来进行波束搜索。
非常感谢任何见解/建议。
我还在其主存储库Issue-93
中为此创建了一个问题答案 0 :(得分:1)
我不确定你是什么意思"我无法得到结果"但我假设你的模型没有利用训练时学到的知识。
如果是这种情况,那么首先你需要知道关于变量共享的所有内容,你需要做的第一件事就是你摆脱了训练和推断之间的不同变量范围,而你需要使用像
这样的东西删除
with tf.name_scope("Training"):
并使用:
with tf.variable_scope("myScope"):
然后删除
with tf.name_scope("Inference"):
并改为使用
with tf.variable_scope("myScope" , reuse=True):
也在with tf.variable_scope("myScope" )
enc_rnn_out = tf.contrib.seq2seq.tile_batch( enc_rnn_out , 1 )
seq_len = tf.contrib.seq2seq.tile_batch( seq_len , 1 )
enc_rnn_state = tf.contrib.seq2seq.tile_batch( enc_rnn_state , 1 )
这将确保您的推理变量和训练变量具有相同的签名并被共享,
当我按照你提到的相同教程时,我已经测试了这个,我的模型仍在训练,因为我正在撰写这篇文章,但我可以看到准确性随着我们说话而增加,这表明解决方案应该也适合你。
谢谢
答案 1 :(得分:0)
您可以使用tf.cond()
在训练和推理阶段之间创建不同的路径:
def get_tile_batch(enc_output, source_sequence_length, enc_state, useBeamSearch):
enc_output = tf.contrib.seq2seq.tile_batch(enc_output, multiplier=useBeamSearch)
source_sequence_length = tf.contrib.seq2seq.tile_batch(source_sequence_length, multiplier=useBeamSearch)
enc_state = tf.contrib.seq2seq.tile_batch(enc_state, multiplier=useBeamSearch)
return enc_output, source_sequence_length, enc_state
## for beam search: at training stage, use tile_batch multiplier = 1,
## at infer stage, use tile_batch multiplier = useBeamSearch
## tile_batch is just duplicate every sample in a batch,
## so it'll change batch_size to batch_size * useBeamSearch at runtime once batch_size was determined
enc_output, source_sequence_length, enc_state = tf.cond(
self.on_infer, # is inference stage?
lambda: get_tile_batch(enc_output, source_sequence_length, enc_state, useBeamSearch=useBeamSearch),
lambda: get_tile_batch(enc_output, source_sequence_length, enc_state, useBeamSearch=1)
)
# attention mechanism
attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(num_units=rnn_size, memory=enc_output, memory_sequence_length=source_sequence_length)
dec_cell = tf.contrib.seq2seq.AttentionWrapper(dec_cell, attention_mechanism)
## for beam search: change batch_size to batch_size * useBeamSearch at infer stage
decoder_initial_state = tf.cond(
self.on_infer, # is inference stage?
lambda: dec_cell.zero_state(batch_size=batch_size * useBeamSearch, dtype=tf.float32),
lambda: dec_cell.zero_state(batch_size=batch_size * 1, dtype=tf.float32)
)
enc_state = decoder_initial_state.clone(cell_state=enc_state)