如何在编码器/解码器配置中使用Tensorflow 2.0堆叠2个LSTM层

时间:2019-10-24 16:09:10

标签: keras lstm tensorflow2.0 encoder-decoder

我有一个问题,对于使用tensorflow 2.0的技术人员来说可能会非常简单

我正在遵循本指南-> https://machinetalk.org/2019/03/29/neural-machine-translation-with-attention-mechanism,并且一切工作都很好,我并没有真正理解采用这种结构的原因,而不是一种简单的功能性API,但考虑到它正在处理文本, ,我认为这是有原因的。

因此,我尝试在现有的LSTM层之上堆叠另一个LSTM层,首先是在编码器端,以查看会发生什么,我不太了解LSTM“状态”的使用,所以我只是假设了它的工作,写下了我的新版本的编码器,问题是我收到一个奇怪的错误,并且由于每个指南都讲述了如何写东西,所以我不知道如何解决我的代码,但是他们忘记了为什么要这样做。

所以我写了:

class Encoder2(tf.keras.Model):
    def __init__(self, vocab_size, embedding_size, lstm_size):
        super(Encoder2, self).__init__()
        self.lstm_size = lstm_size
        self.lstm_size2 = (int)(lstm_size)
        self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_size)
        self.lstm = tf.keras.layers.LSTM(lstm_size, return_sequences=True, return_state=True)
        self.lstm2 = tf.keras.layers.LSTM(self.lstm_size2, return_sequences = True, return_state = True)

    def call(self, sequence, states1, states2):
        embed = self.embedding(sequence)
        output = self.lstm(embed, initial_state=states1)
        output, state_h, state_c = self.lstm2(output, initial_state=states2)

        return output, state_h, state_c

    def init_states(self, batch_size):
        return ([tf.zeros([batch_size, self.lstm_size]),
                tf.zeros([batch_size, self.lstm_size])],
                [tf.zeros([batch_size, self.lstm_size2]),
                tf.zeros([batch_size, self.lstm_size2])]) 

对我来说,在LSTM层上使用“状态”不是很清楚,也许我在构建这两层时犯了一些错误。无论如何,我只用数据集的一个迭代就尝试了代码,看看会发生什么,像这样:


encoder = Encoder2(in_vocab_size, EMBEDDING_SIZE, LSTM_SIZE)
decoder = Decoder(out_vocab_size, EMBEDDING_SIZE, LSTM_SIZE)  

en_initial_states = encoder.init_states(BATCH_SIZE)
for batch, (source_seq, target_seq_in, target_seq_out) in enumerate(training_set.take(-1)):
    if(len(source_seq)== BATCH_SIZE):
      train_step(source_seq, target_seq_in,
                          target_seq_out, en_initial_states)
      break
encoder.summary()
@tf.function
def train_step(source_seq, target_seq_in, target_seq_out, en_initial_states):
    loss = 0
    with tf.GradientTape() as tape:
        en_outputs = encoder(source_seq, en_initial_states[0], en_initial_states[1])
        #en_outputs = encoder(source_seq)
        en_states = en_outputs[1:]
        de_state_h, de_state_c = en_states

        # We need to create a loop to iterate through the target sequences
        for i in range(target_seq_out.shape[1]):
            # Input to the decoder must have shape of (batch_size, length)
            # so we need to expand one dimension
            decoder_in = tf.expand_dims(target_seq_in[:, i], 1)
            logit, de_state_h, de_state_c, _ = decoder(
                decoder_in, (de_state_h, de_state_c), en_outputs[0])

            # The loss is now accumulated through the whole batch
            loss += loss_func(target_seq_out[:, i], logit)

    variables = encoder.trainable_variables + decoder.trainable_variables
    gradients = tape.gradient(loss, variables)
    optimizer.apply_gradients(zip(gradients, variables))

    return loss / target_seq_out.shape[1]

我得到这个错误:

AssertionError                            Traceback (most recent call last)
<ipython-input-84-8e6b52715d0e> in <module>()
     13     if(len(source_seq)== BATCH_SIZE):
     14       train_step(source_seq, target_seq_in,
---> 15                           target_seq_out, en_initial_states)
     16       break
     17 encoder.summary()

8 frames
/tensorflow-2.0.0/python3.6/tensorflow_core/python/framework/func_graph.py in wrapper(*args, **kwargs)
    903           except Exception as e:  # pylint:disable=broad-except
    904             if hasattr(e, "ag_error_metadata"):
--> 905               raise e.ag_error_metadata.to_exception(e)
    906             else:
    907               raise

AssertionError: in converted code:

    <ipython-input-67-5f4e7c851f23>:5 train_step  *
        en_outputs = encoder(source_seq, en_initial_states[0], en_initial_states[1])
    /tensorflow-2.0.0/python3.6/tensorflow_core/python/keras/engine/base_layer.py:847 __call__
        outputs = call_fn(cast_inputs, *args, **kwargs)
    <ipython-input-66-0a370f7cebcd>:13 call  *
        output, state_h, state_c = self.lstm2(output, initial_state=states2)
    /tensorflow-2.0.0/python3.6/tensorflow_core/python/keras/layers/recurrent.py:620 __call__
        self._num_constants)
    /tensorflow-2.0.0/python3.6/tensorflow_core/python/keras/layers/recurrent.py:2716 _standardize_args
        assert initial_state is None and constants is None

    AssertionError: 

我的问题是:

1)为什么在指南中使用该结构?在其中包含编码器和解码器的代码的单一模型难道不是很简单吗?

2)LSTM层中状态的用途是什么?

3)我的代码有什么问题?

非常感谢您抽出宝贵的时间

0 个答案:

没有答案