Question

我有一个问题，对于使用tensorflow 2.0的技术人员来说可能会非常简单

我正在遵循本指南-> https://machinetalk.org/2019/03/29/neural-machine-translation-with-attention-mechanism，并且一切工作都很好，我并没有真正理解采用这种结构的原因，而不是一种简单的功能性API，但考虑到它正在处理文本，，我认为这是有原因的。

因此，我尝试在现有的LSTM层之上堆叠另一个LSTM层，首先是在编码器端，以查看会发生什么，我不太了解LSTM“状态”的使用，所以我只是假设了它的工作，写下了我的新版本的编码器，问题是我收到一个奇怪的错误，并且由于每个指南都讲述了如何写东西，所以我不知道如何解决我的代码，但是他们忘记了为什么要这样做。

所以我写了：

class Encoder2(tf.keras.Model):
    def __init__(self, vocab_size, embedding_size, lstm_size):
        super(Encoder2, self).__init__()
        self.lstm_size = lstm_size
        self.lstm_size2 = (int)(lstm_size)
        self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_size)
        self.lstm = tf.keras.layers.LSTM(lstm_size, return_sequences=True, return_state=True)
        self.lstm2 = tf.keras.layers.LSTM(self.lstm_size2, return_sequences = True, return_state = True)

    def call(self, sequence, states1, states2):
        embed = self.embedding(sequence)
        output = self.lstm(embed, initial_state=states1)
        output, state_h, state_c = self.lstm2(output, initial_state=states2)

        return output, state_h, state_c

    def init_states(self, batch_size):
        return ([tf.zeros([batch_size, self.lstm_size]),
                tf.zeros([batch_size, self.lstm_size])],
                [tf.zeros([batch_size, self.lstm_size2]),
                tf.zeros([batch_size, self.lstm_size2])])

对我来说，在LSTM层上使用“状态”不是很清楚，也许我在构建这两层时犯了一些错误。无论如何，我只用数据集的一个迭代就尝试了代码，看看会发生什么，像这样：


encoder = Encoder2(in_vocab_size, EMBEDDING_SIZE, LSTM_SIZE)
decoder = Decoder(out_vocab_size, EMBEDDING_SIZE, LSTM_SIZE)  

en_initial_states = encoder.init_states(BATCH_SIZE)
for batch, (source_seq, target_seq_in, target_seq_out) in enumerate(training_set.take(-1)):
    if(len(source_seq)== BATCH_SIZE):
      train_step(source_seq, target_seq_in,
                          target_seq_out, en_initial_states)
      break
encoder.summary()

@tf.function
def train_step(source_seq, target_seq_in, target_seq_out, en_initial_states):
    loss = 0
    with tf.GradientTape() as tape:
        en_outputs = encoder(source_seq, en_initial_states[0], en_initial_states[1])
        #en_outputs = encoder(source_seq)
        en_states = en_outputs[1:]
        de_state_h, de_state_c = en_states

        # We need to create a loop to iterate through the target sequences
        for i in range(target_seq_out.shape[1]):
            # Input to the decoder must have shape of (batch_size, length)
            # so we need to expand one dimension
            decoder_in = tf.expand_dims(target_seq_in[:, i], 1)
            logit, de_state_h, de_state_c, _ = decoder(
                decoder_in, (de_state_h, de_state_c), en_outputs[0])

            # The loss is now accumulated through the whole batch
            loss += loss_func(target_seq_out[:, i], logit)

    variables = encoder.trainable_variables + decoder.trainable_variables
    gradients = tape.gradient(loss, variables)
    optimizer.apply_gradients(zip(gradients, variables))

    return loss / target_seq_out.shape[1]

我得到这个错误：

AssertionError                            Traceback (most recent call last)
<ipython-input-84-8e6b52715d0e> in <module>()
     13     if(len(source_seq)== BATCH_SIZE):
     14       train_step(source_seq, target_seq_in,
---> 15                           target_seq_out, en_initial_states)
     16       break
     17 encoder.summary()

8 frames
/tensorflow-2.0.0/python3.6/tensorflow_core/python/framework/func_graph.py in wrapper(*args, **kwargs)
    903           except Exception as e:  # pylint:disable=broad-except
    904             if hasattr(e, "ag_error_metadata"):
--> 905               raise e.ag_error_metadata.to_exception(e)
    906             else:
    907               raise

AssertionError: in converted code:

    <ipython-input-67-5f4e7c851f23>:5 train_step  *
        en_outputs = encoder(source_seq, en_initial_states[0], en_initial_states[1])
    /tensorflow-2.0.0/python3.6/tensorflow_core/python/keras/engine/base_layer.py:847 __call__
        outputs = call_fn(cast_inputs, *args, **kwargs)
    <ipython-input-66-0a370f7cebcd>:13 call  *
        output, state_h, state_c = self.lstm2(output, initial_state=states2)
    /tensorflow-2.0.0/python3.6/tensorflow_core/python/keras/layers/recurrent.py:620 __call__
        self._num_constants)
    /tensorflow-2.0.0/python3.6/tensorflow_core/python/keras/layers/recurrent.py:2716 _standardize_args
        assert initial_state is None and constants is None

    AssertionError:

我的问题是：

1）为什么在指南中使用该结构？在其中包含编码器和解码器的代码的单一模型难道不是很简单吗？

2）LSTM层中状态的用途是什么？

3）我的代码有什么问题？

非常感谢您抽出宝贵的时间

如何在编码器/解码器配置中使用Tensorflow 2.0堆叠2个LSTM层

0 个答案: