我有一个问题,对于使用tensorflow 2.0的技术人员来说可能会非常简单
我正在遵循本指南-> https://machinetalk.org/2019/03/29/neural-machine-translation-with-attention-mechanism,并且一切工作都很好,我并没有真正理解采用这种结构的原因,而不是一种简单的功能性API,但考虑到它正在处理文本, ,我认为这是有原因的。
因此,我尝试在现有的LSTM层之上堆叠另一个LSTM层,首先是在编码器端,以查看会发生什么,我不太了解LSTM“状态”的使用,所以我只是假设了它的工作,写下了我的新版本的编码器,问题是我收到一个奇怪的错误,并且由于每个指南都讲述了如何写东西,所以我不知道如何解决我的代码,但是他们忘记了为什么要这样做。
所以我写了:
class Encoder2(tf.keras.Model):
def __init__(self, vocab_size, embedding_size, lstm_size):
super(Encoder2, self).__init__()
self.lstm_size = lstm_size
self.lstm_size2 = (int)(lstm_size)
self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_size)
self.lstm = tf.keras.layers.LSTM(lstm_size, return_sequences=True, return_state=True)
self.lstm2 = tf.keras.layers.LSTM(self.lstm_size2, return_sequences = True, return_state = True)
def call(self, sequence, states1, states2):
embed = self.embedding(sequence)
output = self.lstm(embed, initial_state=states1)
output, state_h, state_c = self.lstm2(output, initial_state=states2)
return output, state_h, state_c
def init_states(self, batch_size):
return ([tf.zeros([batch_size, self.lstm_size]),
tf.zeros([batch_size, self.lstm_size])],
[tf.zeros([batch_size, self.lstm_size2]),
tf.zeros([batch_size, self.lstm_size2])])
对我来说,在LSTM层上使用“状态”不是很清楚,也许我在构建这两层时犯了一些错误。无论如何,我只用数据集的一个迭代就尝试了代码,看看会发生什么,像这样:
encoder = Encoder2(in_vocab_size, EMBEDDING_SIZE, LSTM_SIZE)
decoder = Decoder(out_vocab_size, EMBEDDING_SIZE, LSTM_SIZE)
en_initial_states = encoder.init_states(BATCH_SIZE)
for batch, (source_seq, target_seq_in, target_seq_out) in enumerate(training_set.take(-1)):
if(len(source_seq)== BATCH_SIZE):
train_step(source_seq, target_seq_in,
target_seq_out, en_initial_states)
break
encoder.summary()
@tf.function
def train_step(source_seq, target_seq_in, target_seq_out, en_initial_states):
loss = 0
with tf.GradientTape() as tape:
en_outputs = encoder(source_seq, en_initial_states[0], en_initial_states[1])
#en_outputs = encoder(source_seq)
en_states = en_outputs[1:]
de_state_h, de_state_c = en_states
# We need to create a loop to iterate through the target sequences
for i in range(target_seq_out.shape[1]):
# Input to the decoder must have shape of (batch_size, length)
# so we need to expand one dimension
decoder_in = tf.expand_dims(target_seq_in[:, i], 1)
logit, de_state_h, de_state_c, _ = decoder(
decoder_in, (de_state_h, de_state_c), en_outputs[0])
# The loss is now accumulated through the whole batch
loss += loss_func(target_seq_out[:, i], logit)
variables = encoder.trainable_variables + decoder.trainable_variables
gradients = tape.gradient(loss, variables)
optimizer.apply_gradients(zip(gradients, variables))
return loss / target_seq_out.shape[1]
我得到这个错误:
AssertionError Traceback (most recent call last)
<ipython-input-84-8e6b52715d0e> in <module>()
13 if(len(source_seq)== BATCH_SIZE):
14 train_step(source_seq, target_seq_in,
---> 15 target_seq_out, en_initial_states)
16 break
17 encoder.summary()
8 frames
/tensorflow-2.0.0/python3.6/tensorflow_core/python/framework/func_graph.py in wrapper(*args, **kwargs)
903 except Exception as e: # pylint:disable=broad-except
904 if hasattr(e, "ag_error_metadata"):
--> 905 raise e.ag_error_metadata.to_exception(e)
906 else:
907 raise
AssertionError: in converted code:
<ipython-input-67-5f4e7c851f23>:5 train_step *
en_outputs = encoder(source_seq, en_initial_states[0], en_initial_states[1])
/tensorflow-2.0.0/python3.6/tensorflow_core/python/keras/engine/base_layer.py:847 __call__
outputs = call_fn(cast_inputs, *args, **kwargs)
<ipython-input-66-0a370f7cebcd>:13 call *
output, state_h, state_c = self.lstm2(output, initial_state=states2)
/tensorflow-2.0.0/python3.6/tensorflow_core/python/keras/layers/recurrent.py:620 __call__
self._num_constants)
/tensorflow-2.0.0/python3.6/tensorflow_core/python/keras/layers/recurrent.py:2716 _standardize_args
assert initial_state is None and constants is None
AssertionError:
我的问题是:
1)为什么在指南中使用该结构?在其中包含编码器和解码器的代码的单一模型难道不是很简单吗?
2)LSTM层中状态的用途是什么?
3)我的代码有什么问题?
非常感谢您抽出宝贵的时间