这是我在model.fit上尝试建立具有注意力(baidhanu)的sequence-2-Sequence(编码器-解码器模型)时遇到的错误我面临参数错误有人可以纠正我什么吗是确切的问题
#Bahdanu attention
#parameters to pass this attention
'''
1.Encoder state's i.e.., state_c, state_h
2.encoder_outputs
3.decoder_embedding which is in decoder part
4.you will get a context vector named "input_to_decoder" pass this as input to decoder lstm layer
'''
def Attention_layer(state_h,state_c,encoder_outputs,decoder_embedding):
d0 = tf.keras.layers.Dense(1024,name='dense_layer_1')
d1 = tf.keras.layers.Dense(1024,name='dense_layer_2')
d2 = tf.keras.layers.Dense(1024,name='dense_layer_3')
#hidden_with_time_axis_1 = tf.keras.backend.expand_dims(state_h, 1)
#hidden_with_time_axis_1 = state_h
#hidden_with_time_axis_2 = tf.keras.backend.expand_dims(state_c, 1)
#hidden_with_time_axis_2 = state_c
#hidden_states = tf.keras.layers.concatenate([state_h,state_c],axis=-1)
#all_states = tf.keras.layers.concatenate()
score = d0(tf.keras.activations.tanh(encoder_outputs) + d1(state_h) + d2(state_c))
attention_weights = tf.keras.activations.softmax(score, axis=1)
context_vector = attention_weights * encoder_outputs
context_vector = tf.keras.backend.sum(context_vector, axis=1)
context_vector = tf.keras.backend.expand_dims(context_vector, 1)
context_vector = tf.keras.backend.reshape(context_vector,[-1,-1,1024])
input_to_decoder = tf.keras.layers.concatenate([context_vector,decoder_embedding], axis=-1)
return input_to_decoder
以上是我的关注层
#Encoder inputs
encoder_inputs = tf.keras.layers.Input(shape=(None,),name='encoder_input_layer')
encoder_embedding = tf.keras.layers.Embedding(vocab_size, 1024, mask_zero=True,name='encoder_embedding_layer')(encoder_inputs)
encoder_outputs , state_h , state_c = tf.keras.layers.LSTM(1024, return_state=True)(encoder_embedding)
# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = tf.keras.layers.Input(shape=(None,),name='decoder_input_layer')
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the.
# return states in the training model, but we will use them in inference.
decoder_embedding = tf.keras.layers.Embedding(vocab_size, 1024, mask_zero=True,name='decoder_embedding_layer')(decoder_inputs)
decoder_lstm = tf.keras.layers.LSTM(1024, return_state=True, return_sequences=True)
#Attention layer which is defind in above function
attention_layer = Attention_layer(state_h, state_c, encoder_outputs, decoder_embedding)
decoder_outputs, _, _ = decoder_lstm(attention_layer, initial_state=encoder_states)
decoder_dense = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(vocab_size, activation='softmax'))
output = decoder_dense(decoder_outputs)
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = tf.keras.models.Model([encoder_inputs, decoder_inputs], output)
#compiling the model
model.compile(optimizer='adam', loss='categorical_crossentropy')
#model summary
model.summary()
当我试图适应该功能时,我陷入了以下错误,我不了解它是什么
%%time
model.fit([encoder_input_data, decoder_input_data], decoder_output_data, batch_size=86, epochs=10, validation_split=0.2)
------------------------------------------------------------------------------------------------------------------------------
#Output :
Train on 4644 samples, validate on 1162 samples
Epoch 1/10
86/4644 [..............................] - ETA: 8:18
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
<ipython-input-34-781d7ca43c98> in <module>()
----> 1 get_ipython().run_cell_magic('time', '', 'model.fit([encoder_input_data, decoder_input_data], decoder_output_data, batch_size=86, epochs=10, validation_split=0.2) ')
14 frames
</usr/local/lib/python3.6/dist-packages/decorator.py:decorator-gen-60> in time(self, line, cell, local_ns)
<timed eval> in <module>()
/usr/local/lib/python3.6/dist-packages/six.py in raise_from(value, from_value)
InvalidArgumentError: Only one input size may be -1, not both 0 and 1
[[node model/tf_op_layer_Reshape/Reshape (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1751) ]] [Op:__inference_distributed_function_16029]
Function call stack:
distributed_function
谁能帮助我我犯错的地方
答案 0 :(得分:0)
这些行可能是您的问题:
encoder_inputs = tf.keras.layers.Input(shape=(None,),name='encoder_input_layer')
decoder_inputs = tf.keras.layers.Input(shape=(None,),name='decoder_input_layer')
您不能使用shape=(None,)
,并且必须至少指定输入中具有的功能数。
要详细说明您得到的错误,将自动考虑批次尺寸,它期望0尺寸为-1(或等效为None
)-您始终可以选择更改批次大小。但是维度1也不能为None
(这是您当前正在设置的),因为只有一个非批处理特征维度。您的模型不知道要素输入的大小。
This answer提供了有关不同类型的模型输入的有效形状的更多信息。