Question

我正在尝试创建图像字幕模型。您能帮忙解决这个错误吗？ input1是图像矢量，input2是字幕序列。字幕长度为32。我想将图像向量与序列的嵌入连接起来，然后将其馈送到解码器模型。


    def define_model(vocab_size, max_length):
      input1 = Input(shape=(512,))
      input1 = tf.keras.layers.RepeatVector(32)(input1)
      print(input1.shape)

      input2 = Input(shape=(max_length,))
      e1 = Embedding(vocab_size, 512, mask_zero=True)(input2)
      print(e1.shape)

      dec1 = tf.concat([input1,e1], axis=2)
      print(dec1.shape)

      dec2 = LSTM(512)(dec1)
      dec3 = LSTM(256)(dec2)
      dec4 = Dropout(0.2)(dec3)
      dec5 = Dense(256, activation="relu")(dec4)
      output = Dense(vocab_size, activation="softmax")(dec5)
      model = tf.keras.Model(inputs=[input1, input2], outputs=output)
      model.compile(loss="categorical_crossentropy", optimizer="adam")
      print(model.summary())
      return model

ValueError: Input 0 of layer lstm_5 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [None, 512]

Answer 1

当LSTM图层以2D（而非3D）输入时，会发生此错误。例如：

(64, 100)

正确的格式为(n_samples, time_steps, features)：

(64, 5, 100)

在这种情况下，您犯的错误是dec3的输入（它是LSTM层）是dec2的输出（也是LSTM层）。默认情况下，LSTM层中的参数return_sequences为False。这意味着第一个LSTM返回2D张量，该张量与下一个LSTM层不兼容。我通过在您的第一个LSTM层中设置return_sequences=True解决了您的问题。

另外，这一行有错误：

model = tf.keras.Model(inputs=[input1, input2], outputs=output)

input1不是输入层，因为您已对其进行了重新分配。参见：

input1 = Input(shape=(512,))
input1 = tf.keras.layers.RepeatVector(32)(input1)

我重命名了第二个e0，与您命名变量的方式一致。

现在，一切正常：

import tensorflow as tf
from tensorflow.keras.layers import *
from tensorflow.keras import Input

vocab_size, max_length = 1000, 32

input1 = Input(shape=(128))
e0 = tf.keras.layers.RepeatVector(32)(input1)
print(input1.shape)

input2 = Input(shape=(max_length,))
e1 = Embedding(vocab_size, 128, mask_zero=True)(input2)
print(e1.shape)

dec1 = Concatenate()([e0, e1])
print(dec1.shape)

dec2 = LSTM(16, return_sequences=True)(dec1)
dec3 = LSTM(16)(dec2)
dec4 = Dropout(0.2)(dec3)
dec5 = Dense(32, activation="relu")(dec4)
output = Dense(vocab_size, activation="softmax")(dec5)
model = tf.keras.Model(inputs=[input1, input2], outputs=output)
model.compile(loss="categorical_crossentropy", optimizer="adam")
print(model.summary())

Model: "model_2"
_________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to 
=================================================================================
input_24 (InputLayer)           [(None, 128)]        0    
_________________________________________________________________________________

input_25 (InputLayer)           [(None, 32)]         0                          
  _________________________________________________________________________________

repeat_vector_12 (RepeatVector) (None, 32, 128)      0           input_24[0][0]  
_________________________________________________________________________________

embedding_11 (Embedding)        (None, 32, 128)      128000      input_25[0][0]
_________________________________________________________________________________
concatenate_7 (Concatenate)     (None, 32, 256)      0     repeat_vector_12[0][0]
                                                              embedding_11[0][0]
_________________________________________________________________________________
lstm_12 (LSTM)                  (None, 32, 16)       17472    concatenate_7[0][0]
_________________________________________________________________________________
lstm_13 (LSTM)                  (None, 16)           2112        lstm_12[0][0]
_________________________________________________________________________________
dropout_2 (Dropout)             (None, 16)           0           lstm_13[0][0]
_________________________________________________________________________________
dense_4 (Dense)                 (None, 32)           544         dropout_2[0][0]
_________________________________________________________________________________
dense_5 (Dense)                 (None, 1000)         33000       dense_4[0][0]
=================================================================================
Total params: 181,128
Trainable params: 181,128
Non-trainable params: 0
_________________________________________________________________________________

lstm_5层的输入0与该层不兼容：预期ndim = 3，找到的ndim = 2

1 个答案: