检查目标时出错:预期activation_29具有形状(1,)但得到的形状为数组(3,)

时间:2018-05-31 17:54:27

标签: python tensorflow keras lstm

我正在尝试修改Keras's memory neural net using the bAbI dataset从输出单个单词到输出多个单词(本例中为3)。对于上下文,这是一个使用LSTM进行问答的NLP模型。

以下是模型结构的片段:

# placeholders
input_sequence = Input((story_maxlen,))
question = Input((query_maxlen,))

# encoders
# embed the input sequence into a sequence of vectors
input_encoder_m = Sequential()
input_encoder_m.add(Embedding(input_dim=vocab_size,
                              output_dim=64))
input_encoder_m.add(Dropout(0.3))
# output: (samples, story_maxlen, embedding_dim)

# embed the input into a sequence of vectors of size query_maxlen
input_encoder_c = Sequential()
input_encoder_c.add(Embedding(input_dim=vocab_size,
                              output_dim=query_maxlen))
input_encoder_c.add(Dropout(0.3))
# output: (samples, story_maxlen, query_maxlen)

# embed the question into a sequence of vectors
question_encoder = Sequential()
question_encoder.add(Embedding(input_dim=vocab_size,
                               output_dim=64,
                               input_length=query_maxlen))
question_encoder.add(Dropout(0.3))
# output: (samples, query_maxlen, embedding_dim)

# encode input sequence and questions (which are indices)
# to sequences of dense vectors
input_encoded_m = input_encoder_m(input_sequence)
input_encoded_c = input_encoder_c(input_sequence)
question_encoded = question_encoder(question)

# compute a 'match' between the first input vector sequence
# and the question vector sequence
# shape: `(samples, story_maxlen, query_maxlen)`
match = dot([input_encoded_m, question_encoded], axes=(2, 2))
match = Activation('softmax')(match)

# add the match matrix with the second input vector sequence
response = add([match, input_encoded_c])  # (samples, story_maxlen, query_maxlen)
response = Permute((2, 1))(response)  # (samples, query_maxlen, story_maxlen)

# concatenate the match matrix with the question vector sequence
answer = concatenate([response, question_encoded])

# the original paper uses a matrix multiplication for this reduction step.
# we choose to use a RNN instead.
answer = LSTM(32)(answer)  # (samples, 32)

# one regularization layer -- more would probably be needed.
answer = Dropout(0.3)(answer)
answer = Dense(vocab_size)(answer)  # (samples, vocab_size)
# we output a probability distribution over the vocabulary
answer = Activation('softmax')(answer)

这就是编译和训练的方式:

model = Model([input_sequence, question], answer)
model.compile(optimizer='rmsprop', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit([inputs_train, queries_train], answers_train,
          batch_size=32,
          epochs=num_epochs,
          validation_data=([inputs_test, queries_test], answers_test))

在上面的示例中,answers_train变量是1xn矩阵,其中每个项目是问题的值。所以,例如,前三个答案:

print(answers_train[:3])

输出:

[16 16 19]

我的问题

这是我对answer_train变量所做的更改,其中:

print(answers_train[:3])

输出:

[[ 0  0 16]
 [ 0  0 27]
 [ 0  0 16]]
基本上,我试图达到预测的三个单词,而不是一个单词。

当我这样做并尝试训练模型时,我得到了这个错误:

  

ValueError:检查目标时出错:预期activation_29有   shape(1,)但是有形状的数组(3,)

这是model.summary()的输出:

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 552)          0                                            
__________________________________________________________________________________________________
input_2 (InputLayer)            (None, 5)            0                                            
__________________________________________________________________________________________________
sequential_1 (Sequential)       multiple             2304        input_1[0][0]                    
__________________________________________________________________________________________________
sequential_3 (Sequential)       (None, 5, 64)        2304        input_2[0][0]                    
__________________________________________________________________________________________________
dot_1 (Dot)                     (None, 552, 5)       0           sequential_1[1][0]               
                                                                 sequential_3[1][0]               
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 552, 5)       0           dot_1[0][0]                      
__________________________________________________________________________________________________
sequential_2 (Sequential)       multiple             180         input_1[0][0]                    
__________________________________________________________________________________________________
add_1 (Add)                     (None, 552, 5)       0           activation_1[0][0]               
                                                                 sequential_2[1][0]               
__________________________________________________________________________________________________
permute_1 (Permute)             (None, 5, 552)       0           add_1[0][0]                      
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 5, 616)       0           permute_1[0][0]                  
                                                                 sequential_3[1][0]               
__________________________________________________________________________________________________
lstm_1 (LSTM)                   (None, 32)           83072       concatenate_1[0][0]              
__________________________________________________________________________________________________
dropout_4 (Dropout)             (None, 32)           0           lstm_1[0][0]                     
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 36)           1188        dropout_4[0][0]                  
__________________________________________________________________________________________________
activation_2 (Activation)       (None, 36)           0           dense_1[0][0]                    
==================================================================================================
Total params: 89,048
Trainable params: 89,048
Non-trainable params: 0
__________________________________________________________________________________________________

我所理解的是,该模型是为了确定单个单词的答案(即形状(1,))而构建的,我需要修改模型,因为现在我希望它可以确定多个单词答案(在这种情况下,形状(3))。我不明白的是如何改变模型结构来实现这一目标。

我没有在模型的摘要中看到任何指示形状(1,)定义的位置。我只看到单词中最大故事大小的定义(552),单词(5)中的最大查询/问题大小,以及单词(36)中的词汇大小。

有人能帮我弄清楚我做错了吗?

更新#1

在我继续研究这个问题的过程中,我学到了更多东西。我可能在所有这些方面都错了,因为我不熟悉ML和NN的精细细节,所以请随时给我打电话,看看有什么不妥。

  • 最后一个致密的形状层(无,36)根据词汇量大小调整大小,随后的softmax激活层的目的是产生一个概率向量,以指示哪个词是正确的。如果是这种情况,那么通过将最后一个密集层减少到(无,3),我丢失了信息吗?我是否只是获得三个概率的向量而没有任何关于它们适用于哪些单词的指示?除非最后一个密集层是矢量化词汇的标志?在那种情况下,我会知道正在预测的单词,但后续激活层的目的是什么?
  • sparse_categorical_crossentropy损失函数会将~/keras/engine/training.py on line 770中最终输出的形状缩小为(1,)。这是否意味着我使用了错误的损失功能?我不能使用categorical_crossentropy,因为我不想拥有一个热门的矢量输出。这是否意味着我需要完全改变整个模型,还是另一个损失函数会给我所需的输出?

我想总结一下,即使可能对模型进行调整,还是需要使用完全不同的模型?如果你也可以根据以上两点澄清我的困惑,我将非常感激。

2 个答案:

答案 0 :(得分:0)

一个简单的解决方案(没有任何承诺)只会添加两个带有自己权重的“答案”图层,并编译模型以输出这些图层。

answer = Dropout(0.3)(answer)

answer_1 = Dense(vocab_size, activation='softmax')(answer)
answer_2 = Dense(vocab_size, activation='softmax')(answer)
answer_3 = Dense(vocab_size, activation='softmax')(answer)

model = Model([input_sequence, question], [answer_1, answer_2, answer_3])

然后将标签作为三个list维数组的(samples,1)传递,只需传递

即可
first, second, third = answers_train.T

作为你的标签。这可能不适合您的应用程序,您可能希望查看其他sequence to sequence模型。

答案 1 :(得分:0)

您将需要可变数量的输出,这需要预测侧的周期性网络。让我们尝试在现有网络上构建一个:

# first we'll add an extra input telling how many outputs we need
num_outs = Input(shape=(1,), dtype='int')
# ... continuing from
answer = LSTM(32)(answer)  # (samples, 32)
# answer is your encoded context-query, we will decode it into a sequence
answers = RepeatVector(num_outs[0])(answer) # (samples, num_outs, 32)
# Another RNN to give us decoding (optional)
answers = LSTM(32, return_sequences=True)(answers) # note return_sequences
answers = TimeDistributed(Dense(vocab_size, activation='softmax'))(answers)
# we have (samples, num_outs, vocab_size) so num_outs words
# ...

现在您的目标也必须是3D形状。 重要,您必须为每个答案附加一个答案结尾标记,以便您知道何时在预测时间停止。同样,您可以在批处理中填充答案数,以在答案结束后获得一个张量。

现在在预测时,您可以在结束答案令牌后请求10个单词并切断单词,类似于使用seq2seq模型完成机器翻译的方式。如需参考,请查看Dynamic Memory Networks