修改

Question

EDIT2：我的代码https://github.com/hcl14/my_simple_LSTM

我有以下结构的模型：两个LSTM（问答）和额外的注意层，可以在答案之上考虑。这是使用sum和softmax来比较两个输出的版本：

#question
qenc = Sequential()
qenc.add(Embedding(output_dim=WORD2VEC_EMBED_SIZE, input_dim=vocab_size,
                   input_length=seq_maxlen,
                   weights=[embedding_weights]))
qenc.add(Bidirectional(LSTM(QA_EMBED_SIZE, return_sequences=True), 
                       merge_mode="sum"))
qenc.add(Dropout(0.3))
qenc.add(Convolution1D(QA_EMBED_SIZE // 2, 5, border_mode="valid"))
qenc.add(MaxPooling1D(pool_length=2, border_mode="valid"))
qenc.add(Dropout(0.3))

# answer
aenc = Sequential()
aenc.add(Embedding(output_dim=WORD2VEC_EMBED_SIZE, input_dim=vocab_size,
                   input_length=seq_maxlen,
                   weights=[embedding_weights]))
aenc.add(Bidirectional(LSTM(QA_EMBED_SIZE, return_sequences=True),
                       merge_mode="sum"))
aenc.add(Dropout(0.3))
aenc.add(Convolution1D(QA_EMBED_SIZE // 2, 5, border_mode="valid"))
aenc.add(MaxPooling1D(pool_length=2, border_mode="valid"))
aenc.add(Dropout(0.3))

# attention model
attn = Sequential()
attn.add(Merge([qenc, aenc], mode="dot", dot_axes=[1, 1]))
attn.add(Flatten())
#attn.add(Dense((seq_maxlen * QA_EMBED_SIZE)))
#attn.add(Reshape((seq_maxlen, QA_EMBED_SIZE)))
attn.add(Dense((qenc.output_shape[1]*(QA_EMBED_SIZE // 2))))
attn.add(Reshape((qenc.output_shape[1], QA_EMBED_SIZE // 2)))

# Plain sum - not working properly!
model = Sequential()
model.add(Merge([qenc, attn], mode="sum"))
model.add(Flatten())
model.add(Dense(1, activation="softmax"))

这里的网络正在运行，但普通的总和+ softmax是一个错误的选择，并没有给出所需的结果。我想要的是使用qenc和attn之间的余弦相似度，但它们的形状为(None, 48, 32)（这些数字因使用的数据而异）。我正在考虑的是将两者平坦化并使用余弦相似性，与0-1标签进行比较。

问题是如何在那里使用余弦？我无法展平qenc因为在计算attn时合并中使用def cosine_distance(vests): x, y = vests x = K.batch_flatten(x) y = K.batch_flatten(y) x = K.l2_normalize(x, axis=-1) y = K.l2_normalize(y, axis=-1) return -K.mean(x * y, axis=-1) model = Sequential() model.add(Lambda(cosine_distance)([qenc.layers[-1].output,attn.layers[-1].output]))并且形状很重要。我试过了：

Lambda - 不起作用。我不接受顺序模型，只是图层输出，它不是图层，而是张量，所以无法添加。

flattened_attn = Sequential()    
flattened_attn.add(attn)    
flattened_attn.add(Flatten())

flattened_qenc = ...

model = Sequential()
model.add(Merge([flattened_attn, flattned_qenc], mode="cos", dot_axes=1))

中间展平模型 - 导致错误，例如“合并对象没有batch_size属性”或类似的错误：

(None, 1536)

最后，我实现了传递形状为qenc = Sequential() qenc.add(Embedding(output_dim=WORD2VEC_EMBED_SIZE, input_dim=vocab_size, input_length=seq_maxlen, weights=[embedding_weights])) qenc.add(Bidirectional(LSTM(QA_EMBED_SIZE, return_sequences=True), merge_mode="sum")) qenc.add(Dropout(0.3)) qenc.add(Convolution1D(QA_EMBED_SIZE // 2, 5, border_mode="valid")) qenc.add(MaxPooling1D(pool_length=2, border_mode="valid")) qenc.add(Dropout(0.3)) qenc.add(Flatten()) aenc = Sequential() aenc.add(Embedding(output_dim=WORD2VEC_EMBED_SIZE, input_dim=vocab_size, input_length=seq_maxlen, weights=[embedding_weights])) aenc.add(Bidirectional(LSTM(QA_EMBED_SIZE, return_sequences=True), merge_mode="sum")) aenc.add(Dropout(0.3)) aenc.add(Convolution1D(QA_EMBED_SIZE // 2, 5, border_mode="valid")) aenc.add(MaxPooling1D(pool_length=2, border_mode="valid")) aenc.add(Dropout(0.3)) unflattened_qenc = Sequential() unflattened_qenc.add(qenc) unflattened_qenc.add(Reshape((aenc.output_shape[1],aenc.output_shape[2]))) # attention model attn = Sequential() attn.add(Merge([unflattened_qenc, aenc], mode="dot", dot_axes=[1, 1])) attn.add(Flatten()) #attn.add(Dense((seq_maxlen * QA_EMBED_SIZE))) #attn.add(Reshape((seq_maxlen, QA_EMBED_SIZE))) attn.add(Dense((aenc.output_shape[1]*(QA_EMBED_SIZE // 2)))) attn.add(Reshape((aenc.output_shape[1], QA_EMBED_SIZE // 2))) attn.add(Flatten()) model = Sequential() attn.add(Merge([qenc, attn], mode="cos", dot_axes=1))的扁平化数据：

  attn.add(Merge([qenc, attn], mode="cos", dot_axes=1))
Traceback (most recent call last):
  File "qa-lstm-attn.py", line 175, in <module>
    attn.add(Merge([qenc, attn], mode="cos", dot_axes=1))
  File "/home/hcl/.local/lib/python3.5/site-packages/keras/models.py", line 492, in add
    output_tensor = layer(self.outputs[0])
  File "/home/hcl/.local/lib/python3.5/site-packages/keras/engine/topology.py", line 617, in __call__
    output = self.call(inputs, **kwargs)
  File "/home/hcl/.local/lib/python3.5/site-packages/keras/legacy/layers.py", line 202, in call
    '(at least 2). Got: ' + str(inputs))
TypeError: Merge must be called on a list of tensors (at least 2). Got: Tensor("flatten_3/Reshape:0", shape=(?, ?), dtype=float32)
>>> qenc.output_shape
(None, 1536)
>>> aenc.output_shape
(None, 48, 32)
>>> attn.output_shape
(None, 1536)

得到了错误：

model.add()

如何做余弦呢？

Keras v.2.1.4

UPD：修复model = Sequential() model.add(Merge([qenc, attn], mode="cos", dot_axes=1)) copypaste错误后，我有：

  File "qa-lstm-attn.py", line 195, in <module>
    callbacks=[checkpoint])
  File "/home/hcl/.local/lib/python3.5/site-packages/keras/models.py", line 963, in fit
    validation_steps=validation_steps)
  File "/home/hcl/.local/lib/python3.5/site-packages/keras/engine/training.py", line 1637, in fit
    batch_size=batch_size)
  File "/home/hcl/.local/lib/python3.5/site-packages/keras/engine/training.py", line 1483, in _standardize_user_data
    exception_prefix='input')
  File "/home/hcl/.local/lib/python3.5/site-packages/keras/engine/training.py", line 86, in _standardize_input_data
    str(len(data)) + ' arrays: ' + str(data)[:200] + '...')
ValueError: Error when checking model input: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 3 array(s), but instead got the following list of 2 arrays: [array([[ 1676,    19,   328, ...,  1612,    29,  4220],
       [    0,     0,     0, ...,     4,    27,  4807],
       [ 2928,     9,  1652, ...,   125,     9,   181],
       ...,
       [ 5970,   14...

错误讯息：

model.compile(optimizer="adam", loss="mean_squared_error",
              metrics=["accuracy"])

print("Training...")
checkpoint = ModelCheckpoint(
    filepath=os.path.join(MODEL_DIR, "qa-lstm-attn-best.hdf5"),
    verbose=1, save_best_only=True)
model.fit([Xqtrain, Xatrain], Ytrain, batch_size=BATCH_SIZE,
          nb_epoch=NBR_EPOCHS, validation_split=0.1,
          callbacks=[checkpoint])

如何调用回调：

kaggle.py

我认为Keras不理解其中一个模型被重复使用并期望获得额外的输入。

我的模型实际上是此代码的修改版本，它无法正常工作，因为模型只是学会回答False（作者警告它）：

https://github.com/sujitpal/dl-models-for-qa

https://github.com/sujitpal/dl-models-for-qa/blob/master/src/qa-blstm-attn.py

修改

@ daniel-möller的解释：我想从文章https://arxiv.org/abs/1511.04108中实现该模型。只要模型计算问题和答案之间的余弦，我的标签就是0和1（答案匹配问题而不是问题）。数据集由一个问题和答案的4个变体组成，其中一个是正确的。以下是我如何通过创建4个数据对来准备它（def get_question_answer_pairs(question_file, is_test=False): qapairs = [] fqa = open(question_file, "r") data = json.load(fqa) for l, line in enumerate(data): if l%100==0: print(l) question = line["question"]+" "+line["support"] qwords = tokenizer(question) #qwords = nltk.word_tokenize(question) if len(qwords)>100: qwords=qwords[:100] if not is_test: correct_ans = line["correct_answer"], answers = [line["distractor1"],line["distractor2"],line["distractor3"],correct_ans[0]] new_order = [0,1,2,3] random.shuffle(new_order) answers = [ answers[i] for i in new_order] correct_ans_idx = new_order[-1] # training file parsing #correct_ans_idx = ord(correct_ans) - ord('A') for idx, answer in enumerate(answers): #awords = nltk.word_tokenize(answer) #print(answer) awords = tokenizer(answer) qapairs.append((qwords, awords, idx == correct_ans_idx)) else: # test file parsing (no correct answer) answers = cols[2:] for answer in answers: awords = nltk.word_tokenize(answer) qapairs.append((qwords, awords, None)) fqa.close() return qapairs），其中一个数据对为True：

with open("processed_input.pickle", 'rb') as f:
    qapairs = pickle.load(f)

你不需要重新计算qapairs，它们已经保存并通过主程序中的行加载：

>>> qapairs[0]
(['what', 'type', 'of', 'organism', 'is', 'commonly', 'used', 'in', 'preparation', 'of', 'foods', 'such', 'as', 'cheese', 'and', 'yogurt', '', 'mesophiles', 'grow', 'best', 'in', 'moderate', 'temperature', 'typically', 'between', '25°c', 'and', '40°c', '(77°f', 'and', '104°f)', 'mesophiles', 'are', 'often', 'found', 'living', 'in', 'or', 'on', 'the', 'bodies', 'of', 'humans', 'or', 'other', 'animals', 'the', 'optimal', 'growth', 'temperature', 'of', 'many', 'pathogenic', 'mesophiles', 'is', '37°c', '(98°f)', 'the', 'normal_human', 'body', 'temperature', 'mesophilic', 'organisms', 'have', 'important', 'uses', 'in', 'food', 'preparation', 'including', 'cheese', 'yogurt', 'beer', 'and', 'wine'], ['viruses'], False)
>>> qapairs[1]
(['what', 'type', 'of', 'organism', 'is', 'commonly', 'used', 'in', 'preparation', 'of', 'foods', 'such', 'as', 'cheese', 'and', 'yogurt', '', 'mesophiles', 'grow', 'best', 'in', 'moderate', 'temperature', 'typically', 'between', '25°c', 'and', '40°c', '(77°f', 'and', '104°f)', 'mesophiles', 'are', 'often', 'found', 'living', 'in', 'or', 'on', 'the', 'bodies', 'of', 'humans', 'or', 'other', 'animals', 'the', 'optimal', 'growth', 'temperature', 'of', 'many', 'pathogenic', 'mesophiles', 'is', '37°c', '(98°f)', 'the', 'normal_human', 'body', 'temperature', 'mesophilic', 'organisms', 'have', 'important', 'uses', 'in', 'food', 'preparation', 'including', 'cheese', 'yogurt', 'beer', 'and', 'wine'], ['mesophilic', 'organisms'], True)
>>> qapairs[2]
(['what', 'type', 'of', 'organism', 'is', 'commonly', 'used', 'in', 'preparation', 'of', 'foods', 'such', 'as', 'cheese', 'and', 'yogurt', '', 'mesophiles', 'grow', 'best', 'in', 'moderate', 'temperature', 'typically', 'between', '25°c', 'and', '40°c', '(77°f', 'and', '104°f)', 'mesophiles', 'are', 'often', 'found', 'living', 'in', 'or', 'on', 'the', 'bodies', 'of', 'humans', 'or', 'other', 'animals', 'the', 'optimal', 'growth', 'temperature', 'of', 'many', 'pathogenic', 'mesophiles', 'is', '37°c', '(98°f)', 'the', 'normal_human', 'body', 'temperature', 'mesophilic', 'organisms', 'have', 'important', 'uses', 'in', 'food', 'preparation', 'including', 'cheese', 'yogurt', 'beer', 'and', 'wine'], ['protozoa'], False)
>>> qapairs[3]
(['what', 'type', 'of', 'organism', 'is', 'commonly', 'used', 'in', 'preparation', 'of', 'foods', 'such', 'as', 'cheese', 'and', 'yogurt', '', 'mesophiles', 'grow', 'best', 'in', 'moderate', 'temperature', 'typically', 'between', '25°c', 'and', '40°c', '(77°f', 'and', '104°f)', 'mesophiles', 'are', 'often', 'found', 'living', 'in', 'or', 'on', 'the', 'bodies', 'of', 'humans', 'or', 'other', 'animals', 'the', 'optimal', 'growth', 'temperature', 'of', 'many', 'pathogenic', 'mesophiles', 'is', '37°c', '(98°f)', 'the', 'normal_human', 'body', 'temperature', 'mesophilic', 'organisms', 'have', 'important', 'uses', 'in', 'food', 'preparation', 'including', 'cheese', 'yogurt', 'beer', 'and', 'wine'], ['gymnosperms'], False)

以下是示例（请向右滚动以查看答案和真假标签）：

vectorize_qapairs()

下一步由kaggle.py中的函数def vectorize_qapairs(qapairs, word2idx, seq_maxlen): Xq, Xa, Y = [], [], [] for qapair in qapairs: Xq.append([word2idx[qword] for qword in qapair[0]]) Xa.append([word2idx[aword] for aword in qapair[1]]) #Y.append(np.array([1, 0]) if qapair[2] else np.array([0, 1])) # cosine similarity: 1 for 0 degree angle Y.append(np.array([1]) if qapair[2] else np.array([0])) return (pad_sequences(Xq, maxlen=seq_maxlen), pad_sequences(Xa, maxlen=seq_maxlen), np.array(Y))完成/在github上它使用余弦距离，我已将其更改为余弦相似度（1 - 最相似（零角度），0 - 不类似（正交））根据你的评论：

#question
qenc = Sequential()
qenc.add(Embedding(output_dim=WORD2VEC_EMBED_SIZE, input_dim=vocab_size,
                   input_length=seq_maxlen))
qenc.add(Bidirectional(LSTM(QA_EMBED_SIZE, return_sequences=True), 
                       merge_mode="sum"))

aenc = Sequential()
aenc.add(Embedding(output_dim=WORD2VEC_EMBED_SIZE, input_dim=vocab_size,
                   input_length=seq_maxlen))
aenc.add(Bidirectional(LSTM(QA_EMBED_SIZE, return_sequences=True),
                       merge_mode="sum"))

# attention model

#notice that I'm taking "tensors" qenc.output and aenc.output
#I'm not passing "models" to a layer, I'm passing tensors 
#that was the problem with your lambda

attOut = Dot(axes=1)([qenc.output, aenc.output]) 
    #shape = (samples,QA_EMBED_SIZE//2, QA_EMBED_SIZE//2)
    #I really don't understand this output shape.... 
    #I'd swear it should be (samples, 1, QA_EMBED_SIZE//2)
attOut = Flatten()(attOut) #shape is now only (samples,)
#attOut = Dense((qenc.output_shape[1]*(QA_EMBED_SIZE // 2)))(attOut)
#attOut = Reshape((qenc.output_shape[1], QA_EMBED_SIZE // 2))(attOut) 
attOut = Dense((qenc.output_shape[1]*(QA_EMBED_SIZE)))(attOut)
attOut = Reshape((qenc.output_shape[1], QA_EMBED_SIZE))(attOut) 



flatAttOut = Flatten()(attOut)
flatQencOut = Flatten()(qenc.output)
similarity = Dot(axes=1,normalize=True)([flatQencOut,flatAttOut])

model = Model([qenc.input,aenc.input],similarity)

# I tried MSE and binary crossentropy
model.compile(optimizer="adam", loss="binary_crossentropy",
              metrics=["accuracy"])

print("Training...")
checkpoint = ModelCheckpoint(
    filepath=os.path.join(MODEL_DIR, "qa-lstm-attn-best.hdf5"),
    verbose=1, save_best_only=True)
model.fit([Xqtrain, Xatrain], Ytrain, batch_size=BATCH_SIZE,
          nb_epoch=NBR_EPOCHS, validation_split=0.1,
          callbacks=[checkpoint])

如您所见，如果有“True”标签，则放置1，否则放0。

现在我希望模型计算余弦，就像在图片上一样，然后将它与0-1标签进行比较。我相信你所做的是正确的，模型现在正在运行，但我希望它开始学习而不是输出精度= 0.75左右的数字，这对应于输出始终为False。我现在甚至简化了代码以用于调试目的，抛弃了卷积：

Dense(2)

代码当然不完全是我的，我使用https://github.com/sujitpal/dl-models-for-qa的实现来计算wp_4_options层，并且遇到同样的学习问题，只输出错误。

我想知道我是否犯了一些我无法理解的错误。谢谢！

Answer 1

我认为问题在于您使用的是Sequential模型，以下代码块会导致问题（请注意，您使用的是attn.add()而不是model.add()）。

model = Sequential()
attn.add(Merge([qenc, attn], mode="cos", dot_axes=1))

我认为在您的案例中使用Graph模型更有意义。

另外，你在这里犯了错误

# Plain sum - not working properly!
model = Sequential()
model.add(Merge([qenc, attn], mode="sum"))
model.add(Flatten())
model.add(Dense(1, activation="softmax")) # <--- ERROR

单个神经元上的Softmax毫无意义！您应该使用Dense(1, activation='sigmoid')代替。或者，您可以使用Dense(2, activation='softmax')

Answer 2

您正在与分支机构合作。不要使用带分支的顺序模型。

您可以将qenc和aenc用作Sequential型号，没问题，因为它们只是一条路径，没有任何后果。

我在代码的第一部分中提供示例。

更新使用keras 1的电话：

#question
qenc = Sequential()
qenc.add(Embedding(output_dim=WORD2VEC_EMBED_SIZE, input_dim=vocab_size,
                   input_length=seq_maxlen))
qenc.add(Bidirectional(LSTM(QA_EMBED_SIZE, return_sequences=True), 
                       merge_mode="sum"))
qenc.add(Dropout(0.3))
qenc.add(Convolution1D(QA_EMBED_SIZE // 2, 5, padding="valid"))
qenc.add(MaxPooling1D(pool_size=2, padding="valid"))
qenc.add(Dropout(0.3))

# answer
aenc = Sequential()
aenc.add(Embedding(output_dim=WORD2VEC_EMBED_SIZE, input_dim=vocab_size,
                   input_length=seq_maxlen))
aenc.add(Bidirectional(LSTM(QA_EMBED_SIZE, return_sequences=True),
                       merge_mode="sum"))
aenc.add(Dropout(0.3))
aenc.add(Convolution1D(QA_EMBED_SIZE // 2, 5, padding="valid"))
aenc.add(MaxPooling1D(pool_size=2, padding="valid"))
aenc.add(Dropout(0.3))

注意观察每个型号的输入和输出形状：

qenc输出形状为：(samples, (seq_maxlen-4)/2, QA_EMBED_SIZE//2)
aenc输出形状为：(samples, (seq_maxlen-4)/2, QA_EMBED_SIZE//2)

但attn正在合并两个分支，让它成为功能API Model

# attention model

#notice that I'm taking "tensors" qenc.output and aenc.output
#I'm not passing "models" to a layer, I'm passing tensors 
#that was the problem with your lambda

attOut = Dot(axes=1)([qenc.output, aenc.output]) 
    #shape = (samples,QA_EMBED_SIZE//2, QA_EMBED_SIZE//2)
    #I really don't understand this output shape.... 
    #I'd swear it should be (samples, 1, QA_EMBED_SIZE//2)
attOut = Flatten()(attOut) #shape is now only (samples,)
attOut = Dense((qenc.output_shape[1]*(QA_EMBED_SIZE // 2)))(attOut)
attOut = Reshape((qenc.output_shape[1], QA_EMBED_SIZE // 2))(attOut)

注意输出形状：(samples, (seq_maxlen-4)/2, QA_EMBED_SIZE // 2)。
另请注意，此关注部分需要两个输入

如果你需要＆＃34;由于某种原因将attn模型与其他模型分开，请告诉我，因为上面的代码需要稍微修改

现在，你可以展平qenc和attn的输出，没问题，你就是不能做到＆＃34;内部＆＃34; qenc模型。

flatAttOut = Flatten()(attOut)
flatQencOut = Flatten()(qenc.output)
similarity = Dot(axes=1,normalize=True)([flatQencOut,flatAttOut])

最后创建完整模型：

model = Model([qenc.input,aenc.input],similarity)

警告：此模型会输出相似度 - 您确定y_train是否相似？（形状=（样本，1））。
如果是的话，好的。如果不是，请更好地详细说明您的问题并解释您的模型输出，您的培训数据以及您希望何时何地出现这种相似性。

平衡分支的损失函数：

您可以尝试使用自定义丢失功能来平衡类，因为假真输出的比率为75％-25％。

import keras.backend as K

def balanceLoss(yTrue,yPred):

    loss = K.binary_crossentropy(yTrue,yPred)
    scaledTrue = (2*yTrue) + 1 
        #true values are 3 times worth the false values
        #contains 3 for true and 1 for false

    return scaledTrue * loss

model.compile(optimizer='adam', loss=balanceLoss)

不确定binary_crossentropy是否适合这种类型的余额，但您也可以尝试均方误差。

Keras：计算两个展平输出之间的余弦距离

修改

2 个答案:

平衡分支的损失函数：