我有一个词嵌入神经网络,它在我的语料库的小集上工作得很好,但是当我将训练集增加到成千上万条记录(不更改任何其他内容)时,出现以下错误:
ValueError: Error when checking input: expected documents to have shape (46,) but got array with shape (1,)
模型摘要如下:
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
ingredients (InputLayer) (None, 46) 0
__________________________________________________________________________________________________
documents (InputLayer) (None, 46) 0
__________________________________________________________________________________________________
ingredients_embedding (Embeddin (None, 46, 50) 8709200 ingredients[0][0]
__________________________________________________________________________________________________
documents_embedding (Embedding) (None, 46, 50) 8709200 documents[0][0]
__________________________________________________________________________________________________
lambda_1 (Lambda) (None, 50) 0 ingredients_embedding[0][0]
__________________________________________________________________________________________________
lambda_2 (Lambda) (None, 50) 0 documents_embedding[0][0]
__________________________________________________________________________________________________
dot_product (Dot) (None, 1) 0 lambda_1[0][0]
lambda_2[0][0]
__________________________________________________________________________________________________
reshape_1 (Reshape) (None, 1) 0 dot_product[0][0]
==================================================================================================
Total params: 17,418,400
Trainable params: 17,418,400
Non-trainable params: 0
这是模型的入口,生成器为训练生成数据:
negative_ratio, n_positive = 2, 100
t = Trainer()
training_data_pairs = t.index_and_encode()
training_size, embedding_size, input_size = len(training_data_pairs), 50, 46
batch = t.generate_batch(n_positive, negative_ratio=negative_ratio)
model = model(training_size, embedding_size, input_size)
h = model.fit_generator(
batch,
epochs=10,
steps_per_epoch=int(training_size/n_positive),
verbose=2
)