Question

我正在尝试建立一个NER模型，这将有助于我对单词进行分类。由于我是这个领域的新手，所以我一直坚持让图层尺寸不匹配。

我的输入是句子-其中有43163个长度为70。因此，每个句子有70个单词（填充/截断后），总共有43163个单词。因此，我的X_train的形状为（43163，70）

我的y_train的形状为（43163，70，17）每个单词已被转换为长度为70（对应于其所属的NER标签）的单点编码。

单词被标记化，总共有35173个标记

我希望将其转换为将要学习的嵌入（我不想使用word2vec和Glove标准嵌入）现在，需要将这些嵌入放入LSTM中，然后再将其嵌入到DenseLayer中，最后给我分类。

我试图浏览这些博客（您将看到我的代码看起来很相似）无济于事-

https://machinelearningmastery.com/predict-sentiment-movie-reviews-using-deep-learning/

https://machinelearningmastery.com/develop-bidirectional-lstm-sequence-classification-python-keras/

#X_tr.shape = (43163, 70)

#y_train.shape = (43163, 70, 17)

#X_te.shape = (4796, 70) 

#y_te.shape = (4796, 70, 17)

The model is - 
total_words = 35173 #this is the unique tokens
embedding_vecor_length = 32
model = Sequential()

model.add(Embedding(total_words, embedding_vecor_length, input_length=70))
model.add(LSTM(100))
model.add(Dense(17, activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()

这是我得到的错误-

ValueError：检查目标时出错：预期density_10具有2维，但数组的形状为（43163，70，17）

请帮助我了解问题出在哪里或如何解决？

具有学习词嵌入功能的命名实体识别，LSTM，Keras

0 个答案: