我将预训练向量用于keras模型,并将单词转换为id时遇到问题。为此,我使用令牌生成器。但是出现错误:
index 117004 is out of bounds for axis 0 with size 116997
我想问题是它会获得火车的id,但不会扩展它们以进行开发和测试。
t = Tokenizer()
t.fit_on_texts(X_train_words)
vocab_size = len(t.word_index) + 1
X_train = np.array(t.texts_to_sequences(X_train_words))
t.fit_on_texts(X_dev_words)
X_dev = np.array(t.texts_to_sequences(X_dev_words))
t.fit_on_texts(X_test_words)
X_test = np.array(t.texts_to_sequences(X_test_words))
问题出现在这里(embedding_matrix [i] = embedding_vector)
for word, i in t.word_index.items():
embedding_vector = embeddings_index.get(word)
if embedding_vector is not None:
embedding_matrix[i] = embedding_vector