Question

我已使用Word2Vec将imdb评论转换为300维。

我保留了embedding_vecor_length = 32，input_length = 300（共25000条评论）。

我的准确度很差，损失也很大。

在10个周期结束时，我的准确度为0.4977，损失为0.6932。

    embedding_vecor_length = 32
    model = Sequential()
    model.add(Embedding(25000, embedding_vecor_length, input_length=300))
    model.add(LSTM(100))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics['accuracy'])

我应该添加或删除哪些内容，以便提高准确性并减少损失？

Answer 1

25000似乎是您拥有的样本数，而不是嵌入层的输入维数。我认为您应该检查该功能中想要的尺寸。我认为，在没有看到您的数据的情况下，您真正想要的是：

model.add(Embedding(300, embedding_vecor_length))

但是，由于您已经使用过word2vec，因此这已经是一种嵌入！您不需要嵌入层。我认为您应该先删除它，然后再查看准确性。

Answer 2

您可以使用预训练的词嵌入手套，也可以使用Gloves.6B.50d.txt，可以从http://nlp.stanford.edu/data/glove.6B.zip下载该手套，使用50d

def read_glove_vecs(glove_file):
    with open(glove_file,'r',encoding='UTF-8') as f:
         words = set()
         word_to_vec_map = {}
         for line in f:
             line = line.strip().split()
             curr_word = line[0]
             words.add(curr_word)
             word_to_vec_map[curr_word] = np.array(line[1:], dtype=np.float64)

         i = 1
         words_to_index = {}
         index_to_words = {}
         for w in sorted(words):
             words_to_index[w] = I
             index_to_words[i] = w
             i = i + 1
    return words_to_index, index_to_words, word_to_vec_map

现在调用上面的函数，它将返回

word_to_index, index_to_word, word_to_vec_map = read_glove_vecs('data/glove.6B.50d.txt')

现在从这些经过预训练的单词创建单词嵌入

vocab_len = len(word_to_index) 
emb_dim = 50 # the above word vector are trained for 50 dim
emb_matrix = np.zeros((vocab_len, emb_dim))

for word, index in word_to_index.items():
    emb_matrix[index,:] = word_to_vec_map[word]
embedding_layer = Embedding(vocab_len, emb_dim, trainable = False)
embedding_layer.build((None,))

embedding_layer.set_weights([emb_matrix])

现在在模型中使用此嵌入层，这将提高准确性

使用LSTM进行Imdb评估时获得非常低的准确性

2 个答案: