Keras imdb情绪模型 - 如何预测新句子的情绪?

时间:2018-05-09 11:52:49

标签: keras word-embedding

我正在通过Deep Learning with Python书籍,在那里有一个学习单词嵌入以获得情感的例子:

from keras.datasets import imdb
from keras import preprocessing

max_features = 10000
maxlen = 20

(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)

x_train = preprocessing.sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = preprocessing.sequence.pad_sequences(x_test, maxlen=maxlen)

from keras.models import Sequential
from keras.layers import Flatten, Dense

model = Sequential()
model.add(Embedding(10000, 8, input_length=maxlen))
model.add(Flatten())

model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])
model.summary()

history = model.fit(x_train, y_train,
                    epochs=10,
                    batch_size=32,
                    validation_split=0.2)

我想通过一句话来预测情绪。我的第一个想法是传递一系列索引(因为如果我理解正确的话,模型中如何表示作品),例如:

import numpy as np

# does this reflect a really bad review?
model.predict(np.array([[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,]]))
[out] array([[ 0.0066505]], dtype=float32)

# does this reflect a really good review?
model.predict(np.array([[9999,9999,9999,9999,9999,9999,9999,9999,9999,9999,9999,9999,9999,9999,9999,9999,9999,9999,9999,9999 ]]))
[out] array([[ 0.64767915]], dtype=float32)

如何传入单词列表而不是索引?即如何检索新句子的单词索引列表?

更新 - 我试图将某些单词标记为:

def index(word):
    if word in word_index:
        return word_index[word]
    else:
        return "0"

def sequences(words):
    words = text_to_word_sequence(words)
    seqs = [[index(word) for word in words if word != "0"]]
    return preprocessing.sequence.pad_sequences(seqs, maxlen=maxlen)

bad_seq = sequences("Rubbish terrible awful dreadful hate stinks")
good_seq = sequences("Awesome recommended brilliant best")

print("bad movie: " + str(model.predict(bad_seq)))   # 0.00759153
print("good movie: " + str(model.predict(good_seq))) # 0.00423771

情绪非常相似,这表明标记化方法不起作用。

0 个答案:

没有答案