我正在尝试使用深度学习算法进行情感分析。我已经建立了自己的模型并使用测试数据进行了测试。我还应该使用新输入进行预测。但是,我不知道我该怎么做?
这是我的代码:
# Tokenization
print(colored("Tokenizing and padding data", "yellow"))
tokenizer = Tokenizer(num_words=2000, split=' ')
tokenizer.fit_on_texts(train_data['Clean_tweet'].astype(str).values)
train_tweets = tokenizer.texts_to_sequences(train_data['Clean_tweet'].astype(str).values)
max_len = max([len(i) for i in train_tweets])
vocab_size = len(tokenizer.word_index) + 1
train_tweets = pad_sequences(train_tweets, maxlen=max_len)
le = LabelEncoder()
y = le.fit_transform(train_data['Sentiment'].values)
test_tweets = tokenizer.texts_to_sequences(test_data['Clean_tweet'].astype(str).values)
test_tweets = pad_sequences(test_tweets, maxlen=max_len)
print(colored("Tokenizing and padding complete", "yellow"))
embeddings_dictionary = dict()
embedding_dim = 100
glove_file = open('glove.6B.100d.txt')
for line in glove_file:
records = line.split()
word = records[0]
vector_dimensions = np.asarray(records[1:], dtype='float32')
embeddings_dictionary[word] = vector_dimensions
glove_file.close()
embeddings_matrix = np.zeros((vocab_size, embedding_dim))
for word, index in tokenizer.word_index.items():
embedding_vector = embeddings_dictionary.get(word)
if embedding_vector is not None:
embeddings_matrix[index] = embedding_vector
embedding_layer = Embedding(vocab_size, embedding_dim, input_length=max_len, weights=[embeddings_matrix],
trainable=False)
# Building the model
print(colored("Creating the LSTM model", "yellow"))
model = Sequential()
model.add(embedding_layer)
model.add(Dropout(0.4))
model.add(Dense(64, activation='relu'))
model.add(LSTM(256, dropout=0.2))
model.add(Dropout(0.3))
model.add(Dense(2, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
# Training the model
print(colored("Training the LSTM model", "green"))
history = model.fit(train_tweets, pd.get_dummies(train_data['Sentiment']).values, epochs=15, batch_size=256,
validation_split=0.2)
print(colored(history, "green"))
model.save('model.h5')
我做了这样的事情来预测新输入:
text = "I hate you"
seq = tokenizer.texts_to_sequences(text)
padded = pad_sequences(seq, maxlen=len(text) + 1)
pred = model.predict_classes(padded)
print(pred)
输出为:
[1 1 1 1 1 1 1 1 1 1]
我如何清楚地了解什么是情绪(0 表示负面,1 表示正面)?